Centos-kernel-stream-9

Commit Graph

Author	SHA1	Message	Date
Jeff Moyer	c46aaba751	net: change proto and proto_ops accept type JIRA: https://issues.redhat.com/browse/RHEL-64867 Conflicts: RHEL is missing commit 1ded5e5a5931 ("net: annotate data-races around sock->ops"), which accounts for the differences in ops structure dereferencing. commit 92ef0fd55ac80dfc2e4654edfe5d1ddfa6e070fe Author: Jens Axboe <axboe@kernel.dk> Date: Thu May 9 09:20:08 2024 -0600 net: change proto and proto_ops accept type Rather than pass in flags, error pointer, and whether this is a kernel invocation or not, add a struct proto_accept_arg struct as the argument. This then holds all of these arguments, and prepares accept for being able to pass back more information. No functional changes in this patch. Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2024-12-02 11:12:33 -05:00
Ian Kent	f7a70a9fc1	fs: port vfs_() helpers to struct mnt_idmap JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus Conflicts: There was a whitespasce difference possibly due to CentOS Stream commit `c912400e45` ("fs: Fix description of vfs_tmpfile()") CentOS Stream commit `c4f3dd0731` ("nfsd: handle failure to collect pre/post-op attrs more sanely") is present which caused a hunk reject in fs/nfsd/nfs3proc.c and two hunks to be rejected in fs/nfsd/vfs.c the hunks were manually applied. Upstream commit 79b05beaa5c34 ("af_unix: Acquire/Release per-netns hash table's locks.") is not present in CentOS Stream fixed the conflict manually. Dropped ksmbd hunks, ksmbd source is not present. Upstream commit 3350607dc5637 ("security: Create file_truncate hook from path_truncate hook") is not present in CentOS Stream. commit abf08576afe31506b812c8c1be9714f78613f300 Author: Christian Brauner <brauner@kernel.org> Date: Fri Jan 13 12:49:10 2023 +0100 fs: port vfs_() helpers to struct mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-16 08:29:51 +08:00
Lucas Zampieri	7d84201666	Merge: af_unix: Fix data races around sk->sk_shutdown. MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4606 JIRA: https://issues.redhat.com/browse/RHEL-43969 Upstream Status: linux.git CVE: CVE-2024-38596 Signed-off-by: Guillaume Nault <gnault@redhat.com> Approved-by: Davide Caratti <dcaratti@redhat.com> Approved-by: Florian Westphal <fwestpha@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Lucas Zampieri <lzampier@redhat.com>	2024-07-08 13:00:12 +00:00
Guillaume Nault	bb6c36110b	af_unix: Fix data races in unix_release_sock/unix_stream_sendmsg JIRA: https://issues.redhat.com/browse/RHEL-43969 Upstream Status: linux.git CVE: CVE-2024-38596 commit 540bf24fba16b88c1b3b9353927204b4f1074e25 Author: Breno Leitao <leitao@debian.org> Date: Thu May 9 01:14:46 2024 -0700 af_unix: Fix data races in unix_release_sock/unix_stream_sendmsg A data-race condition has been identified in af_unix. In one data path, the write function unix_release_sock() atomically writes to sk->sk_shutdown using WRITE_ONCE. However, on the reader side, unix_stream_sendmsg() does not read it atomically. Consequently, this issue is causing the following KCSAN splat to occur: BUG: KCSAN: data-race in unix_release_sock / unix_stream_sendmsg write (marked) to 0xffff88867256ddbb of 1 bytes by task 7270 on cpu 28: unix_release_sock (net/unix/af_unix.c:640) unix_release (net/unix/af_unix.c:1050) sock_close (net/socket.c:659 net/socket.c:1421) __fput (fs/file_table.c:422) __fput_sync (fs/file_table.c:508) __se_sys_close (fs/open.c:1559 fs/open.c:1541) __x64_sys_close (fs/open.c:1541) x64_sys_call (arch/x86/entry/syscall_64.c:33) do_syscall_64 (arch/x86/entry/common.c:?) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) read to 0xffff88867256ddbb of 1 bytes by task 989 on cpu 14: unix_stream_sendmsg (net/unix/af_unix.c:2273) __sock_sendmsg (net/socket.c:730 net/socket.c:745) ____sys_sendmsg (net/socket.c:2584) __sys_sendmmsg (net/socket.c:2638 net/socket.c:2724) __x64_sys_sendmmsg (net/socket.c:2753 net/socket.c:2750 net/socket.c:2750) x64_sys_call (arch/x86/entry/syscall_64.c:33) do_syscall_64 (arch/x86/entry/common.c:?) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) value changed: 0x01 -> 0x03 The line numbers are related to commit dd5a440a31fa ("Linux 6.9-rc7"). Commit e1d09c2c2f57 ("af_unix: Fix data races around sk->sk_shutdown.") addressed a comparable issue in the past regarding sk->sk_shutdown. However, it overlooked resolving this particular data path. This patch only offending unix_stream_sendmsg() function, since the other reads seem to be protected by unix_state_lock() as discussed in Link: https://lore.kernel.org/all/20240508173324.53565-1-kuniyu@amazon.com/ Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240509081459.2807828-1-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Guillaume Nault <gnault@redhat.com>	2024-06-26 13:48:52 +02:00
Guillaume Nault	8574f1b610	af_unix: Fix data races around sk->sk_shutdown. JIRA: https://issues.redhat.com/browse/RHEL-43969 Upstream Status: linux.git CVE: CVE-2024-38596 commit e1d09c2c2f5793474556b60f83900e088d0d366d Author: Kuniyuki Iwashima <kuniyu@amazon.com> Date: Tue May 9 17:34:56 2023 -0700 af_unix: Fix data races around sk->sk_shutdown. KCSAN found a data race around sk->sk_shutdown where unix_release_sock() and unix_shutdown() update it under unix_state_lock(), OTOH unix_poll() and unix_dgram_poll() read it locklessly. We need to annotate the writes and reads with WRITE_ONCE() and READ_ONCE(). BUG: KCSAN: data-race in unix_poll / unix_release_sock write to 0xffff88800d0f8aec of 1 bytes by task 264 on cpu 0: unix_release_sock+0x75c/0x910 net/unix/af_unix.c:631 unix_release+0x59/0x80 net/unix/af_unix.c:1042 __sock_release+0x7d/0x170 net/socket.c:653 sock_close+0x19/0x30 net/socket.c:1397 __fput+0x179/0x5e0 fs/file_table.c:321 ____fput+0x15/0x20 fs/file_table.c:349 task_work_run+0x116/0x1a0 kernel/task_work.c:179 resume_user_mode_work include/linux/resume_user_mode.h:49 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] exit_to_user_mode_prepare+0x174/0x180 kernel/entry/common.c:204 __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline] syscall_exit_to_user_mode+0x1a/0x30 kernel/entry/common.c:297 do_syscall_64+0x4b/0x90 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x72/0xdc read to 0xffff88800d0f8aec of 1 bytes by task 222 on cpu 1: unix_poll+0xa3/0x2a0 net/unix/af_unix.c:3170 sock_poll+0xcf/0x2b0 net/socket.c:1385 vfs_poll include/linux/poll.h:88 [inline] ep_item_poll.isra.0+0x78/0xc0 fs/eventpoll.c:855 ep_send_events fs/eventpoll.c:1694 [inline] ep_poll fs/eventpoll.c:1823 [inline] do_epoll_wait+0x6c4/0xea0 fs/eventpoll.c:2258 __do_sys_epoll_wait fs/eventpoll.c:2270 [inline] __se_sys_epoll_wait fs/eventpoll.c:2265 [inline] __x64_sys_epoll_wait+0xcc/0x190 fs/eventpoll.c:2265 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc value changed: 0x00 -> 0x03 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 222 Comm: dbus-broker Not tainted 6.3.0-rc7-02330-gca6270c12e20 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Fixes: `3c73419c09` ("af_unix: fix 'poll for write'/ connected DGRAM sockets") Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Guillaume Nault <gnault@redhat.com>	2024-06-26 13:46:00 +02:00
Davide Caratti	c211ba403d	af_unix: fix lockdep positive in sk_diag_dump_icons() JIRA: https://issues.redhat.com/browse/RHEL-33410 Upstream Status: net.git commit 4d322dce82a1d44f8c83f0f54f95dd1b8dcf46c9 commit 4d322dce82a1d44f8c83f0f54f95dd1b8dcf46c9 Author: Eric Dumazet <edumazet@google.com> Date: Tue Jan 30 18:42:35 2024 +0000 af_unix: fix lockdep positive in sk_diag_dump_icons() syzbot reported a lockdep splat [1]. Blamed commit hinted about the possible lockdep violation, and code used unix_state_lock_nested() in an attempt to silence lockdep. It is not sufficient, because unix_state_lock_nested() is already used from unix_state_double_lock(). We need to use a separate subclass. This patch adds a distinct enumeration to make things more explicit. Also use swap() in unix_state_double_lock() as a clean up. v2: add a missing inline keyword to unix_state_lock_nested() [1] WARNING: possible circular locking dependency detected 6.8.0-rc1-syzkaller-00356-g8a696a29c690 #0 Not tainted syz-executor.1/2542 is trying to acquire lock: ffff88808b5df9e8 (rlock-AF_UNIX){+.+.}-{2:2}, at: skb_queue_tail+0x36/0x120 net/core/skbuff.c:3863 but task is already holding lock: ffff88808b5dfe70 (&u->lock/1){+.+.}-{2:2}, at: unix_dgram_sendmsg+0xfc7/0x2200 net/unix/af_unix.c:2089 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&u->lock/1){+.+.}-{2:2}: lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754 _raw_spin_lock_nested+0x31/0x40 kernel/locking/spinlock.c:378 sk_diag_dump_icons net/unix/diag.c:87 [inline] sk_diag_fill+0x6ea/0xfe0 net/unix/diag.c:157 sk_diag_dump net/unix/diag.c:196 [inline] unix_diag_dump+0x3e9/0x630 net/unix/diag.c:220 netlink_dump+0x5c1/0xcd0 net/netlink/af_netlink.c:2264 __netlink_dump_start+0x5d7/0x780 net/netlink/af_netlink.c:2370 netlink_dump_start include/linux/netlink.h:338 [inline] unix_diag_handler_dump+0x1c3/0x8f0 net/unix/diag.c:319 sock_diag_rcv_msg+0xe3/0x400 netlink_rcv_skb+0x1df/0x430 net/netlink/af_netlink.c:2543 sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:280 netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline] netlink_unicast+0x7e6/0x980 net/netlink/af_netlink.c:1367 netlink_sendmsg+0xa37/0xd70 net/netlink/af_netlink.c:1908 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] sock_write_iter+0x39a/0x520 net/socket.c:1160 call_write_iter include/linux/fs.h:2085 [inline] new_sync_write fs/read_write.c:497 [inline] vfs_write+0xa74/0xca0 fs/read_write.c:590 ksys_write+0x1a0/0x2c0 fs/read_write.c:643 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b -> #0 (rlock-AF_UNIX){+.+.}-{2:2}: check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain+0x1909/0x5ab0 kernel/locking/lockdep.c:3869 __lock_acquire+0x1345/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162 skb_queue_tail+0x36/0x120 net/core/skbuff.c:3863 unix_dgram_sendmsg+0x15d9/0x2200 net/unix/af_unix.c:2112 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] ____sys_sendmsg+0x592/0x890 net/socket.c:2584 ___sys_sendmsg net/socket.c:2638 [inline] __sys_sendmmsg+0x3b2/0x730 net/socket.c:2724 __do_sys_sendmmsg net/socket.c:2753 [inline] __se_sys_sendmmsg net/socket.c:2750 [inline] __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2750 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&u->lock/1); lock(rlock-AF_UNIX); lock(&u->lock/1); lock(rlock-AF_UNIX); * DEADLOCK * 1 lock held by syz-executor.1/2542: #0: ffff88808b5dfe70 (&u->lock/1){+.+.}-{2:2}, at: unix_dgram_sendmsg+0xfc7/0x2200 net/unix/af_unix.c:2089 stack backtrace: CPU: 1 PID: 2542 Comm: syz-executor.1 Not tainted 6.8.0-rc1-syzkaller-00356-g8a696a29c690 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106 check_noncircular+0x366/0x490 kernel/locking/lockdep.c:2187 check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain+0x1909/0x5ab0 kernel/locking/lockdep.c:3869 __lock_acquire+0x1345/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162 skb_queue_tail+0x36/0x120 net/core/skbuff.c:3863 unix_dgram_sendmsg+0x15d9/0x2200 net/unix/af_unix.c:2112 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] ____sys_sendmsg+0x592/0x890 net/socket.c:2584 ___sys_sendmsg net/socket.c:2638 [inline] __sys_sendmmsg+0x3b2/0x730 net/socket.c:2724 __do_sys_sendmmsg net/socket.c:2753 [inline] __se_sys_sendmmsg net/socket.c:2750 [inline] __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2750 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b RIP: 0033:0x7f26d887cda9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f26d95a60c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 00007f26d89abf80 RCX: 00007f26d887cda9 RDX: 000000000000003e RSI: 00000000200bd000 RDI: 0000000000000004 RBP: 00007f26d88c947a R08: 0000000000000000 R09: 0000000000000000 R10: 00000000000008c0 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000000000b R14: 00007f26d89abf80 R15: 00007ffcfe081a68 Fixes: `2aac7a2cb0` ("unix_diag: Pending connections IDs NLA") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240130184235.1620738-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Davide Caratti <dcaratti@redhat.com>	2024-06-06 11:51:40 +02:00
Davide Caratti	9a7e59dd2a	af_unix: Fix data races around sk->sk_shutdown. JIRA: https://issues.redhat.com/browse/RHEL-33410 Upstream Status: net.git commit e1d09c2c2f5793474556b60f83900e088d0d366d commit e1d09c2c2f5793474556b60f83900e088d0d366d Author: Kuniyuki Iwashima <kuniyu@amazon.com> Date: Tue May 9 17:34:56 2023 -0700 af_unix: Fix data races around sk->sk_shutdown. KCSAN found a data race around sk->sk_shutdown where unix_release_sock() and unix_shutdown() update it under unix_state_lock(), OTOH unix_poll() and unix_dgram_poll() read it locklessly. We need to annotate the writes and reads with WRITE_ONCE() and READ_ONCE(). BUG: KCSAN: data-race in unix_poll / unix_release_sock write to 0xffff88800d0f8aec of 1 bytes by task 264 on cpu 0: unix_release_sock+0x75c/0x910 net/unix/af_unix.c:631 unix_release+0x59/0x80 net/unix/af_unix.c:1042 __sock_release+0x7d/0x170 net/socket.c:653 sock_close+0x19/0x30 net/socket.c:1397 __fput+0x179/0x5e0 fs/file_table.c:321 ____fput+0x15/0x20 fs/file_table.c:349 task_work_run+0x116/0x1a0 kernel/task_work.c:179 resume_user_mode_work include/linux/resume_user_mode.h:49 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] exit_to_user_mode_prepare+0x174/0x180 kernel/entry/common.c:204 __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline] syscall_exit_to_user_mode+0x1a/0x30 kernel/entry/common.c:297 do_syscall_64+0x4b/0x90 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x72/0xdc read to 0xffff88800d0f8aec of 1 bytes by task 222 on cpu 1: unix_poll+0xa3/0x2a0 net/unix/af_unix.c:3170 sock_poll+0xcf/0x2b0 net/socket.c:1385 vfs_poll include/linux/poll.h:88 [inline] ep_item_poll.isra.0+0x78/0xc0 fs/eventpoll.c:855 ep_send_events fs/eventpoll.c:1694 [inline] ep_poll fs/eventpoll.c:1823 [inline] do_epoll_wait+0x6c4/0xea0 fs/eventpoll.c:2258 __do_sys_epoll_wait fs/eventpoll.c:2270 [inline] __se_sys_epoll_wait fs/eventpoll.c:2265 [inline] __x64_sys_epoll_wait+0xcc/0x190 fs/eventpoll.c:2265 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc value changed: 0x00 -> 0x03 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 222 Comm: dbus-broker Not tainted 6.3.0-rc7-02330-gca6270c12e20 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Fixes: `3c73419c09` ("af_unix: fix 'poll for write'/ connected DGRAM sockets") Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Davide Caratti <dcaratti@redhat.com>	2024-06-06 11:39:13 +02:00
Davide Caratti	426724b26b	af_unix: Fix a data race of sk->sk_receive_queue->qlen. JIRA: https://issues.redhat.com/browse/RHEL-33410 Upstream Status: net.git commit 679ed006d416ea0cecfe24a99d365d1dea69c683 commit 679ed006d416ea0cecfe24a99d365d1dea69c683 Author: Kuniyuki Iwashima <kuniyu@amazon.com> Date: Tue May 9 17:34:55 2023 -0700 af_unix: Fix a data race of sk->sk_receive_queue->qlen. KCSAN found a data race of sk->sk_receive_queue->qlen where recvmsg() updates qlen under the queue lock and sendmsg() checks qlen under unix_state_sock(), not the queue lock, so the reader side needs READ_ONCE(). BUG: KCSAN: data-race in __skb_try_recv_from_queue / unix_wait_for_peer write (marked) to 0xffff888019fe7c68 of 4 bytes by task 49792 on cpu 0: __skb_unlink include/linux/skbuff.h:2347 [inline] __skb_try_recv_from_queue+0x3de/0x470 net/core/datagram.c:197 __skb_try_recv_datagram+0xf7/0x390 net/core/datagram.c:263 __unix_dgram_recvmsg+0x109/0x8a0 net/unix/af_unix.c:2452 unix_dgram_recvmsg+0x94/0xa0 net/unix/af_unix.c:2549 sock_recvmsg_nosec net/socket.c:1019 [inline] ____sys_recvmsg+0x3a3/0x3b0 net/socket.c:2720 ___sys_recvmsg+0xc8/0x150 net/socket.c:2764 do_recvmmsg+0x182/0x560 net/socket.c:2858 __sys_recvmmsg net/socket.c:2937 [inline] __do_sys_recvmmsg net/socket.c:2960 [inline] __se_sys_recvmmsg net/socket.c:2953 [inline] __x64_sys_recvmmsg+0x153/0x170 net/socket.c:2953 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc read to 0xffff888019fe7c68 of 4 bytes by task 49793 on cpu 1: skb_queue_len include/linux/skbuff.h:2127 [inline] unix_recvq_full net/unix/af_unix.c:229 [inline] unix_wait_for_peer+0x154/0x1a0 net/unix/af_unix.c:1445 unix_dgram_sendmsg+0x13bc/0x14b0 net/unix/af_unix.c:2048 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg+0x148/0x160 net/socket.c:747 ____sys_sendmsg+0x20e/0x620 net/socket.c:2503 ___sys_sendmsg+0xc6/0x140 net/socket.c:2557 __sys_sendmmsg+0x11d/0x370 net/socket.c:2643 __do_sys_sendmmsg net/socket.c:2672 [inline] __se_sys_sendmmsg net/socket.c:2669 [inline] __x64_sys_sendmmsg+0x58/0x70 net/socket.c:2669 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc value changed: 0x0000000b -> 0x00000001 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 49793 Comm: syz-executor.0 Not tainted 6.3.0-rc7-02330-gca6270c12e20 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Davide Caratti <dcaratti@redhat.com>	2024-06-06 11:39:13 +02:00
Waiman Long	6d0328a7cf	Revert "Revert "Merge: cgroup: Backport upstream cgroup commits up to v6.8"" JIRA: https://issues.redhat.com/browse/RHEL-36683 Upstream Status: RHEL only This reverts commit `08637d76a2` which is a revert of "Merge: cgroup: Backport upstream cgroup commits up to v6.8" Signed-off-by: Waiman Long <longman@redhat.com>	2024-05-18 21:38:20 -04:00
Lucas Zampieri	08637d76a2	Revert "Merge: cgroup: Backport upstream cgroup commits up to v6.8" This reverts merge request !4128	2024-05-16 15:26:41 +00:00
Waiman Long	724656e7cf	freezer,sched: Rewrite core freezer logic JIRA: https://issues.redhat.com/browse/RHEL-34600 Conflicts: 1) A merge conflict in the kernel/signal.c hunk due to the presence of RHEL-only commit `975d318867` ("signal: Don't disable preemption in ptrace_stop() on PREEMPT_RT."). 2) A merge conflict in the kernel/time/hrtimer.c hunk due to the presence of RHEL-only commit `5f76194136` ("time/hrtimer: Embed hrtimer mode into hrtimer_sleeper"). 3) The fs/cifs/inode.c hunk was applied to fs/smb/client/inode.c due to the presence of upstream commit 38c8a9a52082 ("smb: move client and server files to common directory fs/smb"). 4) Similarly, the fs/cifs/transport.c hunk was applied to fs/smb/client/transport.c manually due to the presence of a later upstream commit d527f51331ca ("cifs: Fix UAF in cifs_demultiplex_thread()"). Note that all the prerequiste patches in the same patch series (https://lore.kernel.org/lkml/20220822111816.760285417@infradead.org/) had already been merged into RHEL9. commit f5d39b020809146cc28e6e73369bf8065e0310aa Author: Peter Zijlstra <peterz@infradead.org> Date: Mon, 22 Aug 2022 13:18:22 +0200 freezer,sched: Rewrite core freezer logic Rewrite the core freezer to behave better wrt thawing and be simpler in general. By replacing PF_FROZEN with TASK_FROZEN, a special block state, it is ensured frozen tasks stay frozen until thawed and don't randomly wake up early, as is currently possible. As such, it does away with PF_FROZEN and PF_FREEZER_SKIP, freeing up two PF_flags (yay!). Specifically; the current scheme works a little like: freezer_do_not_count(); schedule(); freezer_count(); And either the task is blocked, or it lands in try_to_freezer() through freezer_count(). Now, when it is blocked, the freezer considers it frozen and continues. However, on thawing, once pm_freezing is cleared, freezer_count() stops working, and any random/spurious wakeup will let a task run before its time. That is, thawing tries to thaw things in explicit order; kernel threads and workqueues before doing bringing SMP back before userspace etc.. However due to the above mentioned races it is entirely possible for userspace tasks to thaw (by accident) before SMP is back. This can be a fatal problem in asymmetric ISA architectures (eg ARMv9) where the userspace task requires a special CPU to run. As said; replace this with a special task state TASK_FROZEN and add the following state transitions: TASK_FREEZABLE -> TASK_FROZEN __TASK_STOPPED -> TASK_FROZEN __TASK_TRACED -> TASK_FROZEN The new TASK_FREEZABLE can be set on any state part of TASK_NORMAL (IOW. TASK_INTERRUPTIBLE and TASK_UNINTERRUPTIBLE) -- any such state is already required to deal with spurious wakeups and the freezer causes one such when thawing the task (since the original state is lost). The special __TASK_{STOPPED,TRACED} states can be restored since their canonical state is in ->jobctl. With this, frozen tasks need an explicit TASK_FROZEN wakeup and are free of undue (early / spurious) wakeups. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20220822114649.055452969@infradead.org Signed-off-by: Waiman Long <longman@redhat.com>	2024-04-26 22:49:06 -04:00
Guillaume Nault	4e2b5d2e07	af_unix: Fix null-ptr-deref in unix_stream_sendpage(). JIRA: https://issues.redhat.com/browse/RHEL-17264 Upstream Status: git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git CVE: CVE-2023-4622 commit 790c2f9d15b594350ae9bca7b236f2b1859de02c Author: Kuniyuki Iwashima <kuniyu@amazon.com> Date: Mon Aug 21 10:55:05 2023 -0700 af_unix: Fix null-ptr-deref in unix_stream_sendpage(). Bing-Jhong Billy Jheng reported null-ptr-deref in unix_stream_sendpage() with detailed analysis and a nice repro. unix_stream_sendpage() tries to add data to the last skb in the peer's recv queue without locking the queue. If the peer's FD is passed to another socket and the socket's FD is passed to the peer, there is a loop between them. If we close both sockets without receiving FD, the sockets will be cleaned up by garbage collection. The garbage collection iterates such sockets and unlinks skb with FD from the socket's receive queue under the queue's lock. So, there is a race where unix_stream_sendpage() could access an skb locklessly that is being released by garbage collection, resulting in use-after-free. To avoid the issue, unix_stream_sendpage() must lock the peer's recv queue. Note the issue does not exist in 6.5+ thanks to the recent sendpage() refactoring. This patch is originally written by Linus Torvalds. BUG: unable to handle page fault for address: ffff988004dd6870 PF: supervisor read access in kernel mode PF: error_code(0x0000) - not-present page PGD 0 P4D 0 PREEMPT SMP PTI CPU: 4 PID: 297 Comm: garbage_uaf Not tainted 6.1.46 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:kmem_cache_alloc_node+0xa2/0x1e0 Code: c0 0f 84 32 01 00 00 41 83 fd ff 74 10 48 8b 00 48 c1 e8 3a 41 39 c5 0f 85 1c 01 00 00 41 8b 44 24 28 49 8b 3c 24 48 8d 4a 40 <49> 8b 1c 06 4c 89 f0 65 48 0f c7 0f 0f 94 c0 84 c0 74 a1 41 8b 44 RSP: 0018:ffffc9000079fac0 EFLAGS: 00000246 RAX: 0000000000000070 RBX: 0000000000000005 RCX: 000000000001a284 RDX: 000000000001a244 RSI: 0000000000400cc0 RDI: 000000000002eee0 RBP: 0000000000400cc0 R08: 0000000000400cc0 R09: 0000000000000003 R10: 0000000000000001 R11: 0000000000000000 R12: ffff888003970f00 R13: 00000000ffffffff R14: ffff988004dd6800 R15: 00000000000000e8 FS: 00007f174d6f3600(0000) GS:ffff88807db00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff988004dd6870 CR3: 00000000092be000 CR4: 00000000007506e0 PKRU: 55555554 Call Trace: <TASK> ? __die_body.cold+0x1a/0x1f ? page_fault_oops+0xa9/0x1e0 ? fixup_exception+0x1d/0x310 ? exc_page_fault+0xa8/0x150 ? asm_exc_page_fault+0x22/0x30 ? kmem_cache_alloc_node+0xa2/0x1e0 ? __alloc_skb+0x16c/0x1e0 __alloc_skb+0x16c/0x1e0 alloc_skb_with_frags+0x48/0x1e0 sock_alloc_send_pskb+0x234/0x270 unix_stream_sendmsg+0x1f5/0x690 sock_sendmsg+0x5d/0x60 ____sys_sendmsg+0x210/0x260 ___sys_sendmsg+0x83/0xd0 ? kmem_cache_alloc+0xc6/0x1c0 ? avc_disable+0x20/0x20 ? percpu_counter_add_batch+0x53/0xc0 ? alloc_empty_file+0x5d/0xb0 ? alloc_file+0x91/0x170 ? alloc_file_pseudo+0x94/0x100 ? __fget_light+0x9f/0x120 __sys_sendmsg+0x54/0xa0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x69/0xd3 RIP: 0033:0x7f174d639a7d Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 8a c1 f4 ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 de c1 f4 ff 48 RSP: 002b:00007ffcb563ea50 EFLAGS: 00000293 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f174d639a7d RDX: 0000000000000000 RSI: 00007ffcb563eab0 RDI: 0000000000000007 RBP: 00007ffcb563eb10 R08: 0000000000000000 R09: 00000000ffffffff R10: 00000000004040a0 R11: 0000000000000293 R12: 00007ffcb563ec28 R13: 0000000000401398 R14: 0000000000403e00 R15: 00007f174d72c000 </TASK> Fixes: `869e7c6248` ("net: af_unix: implement stream sendpage support") Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg> Reviewed-by: Bing-Jhong Billy Jheng <billy@starlabs.sg> Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Guillaume Nault <gnault@redhat.com>	2023-12-01 17:16:48 +01:00
Jan Stancek	96911a0b20	Merge: net/other: phase-1 backports for RHEL-9.4 MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3264 JIRA: https://issues.redhat.com/browse/RHEL-14526 Upstream Status: all mainline in net.git Tested: boot-tested only Conflicts: None Signed-off-by: Davide Caratti <dcaratti@redhat.com> Approved-by: Hangbin Liu <haliu@redhat.com> Approved-by: Florian Westphal <fwestpha@redhat.com> Signed-off-by: Jan Stancek <jstancek@redhat.com>	2023-11-20 21:49:15 +01:00
Davide Caratti	d9798a3459	af_unix: Fix data-race around unix_tot_inflight. JIRA: https://issues.redhat.com/browse/RHEL-14526 Upstream Status: net.git commit ade32bd8a738d7497ffe9743c46728db26740f78 commit ade32bd8a738d7497ffe9743c46728db26740f78 Author: Kuniyuki Iwashima <kuniyu@amazon.com> Date: Fri Sep 1 17:27:06 2023 -0700 af_unix: Fix data-race around unix_tot_inflight. unix_tot_inflight is changed under spin_lock(unix_gc_lock), but unix_release_sock() reads it locklessly. Let's use READ_ONCE() for unix_tot_inflight. Note that the writer side was marked by commit 9d6d7f1cb67c ("af_unix: annote lockless accesses to unix_tot_inflight & gc_in_progress") BUG: KCSAN: data-race in unix_inflight / unix_release_sock write (marked) to 0xffffffff871852b8 of 4 bytes by task 123 on cpu 1: unix_inflight+0x130/0x180 net/unix/scm.c:64 unix_attach_fds+0x137/0x1b0 net/unix/scm.c:123 unix_scm_to_skb net/unix/af_unix.c:1832 [inline] unix_dgram_sendmsg+0x46a/0x14f0 net/unix/af_unix.c:1955 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg+0x148/0x160 net/socket.c:747 ____sys_sendmsg+0x4e4/0x610 net/socket.c:2493 ___sys_sendmsg+0xc6/0x140 net/socket.c:2547 __sys_sendmsg+0x94/0x140 net/socket.c:2576 __do_sys_sendmsg net/socket.c:2585 [inline] __se_sys_sendmsg net/socket.c:2583 [inline] __x64_sys_sendmsg+0x45/0x50 net/socket.c:2583 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc read to 0xffffffff871852b8 of 4 bytes by task 4891 on cpu 0: unix_release_sock+0x608/0x910 net/unix/af_unix.c:671 unix_release+0x59/0x80 net/unix/af_unix.c:1058 __sock_release+0x7d/0x170 net/socket.c:653 sock_close+0x19/0x30 net/socket.c:1385 __fput+0x179/0x5e0 fs/file_table.c:321 ____fput+0x15/0x20 fs/file_table.c:349 task_work_run+0x116/0x1a0 kernel/task_work.c:179 resume_user_mode_work include/linux/resume_user_mode.h:49 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] exit_to_user_mode_prepare+0x174/0x180 kernel/entry/common.c:204 __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline] syscall_exit_to_user_mode+0x1a/0x30 kernel/entry/common.c:297 do_syscall_64+0x4b/0x90 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x72/0xdc value changed: 0x00000000 -> 0x00000001 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 4891 Comm: systemd-coredum Not tainted 6.4.0-rc5-01219-gfa0e21fa4443 #5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Fixes: `9305cfa444` ("[AF_UNIX]: Make unix_tot_inflight counter non-atomic") Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Davide Caratti <dcaratti@redhat.com>	2023-10-24 15:12:12 +02:00
Chris von Recklinghausen	1f619343f6	treewide: use get_random_u32() when possible Conflicts: drivers/gpu/drm/tests/drm_buddy_test.c drivers/gpu/drm/tests/drm_mm_test.c - We already have ce28ab1380e8 ("drm/tests: Add back seed value information") so keep calls to kunit_info. drop changes to drivers/misc/habanalabs/gaudi2/gaudi2.c fs/ntfs3/fslog.c - files not in CS9 net/sunrpc/auth_gss/gss_krb5_wrap.c - We already have 7f675ca7757b ("SUNRPC: Improve Kerberos confounder generation") so code to change is gone. drivers/gpu/drm/i915/i915_gem_gtt.c drivers/gpu/drm/i915/selftests/i915_selftest.c drivers/gpu/drm/tests/drm_buddy_test.c drivers/gpu/drm/tests/drm_mm_test.c change added under `4cb818386e` ("Merge DRM changes from upstream v6.0.8..v6.1") JIRA: https://issues.redhat.com/browse/RHEL-1848 commit a251c17aa558d8e3128a528af5cf8b9d7caae4fd Author: Jason A. Donenfeld <Jason@zx2c4.com> Date: Wed Oct 5 17:43:22 2022 +0200 treewide: use get_random_u32() when possible The prandom_u32() function has been a deprecated inline wrapper around get_random_u32() for several releases now, and compiles down to the exact same code. Replace the deprecated wrapper with a direct call to the real function. The same also applies to get_random_int(), which is just a wrapper around get_random_u32(). This was done as a basic find and replace. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake Acked-by: Chuck Lever <chuck.lever@oracle.com> # for nfsd Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> # for thunderbol t Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Acked-by: Helge Deller <deller@gmx.de> # for parisc Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2023-10-20 06:15:03 -04:00
Felix Maurer	2d92cf1f17	bpf, sockmap: Pass skb ownership through read_skb Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2218483 Conflicts: - net/ipv4/udp.c: Context difference due to missing ec095263a965 ("net: remove noblock parameter from recvmsg() entities") and db39dfdc1c3b ("udp: Use WARN_ON_ONCE() in udp_read_skb()"); 31f1fbcb346c ("udp: Refactor udp_read_skb()") was adapted to reflect this - net/vmw_vsock/virtio_transport_common.c: Skipped, because the relevant code is not there, missing 634f1a7110b4 ("vsock: support sockmap") commit 78fa0d61d97a728d306b0c23d353c0e340756437 Author: John Fastabend <john.fastabend@gmail.com> Date: Mon May 22 19:56:05 2023 -0700 bpf, sockmap: Pass skb ownership through read_skb The read_skb hook calls consume_skb() now, but this means that if the recv_actor program wants to use the skb it needs to inc the ref cnt so that the consume_skb() doesn't kfree the sk_buff. This is problematic because in some error cases under memory pressure we may need to linearize the sk_buff from sk_psock_skb_ingress_enqueue(). Then we get this, skb_linearize() __pskb_pull_tail() pskb_expand_head() BUG_ON(skb_shared(skb)) Because we incremented users refcnt from sk_psock_verdict_recv() we hit the bug on with refcnt > 1 and trip it. To fix lets simply pass ownership of the sk_buff through the skb_read call. Then we can drop the consume from read_skb handlers and assume the verdict recv does any required kfree. Bug found while testing in our CI which runs in VMs that hit memory constraints rather regularly. William tested TCP read_skb handlers. [ 106.536188] ------------[ cut here ]------------ [ 106.536197] kernel BUG at net/core/skbuff.c:1693! [ 106.536479] invalid opcode: 0000 [#1] PREEMPT SMP PTI [ 106.536726] CPU: 3 PID: 1495 Comm: curl Not tainted 5.19.0-rc5 #1 [ 106.537023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.16.0-1 04/01/2014 [ 106.537467] RIP: 0010:pskb_expand_head+0x269/0x330 [ 106.538585] RSP: 0018:ffffc90000138b68 EFLAGS: 00010202 [ 106.538839] RAX: 000000000000003f RBX: ffff8881048940e8 RCX: 0000000000000a20 [ 106.539186] RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff8881048940e8 [ 106.539529] RBP: ffffc90000138be8 R08: 00000000e161fd1a R09: 0000000000000000 [ 106.539877] R10: 0000000000000018 R11: 0000000000000000 R12: ffff8881048940e8 [ 106.540222] R13: 0000000000000003 R14: 0000000000000000 R15: ffff8881048940e8 [ 106.540568] FS: 00007f277dde9f00(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000 [ 106.540954] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 106.541227] CR2: 00007f277eeede64 CR3: 000000000ad3e000 CR4: 00000000000006e0 [ 106.541569] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 106.541915] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 106.542255] Call Trace: [ 106.542383] <IRQ> [ 106.542487] __pskb_pull_tail+0x4b/0x3e0 [ 106.542681] skb_ensure_writable+0x85/0xa0 [ 106.542882] sk_skb_pull_data+0x18/0x20 [ 106.543084] bpf_prog_b517a65a242018b0_bpf_skskb_http_verdict+0x3a9/0x4aa9 [ 106.543536] ? migrate_disable+0x66/0x80 [ 106.543871] sk_psock_verdict_recv+0xe2/0x310 [ 106.544258] ? sk_psock_write_space+0x1f0/0x1f0 [ 106.544561] tcp_read_skb+0x7b/0x120 [ 106.544740] tcp_data_queue+0x904/0xee0 [ 106.544931] tcp_rcv_established+0x212/0x7c0 [ 106.545142] tcp_v4_do_rcv+0x174/0x2a0 [ 106.545326] tcp_v4_rcv+0xe70/0xf60 [ 106.545500] ip_protocol_deliver_rcu+0x48/0x290 [ 106.545744] ip_local_deliver_finish+0xa7/0x150 Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Reported-by: William Findlay <will@isovalent.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: William Findlay <will@isovalent.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230523025618.113937-2-john.fastabend@gmail.com Signed-off-by: Felix Maurer <fmaurer@redhat.com>	2023-06-29 15:45:40 +02:00
Felix Maurer	8058591656	af_unix: Refactor unix_read_skb() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2218483 commit d6e3b27cbd2df555ff0736796ad2f9a17e74be8b Author: Peilin Ye <peilin.ye@bytedance.com> Date: Thu Sep 22 21:59:26 2022 -0700 af_unix: Refactor unix_read_skb() Similar to udp_read_skb(), delete the unnecessary while loop in unix_read_skb() for readability. Since recv_actor() cannot return a value greater than skb->len (see sk_psock_verdict_recv()), remove the redundant check. Suggested-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Link: https://lore.kernel.org/r/7009141683ad6cd3785daced3e4a80ba0eb773b5.1663909008.git.peilin.ye@bytedance.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Felix Maurer <fmaurer@redhat.com>	2023-06-29 15:45:40 +02:00
Davide Caratti	a94bc10366	af_unix: Fix a data-race in unix_dgram_peer_wake_me(). Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2190429 Upstream Status: net.git commit 662a80946ce1 commit 662a80946ce13633ae90a55379f1346c10f0c432 Author: Kuniyuki Iwashima <kuniyu@amazon.com> Date: Sun Jun 5 16:23:25 2022 -0700 af_unix: Fix a data-race in unix_dgram_peer_wake_me(). unix_dgram_poll() calls unix_dgram_peer_wake_me() without `other`'s lock held and check if its receive queue is full. Here we need to use unix_recvq_full_lockless() instead of unix_recvq_full(), otherwise KCSAN will report a data-race. Fixes: `7d267278a9` ("unix: avoid use-after-free in ep_remove_wait_queue") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20220605232325.11804-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com>	2023-04-28 14:11:35 +02:00
Davide Caratti	081d5c4598	unix: Fix race in SOCK_SEQPACKET's unix_dgram_sendmsg() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2164865 Upstream Status: net.git commit 3ff8bff704f4 commit 3ff8bff704f4de125dca2262e5b5b963a3da1d87 Author: Kirill Tkhai <tkhai@ya.ru> Date: Tue Dec 13 00:05:53 2022 +0300 unix: Fix race in SOCK_SEQPACKET's unix_dgram_sendmsg() There is a race resulting in alive SOCK_SEQPACKET socket may change its state from TCP_ESTABLISHED to TCP_CLOSE: unix_release_sock(peer) unix_dgram_sendmsg(sk) sock_orphan(peer) sock_set_flag(peer, SOCK_DEAD) sock_alloc_send_pskb() if !(sk->sk_shutdown & SEND_SHUTDOWN) OK if sock_flag(peer, SOCK_DEAD) sk->sk_state = TCP_CLOSE sk->sk_shutdown = SHUTDOWN_MASK After that socket sk remains almost normal: it is able to connect, listen, accept and recvmsg, while it can't sendmsg. Since this is the only possibility for alive SOCK_SEQPACKET to change the state in such way, we should better fix this strange and potentially danger corner case. Note, that we will return EPIPE here like this is normally done in sock_alloc_send_pskb(). Originally used ECONNREFUSED looks strange, since it's strange to return a specific retval in dependence of race in kernel, when user can't affect on this. Also, move TCP_CLOSE assignment for SOCK_DGRAM sockets under state lock to fix race with unix_dgram_connect(): unix_dgram_connect(other) unix_dgram_sendmsg(sk) unix_peer(sk) = NULL unix_state_unlock(sk) unix_state_double_lock(sk, other) sk->sk_state = TCP_ESTABLISHED unix_peer(sk) = other unix_state_double_unlock(sk, other) sk->sk_state = TCP_CLOSED This patch fixes both of these races. Fixes: 83301b5367a9 ("af_unix: Set TCP_ESTABLISHED for datagram sockets too") Signed-off-by: Kirill Tkhai <tkhai@ya.ru> Link: https://lore.kernel.org/r/135fda25-22d5-837a-782b-ceee50e19844@ya.ru Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com>	2023-02-02 16:24:03 +01:00
Davide Caratti	c2e1968f90	af_unix: call proto_unregister() in the error path in af_unix_init() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2164865 Upstream Status: net.git commit 73e341e0281a commit 73e341e0281a35274629e9be27eae2f9b1b492bf Author: Yang Yingliang <yangyingliang@huawei.com> Date: Thu Dec 8 23:01:58 2022 +0800 af_unix: call proto_unregister() in the error path in af_unix_init() If register unix_stream_proto returns error, unix_dgram_proto needs be unregistered. Fixes: 94531cfcbe79 ("af_unix: Add unix_stream_proto for sockmap") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Davide Caratti <dcaratti@redhat.com>	2023-02-02 16:24:03 +01:00
Felix Maurer	09faf01cb9	net: Introduce a new proto_ops ->read_skb() Bugzilla: https://bugzilla.redhat.com/2137876 Conflicts: Context difference due to not yet applied 314001f0bf927 ("af_unix: Add OOB support") and already applied 3f92a64e44e5 ("tcp: allow tls to decrypt directly from the tcp rcv queue") commit 965b57b469a589d64d81b1688b38dcb537011bb0 Author: Cong Wang <cong.wang@bytedance.com> Date: Wed Jun 15 09:20:12 2022 -0700 net: Introduce a new proto_ops ->read_skb() Currently both splice() and sockmap use ->read_sock() to read skb from receive queue, but for sockmap we only read one entire skb at a time, so ->read_sock() is too conservative to use. Introduce a new proto_ops ->read_skb() which supports this sematic, with this we can finally pass the ownership of skb to recv actors. For non-TCP protocols, all ->read_sock() can be simply converted to ->read_skb(). Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220615162014.89193-3-xiyou.wangcong@gmail.com Signed-off-by: Felix Maurer <fmaurer@redhat.com>	2023-01-05 15:46:53 +01:00
Frantisek Hrbata	34b02be423	Merge: CNB: net: remove noblock parameter from skb_recv_datagram() MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1655 Bugzilla: https://bugzilla.redhat.com/2143360 Tested: build, boot Conflicts: - isotp: missing many commits, such as: 30ffd5332e06 ("can: isotp: return -EADDRNOTAVAIL when reading from unbound socket") 42bf50a1795a ("can: isotp: support MSG_TRUNC flag when reading from socket") e382fea8ae54 ("can: isotp: restore accidentally removed MSG_PEEK feature") - removed chunks of non existent net/mctp ``` commit f4b41f062c424209e3939a81e6da022e049a45f2 Author: Oliver Hartkopp <socketcan@hartkopp.net> Date: Mon Apr 4 18:30:22 2022 +0200 net: remove noblock parameter from skb_recv_datagram() skb_recv_datagram() has two parameters 'flags' and 'noblock' that are merged inside skb_recv_datagram() by 'flags \| (noblock ? MSG_DONTWAIT : 0)' As 'flags' may contain MSG_DONTWAIT as value most callers split the 'flags' into 'flags' and 'noblock' with finally obsolete bit operations like this: skb_recv_datagram(sk, flags & ~MSG_DONTWAIT, flags & MSG_DONTWAIT, &rc); And this is not even done consistently with the 'flags' parameter. This patch removes the obsolete and costly splitting into two parameters and only performs bit operations when really needed on the caller side. One missing conversion thankfully reported by kernel test robot. I missed to enable kunit tests to build the mctp code. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net> ``` Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Approved-by: Ivan Vecera <ivecera@redhat.com> Approved-by: Xin Long <lxin@redhat.com> Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>	2022-11-30 08:10:47 -05:00
Íñigo Huguet	e24462420c	net: remove noblock parameter from skb_recv_datagram() Bugzilla: https://bugzilla.redhat.com/2143360 Conflicts: - isotp: missing many commits, such as: 30ffd5332e06 ("can: isotp: return -EADDRNOTAVAIL when reading from unbound socket") 42bf50a1795a ("can: isotp: support MSG_TRUNC flag when reading from socket") e382fea8ae54 ("can: isotp: restore accidentally removed MSG_PEEK feature") - removed chunks of non existent net/mctp commit f4b41f062c424209e3939a81e6da022e049a45f2 Author: Oliver Hartkopp <socketcan@hartkopp.net> Date: Mon Apr 4 18:30:22 2022 +0200 net: remove noblock parameter from skb_recv_datagram() skb_recv_datagram() has two parameters 'flags' and 'noblock' that are merged inside skb_recv_datagram() by 'flags \| (noblock ? MSG_DONTWAIT : 0)' As 'flags' may contain MSG_DONTWAIT as value most callers split the 'flags' into 'flags' and 'noblock' with finally obsolete bit operations like this: skb_recv_datagram(sk, flags & ~MSG_DONTWAIT, flags & MSG_DONTWAIT, &rc); And this is not even done consistently with the 'flags' parameter. This patch removes the obsolete and costly splitting into two parameters and only performs bit operations when really needed on the caller side. One missing conversion thankfully reported by kernel test robot. I missed to enable kunit tests to build the mctp code. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>	2022-11-18 11:18:14 +01:00
Jiri Benc	418019c715	bpf: Support bpf_(get\|set)sockopt() in bpf unix iter. Bugzilla: https://bugzilla.redhat.com/2120966 commit eb7d8f1d9ebc7379f09a51bf4faa35e0bfa7437d Author: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Date: Thu Jan 13 09:28:47 2022 +0900 bpf: Support bpf_(get\|set)sockopt() in bpf unix iter. This patch makes bpf_(get\|set)sockopt() available when iterating AF_UNIX sockets. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Link: https://lore.kernel.org/r/20220113002849.4384-4-kuniyu@amazon.co.jp Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Jiri Benc <jbenc@redhat.com>	2022-10-25 14:57:53 +02:00
Jiri Benc	1e320cfd7c	bpf: af_unix: Use batching algorithm in bpf unix iter. Bugzilla: https://bugzilla.redhat.com/2120966 commit 855d8e77ffb05be6e54c34dababccb20318aec00 Author: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Date: Thu Jan 13 09:28:46 2022 +0900 bpf: af_unix: Use batching algorithm in bpf unix iter. The commit 04c7820b776f ("bpf: tcp: Bpf iter batching and lock_sock") introduces the batching algorithm to iterate TCP sockets with more consistency. This patch uses the same algorithm to iterate AF_UNIX sockets. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Link: https://lore.kernel.org/r/20220113002849.4384-3-kuniyu@amazon.co.jp Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Jiri Benc <jbenc@redhat.com>	2022-10-25 14:57:53 +02:00
Jiri Benc	f83ccdbf81	af_unix: Refactor unix_next_socket(). Bugzilla: https://bugzilla.redhat.com/2120966 commit 4408d55a64677febdcb50d1b44d0dc714ce4187e Author: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Date: Thu Jan 13 09:28:45 2022 +0900 af_unix: Refactor unix_next_socket(). Currently, unix_next_socket() is overloaded depending on the 2nd argument. If it is NULL, unix_next_socket() returns the first socket in the hash. If not NULL, it returns the next socket in the same hash list or the first socket in the next non-empty hash list. This patch refactors unix_next_socket() into two functions unix_get_first() and unix_get_next(). unix_get_first() newly acquires a lock and returns the first socket in the list. unix_get_next() returns the next socket in a list or releases a lock and falls back to unix_get_first(). In the following patch, bpf iter holds entire sockets in a list and always releases the lock before .show(). It always calls unix_get_first() to acquire a lock in each iteration. So, this patch makes the change easier to follow. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Link: https://lore.kernel.org/r/20220113002849.4384-2-kuniyu@amazon.co.jp Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Jiri Benc <jbenc@redhat.com>	2022-10-25 14:57:53 +02:00
Jiri Benc	50833f03dd	af_unix: Relax race in unix_autobind(). Bugzilla: https://bugzilla.redhat.com/2120966 commit 9acbc584c3a4e9706703039708ec947ffc152c66 Author: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Date: Wed Nov 24 11:14:31 2021 +0900 af_unix: Relax race in unix_autobind(). When we bind an AF_UNIX socket without a name specified, the kernel selects an available one from 0x00000 to 0xFFFFF. unix_autobind() starts searching from a number in the 'static' variable and increments it after acquiring two locks. If multiple processes try autobind, they obtain the same lock and check if a socket in the hash list has the same name. If not, one process uses it, and all except one end up retrying the _next_ number (actually not, it may be incremented by the other processes). The more we autobind sockets in parallel, the longer the latency gets. We can avoid such a race by searching for a name from a random number. These show latency in unix_autobind() while 64 CPUs are simultaneously autobind-ing 1024 sockets for each. Without this patch: usec : count distribution 0 : 1176 \|* \| 2 : 3655 \|******* \| 4 : 4094 \|********* \| 6 : 3831 \|******** \| 8 : 3829 \|******** \| 10 : 3844 \|******** \| 12 : 3638 \|******* \| 14 : 2992 \|***** \| 16 : 2485 \|*** \| 18 : 2230 \|*** \| 20 : 2095 \|** \| 22 : 1853 \|* \| 24 : 1827 \|* \| 26 : 1677 \|* \| 28 : 1473 \| \| 30 : 1573 \|* \| 32 : 1417 \| \| 34 : 1385 \| \| 36 : 1345 \| \| 38 : 1344 \| \| 40 : 1200 \|* \| With this patch: usec : count distribution 0 : 1855 \|**** \| 2 : 6464 \|***************** \| 4 : 9936 \|**************************** \| 6 : 12107 \|************************************\| 8 : 10441 \|****************************** \| 10 : 7264 \|******************* \| 12 : 4254 \|********** \| 14 : 2538 \|**** \| 16 : 1596 \|* \| 18 : 1088 \|* \| 20 : 800 \| \| 22 : 670 \| \| 24 : 601 \|* \| 26 : 562 \|* \| 28 : 525 \|* \| 30 : 446 \|* \| 32 : 378 \|* \| 34 : 337 \|* \| 36 : 317 \|* \| 38 : 314 \|* \| 40 : 298 \| \| Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jiri Benc <jbenc@redhat.com>	2022-10-25 14:57:53 +02:00
Jiri Benc	ca09b582d6	af_unix: Replace the big lock with small locks. Bugzilla: https://bugzilla.redhat.com/2120966 commit afd20b9290e184c203fe22f2d6b80dc7127ba724 Author: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Date: Wed Nov 24 11:14:30 2021 +0900 af_unix: Replace the big lock with small locks. The hash table of AF_UNIX sockets is protected by the single lock. This patch replaces it with per-hash locks. The effect is noticeable when we handle multiple sockets simultaneously. Here is a test result on an EC2 c5.24xlarge instance. It shows latency (under 10us only) in unix_insert_unbound_socket() while 64 CPUs creating 1024 sockets for each in parallel. Without this patch: nsec : count distribution 0 : 179 \| \| 500 : 3021 \|******* \| 1000 : 6271 \|*************** \| 1500 : 6318 \|*************** \| 2000 : 5828 \|************* \| 2500 : 5124 \|*********** \| 3000 : 4426 \|********* \| 3500 : 3672 \|******* \| 4000 : 3138 \|***** \| 4500 : 2811 \|**** \| 5000 : 2384 \|*** \| 5500 : 2023 \|** \| 6000 : 1954 \|* \| 6500 : 1737 \|* \| 7000 : 1749 \|* \| 7500 : 1520 \| \| 8000 : 1469 \| \| 8500 : 1394 \| \| 9000 : 1232 \|* \| 9500 : 1138 \|* \| 10000 : 994 \|* \| With this patch: nsec : count distribution 0 : 1634 \|** \| 500 : 13170 \|************************************\| 1000 : 13156 \|*********************************** \| 1500 : 9010 \|*********************** \| 2000 : 6363 \|*************** \| 2500 : 4443 \|********* \| 3000 : 3240 \|***** \| 3500 : 2549 \|*** \| 4000 : 1872 \|* \| 4500 : 1504 \| \| 5000 : 1247 \|* \| 5500 : 1035 \|* \| 6000 : 889 \| \| 6500 : 744 \|** \| 7000 : 634 \|* \| 7500 : 498 \|* \| 8000 : 433 \|* \| 8500 : 355 \|* \| 9000 : 336 \|* \| 9500 : 284 \| \| 10000 : 243 \| \| Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jiri Benc <jbenc@redhat.com>	2022-10-25 14:57:53 +02:00
Jiri Benc	264d6b03a0	af_unix: Save hash in sk_hash. Bugzilla: https://bugzilla.redhat.com/2120966 commit e6b4b873896f0e9298f70d25726f4bb1e1b265ba Author: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Date: Wed Nov 24 11:14:29 2021 +0900 af_unix: Save hash in sk_hash. To replace unix_table_lock with per-hash locks in the next patch, we need to save a hash in each socket because /proc/net/unix or BPF prog iterate sockets while holding a hash table lock and release it later in a different function. Currently, we store a real/pseudo hash in struct unix_address. However, we do not allocate it to unbound sockets, nor should we do just for that. For this purpose, we can use sk_hash. Then, we no longer use the hash field in struct unix_address and can remove it. Also, this patch does - rename unix_insert_socket() to unix_insert_unbound_socket() - remove the redundant list argument from __unix_insert_socket() and unix_insert_unbound_socket() - use 'unsigned int' instead of 'unsigned' in __unix_set_addr_hash() - remove 'inline' from unix_remove_socket() and unix_insert_unbound_socket(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jiri Benc <jbenc@redhat.com>	2022-10-25 14:57:53 +02:00
Jiri Benc	6450454e10	af_unix: Add helpers to calculate hashes. Bugzilla: https://bugzilla.redhat.com/2120966 commit f452be496a5c8f58b1a67cde79e89b9f1cfde31c Author: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Date: Wed Nov 24 11:14:28 2021 +0900 af_unix: Add helpers to calculate hashes. This patch adds three helper functions that calculate hashes for unbound sockets and bound sockets with BSD/abstract addresses. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jiri Benc <jbenc@redhat.com>	2022-10-25 14:57:53 +02:00
Jiri Benc	9f9e8cf942	af_unix: Remove UNIX_ABSTRACT() macro and test sun_path[0] instead. Bugzilla: https://bugzilla.redhat.com/2120966 commit 5ce7ab4961a9320ca0836e06849210d088723a56 Author: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Date: Wed Nov 24 11:14:27 2021 +0900 af_unix: Remove UNIX_ABSTRACT() macro and test sun_path[0] instead. In BSD and abstract address cases, we store sockets in the hash table with keys between 0 and UNIX_HASH_SIZE - 1. However, the hash saved in a socket varies depending on its address type; sockets with BSD addresses always have UNIX_HASH_SIZE in their unix_sk(sk)->addr->hash. This is just for the UNIX_ABSTRACT() macro used to check the address type. The difference of the saved hashes comes from the first byte of the address in the first place. So, we can test it directly. Then we can keep a real hash in each socket and replace unix_table_lock with per-hash locks in the later patch. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jiri Benc <jbenc@redhat.com>	2022-10-25 14:57:53 +02:00
Jiri Benc	bde4862c31	af_unix: Allocate unix_address in unix_bind_(bsd\|abstract)(). Bugzilla: https://bugzilla.redhat.com/2120966 commit 12f21c49ad83eba93d0485b8c9edcc28201bee93 Author: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Date: Wed Nov 24 11:14:26 2021 +0900 af_unix: Allocate unix_address in unix_bind_(bsd\|abstract)(). To terminate address with '\0' in unix_bind_bsd(), we add unix_create_addr() and call it in unix_bind_bsd() and unix_bind_abstract(). Also, unix_bind_abstract() does not return -EEXIST. Only kern_path_create() and vfs_mknod() in unix_bind_bsd() can return it, so we move the last error check in unix_bind() to unix_bind_bsd(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jiri Benc <jbenc@redhat.com>	2022-10-25 14:57:53 +02:00
Jiri Benc	9399405ae2	af_unix: Remove unix_mkname(). Bugzilla: https://bugzilla.redhat.com/2120966 commit 5c32a3ed64b4c87ed6d9978074db5f0a54c4cd20 Author: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Date: Wed Nov 24 11:14:25 2021 +0900 af_unix: Remove unix_mkname(). This patch removes unix_mkname() and postpones calculating a hash to unix_bind_abstract(). Some BSD stuffs still remain in unix_bind() though, the next patch packs them into unix_bind_bsd(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jiri Benc <jbenc@redhat.com>	2022-10-25 14:57:53 +02:00
Jiri Benc	9d1470dc1f	af_unix: Copy unix_mkname() into unix_find_(bsd\|abstract)(). Bugzilla: https://bugzilla.redhat.com/2120966 commit d2d8c9fddb1c11ccfa73bf0ad2b1e6b4ea7afdaf Author: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Date: Wed Nov 24 11:14:24 2021 +0900 af_unix: Copy unix_mkname() into unix_find_(bsd\|abstract)(). We should not call unix_mkname() before unix_find_other() and instead do the same thing where necessary based on the address type: - terminating the address with '\0' in unix_find_bsd() - calculating the hash in unix_find_abstract(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jiri Benc <jbenc@redhat.com>	2022-10-25 14:57:53 +02:00
Jiri Benc	9d10cafdc9	af_unix: Cut unix_validate_addr() out of unix_mkname(). Bugzilla: https://bugzilla.redhat.com/2120966 commit b8a58aa6fccc5b2940f0da18c7f02e8a1deb693a Author: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Date: Wed Nov 24 11:14:23 2021 +0900 af_unix: Cut unix_validate_addr() out of unix_mkname(). unix_mkname() tests socket address length and family and does some processing based on the address type. It is called in the early stage, and therefore some instructions are redundant and can end up in vain. The address length/family tests are done twice in unix_bind(). Also, the address type is rechecked later in unix_bind() and unix_find_other(), where we can do the same processing. Moreover, in the BSD address case, the hash is set to 0 but never used and confusing. This patch moves the address tests out of unix_mkname(), and the following patches move the other part into appropriate places and remove unix_mkname() finally. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jiri Benc <jbenc@redhat.com>	2022-10-25 14:57:52 +02:00
Jiri Benc	5384473aa6	af_unix: Return an error as a pointer in unix_find_other(). Bugzilla: https://bugzilla.redhat.com/2120966 commit aed26f557bbc94f0c778f63d7dfe86af99208f68 Author: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Date: Wed Nov 24 11:14:22 2021 +0900 af_unix: Return an error as a pointer in unix_find_other(). We can return an error as a pointer and need not pass an additional argument to unix_find_other(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jiri Benc <jbenc@redhat.com>	2022-10-25 14:57:52 +02:00
Jiri Benc	04e0d4ff01	af_unix: Factorise unix_find_other() based on address types. Bugzilla: https://bugzilla.redhat.com/2120966 commit fa39ef0e472961baef49ddb0e6f7b8ebb555bd8f Author: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Date: Wed Nov 24 11:14:21 2021 +0900 af_unix: Factorise unix_find_other() based on address types. As done in the commit `fa42d910a3` ("unix_bind(): take BSD and abstract address cases into new helpers"), this patch moves BSD and abstract address cases from unix_find_other() into unix_find_bsd() and unix_find_abstract(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jiri Benc <jbenc@redhat.com>	2022-10-25 14:57:52 +02:00
Jiri Benc	8bb71496a8	af_unix: Pass struct sock to unix_autobind(). Bugzilla: https://bugzilla.redhat.com/2120966 commit f7ed31f4615f4e1d97c0e4325c5b8a240e10073c Author: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Date: Wed Nov 24 11:14:20 2021 +0900 af_unix: Pass struct sock to unix_autobind(). We do not use struct socket in unix_autobind() and pass struct sock to unix_bind_bsd() and unix_bind_abstract(). Let's pass it to unix_autobind() as well. Also, this patch fixes these errors by checkpatch.pl. ERROR: do not use assignment in if condition #1795: FILE: net/unix/af_unix.c:1795: + if (test_bit(SOCK_PASSCRED, &sock->flags) && !u->addr CHECK: Logical continuations should be on the previous line #1796: FILE: net/unix/af_unix.c:1796: + if (test_bit(SOCK_PASSCRED, &sock->flags) && !u->addr + && (err = unix_autobind(sock)) != 0) Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jiri Benc <jbenc@redhat.com>	2022-10-25 14:57:52 +02:00
Jiri Benc	b1374dfe08	af_unix: Use offsetof() instead of sizeof(). Bugzilla: https://bugzilla.redhat.com/2120966 commit 755662ce78d14c1a9118df921c528b1f992ded2e Author: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Date: Wed Nov 24 11:14:19 2021 +0900 af_unix: Use offsetof() instead of sizeof(). The length of the AF_UNIX socket address contains an offset to the member sun_path of struct sockaddr_un. Currently, the preceding member is just sun_family, and its type is sa_family_t and resolved to short. Therefore, the offset is represented by sizeof(short). However, it is not clear and fragile to changes in struct sockaddr_storage or sockaddr_un. This commit makes it clear and robust by rewriting sizeof() with offsetof(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jiri Benc <jbenc@redhat.com>	2022-10-25 14:57:52 +02:00
Petr Oros	21e2fb0e83	net: Don't include filter.h from net/sock.h Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2101792 Conflicts: drivers/infiniband/core/cache.c - adjusted context conflict due to missing b74525f21e33ab ("RDMA/core: Delete useless module.h include") drivers/infiniband/hw/mlx5/fs.c - missing upstream commit ffa501ef196312 ("RDMA/mlx5: Add steering support in optional flow counters") adding net/inet_ecn.h. Without inet_ecn.h missing declarations for ether_addr_copy() and is_multicast_ether_addr() We add net/inet_ecn.h include in this commit. drivers/net/amt.c - Unmerged because file missing in RHEL Upstream commit(s): commit b6459415b384cb829f0b2a4268f211c789f6cf0b Author: Jakub Kicinski <kuba@kernel.org> Date: Tue Dec 28 16:49:13 2021 -0800 net: Don't include filter.h from net/sock.h sock.h is pretty heavily used (5k objects rebuilt on x86 after it's touched). We can drop the include of filter.h from it and add a forward declaration of struct sk_filter instead. This decreases the number of rebuilt objects when bpf.h is touched from ~5k to ~1k. There's a lot of missing includes this was masking. Primarily in networking tho, this time. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/bpf/20211229004913.513372-1-kuba@kernel.org Signed-off-by: Petr Oros <poros@redhat.com>	2022-07-13 10:49:16 +02:00
Ivan Vecera	fa0c210030	net: drop nopreempt requirement on sock_prot_inuse_add() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2096377 commit b3cb764aa1d753cf6a58858f9e2097ba71e8100b Author: Eric Dumazet <edumazet@google.com> Date: Mon Nov 15 09:11:50 2021 -0800 net: drop nopreempt requirement on sock_prot_inuse_add() This is distracting really, let's make this simpler, because many callers had to take care of this by themselves, even if on x86 this adds more code than really needed. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ivan Vecera <ivecera@redhat.com>	2022-06-13 18:35:56 +02:00
Jiri Benc	54697ceb89	af_unix: fix regression in read after shutdown Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2071618 commit f9390b249c90a15a4d9e69fbfb7a53c860b1fcaf Author: Vincent Whitchurch <vincent.whitchurch@axis.com> Date: Fri Nov 19 13:05:21 2021 +0100 af_unix: fix regression in read after shutdown On kernels before v5.15, calling read() on a unix socket after shutdown(SHUT_RD) or shutdown(SHUT_RDWR) would return the data previously written or EOF. But now, while read() after shutdown(SHUT_RD) still behaves the same way, read() after shutdown(SHUT_RDWR) always fails with -EINVAL. This behaviour change was apparently inadvertently introduced as part of a bug fix for a different regression caused by the commit adding sockmap support to af_unix, commit 94531cfcbe79c359 ("af_unix: Add unix_stream_proto for sockmap"). Those commits, for unclear reasons, started setting the socket state to TCP_CLOSE on shutdown(SHUT_RDWR), while this state change had previously only been done in unix_release_sock(). Restore the original behaviour. The sockmap tests in tests/selftests/bpf continue to pass after this patch. Fixes: d0c6416bd7091647f60 ("unix: Fix an issue in unix_shutdown causing the other end read/write failures") Link: https://lore.kernel.org/lkml/20211111140000.GA10779@axis.com/ Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Tested-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Benc <jbenc@redhat.com>	2022-05-12 17:29:54 +02:00
Jiri Benc	5d30002e41	af_unix: Rename UNIX-DGRAM to UNIX to maintain backwards compatability Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2071618 commit 0edf0824e0dc359ed76bf96af986e6570ca2c0b9 Author: Stephen Boyd <swboyd@chromium.org> Date: Fri Oct 8 14:59:45 2021 -0700 af_unix: Rename UNIX-DGRAM to UNIX to maintain backwards compatability Then name of this protocol changed in commit 94531cfcbe79 ("af_unix: Add unix_stream_proto for sockmap") because that commit added stream support to the af_unix protocol. Renaming the existing protocol makes a ChromeOS protocol test[1] fail now that the name has changed in /proc/net/protocols from "UNIX" to "UNIX-DGRAM". Let's put the name back to how it was while keeping the stream protocol as "UNIX-STREAM" so that the procfs interface doesn't change. This fixes the test and maintains backwards compatibility in proc. Cc: Jiang Wang <jiang.wang@bytedance.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Cong Wang <cong.wang@bytedance.com> Cc: Jakub Sitnicki <jakub@cloudflare.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Dmitry Osipenko <digetx@gmail.com> Link: https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/chromiumos/tast/local/bundles/cros/network/supported_protocols.go;l=50;drc=e8b1c3f94cb40a054f4aa1ef1aff61e75dc38f18 [1] Fixes: 94531cfcbe79 ("af_unix: Add unix_stream_proto for sockmap") Signed-off-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Benc <jbenc@redhat.com>	2022-05-12 17:29:53 +02:00
Jiri Benc	14f633cc1d	net: Implement ->sock_is_readable() for UDP and AF_UNIX Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2071618 commit af493388950b6ea3a86f860cfaffab137e024fc8 Author: Cong Wang <cong.wang@bytedance.com> Date: Fri Oct 8 13:33:05 2021 -0700 net: Implement ->sock_is_readable() for UDP and AF_UNIX Yucong noticed we can't poll() sockets in sockmap even when they are the destination sockets of redirections. This is because we never poll any psock queues in ->poll(), except for TCP. With ->sock_is_readable() now we can overwrite >sock_is_readable(), invoke and implement it for both UDP and AF_UNIX sockets. Reported-by: Yucong Sun <sunyucong@gmail.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211008203306.37525-4-xiyou.wangcong@gmail.com Signed-off-by: Jiri Benc <jbenc@redhat.com>	2022-05-12 17:29:53 +02:00
Jiri Benc	2044e01cf4	unix: Fix an issue in unix_shutdown causing the other end read/write failures Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2071618 commit d0c6416bd7091647f6041599f396bfa19ae30368 Author: Jiang Wang <jiang.wang@bytedance.com> Date: Mon Oct 4 23:25:28 2021 +0000 unix: Fix an issue in unix_shutdown causing the other end read/write failures Commit 94531cfcbe79 ("af_unix: Add unix_stream_proto for sockmap") sets unix domain socket peer state to TCP_CLOSE in unix_shutdown. This could happen when the local end is shutdown but the other end is not. Then, the other end will get read or write failures which is not expected. Fix the issue by setting the local state to shutdown. Fixes: 94531cfcbe79 ("af_unix: Add unix_stream_proto for sockmap") Reported-by: Casey Schaufler <casey@schaufler-ca.com> Suggested-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Jiang Wang <jiang.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20211004232530.2377085-1-jiang.wang@bytedance.com Signed-off-by: Jiri Benc <jbenc@redhat.com>	2022-05-12 17:29:53 +02:00
Jiri Benc	11722ad22c	af_unix: fix potential NULL deref in unix_dgram_connect() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2071618 commit dc56ad7028c5f559b3ce90d5cca2e6b7b839f1d5 Author: Eric Dumazet <edumazet@google.com> Date: Mon Aug 30 10:21:37 2021 -0700 af_unix: fix potential NULL deref in unix_dgram_connect() syzbot was able to trigger NULL deref in unix_dgram_connect() [1] This happens in if (unix_peer(sk)) sk->sk_state = other->sk_state = TCP_ESTABLISHED; // crash because @other is NULL Because locks have been dropped, unix_peer() might be non NULL, while @other is NULL (AF_UNSPEC case) We need to move code around, so that we no longer access unix_peer() and sk_state while locks have been released. [1] general protection fault, probably for non-canonical address 0xdffffc0000000002: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017] CPU: 0 PID: 10341 Comm: syz-executor239 Not tainted 5.14.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:unix_dgram_connect+0x32a/0xc60 net/unix/af_unix.c:1226 Code: 00 00 45 31 ed 49 83 bc 24 f8 05 00 00 00 74 69 e8 eb 5b a6 f9 48 8d 7d 12 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 48 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 e0 07 00 00 RSP: 0018:ffffc9000a89fcd8 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 0000000000000004 RCX: 0000000000000000 RDX: 0000000000000002 RSI: ffffffff87cf4ef5 RDI: 0000000000000012 RBP: 0000000000000000 R08: 0000000000000000 R09: ffff88802e1917c3 R10: ffffffff87cf4eba R11: 0000000000000001 R12: ffff88802e191740 R13: 0000000000000000 R14: ffff88802e191d38 R15: ffff88802e1917c0 FS: 00007f3eb0052700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004787d0 CR3: 0000000029c0a000 CR4: 00000000001506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __sys_connect_file+0x155/0x1a0 net/socket.c:1890 __sys_connect+0x161/0x190 net/socket.c:1907 __do_sys_connect net/socket.c:1917 [inline] __se_sys_connect net/socket.c:1914 [inline] __x64_sys_connect+0x6f/0xb0 net/socket.c:1914 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x446a89 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 a1 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f3eb0052208 EFLAGS: 00000246 ORIG_RAX: 000000000000002a RAX: ffffffffffffffda RBX: 00000000004cc4d8 RCX: 0000000000446a89 RDX: 000000000000006e RSI: 0000000020000180 RDI: 0000000000000003 RBP: 00000000004cc4d0 R08: 00007f3eb0052700 R09: 0000000000000000 R10: 00007f3eb0052700 R11: 0000000000000246 R12: 00000000004cc4dc R13: 00007ffd791e79cf R14: 00007f3eb0052300 R15: 0000000000022000 Modules linked in: ---[ end trace 4eb809357514968c ]--- RIP: 0010:unix_dgram_connect+0x32a/0xc60 net/unix/af_unix.c:1226 Code: 00 00 45 31 ed 49 83 bc 24 f8 05 00 00 00 74 69 e8 eb 5b a6 f9 48 8d 7d 12 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 48 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 e0 07 00 00 RSP: 0018:ffffc9000a89fcd8 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 0000000000000004 RCX: 0000000000000000 RDX: 0000000000000002 RSI: ffffffff87cf4ef5 RDI: 0000000000000012 RBP: 0000000000000000 R08: 0000000000000000 R09: ffff88802e1917c3 R10: ffffffff87cf4eba R11: 0000000000000001 R12: ffff88802e191740 R13: 0000000000000000 R14: ffff88802e191d38 R15: ffff88802e1917c0 FS: 00007f3eb0052700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffd791fe960 CR3: 0000000029c0a000 CR4: 00000000001506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Fixes: 83301b5367a9 ("af_unix: Set TCP_ESTABLISHED for datagram sockets too") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Cong Wang <cong.wang@bytedance.com> Cc: Alexei Starovoitov <ast@kernel.org> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Benc <jbenc@redhat.com>	2022-05-12 17:29:52 +02:00
Jiri Benc	f8db6053d4	af_unix: Fix NULL pointer bug in unix_shutdown Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2071618 commit d359902d5c357b280e7a0862bb8a1ba56b3fc197 Author: Jiang Wang <jiang.wang@bytedance.com> Date: Sat Aug 21 18:07:36 2021 +0000 af_unix: Fix NULL pointer bug in unix_shutdown Commit 94531cfcbe79 ("af_unix: Add unix_stream_proto for sockmap") introduced a bug for af_unix SEQPACKET type. In unix_shutdown, the unhash function will call prot->unhash(), which is NULL for SEQPACKET. And kernel will panic. On ARM32, it will show following messages: (it likely affects x86 too). Fix the bug by checking the prot->unhash is NULL or not first. Kernel log: <--- cut here --- Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = 2fba1ffb *pgd=00000000 Internal error: Oops: 80000005 [#1] PREEMPT SMP THUMB2 Modules linked in: CPU: 1 PID: 1999 Comm: falkon Tainted: G W 5.14.0-rc5-01175-g94531cfcbe79-dirty #9240 Hardware name: NVIDIA Tegra SoC (Flattened Device Tree) PC is at 0x0 LR is at unix_shutdown+0x81/0x1a8 pc : [<00000000>] lr : [<c08f3311>] psr: 600f0013 sp : e45aff70 ip : e463a3c0 fp : beb54f04 r10: 00000125 r9 : e45ae000 r8 : c4a56664 r7 : 00000001 r6 : c4a56464 r5 : 00000001 r4 : c4a56400 r3 : 00000000 r2 : c5a6b180 r1 : 00000000 r0 : c4a56400 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Control: 50c5387d Table: 05aa804a DAC: 00000051 Register r0 information: slab PING start c4a56400 pointer offset 0 Register r1 information: NULL pointer Register r2 information: slab task_struct start c5a6b180 pointer offset 0 Register r3 information: NULL pointer Register r4 information: slab PING start c4a56400 pointer offset 0 Register r5 information: non-paged memory Register r6 information: slab PING start c4a56400 pointer offset 100 Register r7 information: non-paged memory Register r8 information: slab PING start c4a56400 pointer offset 612 Register r9 information: non-slab/vmalloc memory Register r10 information: non-paged memory Register r11 information: non-paged memory Register r12 information: slab filp start e463a3c0 pointer offset 0 Process falkon (pid: 1999, stack limit = 0x9ec48895) Stack: (0xe45aff70 to 0xe45b0000) ff60: e45ae000 c5f26a00 00000000 00000125 ff80: c0100264 c07f7fa3 beb54f04 fffffff7 00000001 e6f3fc0e b5e5e9ec beb54ec4 ffa0: b5da0ccc c010024b b5e5e9ec beb54ec4 0000000f 00000000 00000000 beb54ebc ffc0: b5e5e9ec beb54ec4 b5da0ccc 00000125 beb54f58 00785238 beb5529c beb54f04 ffe0: b5da1e24 beb54eac b301385c b62b6ee8 600f0030 0000000f 00000000 00000000 [<c08f3311>] (unix_shutdown) from [<c07f7fa3>] (__sys_shutdown+0x2f/0x50) [<c07f7fa3>] (__sys_shutdown) from [<c010024b>] (__sys_trace_return+0x1/0x16) Exception stack(0xe45affa8 to 0xe45afff0) Fixes: 94531cfcbe79 ("af_unix: Add unix_stream_proto for sockmap") Reported-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Jiang Wang <jiang.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Dmitry Osipenko <digetx@gmail.com> Acked-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Link: https://lore.kernel.org/bpf/20210821180738.1151155-1-jiang.wang@bytedance.com Signed-off-by: Jiri Benc <jbenc@redhat.com>	2022-05-12 17:29:51 +02:00
Jiri Benc	028135f373	af_unix: Add unix_stream_proto for sockmap Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2071618 Conflicts: - Code difference in unix_create and context difference in unix_create1 and unix_stream_connect to out of order backport of f4bd73b5a950 "af_unix: Return errno instead of NULL in unix_create1()." The resulting code matches the current upstream. commit 94531cfcbe79c3598acf96806627b2137ca32eb9 Author: Jiang Wang <jiang.wang@bytedance.com> Date: Mon Aug 16 19:03:21 2021 +0000 af_unix: Add unix_stream_proto for sockmap Previously, sockmap for AF_UNIX protocol only supports dgram type. This patch add unix stream type support, which is similar to unix_dgram_proto. To support sockmap, dgram and stream cannot share the same unix_proto anymore, because they have different implementations, such as unhash for stream type (which will remove closed or disconnected sockets from the map), so rename unix_proto to unix_dgram_proto and add a new unix_stream_proto. Also implement stream related sockmap functions. And add dgram key words to those dgram specific functions. Signed-off-by: Jiang Wang <jiang.wang@bytedance.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210816190327.2739291-3-jiang.wang@bytedance.com Signed-off-by: Jiri Benc <jbenc@redhat.com>	2022-05-12 17:29:50 +02:00
Jiri Benc	e3a1d16e8b	af_unix: Add read_sock for stream socket types Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2071618 Conflicts: - [minor] context difference due to missing af_unix OOB support (commit 314001f0bf92 "af_unix: Add OOB support") commit 77462de14a43f4d98dbd8de0f5743a4e02450b1d Author: Jiang Wang <jiang.wang@bytedance.com> Date: Mon Aug 16 19:03:20 2021 +0000 af_unix: Add read_sock for stream socket types To support sockmap for af_unix stream type, implement read_sock, which is similar to the read_sock for unix dgram sockets. Signed-off-by: Jiang Wang <jiang.wang@bytedance.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210816190327.2739291-2-jiang.wang@bytedance.com Signed-off-by: Jiri Benc <jbenc@redhat.com>	2022-05-12 17:29:50 +02:00
Jiri Benc	3694be0f5b	bpf: af_unix: Implement BPF iterator for UNIX domain socket. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2071618 commit 2c860a43dd77f969bb959336a2f743d7103a8f63 Author: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Date: Sat Aug 14 10:57:15 2021 +0900 bpf: af_unix: Implement BPF iterator for UNIX domain socket. This patch implements the BPF iterator for the UNIX domain socket. Currently, the batch optimisation introduced for the TCP iterator in the commit 04c7820b776f ("bpf: tcp: Bpf iter batching and lock_sock") is not used for the UNIX domain socket. It will require replacing the big lock for the hash table with small locks for each hash list not to block other processes. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210814015718.42704-2-kuniyu@amazon.co.jp Signed-off-by: Jiri Benc <jbenc@redhat.com>	2022-05-12 17:29:49 +02:00

1 2 3 4 5 ...

390 Commits