Centos-kernel-stream-9

Commit Graph

Author	SHA1	Message	Date
Toke Høiland-Jørgensen	e9b6f5c14c	bpf: Add bpf_sock_destroy kfunc JIRA: https://issues.redhat.com/browse/RHEL-65787 Conflicts: Context difference due to missing af9784d007d8 ("tcp: diag: add support for TIME_WAIT sockets to tcp_abort()") and out-of-order backport of bac76cf89816 ("tcp: fix forever orphan socket caused by tcp_abort") commit 4ddbcb886268af8d12a23e6640b39d1d9c652b1b Author: Aditi Ghag <aditi.ghag@isovalent.com> Date: Fri May 19 22:51:55 2023 +0000 bpf: Add bpf_sock_destroy kfunc The socket destroy kfunc is used to forcefully terminate sockets from certain BPF contexts. We plan to use the capability in Cilium load-balancing to terminate client sockets that continue to connect to deleted backends. The other use case is on-the-fly policy enforcement where existing socket connections prevented by policies need to be forcefully terminated. The kfunc also allows terminating sockets that may or may not be actively sending traffic. The kfunc can currently be called only from BPF TCP and UDP iterators where users can filter, and terminate selected sockets. More specifically, it can only be called from BPF contexts that ensure socket locking in order to allow synchronous execution of protocol specific `diag_destroy` handlers. The previous commit that batches UDP sockets during iteration facilitated a synchronous invocation of the UDP destroy callback from BPF context by skipping socket locks in `udp_abort`. TCP iterator already supported batching of sockets being iterated. To that end, `tracing_iter_filter` callback filter is added so that verifier can restrict the kfunc to programs with `BPF_TRACE_ITER` attach type, and reject other programs. The kfunc takes `sock_common` type argument, even though it expects, and casts them to a `sock` pointer. This enables the verifier to allow the sock_destroy kfunc to be called for TCP with `sock_common` and UDP with `sock` structs. Furthermore, as `sock_common` only has a subset of certain fields of `sock`, casting pointer to the latter type might not always be safe for certain sockets like request sockets, but these have a special handling in the diag_destroy handlers. Additionally, the kfunc is defined with `KF_TRUSTED_ARGS` flag to avoid the cases where a `PTR_TO_BTF_ID` sk is obtained by following another pointer. eg. getting a sk pointer (may be even NULL) by following another sk pointer. The pointer socket argument passed in TCP and UDP iterators is tagged as `PTR_TRUSTED` in {tcp,udp}_reg_info. The TRUSTED arg changes are contributed by Martin KaFai Lau <martin.lau@kernel.org>. Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com> Link: https://lore.kernel.org/r/20230519225157.760788-8-aditi.ghag@isovalent.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>	2025-01-28 12:51:54 +01:00
Toke Høiland-Jørgensen	692dba9fd0	bpf: Avoid iter->offset making backward progress in bpf_iter_udp JIRA: https://issues.redhat.com/browse/RHEL-65787 commit 2242fd537fab52d5f4d2fbb1845f047c01fad0cf Author: Martin KaFai Lau <martin.lau@kernel.org> Date: Fri Jan 12 11:05:29 2024 -0800 bpf: Avoid iter->offset making backward progress in bpf_iter_udp There is a bug in the bpf_iter_udp_batch() function that stops the userspace from making forward progress. The case that triggers the bug is the userspace passed in a very small read buffer. When the bpf prog does bpf_seq_printf, the userspace read buffer is not enough to capture the whole bucket. When the read buffer is not large enough, the kernel will remember the offset of the bucket in iter->offset such that the next userspace read() can continue from where it left off. The kernel will skip the number (== "iter->offset") of sockets in the next read(). However, the code directly decrements the "--iter->offset". This is incorrect because the next read() may not consume the whole bucket either and then the next-next read() will start from offset 0. The net effect is the userspace will keep reading from the beginning of a bucket and the process will never finish. "iter->offset" must always go forward until the whole bucket is consumed. This patch fixes it by using a local variable "resume_offset" and "resume_bucket". "iter->offset" is always reset to 0 before it may be used. "iter->offset" will be advanced to the "resume_offset" when it continues from the "resume_bucket" (i.e. "state->bucket == resume_bucket"). This brings it closer to the bpf_iter_tcp's offset handling which does not suffer the same bug. Cc: Aditi Ghag <aditi.ghag@isovalent.com> Fixes: c96dac8d369f ("bpf: udp: Implement batching for sockets iterator") Acked-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Aditi Ghag <aditi.ghag@isovalent.com> Link: https://lore.kernel.org/r/20240112190530.3751661-3-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>	2025-01-28 12:51:54 +01:00
Toke Høiland-Jørgensen	61c014209a	bpf: iter_udp: Retry with a larger batch size without going back to the previous bucket JIRA: https://issues.redhat.com/browse/RHEL-65787 commit 19ca0823f6eaad01d18f664a00550abe912c034c Author: Martin KaFai Lau <martin.lau@kernel.org> Date: Fri Jan 12 11:05:28 2024 -0800 bpf: iter_udp: Retry with a larger batch size without going back to the previous bucket The current logic is to use a default size 16 to batch the whole bucket. If it is too small, it will retry with a larger batch size. The current code accidentally does a state->bucket-- before retrying. This goes back to retry with the previous bucket which has already been done. This patch fixed it. It is hard to create a selftest. I added a WARN_ON(state->bucket < 0), forced a particular port to be hashed to the first bucket, created >16 sockets, and observed the for-loop went back to the "-1" bucket. Cc: Aditi Ghag <aditi.ghag@isovalent.com> Fixes: c96dac8d369f ("bpf: udp: Implement batching for sockets iterator") Acked-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Aditi Ghag <aditi.ghag@isovalent.com> Link: https://lore.kernel.org/r/20240112190530.3751661-2-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>	2025-01-28 12:51:54 +01:00
Toke Høiland-Jørgensen	e08ca1e323	bpf: udp: Implement batching for sockets iterator JIRA: https://issues.redhat.com/browse/RHEL-65787 commit c96dac8d369ffd713a45f4e5c30f23c47a1671f0 Author: Aditi Ghag <aditi.ghag@isovalent.com> Date: Fri May 19 22:51:53 2023 +0000 bpf: udp: Implement batching for sockets iterator Batch UDP sockets from BPF iterator that allows for overlapping locking semantics in BPF/kernel helpers executed in BPF programs. This facilitates BPF socket destroy kfunc (introduced by follow-up patches) to execute from BPF iterator programs. Previously, BPF iterators acquired the sock lock and sockets hash table bucket lock while executing BPF programs. This prevented BPF helpers that again acquire these locks to be executed from BPF iterators. With the batching approach, we acquire a bucket lock, batch all the bucket sockets, and then release the bucket lock. This enables BPF or kernel helpers to skip sock locking when invoked in the supported BPF contexts. The batching logic is similar to the logic implemented in TCP iterator: https://lore.kernel.org/bpf/20210701200613.1036157-1-kafai@fb.com/. Suggested-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com> Link: https://lore.kernel.org/r/20230519225157.760788-6-aditi.ghag@isovalent.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>	2025-01-28 12:51:54 +01:00
Toke Høiland-Jørgensen	cde5da29d8	udp: seq_file: Remove bpf_seq_afinfo from udp_iter_state JIRA: https://issues.redhat.com/browse/RHEL-65787 commit e4fe1bf13e09019578b9b93b942fff3d76ed5793 Author: Aditi Ghag <aditi.ghag@isovalent.com> Date: Fri May 19 22:51:52 2023 +0000 udp: seq_file: Remove bpf_seq_afinfo from udp_iter_state This is a preparatory commit to remove the field. The field was previously shared between proc fs and BPF UDP socket iterators. As the follow-up commits will decouple the implementation for the iterators, remove the field. As for BPF socket iterator, filtering of sockets is exepected to be done in BPF programs. Suggested-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com> Link: https://lore.kernel.org/r/20230519225157.760788-5-aditi.ghag@isovalent.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>	2025-01-28 12:51:54 +01:00
Toke Høiland-Jørgensen	dcbb4b88e1	bpf: udp: Encapsulate logic to get udp table JIRA: https://issues.redhat.com/browse/RHEL-65787 commit 7625d2e9741c1f6e08ee79c28a1e27bbb5071805 Author: Aditi Ghag <aditi.ghag@isovalent.com> Date: Fri May 19 22:51:51 2023 +0000 bpf: udp: Encapsulate logic to get udp table This is a preparatory commit that encapsulates the logic to get udp table in iterator inside udp_get_table_afinfo, and renames the function to `udp_get_table_seq` accordingly. Suggested-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com> Link: https://lore.kernel.org/r/20230519225157.760788-4-aditi.ghag@isovalent.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>	2025-01-28 12:51:54 +01:00
Toke Høiland-Jørgensen	6a228086f4	udp: seq_file: Helper function to match socket attributes JIRA: https://issues.redhat.com/browse/RHEL-65787 commit f44b1c515833c59701c86f92d47b4edd478fb0f3 Author: Aditi Ghag <aditi.ghag@isovalent.com> Date: Fri May 19 22:51:50 2023 +0000 udp: seq_file: Helper function to match socket attributes This is a preparatory commit to refactor code that matches socket attributes in iterators to a helper function, and use it in the proc fs iterator. Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com> Link: https://lore.kernel.org/r/20230519225157.760788-3-aditi.ghag@isovalent.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>	2025-01-28 12:51:54 +01:00
Toke Høiland-Jørgensen	25cf0b2b68	udp: Access &udp_table via net. JIRA: https://issues.redhat.com/browse/RHEL-65787 commit ba6aac1516779dd0ced22c136a2c2c4a9c70cf29 Author: Kuniyuki Iwashima <kuniyu@amazon.com> Date: Mon Nov 14 13:57:56 2022 -0800 udp: Access &udp_table via net. We will soon introduce an optional per-netns hash table for UDP. This means we cannot use udp_table directly in most places. Instead, access it via net->ipv4.udp_table. The access will be valid only while initialising udp_table itself and creating/destroying each netns. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>	2025-01-28 12:51:53 +01:00
Toke Høiland-Jørgensen	c17fe8b439	udp: Set NULL to udp_seq_afinfo.udp_table. JIRA: https://issues.redhat.com/browse/RHEL-65787 commit 478aee5d6bf617c932f4e9c2981f17e86e093fc5 Author: Kuniyuki Iwashima <kuniyu@amazon.com> Date: Mon Nov 14 13:57:55 2022 -0800 udp: Set NULL to udp_seq_afinfo.udp_table. We will soon introduce an optional per-netns hash table for UDP. This means we cannot use the global udp_seq_afinfo.udp_table to fetch a UDP hash table. Instead, set NULL to udp_seq_afinfo.udp_table for UDP and get a proper table from net->ipv4.udp_table. Note that we still need udp_seq_afinfo.udp_table for UDP LITE. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>	2025-01-28 12:51:53 +01:00
Toke Høiland-Jørgensen	80f3b66d62	udp: Set NULL to sk->sk_prot->h.udp_table. JIRA: https://issues.redhat.com/browse/RHEL-65787 Conflicts: Context difference due to already backported 7a7160edf1bf ("net: Return errno in sk->sk_prot->get_port().") commit 67fb43308f4b354f13aabcc66dd5d99bfbb7e838 Author: Kuniyuki Iwashima <kuniyu@amazon.com> Date: Mon Nov 14 13:57:54 2022 -0800 udp: Set NULL to sk->sk_prot->h.udp_table. We will soon introduce an optional per-netns hash table for UDP. This means we cannot use the global sk->sk_prot->h.udp_table to fetch a UDP hash table. Instead, set NULL to sk->sk_prot->h.udp_table for UDP and get a proper table from net->ipv4.udp_table. Note that we still need sk->sk_prot->h.udp_table for UDP LITE. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>	2025-01-28 12:51:53 +01:00
Toke Høiland-Jørgensen	c9f94d6c3a	udp: Clean up some functions. JIRA: https://issues.redhat.com/browse/RHEL-65787 Conflicts: Context difference due to already backported 7a7160edf1bf ("net: Return errno in sk->sk_prot->get_port().") commit 919dfa0b20ae56060dce0436eb710717f8987d18 Author: Kuniyuki Iwashima <kuniyu@amazon.com> Date: Mon Nov 14 13:57:53 2022 -0800 udp: Clean up some functions. This patch adds no functional change and cleans up some functions that the following patches touch around so that we make them tidy and easy to review/revert. The change is mainly to keep reverse christmas tree order. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>	2025-01-28 12:51:52 +01:00
Lucas Zampieri	55f96777fb	Merge: net: backport visibility improvements MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4765 JIRA: https://issues.redhat.com/browse/RHEL-48648 Various visibility improvements; mainly around drop reasons, reset reason and improved tracepoints this time. Signed-off-by: Antoine Tenart <atenart@redhat.com> Approved-by: Chris von Recklinghausen <crecklin@redhat.com> Approved-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Lucas Zampieri <lzampier@redhat.com>	2024-08-12 16:18:50 +00:00
CKI Backport Bot	ca2a05a2e6	udp: Set SOCK_RCU_FREE earlier in udp_lib_get_port(). JIRA: https://issues.redhat.com/browse/RHEL-51033 CVE: CVE-2024-41041 commit 5c0b485a8c6116516f33925b9ce5b6104a6eadfd Author: Kuniyuki Iwashima <kuniyu@amazon.com> Date: Tue Jul 9 12:13:56 2024 -0700 udp: Set SOCK_RCU_FREE earlier in udp_lib_get_port(). syzkaller triggered the warning [0] in udp_v4_early_demux(). In udp_v[46]_early_demux() and sk_lookup(), we do not touch the refcount of the looked-up sk and use sock_pfree() as skb->destructor, so we check SOCK_RCU_FREE to ensure that the sk is safe to access during the RCU grace period. Currently, SOCK_RCU_FREE is flagged for a bound socket after being put into the hash table. Moreover, the SOCK_RCU_FREE check is done too early in udp_v[46]_early_demux() and sk_lookup(), so there could be a small race window: CPU1 CPU2 ---- ---- udp_v4_early_demux() udp_lib_get_port() \| \|- hlist_add_head_rcu() \|- sk = __udp4_lib_demux_lookup() \| \|- DEBUG_NET_WARN_ON_ONCE(sk_is_refcounted(sk)); `- sock_set_flag(sk, SOCK_RCU_FREE) We had the same bug in TCP and fixed it in commit 871019b22d1b ("net: set SOCK_RCU_FREE before inserting socket into hashtable"). Let's apply the same fix for UDP. [0]: WARNING: CPU: 0 PID: 11198 at net/ipv4/udp.c:2599 udp_v4_early_demux+0x481/0xb70 net/ipv4/udp.c:2599 Modules linked in: CPU: 0 PID: 11198 Comm: syz-executor.1 Not tainted 6.9.0-g93bda33046e7 #13 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:udp_v4_early_demux+0x481/0xb70 net/ipv4/udp.c:2599 Code: c5 7a 15 fe bb 01 00 00 00 44 89 e9 31 ff d3 e3 81 e3 bf ef ff ff 89 de e8 2c 74 15 fe 85 db 0f 85 02 06 00 00 e8 9f 7a 15 fe <0f> 0b e8 98 7a 15 fe 49 8d 7e 60 e8 4f 39 2f fe 49 c7 46 60 20 52 RSP: 0018:ffffc9000ce3fa58 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff8318c92c RDX: ffff888036ccde00 RSI: ffffffff8318c2f1 RDI: 0000000000000001 RBP: ffff88805a2dd6e0 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0001ffffffffffff R12: ffff88805a2dd680 R13: 0000000000000007 R14: ffff88800923f900 R15: ffff88805456004e FS: 00007fc449127640(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc449126e38 CR3: 000000003de4b002 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 PKRU: 55555554 Call Trace: <TASK> ip_rcv_finish_core.constprop.0+0xbdd/0xd20 net/ipv4/ip_input.c:349 ip_rcv_finish+0xda/0x150 net/ipv4/ip_input.c:447 NF_HOOK include/linux/netfilter.h:314 [inline] NF_HOOK include/linux/netfilter.h:308 [inline] ip_rcv+0x16c/0x180 net/ipv4/ip_input.c:569 __netif_receive_skb_one_core+0xb3/0xe0 net/core/dev.c:5624 __netif_receive_skb+0x21/0xd0 net/core/dev.c:5738 netif_receive_skb_internal net/core/dev.c:5824 [inline] netif_receive_skb+0x271/0x300 net/core/dev.c:5884 tun_rx_batched drivers/net/tun.c:1549 [inline] tun_get_user+0x24db/0x2c50 drivers/net/tun.c:2002 tun_chr_write_iter+0x107/0x1a0 drivers/net/tun.c:2048 new_sync_write fs/read_write.c:497 [inline] vfs_write+0x76f/0x8d0 fs/read_write.c:590 ksys_write+0xbf/0x190 fs/read_write.c:643 __do_sys_write fs/read_write.c:655 [inline] __se_sys_write fs/read_write.c:652 [inline] __x64_sys_write+0x41/0x50 fs/read_write.c:652 x64_sys_call+0xe66/0x1990 arch/x86/include/generated/asm/syscalls_64.h:2 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x4b/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7fc44a68bc1f Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 e9 cf f5 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 3c d0 f5 ff 48 RSP: 002b:00007fc449126c90 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00000000004bc050 RCX: 00007fc44a68bc1f RDX: 0000000000000032 RSI: 00000000200000c0 RDI: 00000000000000c8 RBP: 00000000004bc050 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000032 R11: 0000000000000293 R12: 0000000000000000 R13: 000000000000000b R14: 00007fc44a5ec530 R15: 0000000000000000 </TASK> Fixes: `6acc9b432e` ("bpf: Add helper to retrieve socket in BPF") Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240709191356.24010-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>	2024-07-30 09:48:30 +00:00
Antoine Tenart	32669d8760	udp: use sk_skb_reason_drop to free rx packets JIRA: https://issues.redhat.com/browse/RHEL-48648 Upstream Status: net-next.git commit fc0cc9248843b37243fa5fd3287a121ec41d291f Author: Yan Zhai <yan@cloudflare.com> Date: Mon Jun 17 11:09:24 2024 -0700 udp: use sk_skb_reason_drop to free rx packets Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving socket to the tracepoint. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/r/202406011751.NpVN0sSk-lkp@intel.com/ Signed-off-by: Yan Zhai <yan@cloudflare.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Antoine Tenart <atenart@redhat.com>	2024-07-16 17:29:42 +02:00
Antoine Tenart	4e410b55fd	net: udp: add IP/port data to the tracepoint udp/udp_fail_queue_rcv_skb JIRA: https://issues.redhat.com/browse/RHEL-48648 Upstream Status: linux.git commit e9669a00bba79442dd4862c57761333d6a020c24 Author: Balazs Scheidler <bazsi77@gmail.com> Date: Tue Mar 26 19:05:47 2024 +0100 net: udp: add IP/port data to the tracepoint udp/udp_fail_queue_rcv_skb The udp_fail_queue_rcv_skb() tracepoint lacks any details on the source and destination IP/port whereas this information can be critical in case of UDP/syslog. Signed-off-by: Balazs Scheidler <balazs.scheidler@axoflow.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://lore.kernel.org/r/0c8b3e33dbf679e190be6f4c6736603a76988a20.1711475011.git.balazs.scheidler@axoflow.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Antoine Tenart <atenart@redhat.com>	2024-07-16 17:29:41 +02:00
Antoine Tenart	5d9b38e8a9	udp: do not accept non-tunnel GSO skbs landing in a tunnel JIRA: https://issues.redhat.com/browse/RHEL-19729 Upstream Status: net.git commit 3d010c8031e39f5fa1e8b13ada77e0321091011f Author: Antoine Tenart <atenart@kernel.org> Date: Tue Mar 26 12:33:58 2024 +0100 udp: do not accept non-tunnel GSO skbs landing in a tunnel When rx-udp-gro-forwarding is enabled UDP packets might be GROed when being forwarded. If such packets might land in a tunnel this can cause various issues and udp_gro_receive makes sure this isn't the case by looking for a matching socket. This is performed in udp4/6_gro_lookup_skb but only in the current netns. This is an issue with tunneled packets when the endpoint is in another netns. In such cases the packets will be GROed at the UDP level, which leads to various issues later on. The same thing can happen with rx-gro-list. We saw this with geneve packets being GROed at the UDP level. In such case gso_size is set; later the packet goes through the geneve rx path, the geneve header is pulled, the offset are adjusted and frag_list skbs are not adjusted with regard to geneve. When those skbs hit skb_fragment, it will misbehave. Different outcomes are possible depending on what the GROed skbs look like; from corrupted packets to kernel crashes. One example is a BUG_ON[1] triggered in skb_segment while processing the frag_list. Because gso_size is wrong (geneve header was pulled) skb_segment thinks there is "geneve header size" of data in frag_list, although it's in fact the next packet. The BUG_ON itself has nothing to do with the issue. This is only one of the potential issues. Looking up for a matching socket in udp_gro_receive is fragile: the lookup could be extended to all netns (not speaking about performances) but nothing prevents those packets from being modified in between and we could still not find a matching socket. It's OK to keep the current logic there as it should cover most cases but we also need to make sure we handle tunnel packets being GROed too early. This is done by extending the checks in udp_unexpected_gso: GSO packets lacking the SKB_GSO_UDP_TUNNEL/_CSUM bits and landing in a tunnel must be segmented. [1] kernel BUG at net/core/skbuff.c:4408! RIP: 0010:skb_segment+0xd2a/0xf70 __udp_gso_segment+0xaa/0x560 Fixes: `9fd1ff5d2a` ("udp: Support UDP fraglist GRO/GSO.") Fixes: `36707061d6` ("udp: allow forwarding of plain (non-fraglisted) UDP GRO packets") Signed-off-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Antoine Tenart <atenart@redhat.com>	2024-03-29 13:55:00 +01:00
Jeff Moyer	092f5d645a	net: ioctl: Use kernel memory on protocol ioctl callbacks JIRA: https://issues.redhat.com/browse/RHEL-12076 Conflicts: There are contextual differences as we're missing commit 559260fd9d9a ("ipmr: do not acquire mrt_lock in ioctl(SIOCGETVIFCNT)"). I also pulled in header changes from commit 949d6b405e61 ("net: add missing includes and forward declarations under net/") to address a build failure with this patch applied. commit e1d001fa5b477c4da46a29be1fcece91db7c7c6f Author: Breno Leitao <leitao@debian.org> Date: Fri Jun 9 08:27:42 2023 -0700 net: ioctl: Use kernel memory on protocol ioctl callbacks Most of the ioctls to net protocols operates directly on userspace argument (arg). Usually doing get_user()/put_user() directly in the ioctl callback. This is not flexible, because it is hard to reuse these functions without passing userspace buffers. Change the "struct proto" ioctls to avoid touching userspace memory and operate on kernel buffers, i.e., all protocol's ioctl callbacks is adapted to operate on a kernel memory other than on userspace (so, no more {put,get}_user() and friends being called in the ioctl callback). This changes the "struct proto" ioctl format in the following way: int (ioctl)(struct sock sk, int cmd, - unsigned long arg); + int karg); (Important to say that this patch does not touch the "struct proto_ops" protocols) So, the "karg" argument, which is passed to the ioctl callback, is a pointer allocated to kernel space memory (inside a function wrapper). This buffer (karg) may contain input argument (copied from userspace in a prep function) and it might return a value/buffer, which is copied back to userspace if necessary. There is not one-size-fits-all format (that is I am using 'may' above), but basically, there are three type of ioctls: 1) Do not read from userspace, returns a result to userspace 2) Read an input parameter from userspace, and does not return anything to userspace 3) Read an input from userspace, and return a buffer to userspace. The default case (1) (where no input parameter is given, and an "int" is returned to userspace) encompasses more than 90% of the cases, but there are two other exceptions. Here is a list of exceptions: Protocol RAW: * cmd = SIOCGETVIFCNT: * input and output = struct sioc_vif_req * cmd = SIOCGETSGCNT * input and output = struct sioc_sg_req * Explanation: for the SIOCGETVIFCNT case, userspace passes the input argument, which is struct sioc_vif_req. Then the callback populates the struct, which is copied back to userspace. * Protocol RAW6: * cmd = SIOCGETMIFCNT_IN6 * input and output = struct sioc_mif_req6 * cmd = SIOCGETSGCNT_IN6 * input and output = struct sioc_sg_req6 * Protocol PHONET: * cmd == SIOCPNADDRESOURCE \| SIOCPNDELRESOURCE * input int (4 bytes) * Nothing is copied back to userspace. For the exception cases, functions sock_sk_ioctl_inout() will copy the userspace input, and copy it back to kernel space. The wrapper that prepare the buffer and put the buffer back to user is sk_ioctl(), so, instead of calling sk->sk_prot->ioctl(), the callee now calls sk_ioctl(), which will handle all cases. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20230609152800.830401-1-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2023-11-02 15:32:16 -04:00
Scott Weaver	9ec000dabc	Merge: UDP: stable backports for rhel 9.4 phase 1 MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3249 JIRA: https://issues.redhat.com/browse/RHEL-14356 UDP: stable backports for rhel 9.4 phase 1 Tested: LNST, Tier1 A bunch of fixlet for the UDP procotol Signed-off-by: Paolo Abeni <pabeni@redhat.com> Approved-by: Florian Westphal <fwestpha@redhat.com> Approved-by: Hangbin Liu <haliu@redhat.com> Signed-off-by: Scott Weaver <scweaver@redhat.com>	2023-10-30 15:41:53 -04:00
Paolo Abeni	4b1fe55101	udp: re-score reuseport groups when connected sockets are present JIRA: https://issues.redhat.com/browse/RHEL-14356 Tested: LNST, Tier1 Upstream commit: commit f0ea27e7bfe1c34e1f451a63eb68faa1d4c3a86d Author: Lorenz Bauer <lmb@isovalent.com> Date: Thu Jul 20 17:30:05 2023 +0200 udp: re-score reuseport groups when connected sockets are present Contrary to TCP, UDP reuseport groups can contain TCP_ESTABLISHED sockets. To support these properly we remember whether a group has a connected socket and skip the fast reuseport early-return. In effect we continue scoring all reuseport sockets and then choose the one with the highest score. The current code fails to re-calculate the score for the result of lookup_reuseport. According to Kuniyuki Iwashima: 1) SO_INCOMING_CPU is set -> selected sk might have +1 score 2) BPF prog returns ESTABLISHED and/or SO_INCOMING_CPU sk -> selected sk will have more than 8 Using the old score could trigger more lookups depending on the order that sockets are created. sk -> sk (SO_INCOMING_CPU) -> sk (ESTABLISHED) \| \| `-> select the next SO_INCOMING_CPU sk \| `-> select itself (We should save this lookup) Fixes: `efc6b6f6c3` ("udp: Improve load balancing for SO_REUSEPORT.") Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Lorenz Bauer <lmb@isovalent.com> Link: https://lore.kernel.org/r/20230720-so-reuseport-v6-1-7021b683cdae@isovalent.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-10-20 13:05:37 +02:00
Chris von Recklinghausen	1f619343f6	treewide: use get_random_u32() when possible Conflicts: drivers/gpu/drm/tests/drm_buddy_test.c drivers/gpu/drm/tests/drm_mm_test.c - We already have ce28ab1380e8 ("drm/tests: Add back seed value information") so keep calls to kunit_info. drop changes to drivers/misc/habanalabs/gaudi2/gaudi2.c fs/ntfs3/fslog.c - files not in CS9 net/sunrpc/auth_gss/gss_krb5_wrap.c - We already have 7f675ca7757b ("SUNRPC: Improve Kerberos confounder generation") so code to change is gone. drivers/gpu/drm/i915/i915_gem_gtt.c drivers/gpu/drm/i915/selftests/i915_selftest.c drivers/gpu/drm/tests/drm_buddy_test.c drivers/gpu/drm/tests/drm_mm_test.c change added under `4cb818386e` ("Merge DRM changes from upstream v6.0.8..v6.1") JIRA: https://issues.redhat.com/browse/RHEL-1848 commit a251c17aa558d8e3128a528af5cf8b9d7caae4fd Author: Jason A. Donenfeld <Jason@zx2c4.com> Date: Wed Oct 5 17:43:22 2022 +0200 treewide: use get_random_u32() when possible The prandom_u32() function has been a deprecated inline wrapper around get_random_u32() for several releases now, and compiles down to the exact same code. Replace the deprecated wrapper with a direct call to the real function. The same also applies to get_random_int(), which is just a wrapper around get_random_u32(). This was done as a basic find and replace. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake Acked-by: Chuck Lever <chuck.lever@oracle.com> # for nfsd Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> # for thunderbol t Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Acked-by: Helge Deller <deller@gmx.de> # for parisc Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2023-10-20 06:15:03 -04:00
Ivan Vecera	497f645693	net: move gso declarations and functions to their own files JIRA: https://issues.redhat.com/browse/RHEL-12679 commit d457a0e329b0bfd3a1450e0b1a18cd2b47a25a08 Author: Eric Dumazet <edumazet@google.com> Date: Thu Jun 8 19:17:37 2023 +0000 net: move gso declarations and functions to their own files Move declarations into include/net/gso.h and code into net/core/gso.c Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Stanislav Fomichev <sdf@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230608191738.3947077-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Ivan Vecera <ivecera@redhat.com>	2023-10-11 13:35:27 +02:00
Felix Maurer	2d92cf1f17	bpf, sockmap: Pass skb ownership through read_skb Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2218483 Conflicts: - net/ipv4/udp.c: Context difference due to missing ec095263a965 ("net: remove noblock parameter from recvmsg() entities") and db39dfdc1c3b ("udp: Use WARN_ON_ONCE() in udp_read_skb()"); 31f1fbcb346c ("udp: Refactor udp_read_skb()") was adapted to reflect this - net/vmw_vsock/virtio_transport_common.c: Skipped, because the relevant code is not there, missing 634f1a7110b4 ("vsock: support sockmap") commit 78fa0d61d97a728d306b0c23d353c0e340756437 Author: John Fastabend <john.fastabend@gmail.com> Date: Mon May 22 19:56:05 2023 -0700 bpf, sockmap: Pass skb ownership through read_skb The read_skb hook calls consume_skb() now, but this means that if the recv_actor program wants to use the skb it needs to inc the ref cnt so that the consume_skb() doesn't kfree the sk_buff. This is problematic because in some error cases under memory pressure we may need to linearize the sk_buff from sk_psock_skb_ingress_enqueue(). Then we get this, skb_linearize() __pskb_pull_tail() pskb_expand_head() BUG_ON(skb_shared(skb)) Because we incremented users refcnt from sk_psock_verdict_recv() we hit the bug on with refcnt > 1 and trip it. To fix lets simply pass ownership of the sk_buff through the skb_read call. Then we can drop the consume from read_skb handlers and assume the verdict recv does any required kfree. Bug found while testing in our CI which runs in VMs that hit memory constraints rather regularly. William tested TCP read_skb handlers. [ 106.536188] ------------[ cut here ]------------ [ 106.536197] kernel BUG at net/core/skbuff.c:1693! [ 106.536479] invalid opcode: 0000 [#1] PREEMPT SMP PTI [ 106.536726] CPU: 3 PID: 1495 Comm: curl Not tainted 5.19.0-rc5 #1 [ 106.537023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.16.0-1 04/01/2014 [ 106.537467] RIP: 0010:pskb_expand_head+0x269/0x330 [ 106.538585] RSP: 0018:ffffc90000138b68 EFLAGS: 00010202 [ 106.538839] RAX: 000000000000003f RBX: ffff8881048940e8 RCX: 0000000000000a20 [ 106.539186] RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff8881048940e8 [ 106.539529] RBP: ffffc90000138be8 R08: 00000000e161fd1a R09: 0000000000000000 [ 106.539877] R10: 0000000000000018 R11: 0000000000000000 R12: ffff8881048940e8 [ 106.540222] R13: 0000000000000003 R14: 0000000000000000 R15: ffff8881048940e8 [ 106.540568] FS: 00007f277dde9f00(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000 [ 106.540954] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 106.541227] CR2: 00007f277eeede64 CR3: 000000000ad3e000 CR4: 00000000000006e0 [ 106.541569] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 106.541915] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 106.542255] Call Trace: [ 106.542383] <IRQ> [ 106.542487] __pskb_pull_tail+0x4b/0x3e0 [ 106.542681] skb_ensure_writable+0x85/0xa0 [ 106.542882] sk_skb_pull_data+0x18/0x20 [ 106.543084] bpf_prog_b517a65a242018b0_bpf_skskb_http_verdict+0x3a9/0x4aa9 [ 106.543536] ? migrate_disable+0x66/0x80 [ 106.543871] sk_psock_verdict_recv+0xe2/0x310 [ 106.544258] ? sk_psock_write_space+0x1f0/0x1f0 [ 106.544561] tcp_read_skb+0x7b/0x120 [ 106.544740] tcp_data_queue+0x904/0xee0 [ 106.544931] tcp_rcv_established+0x212/0x7c0 [ 106.545142] tcp_v4_do_rcv+0x174/0x2a0 [ 106.545326] tcp_v4_rcv+0xe70/0xf60 [ 106.545500] ip_protocol_deliver_rcu+0x48/0x290 [ 106.545744] ip_local_deliver_finish+0xa7/0x150 Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Reported-by: William Findlay <will@isovalent.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: William Findlay <will@isovalent.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230523025618.113937-2-john.fastabend@gmail.com Signed-off-by: Felix Maurer <fmaurer@redhat.com>	2023-06-29 15:45:40 +02:00
Felix Maurer	7f95976aed	udp: Refactor udp_read_skb() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2218483 Conflicts: - net/ipv4/udp.c: Code differece due to missing ec095263a965 ("net: remove noblock parameter from recvmsg() entities"): keep the existing parameters to skb_recv_udp(); and missing db39dfdc1c3b ("udp: Use WARN_ON_ONCE() in udp_read_skb()"): keep WARN_ON commit 31f1fbcb346c9342f6860c322b3f33b2acbc640b Author: Peilin Ye <peilin.ye@bytedance.com> Date: Thu Sep 22 21:59:13 2022 -0700 udp: Refactor udp_read_skb() Delete the unnecessary while loop in udp_read_skb() for readability. Additionally, since recv_actor() cannot return a value greater than skb->len (see sk_psock_verdict_recv()), remove the redundant check. Suggested-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Link: https://lore.kernel.org/r/343b5d8090a3eb764068e9f1d392939e2b423747.1663909008.git.peilin.ye@bytedance.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Felix Maurer <fmaurer@redhat.com>	2023-06-29 15:45:40 +02:00
Jan Stancek	dea08a5636	Merge: net: mptcp: rebase to latest net-next MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/2479 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2193330 Upstream Status: All mainline in net.git. Tested: boot+kselftest Conflicts: see individual commits Signed-off-by: Davide Caratti <dcaratti@redhat.com> Approved-by: Paolo Abeni <pabeni@redhat.com> Approved-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: Jan Stancek <jstancek@redhat.com>	2023-05-19 08:29:21 +02:00
Davide Caratti	c6f30ffe1a	net: cache align tcp_memory_allocated, tcp_sockets_allocated Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2193330 Upstream Status: net.git commit 91b6d3256356 commit 91b6d325635617540b6a1646ddb138bb17cbd569 Author: Eric Dumazet <edumazet@google.com> Date: Mon Nov 15 11:02:39 2021 -0800 net: cache align tcp_memory_allocated, tcp_sockets_allocated tcp_memory_allocated and tcp_sockets_allocated often share a common cache line, source of false sharing. Also take care of udp_memory_allocated and mptcp_sockets_allocated. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Davide Caratti <dcaratti@redhat.com>	2023-05-09 11:08:43 +02:00
Jeff Moyer	87aedebebc	net: flag sockets supporting msghdr originated zerocopy Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2068237 Conflicts: include/linux/net.h - upstream there was a conflict between SOCK_CUSTOM_SOCKOPT and SOCK_SUPPORT_ZC. There, it was resolved with the former getting defined as 6, and the latter as 5. However, in the RHEL backport of a5ef058dc4d9 ("net: introduce and use custom sockopt socket flag"), 5 was chosen for SOCK_CUSTOM_SOCKOPT. I could renumber it to 6 to match upstream, but that risks introducing unnecessary incompatibilities for 3rd party modules, so I opted to differ from upstream. net/ipv4/udp.c - RHEL has a backport of commit 8a3854c7b8e4 ("udp: track the forward memory release threshold in an hot cacheline") out of order with this commit. It's a simple fixup. commit e993ffe3da4bcddea0536b03be1031bf35cd8d85 Author: Pavel Begunkov <asml.silence@gmail.com> Date: Fri Oct 21 11:16:39 2022 +0100 net: flag sockets supporting msghdr originated zerocopy We need an efficient way in io_uring to check whether a socket supports zerocopy with msghdr provided ubuf_info. Add a new flag into the struct socket flags fields. Cc: <stable@vger.kernel.org> # 6.0 Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/3dafafab822b1c66308bb58a0ac738b1e3f53f74.1666346426.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2023-05-05 15:24:12 -04:00
Paolo Abeni	8912bbc04a	net: Return errno in sk->sk_prot->get_port(). Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2166482 Tested: vs bz reproducer Conflicts: different context in inet_csk_get_port and udp_lib_get_port,\ as rhel-9 lacks the upstream commit 08eaef904031 ("tcp: Clean up \ some functions.") and upstream commit 919dfa0b20ae ("udp: Clean up \ some functions.") Upstream commit: commit 7a7160edf1bfde25422262fb26851cef65f695d3 Author: Kuniyuki Iwashima <kuniyu@amazon.com> Date: Fri Nov 18 10:25:06 2022 -0800 net: Return errno in sk->sk_prot->get_port(). We assume the correct errno is -EADDRINUSE when sk->sk_prot->get_port() fails, so some ->get_port() functions return just 1 on failure and the callers return -EADDRINUSE instead. However, mptcp_get_port() can return -EINVAL. Let's not ignore the error. Note the only exception is inet_autobind(), all of whose callers return -EAGAIN instead. Fixes: `cec37a6e41` ("mptcp: Handle MP_CAPABLE options for outgoing connections") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-02-02 00:03:37 +01:00
Herton R. Krzesinski	ee17c5d305	Merge: bpf, xdp: update to 6.0 MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1742 bpf, xdp: update to 6.0 Bugzilla: https://bugzilla.redhat.com/2137876 Signed-off-by: Artem Savkov <asavkov@redhat.com> Approved-by: Jiri Benc <jbenc@redhat.com> Approved-by: Prarit Bhargava <prarit@redhat.com> Approved-by: Jerome Marchand <jmarchan@redhat.com> Approved-by: Yauheni Kaliuta <ykaliuta@redhat.com> Approved-by: Michael Petlan <mpetlan@redhat.com> Signed-off-by: Herton R. Krzesinski <herton@redhat.com>	2023-01-12 16:01:19 +00:00
Felix Maurer	6aa8ccbbdc	skmsg: Get rid of skb_clone() Bugzilla: https://bugzilla.redhat.com/2137876 commit 57452d767feaeab405de3bff0d240c3ac84bfe0d Author: Cong Wang <cong.wang@bytedance.com> Date: Wed Jun 15 09:20:13 2022 -0700 skmsg: Get rid of skb_clone() With ->read_skb() now we have an entire skb dequeued from receive queue, now we just need to grab an addtional refcnt before passing its ownership to recv actors. And we should not touch them any more, particularly for skb->sk. Fortunately, skb->sk is already set for most of the protocols except UDP where skb->sk has been stolen, so we have to fix it up for UDP case. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220615162014.89193-4-xiyou.wangcong@gmail.com Signed-off-by: Felix Maurer <fmaurer@redhat.com>	2023-01-05 15:46:53 +01:00
Felix Maurer	09faf01cb9	net: Introduce a new proto_ops ->read_skb() Bugzilla: https://bugzilla.redhat.com/2137876 Conflicts: Context difference due to not yet applied 314001f0bf927 ("af_unix: Add OOB support") and already applied 3f92a64e44e5 ("tcp: allow tls to decrypt directly from the tcp rcv queue") commit 965b57b469a589d64d81b1688b38dcb537011bb0 Author: Cong Wang <cong.wang@bytedance.com> Date: Wed Jun 15 09:20:12 2022 -0700 net: Introduce a new proto_ops ->read_skb() Currently both splice() and sockmap use ->read_sock() to read skb from receive queue, but for sockmap we only read one entire skb at a time, so ->read_sock() is too conservative to use. Introduce a new proto_ops ->read_skb() which supports this sematic, with this we can finally pass the ownership of skb to recv actors. For non-TCP protocols, all ->read_sock() can be simply converted to ->read_skb(). Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220615162014.89193-3-xiyou.wangcong@gmail.com Signed-off-by: Felix Maurer <fmaurer@redhat.com>	2023-01-05 15:46:53 +01:00
Guillaume Nault	996e10a048	inet: rename INET_MATCH() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2149949 Upstream Status: linux.git commit eda090c31fe923ab9463b884469744ec903ab0cc Author: Eric Dumazet <edumazet@google.com> Date: Fri May 13 11:55:50 2022 -0700 inet: rename INET_MATCH() This is no longer a macro, but an inlined function. INET_MATCH() -> inet_match() Signed-off-by: Eric Dumazet <edumazet@google.com> Suggested-by: Olivier Hartkopp <socketcan@hartkopp.net> Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Guillaume Nault <gnault@redhat.com>	2022-12-22 11:37:47 +01:00
Guillaume Nault	97f5ffd267	inet: add READ_ONCE(sk->sk_bound_dev_if) in INET_MATCH() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2149949 Upstream Status: linux.git commit 4915d50e300e96929d2462041d6f6c6f061167fd Author: Eric Dumazet <edumazet@google.com> Date: Thu May 12 09:56:01 2022 -0700 inet: add READ_ONCE(sk->sk_bound_dev_if) in INET_MATCH() INET_MATCH() runs without holding a lock on the socket. We probably need to annotate most reads. This patch makes INET_MATCH() an inline function to ease our changes. v2: We remove the 32bit version of it, as modern compilers should generate the same code really, no need to try to be smarter. Also make 'struct net *net' the first argument. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Guillaume Nault <gnault@redhat.com>	2022-12-22 11:37:45 +01:00
Herton R. Krzesinski	09736a3a30	Merge: udp: some performance optimizations MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1541 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2133057 Tested: LNST, Tier1, tput test This series improves UDP protocol RX tput, to keep it on equal footing with rhel-8 one. Patches 1,3,4 are there just to reduces the conflicts, and patch 4 is a very partial backport, to avoid pulling unrelated features. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Approved-by: Antoine Tenart <atenart@redhat.com> Approved-by: Florian Westphal <fwestpha@redhat.com> Signed-off-by: Herton R. Krzesinski <herton@redhat.com>	2022-12-13 17:35:03 +00:00
Frantisek Hrbata	f76fa64ae5	Merge: udp: backports from upstream MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1501 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2135958 Tested: compile only Signed-off-by: Xin Long <lxin@redhat.com> Approved-by: Antoine Tenart <atenart@redhat.com> Approved-by: Sabrina Dubroca <sdubroca@redhat.com> Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>	2022-11-29 07:44:19 -05:00
Davide Caratti	9aac6c4346	net: add per_cpu_fw_alloc field to struct proto Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2137858 Upstream Status: net.git commit 0defbb0af775 Conflicts: - net/core/sock.c: context mismatch because of missing backport of upstream commit f20cfd662a62 ("net: add sanity check in proto_register()") commit 0defbb0af775ef037913786048d099bbe8b9a2c2 Author: Eric Dumazet <edumazet@google.com> Date: Wed Jun 8 23:34:08 2022 -0700 net: add per_cpu_fw_alloc field to struct proto Each protocol having a ->memory_allocated pointer gets a corresponding per-cpu reserve, that following patches will use. Instead of having reserved bytes per socket, we want to have per-cpu reserves. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Davide Caratti <dcaratti@redhat.com>	2022-11-08 17:10:55 +01:00
Davide Caratti	543f426b27	net: remove SK_MEM_QUANTUM and SK_MEM_QUANTUM_SHIFT Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2137858 Upstream Status: net.git commit 100fdd1faf50 commit 100fdd1faf50557558e2911af4be32e515cb8036 Author: Eric Dumazet <edumazet@google.com> Date: Wed Jun 8 23:34:07 2022 -0700 net: remove SK_MEM_QUANTUM and SK_MEM_QUANTUM_SHIFT Due to memcg interface, SK_MEM_QUANTUM is effectively PAGE_SIZE. This might change in the future, but it seems better to avoid the confusion. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Davide Caratti <dcaratti@redhat.com>	2022-11-08 17:10:55 +01:00
Paolo Abeni	2657483f26	udp: track the forward memory release threshold in an hot cacheline Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2133057 Tested: LNST, Tier1 Conflicts: use lock_sock()/release_sock() instead of \ sockopt_lock_sock()/sockopt_release_sock as rhel-9 lacks the upstream commit 24426654ed3a ("bpf: net: Avoid sk_setsockopt() taking sk lock when called from bpf")\ Upstream commit: commit 8a3854c7b8e4532063b14bed34115079b7d0cb36 Author: Paolo Abeni <pabeni@redhat.com> Date: Thu Oct 20 19:48:52 2022 +0200 udp: track the forward memory release threshold in an hot cacheline When the receiver process and the BH runs on different cores, udp_rmem_release() experience a cache miss while accessing sk_rcvbuf, as the latter shares the same cacheline with sk_forward_alloc, written by the BH. With this patch, UDP tracks the rcvbuf value and its update via custom SOL_SOCKET socket options, and copies the forward memory threshold value used by udp_rmem_release() in a different cacheline, already accessed by the above function and uncontended. Since the UDP socket init operation grown a bit, factor out the common code between v4 and v6 in a shared helper. Overall the above give a 10% peek throughput increase under UDP flood. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-10-28 09:46:32 +02:00
Frantisek Hrbata	0c3a22328a	Merge: IPv6: 9.2 P1 backport from upstream MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1488 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2135319 Signed-off-by: Hangbin Liu <haliu@redhat.com> Approved-by: Davide Caratti <dcaratti@redhat.com> Approved-by: Sabrina Dubroca <sdubroca@redhat.com> Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>	2022-10-27 08:26:02 -04:00
Frantisek Hrbata	fa843be1d1	Merge: net: add skb drop reasons MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1454 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2059161 Sync skb drop reasons with upstream to improve debuggability and visibility in the net stack. This MR helps in understanding why a given packet is being dropped. One way of retrieving the skb drop reason is to hook to the kfree_skb tracepoint: ``` # perf record -e skb:kfree_skb -a sleep 10 # perf script swapper 0 [000] 45483.977088: skb:kfree_skb: skbaddr=0xffffa04859090f00 protocol=34525 location=0xffffffff9bc92940 reason: NOT_SPECIFIED swapper 0 [000] 45485.792919: skb:kfree_skb: skbaddr=0xffffa04143757900 protocol=34525 location=0xffffffff9bbd84bb reason: TCP_INVALID_SEQUENCE ``` Signed-off-by: Antoine Tenart <atenart@redhat.com> Approved-by: Jarod Wilson <jarod@redhat.com> Approved-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>	2022-10-24 14:27:58 -04:00
Xin Long	41b7fb44a4	udp: Update reuse->has_conns under reuseport_lock. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2135958 Tested: compile only commit 69421bf98482d089e50799f45e48b25ce4a8d154 Author: Kuniyuki Iwashima <kuniyu@amazon.com> Date: Fri Oct 14 11:26:25 2022 -0700 udp: Update reuse->has_conns under reuseport_lock. When we call connect() for a UDP socket in a reuseport group, we have to update sk->sk_reuseport_cb->has_conns to 1. Otherwise, the kernel could select a unconnected socket wrongly for packets sent to the connected socket. However, the current way to set has_conns is illegal and possible to trigger that problem. reuseport_has_conns() changes has_conns under rcu_read_lock(), which upgrades the RCU reader to the updater. Then, it must do the update under the updater's lock, reuseport_lock, but it doesn't for now. For this reason, there is a race below where we fail to set has_conns resulting in the wrong socket selection. To avoid the race, let's split the reader and updater with proper locking. cpu1 cpu2 +----+ +----+ __ip[46]_datagram_connect() reuseport_grow() . . \|- reuseport_has_conns(sk, true) \|- more_reuse = __reuseport_alloc(more_socks_size) \| . \| \| \|- rcu_read_lock() \| \|- reuse = rcu_dereference(sk->sk_reuseport_cb) \| \| \| \| \| /* reuse->has_conns == 0 here / \| \| \|- more_reuse->has_conns = reuse->has_conns \| \|- reuse->has_conns = 1 \| / more_reuse->has_conns SHOULD BE 1 HERE */ \| \| \| \| \| \|- rcu_assign_pointer(reuse->socks[i]->sk_reuseport_cb, \| \| \| more_reuse) \| `- rcu_read_unlock() `- kfree_rcu(reuse, rcu) \| \|- sk->sk_state = TCP_ESTABLISHED Note the likely(reuse) in reuseport_has_conns_set() is always true, but we put the test there for ease of review. [0] For the record, usually, sk_reuseport_cb is changed under lock_sock(). The only exception is reuseport_grow() & TCP reqsk migration case. 1) shutdown() TCP listener, which is moved into the latter part of reuse->socks[] to migrate reqsk. 2) New listen() overflows reuse->socks[] and call reuseport_grow(). 3) reuse->max_socks overflows u16 with the new listener. 4) reuseport_grow() pops the old shutdown()ed listener from the array and update its sk->sk_reuseport_cb as NULL without lock_sock(). shutdown()ed TCP sk->sk_reuseport_cb can be changed without lock_sock(), but, reuseport_has_conns_set() is called only for UDP under lock_sock(), so likely(reuse) never be false in reuseport_has_conns_set(). [0]: https://lore.kernel.org/netdev/CANn89iLja=eQHbsM_Ta2sQF0tOGU8vAGrh_izRuuHjuO1ouUag@mail.gmail.com/ Fixes: `acdcecc612` ("udp: correct reuseport selection with connected sockets") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20221014182625.89913-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Xin Long <lxin@redhat.com>	2022-10-18 19:20:11 -04:00
Xin Long	bcde996ce0	udp: Remove redundant __udp_sysctl_init() call from udp_init(). Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2135958 Tested: compile only commit 02a7cb2866dd6e3ac7645b594289e1c308b68c4e Author: Kuniyuki Iwashima <kuniyu@amazon.com> Date: Thu Jul 28 20:21:37 2022 -0700 udp: Remove redundant __udp_sysctl_init() call from udp_init(). __udp_sysctl_init() is called for init_net via udp_sysctl_ops. While at it, we can rename __udp_sysctl_init() to udp_sysctl_init(). Fixes: `1e80295158` ("udp: Move the udp sysctl to namespace.") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Xin Long <lxin@redhat.com>	2022-10-18 19:20:11 -04:00
Xin Long	83ed64626f	net: udp: fix alignment problem in udp4_seq_show() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2135958 Tested: compile only commit 6c25449e1a32c594d743df8e8258e8ef870b6a77 Author: yangxingwu <xingwu.yang@gmail.com> Date: Mon Dec 27 16:29:51 2021 +0800 net: udp: fix alignment problem in udp4_seq_show() $ cat /pro/net/udp before: sl local_address rem_address st tx_queue rx_queue tr tm->when 26050: 0100007F:0035 00000000:0000 07 00000000:00000000 00:00000000 26320: 0100007F:0143 00000000:0000 07 00000000:00000000 00:00000000 27135: 00000000:8472 00000000:0000 07 00000000:00000000 00:00000000 after: sl local_address rem_address st tx_queue rx_queue tr tm->when 26050: 0100007F:0035 00000000:0000 07 00000000:00000000 00:00000000 26320: 0100007F:0143 00000000:0000 07 00000000:00000000 00:00000000 27135: 00000000:8472 00000000:0000 07 00000000:00000000 00:00000000 Signed-off-by: yangxingwu <xingwu.yang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Xin Long <lxin@redhat.com>	2022-10-18 19:20:10 -04:00
Hangbin Liu	4b10ce48ea	tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct(). Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2135319 Upstream Status: net.git commit d38afeec26ed commit d38afeec26ed4739c640bf286c270559aab2ba5f Author: Kuniyuki Iwashima <kuniyu@amazon.com> Date: Thu Oct 6 11:53:47 2022 -0700 tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct(). Originally, inet6_sk(sk)->XXX were changed under lock_sock(), so we were able to clean them up by calling inet6_destroy_sock() during the IPv6 -> IPv4 conversion by IPV6_ADDRFORM. However, commit `03485f2adc` ("udpv6: Add lockless sendmsg() support") added a lockless memory allocation path, which could cause a memory leak: setsockopt(IPV6_ADDRFORM) sendmsg() +-----------------------+ +-------+ - do_ipv6_setsockopt(sk, ...) - udpv6_sendmsg(sk, ...) - sockopt_lock_sock(sk) ^._ called via udpv6_prot - lock_sock(sk) before WRITE_ONCE() - WRITE_ONCE(sk->sk_prot, &tcp_prot) - inet6_destroy_sock() - if (!corkreq) - sockopt_release_sock(sk) - ip6_make_skb(sk, ...) - release_sock(sk) ^._ lockless fast path for the non-corking case - __ip6_append_data(sk, ...) - ipv6_local_rxpmtu(sk, ...) - xchg(&np->rxpmtu, skb) ^._ rxpmtu is never freed. - goto out_no_dst; - lock_sock(sk) For now, rxpmtu is only the case, but not to miss the future change and a similar bug fixed in commit e27326009a3d ("net: ping6: Fix memleak in ipv6_renew_options()."), let's set a new function to IPv6 sk->sk_destruct() and call inet6_cleanup_sock() there. Since the conversion does not change sk->sk_destruct(), we can guarantee that we can clean up IPv6 resources finally. We can now remove all inet6_destroy_sock() calls from IPv6 protocol specific ->destroy() functions, but such changes are invasive to backport. So they can be posted as a follow-up later for net-next. Fixes: `03485f2adc` ("udpv6: Add lockless sendmsg() support") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Hangbin Liu <haliu@redhat.com>	2022-10-18 11:41:13 +08:00
Antoine Tenart	854953ce3e	net: udp: use kfree_skb_reason() in __udp_queue_rcv_skb() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2059161 Upstream Status: linux.git commit 08d4c0370c400fa6ef2194f9ee2e8dccc4a7ab39 Author: Menglong Dong <imagedong@tencent.com> Date: Sat Feb 5 15:47:39 2022 +0800 net: udp: use kfree_skb_reason() in __udp_queue_rcv_skb() Replace kfree_skb() with kfree_skb_reason() in __udp_queue_rcv_skb(). Following new drop reasons are introduced: SKB_DROP_REASON_SOCKET_RCVBUFF SKB_DROP_REASON_PROTO_MEM Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Antoine Tenart <atenart@redhat.com>	2022-10-13 14:53:22 +02:00
Antoine Tenart	b9868e0eb2	net: udp: use kfree_skb_reason() in udp_queue_rcv_one_skb() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2059161 Upstream Status: linux.git commit 1379a92d38e31132e87d9b653e9343c7841a7348 Author: Menglong Dong <imagedong@tencent.com> Date: Sat Feb 5 15:47:38 2022 +0800 net: udp: use kfree_skb_reason() in udp_queue_rcv_one_skb() Replace kfree_skb() with kfree_skb_reason() in udp_queue_rcv_one_skb(). Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Antoine Tenart <atenart@redhat.com>	2022-10-13 14:53:22 +02:00
Chris von Recklinghausen	59baaa91f0	include/linux/mm.h: move nr_free_buffer_pages from swap.h to mm.h Bugzilla: https://bugzilla.redhat.com/2120352 commit a1554c002699cbc9ced2e9f44f9c1357181bead3 Author: Mianhan Liu <liumh1@shanghaitech.edu.cn> Date: Fri Nov 5 13:45:21 2021 -0700 include/linux/mm.h: move nr_free_buffer_pages from swap.h to mm.h nr_free_buffer_pages could be exposed through mm.h instead of swap.h. The advantage of this change is that it can reduce the obsolete includes. For example, net/ipv4/tcp.c wouldn't need swap.h any more since it has already included mm.h. Similarly, after checking all the other files, it comes that tcp.c, udp.c meter.c ,... follow the same rule, so these files can have swap.h removed too. Moreover, after preprocessing all the files that use nr_free_buffer_pages, it turns out that those files have already included mm.h.Thus, we can move nr_free_buffer_pages from swap.h to mm.h safely. This change will not affect the compilation of other files. Link: https://lkml.kernel.org/r/20210912133640.1624-1-liumh1@shanghaitech.edu.cn Signed-off-by: Mianhan Liu <liumh1@shanghaitech.edu.cn> Cc: Jakub Kicinski <kuba@kernel.org> CC: Ulf Hansson <ulf.hansson@linaro.org> Cc: "David S . Miller" <davem@davemloft.net> Cc: Simon Horman <horms@verge.net.au> Cc: Pravin B Shelar <pshelar@ovn.org> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:27:30 -04:00
Felix Maurer	de20724127	net: bpf: Handle return value of BPF_CGROUP_RUN_PROG_INET{4,6}_POST_BIND() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2071620 commit 91a760b26926265a60c77ddf016529bcf3e17a04 Author: Menglong Dong <imagedong@tencent.com> Date: Thu Jan 6 21:20:20 2022 +0800 net: bpf: Handle return value of BPF_CGROUP_RUN_PROG_INET{4,6}_POST_BIND() The return value of BPF_CGROUP_RUN_PROG_INET{4,6}_POST_BIND() in __inet_bind() is not handled properly. While the return value is non-zero, it will set inet_saddr and inet_rcv_saddr to 0 and exit: err = BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk); if (err) { inet->inet_saddr = inet->inet_rcv_saddr = 0; goto out_release_sock; } Let's take UDP for example and see what will happen. For UDP socket, it will be added to 'udp_prot.h.udp_table->hash' and 'udp_prot.h.udp_table->hash2' after the sk->sk_prot->get_port() called success. If 'inet->inet_rcv_saddr' is specified here, then 'sk' will be in the 'hslot2' of 'hash2' that it don't belong to (because inet_saddr is changed to 0), and UDP packet received will not be passed to this sock. If 'inet->inet_rcv_saddr' is not specified here, the sock will work fine, as it can receive packet properly, which is wired, as the 'bind()' is already failed. To undo the get_port() operation, introduce the 'put_port' field for 'struct proto'. For TCP proto, it is inet_put_port(); For UDP proto, it is udp_lib_unhash(); For icmp proto, it is ping_unhash(). Therefore, after sys_bind() fail caused by BPF_CGROUP_RUN_PROG_INET4_POST_BIND(), it will be unbinded, which means that it can try to be binded to another port. Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220106132022.3470772-2-imagedong@tencent.com Signed-off-by: Felix Maurer <fmaurer@redhat.com>	2022-08-24 16:53:48 +02:00
Artem Savkov	75a645a56c	add missing bpf-cgroup.h includes Bugzilla: https://bugzilla.redhat.com/2069046 Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git commit aef2feda97b840ec38e9fa53d0065188453304e8 Author: Jakub Kicinski <kuba@kernel.org> Date: Wed Dec 15 18:55:37 2021 -0800 add missing bpf-cgroup.h includes We're about to break the cgroup-defs.h -> bpf-cgroup.h dependency, make sure those who actually need more than the definition of struct cgroup_bpf include bpf-cgroup.h explicitly. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/bpf/20211216025538.1649516-3-kuba@kernel.org Signed-off-by: Artem Savkov <asavkov@redhat.com>	2022-08-24 12:53:49 +02:00
Artem Savkov	070349c7ea	bpf: Add ingress_ifindex to bpf_sk_lookup Bugzilla: https://bugzilla.redhat.com/2069046 Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git commit f89315650ba34ec6c91a8bded72796980bee2a4d Author: Mark Pashmfouroush <markpash@cloudflare.com> Date: Wed Nov 10 11:10:15 2021 +0000 bpf: Add ingress_ifindex to bpf_sk_lookup It may be helpful to have access to the ifindex during bpf socket lookup. An example may be to scope certain socket lookup logic to specific interfaces, i.e. an interface may be made exempt from custom lookup code. Add the ifindex of the arriving connection to the bpf_sk_lookup API. Signed-off-by: Mark Pashmfouroush <markpash@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211110111016.5670-2-markpash@cloudflare.com Signed-off-by: Artem Savkov <asavkov@redhat.com>	2022-08-24 12:53:35 +02:00
Patrick Talbert	8c5b3f7fd9	Merge: XDP and networking eBPF rebase to v5.15 MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/674 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2071618 Depends: !572 Tested: Using bpf selftests, everything passes. This rebases XDP and networking eBPF to upstream kernel version 5.15. Signed-off-by: Jiri Benc <jbenc@redhat.com> Approved-by: Hangbin Liu <haliu@redhat.com> Approved-by: Rafael Aquini <aquini@redhat.com> Approved-by: Toke Høiland-Jørgensen <toke@redhat.com> Approved-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: Patrick Talbert <ptalbert@redhat.com>	2022-06-03 09:26:25 +02:00

1 2 3 4 5 ...

702 Commits