Centos-kernel-stream-9

Commit Graph

Author	SHA1	Message	Date
Florian Westphal	10031021a9	netfilter: nft_exthdr: fix offset with ipv4_find_option() JIRA: https://issues.redhat.com/browse/RHEL-84577 Upstream Status: commit 6edd78af9506 commit 6edd78af9506bb182518da7f6feebd75655d9a0e Author: Alexey Kashavkin <akashavkin@gmail.com> Date: Sun Mar 2 00:14:36 2025 +0300 netfilter: nft_exthdr: fix offset with ipv4_find_option() There is an incorrect calculation in the offset variable which causes the nft_skb_copy_to_reg() function to always return -EFAULT. Adding the start variable is redundant. In the __ip_options_compile() function the correct offset is specified when finding the function. There is no need to add the size of the iphdr structure to the offset. Fixes: `dbb5281a1f` ("netfilter: nf_tables: add support for matching IPv4 options") Signed-off-by: Alexey Kashavkin <akashavkin@gmail.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fwestpha@redhat.com>	2025-03-26 11:19:46 +01:00
Florian Westphal	9ab0fa974f	netfilter: nf_conncount: Fully initialize struct nf_conncount_tuple in insert_tree() JIRA: https://issues.redhat.com/browse/RHEL-84577 Upstream Status: commit d653bfeb07eb Conflicts: net/netfilter/nf_conncount.c Context only, we lack commit 0b88d1654d55 ("netfilter: nf_conncount: fix wrong variable type"). commit d653bfeb07ebb3499c403404c21ac58a16531607 Author: Kohei Enju <enjuk@amazon.com> Date: Sun Mar 9 17:07:38 2025 +0900 netfilter: nf_conncount: Fully initialize struct nf_conncount_tuple in insert_tree() Since commit `b36e4523d4` ("netfilter: nf_conncount: fix garbage collection confirm race"), `cpu` and `jiffies32` were introduced to the struct nf_conncount_tuple. The commit made nf_conncount_add() initialize `conn->cpu` and `conn->jiffies32` when allocating the struct. In contrast, count_tree() was not changed to initialize them. By commit `34848d5c89` ("netfilter: nf_conncount: Split insert and traversal"), count_tree() was split and the relevant allocation code now resides in insert_tree(). Initialize `conn->cpu` and `conn->jiffies32` in insert_tree(). BUG: KMSAN: uninit-value in find_or_evict net/netfilter/nf_conncount.c:117 [inline] BUG: KMSAN: uninit-value in __nf_conncount_add+0xd9c/0x2850 net/netfilter/nf_conncount.c:143 find_or_evict net/netfilter/nf_conncount.c:117 [inline] __nf_conncount_add+0xd9c/0x2850 net/netfilter/nf_conncount.c:143 count_tree net/netfilter/nf_conncount.c:438 [inline] nf_conncount_count+0x82f/0x1e80 net/netfilter/nf_conncount.c:521 connlimit_mt+0x7f6/0xbd0 net/netfilter/xt_connlimit.c:72 __nft_match_eval net/netfilter/nft_compat.c:403 [inline] nft_match_eval+0x1a5/0x300 net/netfilter/nft_compat.c:433 expr_call_ops_eval net/netfilter/nf_tables_core.c:240 [inline] nft_do_chain+0x426/0x2290 net/netfilter/nf_tables_core.c:288 nft_do_chain_ipv4+0x1a5/0x230 net/netfilter/nft_chain_filter.c:23 nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline] nf_hook_slow+0xf4/0x400 net/netfilter/core.c:626 nf_hook_slow_list+0x24d/0x860 net/netfilter/core.c:663 NF_HOOK_LIST include/linux/netfilter.h:350 [inline] ip_sublist_rcv+0x17b7/0x17f0 net/ipv4/ip_input.c:633 ip_list_rcv+0x9ef/0xa40 net/ipv4/ip_input.c:669 __netif_receive_skb_list_ptype net/core/dev.c:5936 [inline] __netif_receive_skb_list_core+0x15c5/0x1670 net/core/dev.c:5983 __netif_receive_skb_list net/core/dev.c:6035 [inline] netif_receive_skb_list_internal+0x1085/0x1700 net/core/dev.c:6126 netif_receive_skb_list+0x5a/0x460 net/core/dev.c:6178 xdp_recv_frames net/bpf/test_run.c:280 [inline] xdp_test_run_batch net/bpf/test_run.c:361 [inline] bpf_test_run_xdp_live+0x2e86/0x3480 net/bpf/test_run.c:390 bpf_prog_test_run_xdp+0xf1d/0x1ae0 net/bpf/test_run.c:1316 bpf_prog_test_run+0x5e5/0xa30 kernel/bpf/syscall.c:4407 __sys_bpf+0x6aa/0xd90 kernel/bpf/syscall.c:5813 __do_sys_bpf kernel/bpf/syscall.c:5902 [inline] __se_sys_bpf kernel/bpf/syscall.c:5900 [inline] __ia32_sys_bpf+0xa0/0xe0 kernel/bpf/syscall.c:5900 ia32_sys_call+0x394d/0x4180 arch/x86/include/generated/asm/syscalls_32.h:358 do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline] __do_fast_syscall_32+0xb0/0x110 arch/x86/entry/common.c:387 do_fast_syscall_32+0x38/0x80 arch/x86/entry/common.c:412 do_SYSENTER_32+0x1f/0x30 arch/x86/entry/common.c:450 entry_SYSENTER_compat_after_hwframe+0x84/0x8e Uninit was created at: slab_post_alloc_hook mm/slub.c:4121 [inline] slab_alloc_node mm/slub.c:4164 [inline] kmem_cache_alloc_noprof+0x915/0xe10 mm/slub.c:4171 insert_tree net/netfilter/nf_conncount.c:372 [inline] count_tree net/netfilter/nf_conncount.c:450 [inline] nf_conncount_count+0x1415/0x1e80 net/netfilter/nf_conncount.c:521 connlimit_mt+0x7f6/0xbd0 net/netfilter/xt_connlimit.c:72 __nft_match_eval net/netfilter/nft_compat.c:403 [inline] nft_match_eval+0x1a5/0x300 net/netfilter/nft_compat.c:433 expr_call_ops_eval net/netfilter/nf_tables_core.c:240 [inline] nft_do_chain+0x426/0x2290 net/netfilter/nf_tables_core.c:288 nft_do_chain_ipv4+0x1a5/0x230 net/netfilter/nft_chain_filter.c:23 nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline] nf_hook_slow+0xf4/0x400 net/netfilter/core.c:626 nf_hook_slow_list+0x24d/0x860 net/netfilter/core.c:663 NF_HOOK_LIST include/linux/netfilter.h:350 [inline] ip_sublist_rcv+0x17b7/0x17f0 net/ipv4/ip_input.c:633 ip_list_rcv+0x9ef/0xa40 net/ipv4/ip_input.c:669 __netif_receive_skb_list_ptype net/core/dev.c:5936 [inline] __netif_receive_skb_list_core+0x15c5/0x1670 net/core/dev.c:5983 __netif_receive_skb_list net/core/dev.c:6035 [inline] netif_receive_skb_list_internal+0x1085/0x1700 net/core/dev.c:6126 netif_receive_skb_list+0x5a/0x460 net/core/dev.c:6178 xdp_recv_frames net/bpf/test_run.c:280 [inline] xdp_test_run_batch net/bpf/test_run.c:361 [inline] bpf_test_run_xdp_live+0x2e86/0x3480 net/bpf/test_run.c:390 bpf_prog_test_run_xdp+0xf1d/0x1ae0 net/bpf/test_run.c:1316 bpf_prog_test_run+0x5e5/0xa30 kernel/bpf/syscall.c:4407 __sys_bpf+0x6aa/0xd90 kernel/bpf/syscall.c:5813 __do_sys_bpf kernel/bpf/syscall.c:5902 [inline] __se_sys_bpf kernel/bpf/syscall.c:5900 [inline] __ia32_sys_bpf+0xa0/0xe0 kernel/bpf/syscall.c:5900 ia32_sys_call+0x394d/0x4180 arch/x86/include/generated/asm/syscalls_32.h:358 do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline] __do_fast_syscall_32+0xb0/0x110 arch/x86/entry/common.c:387 do_fast_syscall_32+0x38/0x80 arch/x86/entry/common.c:412 do_SYSENTER_32+0x1f/0x30 arch/x86/entry/common.c:450 entry_SYSENTER_compat_after_hwframe+0x84/0x8e Reported-by: syzbot+83fed965338b573115f7@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=83fed965338b573115f7 Fixes: `b36e4523d4` ("netfilter: nf_conncount: fix garbage collection confirm race") Signed-off-by: Kohei Enju <enjuk@amazon.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fwestpha@redhat.com>	2025-03-26 10:12:55 +01:00
Florian Westphal	ac4415015f	netfilter: nf_tables: make destruction work queue pernet JIRA: https://issues.redhat.com/browse/RHEL-84577 Upstream Status: commit fb8286562ecf commit fb8286562ecfb585e26b033c5e32e6fb85efb0b3 Author: Florian Westphal <fw@strlen.de> Date: Thu Mar 6 04:05:26 2025 +0100 netfilter: nf_tables: make destruction work queue pernet The call to flush_work before tearing down a table from the netlink notifier was supposed to make sure that all earlier updates (e.g. rule add) that might reference that table have been processed. Unfortunately, flush_work() waits for the last queued instance. This could be an instance that is different from the one that we must wait for. This is because transactions are protected with a pernet mutex, but the work item is global, so holding the transaction mutex doesn't prevent another netns from queueing more work. Make the work item pernet so that flush_work() will wait for all transactions queued from this netns. A welcome side effect is that we no longer need to wait for transaction objects from foreign netns. The gc work queue is still global. This seems to be ok because nft_set structures are reference counted and each container structure owns a reference on the net namespace. The destroy_list is still protected by a global spinlock rather than pernet one but the hold time is very short anyway. v2: call cancel_work_sync before reaping the remaining tables (Pablo). Fixes: 9f6958ba2e90 ("netfilter: nf_tables: unconditionally flush pending work before notifier") Reported-by: syzbot+5d8c5789c8cb076b2c25@syzkaller.appspotmail.com Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fwestpha@redhat.com>	2025-03-26 10:12:49 +01:00
Florian Westphal	ac58aedac2	netfilter: nft_ct: Use __refcount_inc() for per-CPU nft_ct_pcpu_template. JIRA: https://issues.redhat.com/browse/RHEL-84577 Upstream Status: commit 5cfe5612ca95 commit 5cfe5612ca9590db69b9be29dc83041dbf001108 Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Date: Mon Feb 17 17:02:42 2025 +0100 netfilter: nft_ct: Use __refcount_inc() for per-CPU nft_ct_pcpu_template. nft_ct_pcpu_template is a per-CPU variable and relies on disabled BH for its locking. The refcounter is read and if its value is set to one then the refcounter is incremented and variable is used - otherwise it is already in use and left untouched. Without per-CPU locking in local_bh_disable() on PREEMPT_RT the read-then-increment operation is not atomic and therefore racy. This can be avoided by using unconditionally __refcount_inc() which will increment counter and return the old value as an atomic operation. In case the returned counter is not one, the variable is in use and we need to decrement counter. Otherwise we can use it. Use __refcount_inc() instead of read and a conditional increment. Fixes: `edee4f1e92` ("netfilter: nft_ct: add zone id set support") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fwestpha@redhat.com>	2025-03-26 10:12:48 +01:00
Florian Westphal	e86e5c7f86	netfilter: nft_flow_offload: update tcp state flags under lock JIRA: https://issues.redhat.com/browse/RHEL-84577 Upstream Status: commit 7a4b61406395 commit 7a4b61406395291ffb7220a10e8951a9a8684819 Author: Florian Westphal <fw@strlen.de> Date: Tue Jan 14 00:50:34 2025 +0100 netfilter: nft_flow_offload: update tcp state flags under lock The conntrack entry is already public, there is a small chance that another CPU is handling a packet in reply direction and racing with the tcp state update. Move this under ct spinlock. This is done once, when ct is about to be offloaded, so this should not result in a noticeable performance hit. Fixes: `8437a6209f` ("netfilter: nft_flow_offload: set liberal tracking mode for tcp") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fwestpha@redhat.com>	2025-03-26 10:12:48 +01:00
Florian Westphal	6b1086a4cb	netfilter: nft_flow_offload: clear tcp MAXACK flag before moving to slowpath JIRA: https://issues.redhat.com/browse/RHEL-84577 Upstream Status: commit d9d7b489416d commit d9d7b489416d18ba696c32a93623ecb0176b374e Author: Florian Westphal <fw@strlen.de> Date: Tue Jan 14 00:50:33 2025 +0100 netfilter: nft_flow_offload: clear tcp MAXACK flag before moving to slowpath This state reset is racy, no locks are held here. Since commit `8437a6209f` ("netfilter: nft_flow_offload: set liberal tracking mode for tcp"), the window checks are disabled for normal data packets, but MAXACK flag is checked when validating TCP resets. Clear the flag so tcp reset validation checks are ignored. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fwestpha@redhat.com>	2025-03-26 10:12:48 +01:00
Augusto Caringi	d5cbb3a73e	Merge: CVE-2025-21826: netfilter: nf_tables: reject mismatching sum of field_len with set key length MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/6532 JIRA: https://issues.redhat.com/browse/RHEL-82489 CVE: CVE-2025-21826 ``` commit 1b9335a8000fb70742f7db10af314104b6ace220 Author: Pablo Neira Ayuso <pablo@netfilter.org> Date: Tue Jan 28 12:26:33 2025 +0100 netfilter: nf_tables: reject mismatching sum of field_len with set key length The field length description provides the length of each separated key field in the concatenation, each field gets rounded up to 32-bits to calculate the pipapo rule width from pipapo_init(). The set key length provides the total size of the key aligned to 32-bits. Register-based arithmetics still allows for combining mismatching set key length and field length description, eg. set key length 10 and field description [ 5, 4 ] leading to pipapo width of 12. Cc: stable@vger.kernel.org Fixes: 3ce67e3793f4 ("netfilter: nf_tables: do not allow mismatch field size and set key length") Reported-by: Noam Rathaus <noamr@ssd-disclosure.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>``` Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com> --- <small>Created 2025-03-06 18:39 UTC by backporter - [KWF FAQ](https://red.ht/kernel_workflow_doc) - [Slack #team-kernel-workflow](https://redhat-internal.slack.com/archives/C04LRUPMJQ5) - [Source](https://gitlab.com/cki-project/kernel-workflow/-/blob/main/webhook/utils/backporter.py) - [Documentation](https://gitlab.com/cki-project/kernel-workflow/-/blob/main/docs/README.backporter.md) - [Report an issue](https://gitlab.com/cki-project/kernel-workflow/-/issues/new?issue%5Btitle%5D=backporter%20webhook%20issue)</small> Approved-by: Florian Westphal <fwestpha@redhat.com> Approved-by: Xin Long <lxin@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Augusto Caringi <acaringi@redhat.com>	2025-03-20 11:20:20 -03:00
Augusto Caringi	e7e88834bf	Merge: netfilter: nfnetlink_queue: drop bogus WARN_ON MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/6402 JIRA: https://issues.redhat.com/browse/RHEL-80104 Upstream Status: commit 631a4b3ddc78 Updated nft_queue.sh kselftest can trigger a WARN splat. Note that upstream "Fixes" tag is incorrect, the problem does exist in 9.x releases. Signed-off-by: Florian Westphal <fwestpha@redhat.com> Approved-by: Hangbin Liu <haliu@redhat.com> Approved-by: Antoine Tenart <atenart@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Augusto Caringi <acaringi@redhat.com>	2025-03-17 14:57:52 -03:00
CKI Backport Bot	6969a826ff	netfilter: nf_tables: reject mismatching sum of field_len with set key length JIRA: https://issues.redhat.com/browse/RHEL-82489 CVE: CVE-2025-21826 commit 1b9335a8000fb70742f7db10af314104b6ace220 Author: Pablo Neira Ayuso <pablo@netfilter.org> Date: Tue Jan 28 12:26:33 2025 +0100 netfilter: nf_tables: reject mismatching sum of field_len with set key length The field length description provides the length of each separated key field in the concatenation, each field gets rounded up to 32-bits to calculate the pipapo rule width from pipapo_init(). The set key length provides the total size of the key aligned to 32-bits. Register-based arithmetics still allows for combining mismatching set key length and field length description, eg. set key length 10 and field description [ 5, 4 ] leading to pipapo width of 12. Cc: stable@vger.kernel.org Fixes: 3ce67e3793f4 ("netfilter: nf_tables: do not allow mismatch field size and set key length") Reported-by: Noam Rathaus <noamr@ssd-disclosure.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>	2025-03-06 18:39:57 +00:00
Augusto Caringi	19e4d875cf	Merge: CVE-2024-53680: ipvs: fix UB due to uninitialized stack access in ip_vs_protocol_init() MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/6343 JIRA: https://issues.redhat.com/browse/RHEL-77915 CVE: CVE-2024-53680 ``` ipvs: fix UB due to uninitialized stack access in ip_vs_protocol_init() Under certain kernel configurations when building with Clang/LLVM, the compiler does not generate a return or jump as the terminator instruction for ip_vs_protocol_init(), triggering the following objtool warning during build time: vmlinux.o: warning: objtool: ip_vs_protocol_init() falls through to next function __initstub__kmod_ip_vs_rr__935_123_ip_vs_rr_init6() At runtime, this either causes an oops when trying to load the ipvs module or a boot-time panic if ipvs is built-in. This same issue has been reported by the Intel kernel test robot previously. Digging deeper into both LLVM and the kernel code reveals this to be a undefined behavior problem. ip_vs_protocol_init() uses a on-stack buffer of 64 chars to store the registered protocol names and leaves it uninitialized after definition. The function calls strnlen() when concatenating protocol names into the buffer. With CONFIG_FORTIFY_SOURCE strnlen() performs an extra step to check whether the last byte of the input char buffer is a null character (commit 3009f891bb9f ("fortify: Allow strlen() and strnlen() to pass compile-time known lengths")). This, together with possibly other configurations, cause the following IR to be generated: define hidden i32 @ip_vs_protocol_init() local_unnamed_addr #5 section ".init.text" align 16 !kcfi_type !29 { %1 = alloca [64 x i8], align 16 ... 14: ; preds = %11 %15 = getelementptr inbounds i8, ptr %1, i64 63 %16 = load i8, ptr %15, align 1 %17 = tail call i1 @llvm.is.constant.i8(i8 %16) %18 = icmp eq i8 %16, 0 %19 = select i1 %17, i1 %18, i1 false br i1 %19, label %20, label %23 20: ; preds = %14 %21 = call i64 @strlen(ptr noundef nonnull dereferenceable(1) %1) #23 ... 23: ; preds = %14, %11, %20 %24 = call i64 @strnlen(ptr noundef nonnull dereferenceable(1) %1, i64 noundef 64) #24 ... } The above code calculates the address of the last char in the buffer (value %15) and then loads from it (value %16). Because the buffer is never initialized, the LLVM GVN pass marks value %16 as undefined: %13 = getelementptr inbounds i8, ptr %1, i64 63 br i1 undef, label %14, label %17 This gives later passes (SCCP, in particular) more DCE opportunities by propagating the undef value further, and eventually removes everything after the load on the uninitialized stack location: define hidden i32 @ip_vs_protocol_init() local_unnamed_addr #0 section ".init.text" align 16 !kcfi_type !11 { %1 = alloca [64 x i8], align 16 ... 12: ; preds = %11 %13 = getelementptr inbounds i8, ptr %1, i64 63 unreachable } In this way, the generated native code will just fall through to the next function, as LLVM does not generate any code for the unreachable IR instruction and leaves the function without a terminator. Zero the on-stack buffer to avoid this possible UB. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202402100205.PWXIz1ZK-lkp@intel.com/ Co-developed-by: Ruowen Qin <ruqin@redhat.com> Signed-off-by: Ruowen Qin <ruqin@redhat.com> Signed-off-by: Jinghao Jia <jinghao7@illinois.edu> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> (cherry picked from commit 146b6f1112eb30a19776d6c323c994e9d67790db) ``` Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com> --- <small>Created 2025-02-05 14:10 UTC by backporter - [KWF FAQ](https://red.ht/kernel_workflow_doc) - [Slack #team-kernel-workflow](https://redhat-internal.slack.com/archives/C04LRUPMJQ5) - [Source](https://gitlab.com/cki-project/kernel-workflow/-/blob/main/webhook/utils/backporter.py) - [Documentation](https://gitlab.com/cki-project/kernel-workflow/-/blob/main/docs/README.backporter.md) - [Report an issue](https://gitlab.com/cki-project/kernel-workflow/-/issues/new?issue%5Btitle%5D=backporter%20webhook%20issue)</small> Approved-by: Guillaume Nault <gnault@redhat.com> Approved-by: Hangbin Liu <haliu@redhat.com> Approved-by: Andrea Claudi <aclaudi@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Augusto Caringi <acaringi@redhat.com>	2025-03-06 00:01:06 -03:00
Florian Westphal	0aeb832cac	netfilter: nfnetlink_queue: drop bogus WARN_ON JIRA: https://issues.redhat.com/browse/RHEL-80104 Upstream Status: commit 631a4b3ddc78 Conflicts: net/netfilter/nfnetlink_queue.c The function was moved upstream from nf_queue.c to nfnetlink_queue.c, the former is baked into vmlinux while the latter is part of nfnetlink_queue module. While we could pick up 3f8019688894 ("netfilter: move nf_reinject into nfnetlink_queue modules"), it doesn't apply as-is either because of other upstream changes. commit 631a4b3ddc7831b20442c59c28b0476d0704c9af Author: Florian Westphal <fw@strlen.de> Date: Tue Jul 9 02:02:26 2024 +0200 netfilter: nfnetlink_queue: drop bogus WARN_ON Happens when rules get flushed/deleted while packet is out, so remove this WARN_ON. This WARN exists in one form or another since v4.14, no need to backport this to older releases, hence use a more recent fixes tag. Fixes: 3f8019688894 ("netfilter: move nf_reinject into nfnetlink_queue modules") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202407081453.11ac0f63-lkp@intel.com Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fwestpha@redhat.com>	2025-02-19 13:19:20 +01:00
CKI Backport Bot	237e6ee3cb	ipvs: fix UB due to uninitialized stack access in ip_vs_protocol_init() JIRA: https://issues.redhat.com/browse/RHEL-77915 CVE: CVE-2024-53680 commit 146b6f1112eb30a19776d6c323c994e9d67790db Author: Jinghao Jia <jinghao7@illinois.edu> Date: Sat Nov 23 03:42:56 2024 -0600 ipvs: fix UB due to uninitialized stack access in ip_vs_protocol_init() Under certain kernel configurations when building with Clang/LLVM, the compiler does not generate a return or jump as the terminator instruction for ip_vs_protocol_init(), triggering the following objtool warning during build time: vmlinux.o: warning: objtool: ip_vs_protocol_init() falls through to next function __initstub__kmod_ip_vs_rr__935_123_ip_vs_rr_init6() At runtime, this either causes an oops when trying to load the ipvs module or a boot-time panic if ipvs is built-in. This same issue has been reported by the Intel kernel test robot previously. Digging deeper into both LLVM and the kernel code reveals this to be a undefined behavior problem. ip_vs_protocol_init() uses a on-stack buffer of 64 chars to store the registered protocol names and leaves it uninitialized after definition. The function calls strnlen() when concatenating protocol names into the buffer. With CONFIG_FORTIFY_SOURCE strnlen() performs an extra step to check whether the last byte of the input char buffer is a null character (commit 3009f891bb9f ("fortify: Allow strlen() and strnlen() to pass compile-time known lengths")). This, together with possibly other configurations, cause the following IR to be generated: define hidden i32 @ip_vs_protocol_init() local_unnamed_addr #5 section ".init.text" align 16 !kcfi_type !29 { %1 = alloca [64 x i8], align 16 ... 14: ; preds = %11 %15 = getelementptr inbounds i8, ptr %1, i64 63 %16 = load i8, ptr %15, align 1 %17 = tail call i1 @llvm.is.constant.i8(i8 %16) %18 = icmp eq i8 %16, 0 %19 = select i1 %17, i1 %18, i1 false br i1 %19, label %20, label %23 20: ; preds = %14 %21 = call i64 @strlen(ptr noundef nonnull dereferenceable(1) %1) #23 ... 23: ; preds = %14, %11, %20 %24 = call i64 @strnlen(ptr noundef nonnull dereferenceable(1) %1, i64 noundef 64) #24 ... } The above code calculates the address of the last char in the buffer (value %15) and then loads from it (value %16). Because the buffer is never initialized, the LLVM GVN pass marks value %16 as undefined: %13 = getelementptr inbounds i8, ptr %1, i64 63 br i1 undef, label %14, label %17 This gives later passes (SCCP, in particular) more DCE opportunities by propagating the undef value further, and eventually removes everything after the load on the uninitialized stack location: define hidden i32 @ip_vs_protocol_init() local_unnamed_addr #0 section ".init.text" align 16 !kcfi_type !11 { %1 = alloca [64 x i8], align 16 ... 12: ; preds = %11 %13 = getelementptr inbounds i8, ptr %1, i64 63 unreachable } In this way, the generated native code will just fall through to the next function, as LLVM does not generate any code for the unreachable IR instruction and leaves the function without a terminator. Zero the on-stack buffer to avoid this possible UB. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202402100205.PWXIz1ZK-lkp@intel.com/ Co-developed-by: Ruowen Qin <ruqin@redhat.com> Signed-off-by: Ruowen Qin <ruqin@redhat.com> Signed-off-by: Jinghao Jia <jinghao7@illinois.edu> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>	2025-02-05 14:10:28 +00:00
CKI Backport Bot	39c44c42b2	netfilter: conntrack: clamp maximum hashtable size to INT_MAX JIRA: https://issues.redhat.com/browse/RHEL-77891 CVE: CVE-2025-21648 commit b541ba7d1f5a5b7b3e2e22dc9e40e18a7d6dbc13 Author: Pablo Neira Ayuso <pablo@netfilter.org> Date: Wed Jan 8 22:56:33 2025 +0100 netfilter: conntrack: clamp maximum hashtable size to INT_MAX Use INT_MAX as maximum size for the conntrack hashtable. Otherwise, it is possible to hit WARN_ON_ONCE in __kvmalloc_node_noprof() when resizing hashtable because __GFP_NOWARN is unset. See: 0708a0afe291 ("mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls") Note: hashtable resize is only possible from init_netns. Fixes: `9cc1c73ad6` ("netfilter: conntrack: avoid integer overflow when resizing") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>	2025-02-05 14:07:46 +00:00
Patrick Talbert	e3f336f694	Merge: ipvs: speed up reads from ip_vs_conn proc file MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/6185 JIRA: https://issues.redhat.com/browse/RHEL-74064 Upstream Status: net-next.git Signed-off-by: Florian Westphal <fwestpha@redhat.com> Approved-by: Guillaume Nault <gnault@redhat.com> Approved-by: Eric Garver <egarver@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Patrick Talbert <ptalbert@redhat.com>	2025-01-27 15:24:27 +01:00
Rado Vrbovsky	16f625d8bd	Merge: CVE-2024-56783: netfilter: nft_socket: remove WARN_ON_ONCE on maximum cgroup level MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/6151 JIRA: https://issues.redhat.com/browse/RHEL-73350 CVE: CVE-2024-56783 ``` netfilter: nft_socket: remove WARN_ON_ONCE on maximum cgroup level cgroup maximum depth is INT_MAX by default, there is a cgroup toggle to restrict this maximum depth to a more reasonable value not to harm performance. Remove unnecessary WARN_ON_ONCE which is reachable from userspace. Fixes: 7f3287db6543 ("netfilter: nft_socket: make cgroupsv2 matching work with namespaces") Reported-by: syzbot+57bac0866ddd99fe47c0@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> (cherry picked from commit b7529880cb961d515642ce63f9d7570869bbbdc3) ``` Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com> --- <small>Created 2025-01-13 14:42 UTC by backporter - [KWF FAQ](https://red.ht/kernel_workflow_doc) - [Slack #team-kernel-workflow](https://redhat-internal.slack.com/archives/C04LRUPMJQ5) - [Source](https://gitlab.com/cki-project/kernel-workflow/-/blob/main/webhook/utils/backporter.py) - [Documentation](https://gitlab.com/cki-project/kernel-workflow/-/blob/main/docs/README.backporter.md) - [Report an issue](https://gitlab.com/cki-project/kernel-workflow/-/issues/new?issue%5Btitle%5D=backporter%20webhook%20issue)</small> Approved-by: Florian Westphal <fwestpha@redhat.com> Approved-by: Antoine Tenart <atenart@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>	2025-01-23 13:14:29 +00:00
Florian Westphal	2dabc0c0ee	ipvs: speed up reads from ip_vs_conn proc file JIRA: https://issues.redhat.com/browse/RHEL-74064 Upstream Status: commit 178883fd039d commit 178883fd039d38a708cc56555489533d9a9c07df Author: Florian Westphal <fw@strlen.de> Date: Tue Dec 3 12:08:30 2024 +0100 ipvs: speed up reads from ip_vs_conn proc file Reading is very slow because ->start() performs a linear re-scan of the entire hash table until it finds the successor to the last dumped element. The current implementation uses 'pos' as the 'number of elements to skip, then does linear iteration until it has skipped 'pos' entries. Store the last bucket and the number of elements to skip in that bucket instead, so we can resume from bucket b directly. before this patch, its possible to read ~35k entries in one second, but each read() gets slower as the number of entries to skip grows: time timeout 60 cat /proc/net/ip_vs_conn > /tmp/all; wc -l /tmp/all real 1m0.007s user 0m0.003s sys 0m59.956s 140386 /tmp/all Only ~100k more got read in remaining the remaining 59s, and did not get nowhere near the 1m entries that are stored at the time. after this patch, dump completes very quickly: time cat /proc/net/ip_vs_conn > /tmp/all; wc -l /tmp/all real 0m2.286s user 0m0.004s sys 0m2.281s 1000001 /tmp/all Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fwestpha@redhat.com>	2025-01-15 15:11:53 +01:00
CKI Backport Bot	64ef02fcd1	netfilter: nft_set_hash: skip duplicated elements pending gc run JIRA: https://issues.redhat.com/browse/RHEL-73708 commit 7ffc7481153bbabf3332c6a19b289730c7e1edf5 Author: Pablo Neira Ayuso <pablo@netfilter.org> Date: Mon Dec 2 00:04:49 2024 +0100 netfilter: nft_set_hash: skip duplicated elements pending gc run rhashtable does not provide stable walk, duplicated elements are possible in case of resizing. I considered that checking for errors when calling rhashtable_walk_next() was sufficient to detect the resizing. However, rhashtable_walk_next() returns -EAGAIN only at the end of the iteration, which is too late, because a gc work containing duplicated elements could have been already scheduled for removal to the worker. Add a u32 gc worker sequence number per set, bump it on every workqueue run. Annotate gc worker sequence number on the expired element. Use it to skip those already seen in this gc workqueue run. Note that this new field is never reset in case gc transaction fails, so next gc worker run on the expired element overrides it. Wraparound of gc worker sequence number should not be an issue with stale gc worker sequence number in the element, that would just postpone the element removal in one gc run. Note that it is not possible to use flags to annotate that element is pending gc run to detect duplicates, given that gc transaction can be invalidated in case of update from the control plane, therefore, not allowing to clear such flag. On x86_64, pahole reports no changes in the size of nft_rhash_elem. Fixes: f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API") Reported-by: Laurent Fasnacht <laurent.fasnacht@proton.ch> Tested-by: Laurent Fasnacht <laurent.fasnacht@proton.ch> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>	2025-01-13 15:04:44 +00:00
CKI Backport Bot	f873ac0414	netfilter: nft_inner: incorrect percpu area handling under softirq JIRA: https://issues.redhat.com/browse/RHEL-73708 commit 7b1d83da254be3bf054965c8f3b1ad976f460ae5 Author: Pablo Neira Ayuso <pablo@netfilter.org> Date: Wed Nov 27 12:46:54 2024 +0100 netfilter: nft_inner: incorrect percpu area handling under softirq Softirq can interrupt ongoing packet from process context that is walking over the percpu area that contains inner header offsets. Disable bh and perform three checks before restoring the percpu inner header offsets to validate that the percpu area is valid for this skbuff: 1) If the NFT_PKTINFO_INNER_FULL flag is set on, then this skbuff has already been parsed before for inner header fetching to register. 2) Validate that the percpu area refers to this skbuff using the skbuff pointer as a cookie. If there is a cookie mismatch, then this skbuff needs to be parsed again. 3) Finally, validate if the percpu area refers to this tunnel type. Only after these three checks the percpu area is restored to a on-stack copy and bh is enabled again. After inner header fetching, the on-stack copy is stored back to the percpu area. Fixes: 3a07327d10a0 ("netfilter: nft_inner: support for inner tunnel header matching") Reported-by: syzbot+84d0441b9860f0d63285@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>	2025-01-13 15:04:43 +00:00
CKI Backport Bot	5e54e2cc17	netfilter: x_tables: fix LED ID check in led_tg_check() JIRA: https://issues.redhat.com/browse/RHEL-73708 commit 04317f4eb2aad312ad85c1a17ad81fe75f1f9bc7 Author: Dmitry Antipov <dmantipov@yandex.ru> Date: Thu Nov 21 09:55:42 2024 +0300 netfilter: x_tables: fix LED ID check in led_tg_check() Syzbot has reported the following BUG detected by KASAN: BUG: KASAN: slab-out-of-bounds in strlen+0x58/0x70 Read of size 1 at addr ffff8881022da0c8 by task repro/5879 ... Call Trace: <TASK> dump_stack_lvl+0x241/0x360 ? __pfx_dump_stack_lvl+0x10/0x10 ? __pfx__printk+0x10/0x10 ? _printk+0xd5/0x120 ? __virt_addr_valid+0x183/0x530 ? __virt_addr_valid+0x183/0x530 print_report+0x169/0x550 ? __virt_addr_valid+0x183/0x530 ? __virt_addr_valid+0x183/0x530 ? __virt_addr_valid+0x45f/0x530 ? __phys_addr+0xba/0x170 ? strlen+0x58/0x70 kasan_report+0x143/0x180 ? strlen+0x58/0x70 strlen+0x58/0x70 kstrdup+0x20/0x80 led_tg_check+0x18b/0x3c0 xt_check_target+0x3bb/0xa40 ? __pfx_xt_check_target+0x10/0x10 ? stack_depot_save_flags+0x6e4/0x830 ? nft_target_init+0x174/0xc30 nft_target_init+0x82d/0xc30 ? __pfx_nft_target_init+0x10/0x10 ? nf_tables_newrule+0x1609/0x2980 ? nf_tables_newrule+0x1609/0x2980 ? rcu_is_watching+0x15/0xb0 ? nf_tables_newrule+0x1609/0x2980 ? nf_tables_newrule+0x1609/0x2980 ? __kmalloc_noprof+0x21a/0x400 nf_tables_newrule+0x1860/0x2980 ? __pfx_nf_tables_newrule+0x10/0x10 ? __nla_parse+0x40/0x60 nfnetlink_rcv+0x14e5/0x2ab0 ? __pfx_validate_chain+0x10/0x10 ? __pfx_nfnetlink_rcv+0x10/0x10 ? __lock_acquire+0x1384/0x2050 ? netlink_deliver_tap+0x2e/0x1b0 ? __pfx_lock_release+0x10/0x10 ? netlink_deliver_tap+0x2e/0x1b0 netlink_unicast+0x7f8/0x990 ? __pfx_netlink_unicast+0x10/0x10 ? __virt_addr_valid+0x183/0x530 ? __check_object_size+0x48e/0x900 netlink_sendmsg+0x8e4/0xcb0 ? __pfx_netlink_sendmsg+0x10/0x10 ? aa_sock_msg_perm+0x91/0x160 ? __pfx_netlink_sendmsg+0x10/0x10 __sock_sendmsg+0x223/0x270 ____sys_sendmsg+0x52a/0x7e0 ? __pfx_____sys_sendmsg+0x10/0x10 __sys_sendmsg+0x292/0x380 ? __pfx___sys_sendmsg+0x10/0x10 ? lockdep_hardirqs_on_prepare+0x43d/0x780 ? __pfx_lockdep_hardirqs_on_prepare+0x10/0x10 ? exc_page_fault+0x590/0x8c0 ? do_syscall_64+0xb6/0x230 do_syscall_64+0xf3/0x230 entry_SYSCALL_64_after_hwframe+0x77/0x7f ... </TASK> Since an invalid (without '\0' byte at all) byte sequence may be passed from userspace, add an extra check to ensure that such a sequence is rejected as possible ID and so never passed to 'kstrdup()' and further. Reported-by: syzbot+6c8215822f35fdb35667@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=6c8215822f35fdb35667 Fixes: `268cb38e18` ("netfilter: x_tables: add LED trigger target") Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>	2025-01-13 15:04:42 +00:00
CKI Backport Bot	2e568d0cde	netfilter: ipset: add missing range check in bitmap_ip_uadt JIRA: https://issues.redhat.com/browse/RHEL-73708 commit 35f56c554eb1b56b77b3cf197a6b00922d49033d Author: Jeongjun Park <aha310510@gmail.com> Date: Wed Nov 13 22:02:09 2024 +0900 netfilter: ipset: add missing range check in bitmap_ip_uadt When tb[IPSET_ATTR_IP_TO] is not present but tb[IPSET_ATTR_CIDR] exists, the values of ip and ip_to are slightly swapped. Therefore, the range check for ip should be done later, but this part is missing and it seems that the vulnerability occurs. So we should add missing range checks and remove unnecessary range checks. Cc: <stable@vger.kernel.org> Reported-by: syzbot+58c872f7790a4d2ac951@syzkaller.appspotmail.com Fixes: `72205fc68b` ("netfilter: ipset: bitmap:ip set type support") Signed-off-by: Jeongjun Park <aha310510@gmail.com> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>	2025-01-13 15:04:41 +00:00
CKI Backport Bot	b5a8b80ce0	netfilter: nf_tables: must hold rcu read lock while iterating object type list JIRA: https://issues.redhat.com/browse/RHEL-73708 commit cddc04275f95ca3b18da5c0fb111705ac173af89 Author: Florian Westphal <fw@strlen.de> Date: Mon Nov 4 10:41:19 2024 +0100 netfilter: nf_tables: must hold rcu read lock while iterating object type list Update of stateful object triggers: WARNING: suspicious RCU usage net/netfilter/nf_tables_api.c:7759 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by nft/3060: #0: ffff88810f0578c8 (&nft_net->commit_mutex){+.+.}-{4:4}, [..] ... but this list is not protected by the transaction mutex but the nfnl nftables subsystem mutex. Switch to nft_obj_type_get which will acquire rcu read lock, bump refcount, and returns the result. v3: Dan Carpenter points out nft_obj_type_get returns error pointer, not NULL, on error. Fixes: dad3bdeef45f ("netfilter: nf_tables: fix memory leak during stateful obj update"). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>	2025-01-13 15:04:40 +00:00
CKI Backport Bot	91ba8bb662	netfilter: nf_tables: must hold rcu read lock while iterating expression type list JIRA: https://issues.redhat.com/browse/RHEL-73708 commit ee666a541ed957937454d50afa4757924508cd74 Author: Florian Westphal <fw@strlen.de> Date: Mon Nov 4 10:41:18 2024 +0100 netfilter: nf_tables: must hold rcu read lock while iterating expression type list nft shell tests trigger: WARNING: suspicious RCU usage net/netfilter/nf_tables_api.c:3125 RCU-list traversed in non-reader section!! 1 lock held by nft/2068: #0: ffff888106c6f8c8 (&nft_net->commit_mutex){+.+.}-{4:4}, at: nf_tables_valid_genid+0x3c/0xf0 But the transaction mutex doesn't protect this list, the nfnl subsystem mutex would, but we can't acquire it here without risk of ABBA deadlocks. Acquire the rcu read lock to avoid this issue. v3: add a comment that explains the ->inner_ops check implies expression is builtin and lack of a module owner reference is ok. Fixes: 3a07327d10a0 ("netfilter: nft_inner: support for inner tunnel header matching") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>	2025-01-13 15:04:39 +00:00
CKI Backport Bot	f9140cdd7d	netfilter: ctnetlink: support CTA_FILTER for flush JIRA: https://issues.redhat.com/browse/RHEL-73708 commit 1ef7f50ccc6e8e2b5de96ad1e304684a277a3055 Author: Changliang Wu <changliang.wu@smartx.com> Date: Thu Jun 20 19:35:27 2024 +0800 netfilter: ctnetlink: support CTA_FILTER for flush From `cb8aa9a`, we can use kernel side filtering for dump, but this capability is not available for flush. This Patch allows advanced filter with CTA_FILTER for flush Performace 1048576 ct flows in total, delete 50,000 flows by origin src ip 3.06s -> dump all, compare and delete 584ms -> directly flush with filter Signed-off-by: Changliang Wu <changliang.wu@smartx.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>	2025-01-13 15:04:36 +00:00
CKI Backport Bot	a699b653ab	netfilter: nfnetlink: convert kfree_skb to consume_skb JIRA: https://issues.redhat.com/browse/RHEL-73708 commit e2444c1d463995477fb447be9d0c54150a5c393b Author: Donald Hunter <donald.hunter@gmail.com> Date: Tue May 28 11:37:54 2024 +0100 netfilter: nfnetlink: convert kfree_skb to consume_skb Use consume_skb in the batch code path to avoid generating spurious NOT_SPECIFIED skb drop reasons. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>	2025-01-13 15:04:35 +00:00
CKI Backport Bot	229e48953b	netfilter: conntrack: fix ct-state for ICMPv6 Multicast Router Discovery JIRA: https://issues.redhat.com/browse/RHEL-73708 commit 4a3540a8bf3c13dc3955f0c0895332b9c653be3f Author: Linus Lüssing <linus.luessing@c0d3.blue> Date: Wed Mar 6 15:18:04 2024 +0100 netfilter: conntrack: fix ct-state for ICMPv6 Multicast Router Discovery So far Multicast Router Advertisements and Multicast Router Solicitations from the Multicast Router Discovery protocol (RFC4286) would be marked as INVALID for IPv6, even if they are in fact intact and adhering to RFC4286. This broke MRA reception and by that multicast reception on IPv6 multicast routers in a Proxmox managed setup, where Proxmox would install a rule like "-m conntrack --ctstate INVALID -j DROP" at the top of the FORWARD chain with br-nf-call-ip6tables enabled by default. Similar to as it's done for MLDv1, MLDv2 and IPv6 Neighbor Discovery already, fix this issue by excluding MRD from connection tracking handling as MRD always uses predefined multicast destinations for its messages, too. This changes the ct-state for ICMPv6 MRD messages from INVALID to UNTRACKED. This issue was found and fixed with the help of the mrdisc tool (https://github.com/troglobit/mrdisc). Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>	2025-01-13 15:04:34 +00:00
CKI Backport Bot	dfe93f9c5c	netfilter: nf_tables: skip transaction if update object is not implemented JIRA: https://issues.redhat.com/browse/RHEL-73708 commit 84b1a0c0140a9a92ea108576c0002210f224ce59 Author: Pablo Neira Ayuso <pablo@netfilter.org> Date: Tue Mar 5 09:35:48 2024 +0100 netfilter: nf_tables: skip transaction if update object is not implemented Turn update into noop as a follow up for: `9fedd894b4` ("netfilter: nf_tables: fix unexpected EOPNOTSUPP error") instead of adding a transaction object which is simply discarded at a later stage of the commit protocol. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>	2025-01-13 15:04:33 +00:00
CKI Backport Bot	9a914a6c65	netfilter: nft_socket: remove WARN_ON_ONCE on maximum cgroup level JIRA: https://issues.redhat.com/browse/RHEL-73350 CVE: CVE-2024-56783 commit b7529880cb961d515642ce63f9d7570869bbbdc3 Author: Pablo Neira Ayuso <pablo@netfilter.org> Date: Tue Nov 26 11:59:06 2024 +0100 netfilter: nft_socket: remove WARN_ON_ONCE on maximum cgroup level cgroup maximum depth is INT_MAX by default, there is a cgroup toggle to restrict this maximum depth to a more reasonable value not to harm performance. Remove unnecessary WARN_ON_ONCE which is reachable from userspace. Fixes: 7f3287db6543 ("netfilter: nft_socket: make cgroupsv2 matching work with namespaces") Reported-by: syzbot+57bac0866ddd99fe47c0@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>	2025-01-13 14:42:49 +00:00
Rado Vrbovsky	db51a70cea	Merge: netfilter: ipset: Fix for recursive locking warning MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/6059 ``` JIRA: https://issues.redhat.com/browse/RHEL-35897 Upstream Status: net.git commit 70b6f46a4ed8bd56c85ffff22df91e20e8c85e33 commit 70b6f46a4ed8bd56c85ffff22df91e20e8c85e33 Author: Phil Sutter <phil@nwl.cc> Date: Tue Dec 17 20:56:55 2024 +0100 netfilter: ipset: Fix for recursive locking warning With CONFIG_PROVE_LOCKING, when creating a set of type bitmap:ip, adding it to a set of type list:set and populating it from iptables SET target triggers a kernel warning: \| WARNING: possible recursive locking detected \| 6.12.0-rc7-01692-g5e9a28f41134-dirty #594 Not tainted \| -------------------------------------------- \| ping/4018 is trying to acquire lock: \| ffff8881094a6848 (&set->lock){+.-.}-{2:2}, at: ip_set_add+0x28c/0x360 [ip_set] \| \| but task is already holding lock: \| ffff88811034c048 (&set->lock){+.-.}-{2:2}, at: ip_set_add+0x28c/0x360 [ip_set] This is a false alarm: ipset does not allow nested list:set type, so the loop in list_set_kadd() can never encounter the outer set itself. No other set type supports embedded sets, so this is the only case to consider. To avoid the false report, create a distinct lock class for list:set type ipset locks. Fixes: `f830837f0e` ("netfilter: ipset: list:set set type support") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> ``` Signed-off-by: Phil Sutter <psutter@redhat.com> Approved-by: Florian Westphal <fwestpha@redhat.com> Approved-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>	2025-01-06 08:26:07 +00:00
Patrick Talbert	5838c30c9f	Merge: netfilter: IDLETIMER: Fix for possible ABBA deadlock MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/6021 ``` JIRA: https://issues.redhat.com/browse/RHEL-6041 Upstream Status: net.git commit f36b01994d68ffc253c8296e2228dfe6e6431c03 commit f36b01994d68ffc253c8296e2228dfe6e6431c03 Author: Phil Sutter <phil@nwl.cc> Date: Fri Dec 6 19:32:29 2024 +0100 netfilter: IDLETIMER: Fix for possible ABBA deadlock Deletion of the last rule referencing a given idletimer may happen at the same time as a read of its file in sysfs: \| ====================================================== \| WARNING: possible circular locking dependency detected \| 6.12.0-rc7-01692-g5e9a28f41134-dirty #594 Not tainted \| ------------------------------------------------------ \| iptables/3303 is trying to acquire lock: \| ffff8881057e04b8 (kn->active#48){++++}-{0:0}, at: __kernfs_remove+0x20 \| \| but task is already holding lock: \| ffffffffa0249068 (list_mutex){+.+.}-{3:3}, at: idletimer_tg_destroy_v] \| \| which lock already depends on the new lock. A simple reproducer is: \| #!/bin/bash \| \| while true; do \| iptables -A INPUT -i foo -j IDLETIMER --timeout 10 --label "testme" \| iptables -D INPUT -i foo -j IDLETIMER --timeout 10 --label "testme" \| done & \| while true; do \| cat /sys/class/xt_idletimer/timers/testme >/dev/null \| done Avoid this by freeing list_mutex right after deleting the element from the list, then continuing with the teardown. Fixes: `0902b469bd` ("netfilter: xtables: idletimer target implementation") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Phil Sutter <psutter@redhat.com> ``` Approved-by: Florian Westphal <fwestpha@redhat.com> Approved-by: Antoine Tenart <atenart@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Patrick Talbert <ptalbert@redhat.com>	2024-12-30 07:30:25 -05:00
Patrick Talbert	98f52f1680	Merge: CNB96: netlink/devlink: update devlink & netlink to the v6.12 MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5861 JIRA: https://issues.redhat.com/browse/RHEL-57756 Depends: !5257 Depends: !5851 Signed-off-by: Petr Oros <poros@redhat.com> Approved-by: José Ignacio Tornos Martínez <jtornosm@redhat.com> Approved-by: Davide Caratti <dcaratti@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Patrick Talbert <ptalbert@redhat.com>	2024-12-30 07:30:10 -05:00
Phil Sutter	385069a5f1	netfilter: ipset: Fix for recursive locking warning JIRA: https://issues.redhat.com/browse/RHEL-35897 Upstream Status: net.git commit 70b6f46a4ed8bd56c85ffff22df91e20e8c85e33 commit 70b6f46a4ed8bd56c85ffff22df91e20e8c85e33 Author: Phil Sutter <phil@nwl.cc> Date: Tue Dec 17 20:56:55 2024 +0100 netfilter: ipset: Fix for recursive locking warning With CONFIG_PROVE_LOCKING, when creating a set of type bitmap:ip, adding it to a set of type list:set and populating it from iptables SET target triggers a kernel warning: \| WARNING: possible recursive locking detected \| 6.12.0-rc7-01692-g5e9a28f41134-dirty #594 Not tainted \| -------------------------------------------- \| ping/4018 is trying to acquire lock: \| ffff8881094a6848 (&set->lock){+.-.}-{2:2}, at: ip_set_add+0x28c/0x360 [ip_set] \| \| but task is already holding lock: \| ffff88811034c048 (&set->lock){+.-.}-{2:2}, at: ip_set_add+0x28c/0x360 [ip_set] This is a false alarm: ipset does not allow nested list:set type, so the loop in list_set_kadd() can never encounter the outer set itself. No other set type supports embedded sets, so this is the only case to consider. To avoid the false report, create a distinct lock class for list:set type ipset locks. Fixes: `f830837f0e` ("netfilter: ipset: list:set set type support") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Phil Sutter <psutter@redhat.com>	2024-12-19 13:43:25 +01:00
Phil Sutter	16c2cecd78	netfilter: IDLETIMER: Fix for possible ABBA deadlock JIRA: https://issues.redhat.com/browse/RHEL-6041 Upstream Status: net.git commit f36b01994d68ffc253c8296e2228dfe6e6431c03 commit f36b01994d68ffc253c8296e2228dfe6e6431c03 Author: Phil Sutter <phil@nwl.cc> Date: Fri Dec 6 19:32:29 2024 +0100 netfilter: IDLETIMER: Fix for possible ABBA deadlock Deletion of the last rule referencing a given idletimer may happen at the same time as a read of its file in sysfs: \| ====================================================== \| WARNING: possible circular locking dependency detected \| 6.12.0-rc7-01692-g5e9a28f41134-dirty #594 Not tainted \| ------------------------------------------------------ \| iptables/3303 is trying to acquire lock: \| ffff8881057e04b8 (kn->active#48){++++}-{0:0}, at: __kernfs_remove+0x20 \| \| but task is already holding lock: \| ffffffffa0249068 (list_mutex){+.+.}-{3:3}, at: idletimer_tg_destroy_v] \| \| which lock already depends on the new lock. A simple reproducer is: \| #!/bin/bash \| \| while true; do \| iptables -A INPUT -i foo -j IDLETIMER --timeout 10 --label "testme" \| iptables -D INPUT -i foo -j IDLETIMER --timeout 10 --label "testme" \| done & \| while true; do \| cat /sys/class/xt_idletimer/timers/testme >/dev/null \| done Avoid this by freeing list_mutex right after deleting the element from the list, then continuing with the teardown. Fixes: `0902b469bd` ("netfilter: xtables: idletimer target implementation") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Phil Sutter <psutter@redhat.com>	2024-12-12 15:46:12 +01:00
Petr Oros	1cd6777e53	netfilter: nfnetlink: Initialise extack before use in ACKs JIRA: https://issues.redhat.com/browse/RHEL-57756 CVE: CVE-2024-44945 Upstream commit(s): commit d1a7b382a9d3f0f3e5a80e0be2991c075fa4f618 Author: Donald Hunter <donald.hunter@gmail.com> Date: Tue Aug 6 16:43:24 2024 +0100 netfilter: nfnetlink: Initialise extack before use in ACKs Add missing extack initialisation when ACKing BATCH_BEGIN and BATCH_END. Fixes: bf2ac490d28c ("netfilter: nfnetlink: Handle ACK flags for batch messages") Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Petr Oros <poros@redhat.com>	2024-12-10 10:37:56 +01:00
Petr Oros	dc1955b023	netfilter: nfnetlink: Handle ACK flags for batch messages JIRA: https://issues.redhat.com/browse/RHEL-57756 Upstream commit(s): commit bf2ac490d28c21a349e9eef81edc45320fca4a3c Author: Donald Hunter <donald.hunter@gmail.com> Date: Thu Apr 18 11:47:37 2024 +0100 netfilter: nfnetlink: Handle ACK flags for batch messages The NLM_F_ACK flag is ignored for nfnetlink batch begin and end messages. This is a problem for ynl which wants to receive an ack for every message it sends, not just the commands in between the begin/end messages. Add processing for ACKs for begin/end messages and provide responses when requested. I have checked that iproute2, pyroute2 and systemd are unaffected by this change since none of them use NLM_F_ACK for batch begin/end. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20240418104737.77914-5-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Petr Oros <poros@redhat.com>	2024-12-10 10:37:53 +01:00
Phil Sutter	fd462b693e	netfilter: ipset: Hold module reference while requesting a module JIRA: https://issues.redhat.com/browse/RHEL-35819 Upstream Status: net.git commit 456f010bfaefde84d3390c755eedb1b0a5857c3c commit 456f010bfaefde84d3390c755eedb1b0a5857c3c Author: Phil Sutter <phil@nwl.cc> Date: Fri Nov 29 16:30:38 2024 +0100 netfilter: ipset: Hold module reference while requesting a module User space may unload ip_set.ko while it is itself requesting a set type backend module, leading to a kernel crash. The race condition may be provoked by inserting an mdelay() right after the nfnl_unlock() call. Fixes: `a7b4f989a6` ("netfilter: ipset: IP set core support") Signed-off-by: Phil Sutter <phil@nwl.cc> Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Phil Sutter <psutter@redhat.com>	2024-12-05 12:59:39 +01:00
Rado Vrbovsky	9da3f14bc5	Merge: CVE-2024-50251: netfilter: nft_payload: sanitize offset and length before calling skb_checksum() MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5746 JIRA: https://issues.redhat.com/browse/RHEL-66855 CVE: CVE-2024-50251 ``` netfilter: nft_payload: sanitize offset and length before calling skb_checksum() If access to offset + length is larger than the skbuff length, then skb_checksum() triggers BUG_ON(). skb_checksum() internally subtracts the length parameter while iterating over skbuff, BUG_ON(len) at the end of it checks that the expected length to be included in the checksum calculation is fully consumed. Fixes: `7ec3f7b47b` ("netfilter: nft_payload: add packet mangling support") Reported-by: Slavin Liu <slavin-ayu@qq.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> (cherry picked from commit d5953d680f7e96208c29ce4139a0e38de87a57fe) ``` Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com> --- <small>Created 2024-11-11 06:17 UTC by backporter - [KWF FAQ](https://red.ht/kernel_workflow_doc) - [Slack #team-kernel-workflow](https://redhat-internal.slack.com/archives/C04LRUPMJQ5) - [Source](https://gitlab.com/cki-project/kernel-workflow/-/blob/main/webhook/utils/backporter.py) - [Documentation](https://gitlab.com/cki-project/kernel-workflow/-/blob/main/docs/README.backporter.md) - [Report an issue](https://gitlab.com/cki-project/kernel-workflow/-/issues/new?issue%5Btitle%5D=backporter%20webhook%20issue)</small> Approved-by: Antoine Tenart <atenart@redhat.com> Approved-by: Xin Long <lxin@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>	2024-11-27 11:19:38 +00:00
Rado Vrbovsky	17ebd1b961	Merge: netfilter: bpf: must hold reference on net namespace MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5719 JIRA: https://issues.redhat.com/browse/RHEL-65877 Upstream Status: commit 1230fe7ad397 CVE: CVE-2024-50130 Signed-off-by: Florian Westphal <fwestpha@redhat.com> Approved-by: Chris von Recklinghausen <crecklin@redhat.com> Approved-by: Xin Long <lxin@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>	2024-11-22 09:25:30 +00:00
Antoine Tenart	f0adec3f81	ipv6: annotate data-races around cnf.hop_limit JIRA: https://issues.redhat.com/browse/RHEL-62203 Upstream Status: linux.git commit e0bb2675fea2783c45bb95d74f00c55156720863 Author: Eric Dumazet <edumazet@google.com> Date: Wed Feb 28 13:54:29 2024 +0000 ipv6: annotate data-races around cnf.hop_limit idev->cnf.hop_limit and net->ipv6.devconf_all->hop_limit might be read locklessly, add appropriate READ_ONCE() and WRITE_ONCE() annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Florian Westphal <fw@strlen.de> # for netfilter parts Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Antoine Tenart <atenart@redhat.com>	2024-11-14 10:16:48 +01:00
Rado Vrbovsky	fc0c68cffc	Merge: netfilter: xtables: avoid NFPROTO_UNSPEC where needed MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5575 ``` CVE: CVE-2024-50038 JIRA: https://issues.redhat.com/browse/RHEL-63905 ``` Signed-off-by: Phil Sutter <psutter@redhat.com> Approved-by: Florian Westphal <fwestpha@redhat.com> Approved-by: Antoine Tenart <atenart@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>	2024-11-12 08:14:14 +00:00
Rado Vrbovsky	ab32b3c363	Merge: ipvs: properly dereference pe in ip_vs_add_service MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5412 ``` CVE: CVE-2024-42322 JIRA: https://issues.redhat.com/browse/RHEL-54908 Upstream Status: commit cbd070a4ae62f119058973f6d2c984e325bce6e7 Conflicts: - Context change due to missing commit 705dd3444081 ("ipvs: use kthreads for stats estimation"). commit cbd070a4ae62f119058973f6d2c984e325bce6e7 Author: Chen Hanxiao <chenhx.fnst@fujitsu.com> Date: Thu Jun 27 14:15:15 2024 +0800 ipvs: properly dereference pe in ip_vs_add_service Use pe directly to resolve sparse warning: net/netfilter/ipvs/ip_vs_ctl.c:1471:27: warning: dereference of noderef expression Fixes: `39b9722315` ("ipvs: handle connections started by real-servers") Signed-off-by: Chen Hanxiao <chenhx.fnst@fujitsu.com> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Phil Sutter <psutter@redhat.com> ``` Approved-by: Florian Westphal <fwestpha@redhat.com> Approved-by: Antoine Tenart <atenart@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>	2024-11-12 08:03:30 +00:00
CKI Backport Bot	86076f0f96	netfilter: nft_payload: sanitize offset and length before calling skb_checksum() JIRA: https://issues.redhat.com/browse/RHEL-66855 CVE: CVE-2024-50251 commit d5953d680f7e96208c29ce4139a0e38de87a57fe Author: Pablo Neira Ayuso <pablo@netfilter.org> Date: Wed Oct 30 23:13:48 2024 +0100 netfilter: nft_payload: sanitize offset and length before calling skb_checksum() If access to offset + length is larger than the skbuff length, then skb_checksum() triggers BUG_ON(). skb_checksum() internally subtracts the length parameter while iterating over skbuff, BUG_ON(len) at the end of it checks that the expected length to be included in the checksum calculation is fully consumed. Fixes: `7ec3f7b47b` ("netfilter: nft_payload: add packet mangling support") Reported-by: Slavin Liu <slavin-ayu@qq.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>	2024-11-11 06:17:54 +00:00
Florian Westphal	012a65c2b6	netfilter: bpf: must hold reference on net namespace JIRA: https://issues.redhat.com/browse/RHEL-65877 Upstream Status: commit 1230fe7ad397 CVE: CVE-2024-50130 Conflicts: net/netfilter/nf_bpf_link.c RHEL9 lacks nf_bpf defrag support that came with commit 1721c2d02d3 ("netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link"), so discard/ignore bpf_nf_disable_defrag() call. commit 1230fe7ad3974f7bf6c78901473e039b34d4fb1f Author: Florian Westphal <fw@strlen.de> Date: Thu Oct 10 18:34:05 2024 +0200 netfilter: bpf: must hold reference on net namespace BUG: KASAN: slab-use-after-free in __nf_unregister_net_hook+0x640/0x6b0 Read of size 8 at addr ffff8880106fe400 by task repro/72= bpf_nf_link_release+0xda/0x1e0 bpf_link_free+0x139/0x2d0 bpf_link_release+0x68/0x80 __fput+0x414/0xb60 Eric says: It seems that bpf was able to defer the __nf_unregister_net_hook() after exit()/close() time. Perhaps a netns reference is missing, because the netns has been dismantled/freed already. bpf_nf_link_attach() does : link->net = net; But I do not see a reference being taken on net. Add such a reference and release it after hook unreg. Note that I was unable to get syzbot reproducer to work, so I do not know if this resolves this splat. Fixes: 84601d6ee68a ("bpf: add bpf_link support for BPF_NETFILTER programs") Diagnosed-by: Eric Dumazet <edumazet@google.com> Reported-by: Lai, Yi <yi1.lai@linux.intel.com> Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fwestpha@redhat.com>	2024-11-07 21:52:35 +01:00
Rado Vrbovsky	14b4cc02eb	Merge: BPF 6.9 rebase MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5142 Rebase BPF subsystem to upstream version 6.9 JIRA: https://issues.redhat.com/browse/RHEL-23649 Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Approved-by: Viktor Malik <vmalik@redhat.com> Approved-by: Chris von Recklinghausen <crecklin@redhat.com> Approved-by: Rafael Aquini <raquini@redhat.com> Approved-by: Mark Salter <msalter@redhat.com> Approved-by: Toke Høiland-Jørgensen <toke@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>	2024-10-30 07:25:08 +00:00
Rado Vrbovsky	570a71d7db	Merge: mm: update core code to v6.6 upstream MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5252 JIRA: https://issues.redhat.com/browse/RHEL-27743 JIRA: https://issues.redhat.com/browse/RHEL-59459 CVE: CVE-2024-46787 Depends: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4961 This MR brings RHEL9 core MM code up to upstream's v6.6 LTS level. This work follows up on the previous v6.5 update (RHEL-27742) and as such, the bulk of this changeset is comprised of refactoring and clean-ups of the internal implementation of several APIs as it further advances the conversion to FOLIOS, and follow up on the per-VMA locking changes. Also, with the rebase to v6.6 LTS, we complete the infrastructure to allow Control-flow Enforcement Technology, a.k.a. Shadow Stacks, for x86 builds, and we add a potential extra level of protection (assessment pending) to help on mitigating kernel heap exploits dubbed as "SlubStick". Follow-up fixes are omitted from this series either because they are irrelevant to the bits we support on RHEL or because they depend on bigger changesets introduced upstream more recently. A follow-up ticket (RHEL-27745) will deal with these and other cases separately. Omitted-fix: e540b8c5da04 ("mips: mm: add slab availability checking in ioremap_prot") Omitted-fix: f7875966dc0c ("tools headers UAPI: Sync files changed by new fchmodat2 and map_shadow_stack syscalls with the kernel sources") Omitted-fix: df39038cd895 ("s390/mm: Fix VM_FAULT_HWPOISON handling in do_exception()") Omitted-fix: 12bbaae7635a ("mm: create FOLIO_FLAG_FALSE and FOLIO_TYPE_OPS macros") Omitted-fix: fd1a745ce03e ("mm: support page_mapcount() on page_has_type() pages") Omitted-fix: d99e3140a4d3 ("mm: turn folio_test_hugetlb into a PageType") Omitted-fix: fa2690af573d ("mm: page_ref: remove folio_try_get_rcu()") Omitted-fix: f442fa614137 ("mm: gup: stop abusing try_grab_folio") Omitted-fix: cb0f01beb166 ("mm/mprotect: fix dax pud handling") Signed-off-by: Rafael Aquini <raquini@redhat.com> Approved-by: John W. Linville <linville@redhat.com> Approved-by: Mark Salter <msalter@redhat.com> Approved-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Approved-by: Chris von Recklinghausen <crecklin@redhat.com> Approved-by: Steve Best <sbest@redhat.com> Approved-by: David Airlie <airlied@redhat.com> Approved-by: Michal Schmidt <mschmidt@redhat.com> Approved-by: Baoquan He <5820488-baoquan_he@users.noreply.gitlab.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>	2024-10-30 07:22:28 +00:00
Phil Sutter	01de117062	netfilter: xtables: fix typo causing some targets not to load on IPv6 CVE: CVE-2024-50038 JIRA: https://issues.redhat.com/browse/RHEL-63905 Upstream Status: net.git commit 306ed1728e8438caed30332e1ab46b28c25fe3d8 commit 306ed1728e8438caed30332e1ab46b28c25fe3d8 Author: Pablo Neira Ayuso <pablo@netfilter.org> Date: Sun Oct 20 14:49:51 2024 +0200 netfilter: xtables: fix typo causing some targets not to load on IPv6 - There is no NFPROTO_IPV6 family for mark and NFLOG. - TRACE is also missing module autoload with NFPROTO_IPV6. This results in ip6tables failing to restore a ruleset. This issue has been reported by several users providing incomplete patches. Very similar to Ilya Katsnelson's patch including a missing chunk in the TRACE extension. Fixes: 0bfcb7b71e73 ("netfilter: xtables: avoid NFPROTO_UNSPEC where needed") Reported-by: Ignat Korchagin <ignat@cloudflare.com> Reported-by: Ilya Katsnelson <me@0upti.me> Reported-by: Krzysztof Olędzki <ole@ans.pl> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Phil Sutter <psutter@redhat.com>	2024-10-23 16:19:47 +02:00
Phil Sutter	3f35e92a41	netfilter: xtables: avoid NFPROTO_UNSPEC where needed CVE: CVE-2024-50038 JIRA: https://issues.redhat.com/browse/RHEL-63905 Upstream Status: commit 0bfcb7b71e735560077a42847f69597ec7dcc326 Conflicts: Missing commit f2e3778db7e1 ("netfilter: remove xt pernet data") in RHEL9, keep the deprecation warning for NOTRACK target. commit 0bfcb7b71e735560077a42847f69597ec7dcc326 Author: Florian Westphal <fw@strlen.de> Date: Mon Oct 7 11:28:16 2024 +0200 netfilter: xtables: avoid NFPROTO_UNSPEC where needed syzbot managed to call xt_cluster match via ebtables: WARNING: CPU: 0 PID: 11 at net/netfilter/xt_cluster.c:72 xt_cluster_mt+0x196/0x780 [..] ebt_do_table+0x174b/0x2a40 Module registers to NFPROTO_UNSPEC, but it assumes ipv4/ipv6 packet processing. As this is only useful to restrict locally terminating TCP/UDP traffic, register this for ipv4 and ipv6 family only. Pablo points out that this is a general issue, direct users of the set/getsockopt interface can call into targets/matches that were only intended for use with ip(6)tables. Check all UNSPEC matches and targets for similar issues: - matches and targets are fine except if they assume skb_network_header() is valid -- this is only true when called from inet layer: ip(6) stack pulls the ip/ipv6 header into linear data area. - targets that return XT_CONTINUE or other xtables verdicts must be restricted too, they are incompatbile with the ebtables traverser, e.g. EBT_CONTINUE is a completely different value than XT_CONTINUE. Most matches/targets are changed to register for NFPROTO_IPV4/IPV6, as they are provided for use by ip(6)tables. The MARK target is also used by arptables, so register for NFPROTO_ARP too. While at it, bail out if connbytes fails to enable the corresponding conntrack family. This change passes the selftests in iptables.git. Reported-by: syzbot+256c348558aa5cf611a9@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netfilter-devel/66fec2e2.050a0220.9ec68.0047.GAE@google.com/ Fixes: `0269ea4937` ("netfilter: xtables: add cluster match") Signed-off-by: Florian Westphal <fw@strlen.de> Co-developed-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Phil Sutter <psutter@redhat.com>	2024-10-23 16:19:47 +02:00
Phil Sutter	667495128e	ipvs: properly dereference pe in ip_vs_add_service CVE: CVE-2024-42322 JIRA: https://issues.redhat.com/browse/RHEL-54908 Upstream Status: commit cbd070a4ae62f119058973f6d2c984e325bce6e7 Conflicts: - Context change due to missing commit 705dd3444081 ("ipvs: use kthreads for stats estimation"). commit cbd070a4ae62f119058973f6d2c984e325bce6e7 Author: Chen Hanxiao <chenhx.fnst@fujitsu.com> Date: Thu Jun 27 14:15:15 2024 +0800 ipvs: properly dereference pe in ip_vs_add_service Use pe directly to resolve sparse warning: net/netfilter/ipvs/ip_vs_ctl.c:1471:27: warning: dereference of noderef expression Fixes: `39b9722315` ("ipvs: handle connections started by real-servers") Signed-off-by: Chen Hanxiao <chenhx.fnst@fujitsu.com> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Phil Sutter <psutter@redhat.com>	2024-10-16 16:07:34 +02:00
Jerome Marchand	563e3eb7e7	bpf: treewide: Annotate BPF kfuncs in BTF JIRA: https://issues.redhat.com/browse/RHEL-23649 Conflicts: Multiple conflicts due to missing kfuncs. All sections were switched to use the new macro except bpf_mptcp_fmodret_ids which still use BTF_SET8_* upstream. I don't know why. That might be an upstream oversight. commit 6f3189f38a3e995232e028a4c341164c4aca1b20 Author: Daniel Xu <dxu@dxuuu.xyz> Date: Sun Jan 28 18:24:08 2024 -0700 bpf: treewide: Annotate BPF kfuncs in BTF This commit marks kfuncs as such inside the .BTF_ids section. The upshot of these annotations is that we'll be able to automatically generate kfunc prototypes for downstream users. The process is as follows: 1. In source, use BTF_KFUNCS_START/END macro pair to mark kfuncs 2. During build, pahole injects into BTF a "bpf_kfunc" BTF_DECL_TAG for each function inside BTF_KFUNCS sets 3. At runtime, vmlinux or module BTF is made available in sysfs 4. At runtime, bpftool (or similar) can look at provided BTF and generate appropriate prototypes for functions with "bpf_kfunc" tag To ensure future kfunc are similarly tagged, we now also return error inside kfunc registration for untagged kfuncs. For vmlinux kfuncs, we also WARN(), as initcall machinery does not handle errors. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Acked-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/e55150ceecbf0a5d961e608941165c0bee7bc943.1706491398.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Jerome Marchand <jmarchan@redhat.com>	2024-10-15 10:49:07 +02:00
Jerome Marchand	d1c16d1138	bpf: Take into account BPF token when fetching helper protos JIRA: https://issues.redhat.com/browse/RHEL-23649 Conflicts: Context change due to missing commit 9a675ba55a96 ("net, bpf: Add a warning if NAPI cb missed xdp_do_flush().") commit bbc1d24724e110b86a1a7c3c1724ce0d62cc1e2e Author: Andrii Nakryiko <andrii@kernel.org> Date: Tue Jan 23 18:21:04 2024 -0800 bpf: Take into account BPF token when fetching helper protos Instead of performing unconditional system-wide bpf_capable() and perfmon_capable() calls inside bpf_base_func_proto() function (and other similar ones) to determine eligibility of a given BPF helper for a given program, use previously recorded BPF token during BPF_PROG_LOAD command handling to inform the decision. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20240124022127.2379740-8-andrii@kernel.org Signed-off-by: Jerome Marchand <jmarchan@redhat.com>	2024-10-15 10:49:03 +02:00
Rafael Aquini	19e74512fe	minmax: add in_range() macro JIRA: https://issues.redhat.com/browse/RHEL-27743 Conflicts: * fs/btrfs/misc.h: hunk dropped (unsupported FS) * arch/arm/mm/pageattr.c and include/linux/minmax.h: minor context diffs This patch is a backport of the following upstream commit: commit f9bff0e31881d03badf191d3b0005839391f5f2b Author: Matthew Wilcox (Oracle) <willy@infradead.org> Date: Wed Aug 2 16:13:29 2023 +0100 minmax: add in_range() macro Patch series "New page table range API", v6. This patchset changes the API used by the MM to set up page table entries. The four APIs are: set_ptes(mm, addr, ptep, pte, nr) update_mmu_cache_range(vma, addr, ptep, nr) flush_dcache_folio(folio) flush_icache_pages(vma, page, nr) flush_dcache_folio() isn't technically new, but no architecture implemented it, so I've done that for them. The old APIs remain around but are mostly implemented by calling the new interfaces. The new APIs are based around setting up N page table entries at once. The N entries belong to the same PMD, the same folio and the same VMA, so ptep++ is a legitimate operation, and locking is taken care of for you. Some architectures can do a better job of it than just a loop, but I have hesitated to make too deep a change to architectures I don't understand well. One thing I have changed in every architecture is that PG_arch_1 is now a per-folio bit instead of a per-page bit when used for dcache clean/dirty tracking. This was something that would have to happen eventually, and it makes sense to do it now rather than iterate over every page involved in a cache flush and figure out if it needs to happen. The point of all this is better performance, and Fengwei Yin has measured improvement on x86. I suspect you'll see improvement on your architecture too. Try the new will-it-scale test mentioned here: https://lore.kernel.org/linux-mm/20230206140639.538867-5-fengwei.yin@intel.com/ You'll need to run it on an XFS filesystem and have CONFIG_TRANSPARENT_HUGEPAGE set. This patchset is the basis for much of the anonymous large folio work being done by Ryan, so it's received quite a lot of testing over the last few months. This patch (of 38): Determine if a value lies within a range more efficiently (subtraction + comparison vs two comparisons and an AND). It also has useful (under some circumstances) behaviour if the range exceeds the maximum value of the type. Convert all the conflicting definitions of in_range() within the kernel; some can use the generic definition while others need their own definition. Link: https://lkml.kernel.org/r/20230802151406.3735276-1-willy@infradead.org Link: https://lkml.kernel.org/r/20230802151406.3735276-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-10-01 11:20:16 -04:00

1 2 3 4 5 ...

6514 Commits