Commit Graph

6514 Commits

Author SHA1 Message Date
Florian Westphal 10031021a9 netfilter: nft_exthdr: fix offset with ipv4_find_option()
JIRA: https://issues.redhat.com/browse/RHEL-84577
Upstream Status: commit 6edd78af9506

commit 6edd78af9506bb182518da7f6feebd75655d9a0e
Author: Alexey Kashavkin <akashavkin@gmail.com>
Date:   Sun Mar 2 00:14:36 2025 +0300

    netfilter: nft_exthdr: fix offset with ipv4_find_option()

    There is an incorrect calculation in the offset variable which causes
    the nft_skb_copy_to_reg() function to always return -EFAULT. Adding the
    start variable is redundant. In the __ip_options_compile() function the
    correct offset is specified when finding the function. There is no need
    to add the size of the iphdr structure to the offset.

    Fixes: dbb5281a1f ("netfilter: nf_tables: add support for matching IPv4 options")
    Signed-off-by: Alexey Kashavkin <akashavkin@gmail.com>
    Reviewed-by: Florian Westphal <fw@strlen.de>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: Florian Westphal <fwestpha@redhat.com>
2025-03-26 11:19:46 +01:00
Florian Westphal 9ab0fa974f netfilter: nf_conncount: Fully initialize struct nf_conncount_tuple in insert_tree()
JIRA: https://issues.redhat.com/browse/RHEL-84577
Upstream Status: commit d653bfeb07eb

Conflicts: net/netfilter/nf_conncount.c
Context only, we lack
commit 0b88d1654d55 ("netfilter: nf_conncount: fix wrong variable type").

commit d653bfeb07ebb3499c403404c21ac58a16531607
Author: Kohei Enju <enjuk@amazon.com>
Date:   Sun Mar 9 17:07:38 2025 +0900

    netfilter: nf_conncount: Fully initialize struct nf_conncount_tuple in insert_tree()

    Since commit b36e4523d4 ("netfilter: nf_conncount: fix garbage
    collection confirm race"), `cpu` and `jiffies32` were introduced to
    the struct nf_conncount_tuple.

    The commit made nf_conncount_add() initialize `conn->cpu` and
    `conn->jiffies32` when allocating the struct.
    In contrast, count_tree() was not changed to initialize them.

    By commit 34848d5c89 ("netfilter: nf_conncount: Split insert and
    traversal"), count_tree() was split and the relevant allocation
    code now resides in insert_tree().
    Initialize `conn->cpu` and `conn->jiffies32` in insert_tree().

    BUG: KMSAN: uninit-value in find_or_evict net/netfilter/nf_conncount.c:117 [inline]
    BUG: KMSAN: uninit-value in __nf_conncount_add+0xd9c/0x2850 net/netfilter/nf_conncount.c:143
     find_or_evict net/netfilter/nf_conncount.c:117 [inline]
     __nf_conncount_add+0xd9c/0x2850 net/netfilter/nf_conncount.c:143
     count_tree net/netfilter/nf_conncount.c:438 [inline]
     nf_conncount_count+0x82f/0x1e80 net/netfilter/nf_conncount.c:521
     connlimit_mt+0x7f6/0xbd0 net/netfilter/xt_connlimit.c:72
     __nft_match_eval net/netfilter/nft_compat.c:403 [inline]
     nft_match_eval+0x1a5/0x300 net/netfilter/nft_compat.c:433
     expr_call_ops_eval net/netfilter/nf_tables_core.c:240 [inline]
     nft_do_chain+0x426/0x2290 net/netfilter/nf_tables_core.c:288
     nft_do_chain_ipv4+0x1a5/0x230 net/netfilter/nft_chain_filter.c:23
     nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline]
     nf_hook_slow+0xf4/0x400 net/netfilter/core.c:626
     nf_hook_slow_list+0x24d/0x860 net/netfilter/core.c:663
     NF_HOOK_LIST include/linux/netfilter.h:350 [inline]
     ip_sublist_rcv+0x17b7/0x17f0 net/ipv4/ip_input.c:633
     ip_list_rcv+0x9ef/0xa40 net/ipv4/ip_input.c:669
     __netif_receive_skb_list_ptype net/core/dev.c:5936 [inline]
     __netif_receive_skb_list_core+0x15c5/0x1670 net/core/dev.c:5983
     __netif_receive_skb_list net/core/dev.c:6035 [inline]
     netif_receive_skb_list_internal+0x1085/0x1700 net/core/dev.c:6126
     netif_receive_skb_list+0x5a/0x460 net/core/dev.c:6178
     xdp_recv_frames net/bpf/test_run.c:280 [inline]
     xdp_test_run_batch net/bpf/test_run.c:361 [inline]
     bpf_test_run_xdp_live+0x2e86/0x3480 net/bpf/test_run.c:390
     bpf_prog_test_run_xdp+0xf1d/0x1ae0 net/bpf/test_run.c:1316
     bpf_prog_test_run+0x5e5/0xa30 kernel/bpf/syscall.c:4407
     __sys_bpf+0x6aa/0xd90 kernel/bpf/syscall.c:5813
     __do_sys_bpf kernel/bpf/syscall.c:5902 [inline]
     __se_sys_bpf kernel/bpf/syscall.c:5900 [inline]
     __ia32_sys_bpf+0xa0/0xe0 kernel/bpf/syscall.c:5900
     ia32_sys_call+0x394d/0x4180 arch/x86/include/generated/asm/syscalls_32.h:358
     do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline]
     __do_fast_syscall_32+0xb0/0x110 arch/x86/entry/common.c:387
     do_fast_syscall_32+0x38/0x80 arch/x86/entry/common.c:412
     do_SYSENTER_32+0x1f/0x30 arch/x86/entry/common.c:450
     entry_SYSENTER_compat_after_hwframe+0x84/0x8e

    Uninit was created at:
     slab_post_alloc_hook mm/slub.c:4121 [inline]
     slab_alloc_node mm/slub.c:4164 [inline]
     kmem_cache_alloc_noprof+0x915/0xe10 mm/slub.c:4171
     insert_tree net/netfilter/nf_conncount.c:372 [inline]
     count_tree net/netfilter/nf_conncount.c:450 [inline]
     nf_conncount_count+0x1415/0x1e80 net/netfilter/nf_conncount.c:521
     connlimit_mt+0x7f6/0xbd0 net/netfilter/xt_connlimit.c:72
     __nft_match_eval net/netfilter/nft_compat.c:403 [inline]
     nft_match_eval+0x1a5/0x300 net/netfilter/nft_compat.c:433
     expr_call_ops_eval net/netfilter/nf_tables_core.c:240 [inline]
     nft_do_chain+0x426/0x2290 net/netfilter/nf_tables_core.c:288
     nft_do_chain_ipv4+0x1a5/0x230 net/netfilter/nft_chain_filter.c:23
     nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline]
     nf_hook_slow+0xf4/0x400 net/netfilter/core.c:626
     nf_hook_slow_list+0x24d/0x860 net/netfilter/core.c:663
     NF_HOOK_LIST include/linux/netfilter.h:350 [inline]
     ip_sublist_rcv+0x17b7/0x17f0 net/ipv4/ip_input.c:633
     ip_list_rcv+0x9ef/0xa40 net/ipv4/ip_input.c:669
     __netif_receive_skb_list_ptype net/core/dev.c:5936 [inline]
     __netif_receive_skb_list_core+0x15c5/0x1670 net/core/dev.c:5983
     __netif_receive_skb_list net/core/dev.c:6035 [inline]
     netif_receive_skb_list_internal+0x1085/0x1700 net/core/dev.c:6126
     netif_receive_skb_list+0x5a/0x460 net/core/dev.c:6178
     xdp_recv_frames net/bpf/test_run.c:280 [inline]
     xdp_test_run_batch net/bpf/test_run.c:361 [inline]
     bpf_test_run_xdp_live+0x2e86/0x3480 net/bpf/test_run.c:390
     bpf_prog_test_run_xdp+0xf1d/0x1ae0 net/bpf/test_run.c:1316
     bpf_prog_test_run+0x5e5/0xa30 kernel/bpf/syscall.c:4407
     __sys_bpf+0x6aa/0xd90 kernel/bpf/syscall.c:5813
     __do_sys_bpf kernel/bpf/syscall.c:5902 [inline]
     __se_sys_bpf kernel/bpf/syscall.c:5900 [inline]
     __ia32_sys_bpf+0xa0/0xe0 kernel/bpf/syscall.c:5900
     ia32_sys_call+0x394d/0x4180 arch/x86/include/generated/asm/syscalls_32.h:358
     do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline]
     __do_fast_syscall_32+0xb0/0x110 arch/x86/entry/common.c:387
     do_fast_syscall_32+0x38/0x80 arch/x86/entry/common.c:412
     do_SYSENTER_32+0x1f/0x30 arch/x86/entry/common.c:450
     entry_SYSENTER_compat_after_hwframe+0x84/0x8e

    Reported-by: syzbot+83fed965338b573115f7@syzkaller.appspotmail.com
    Closes: https://syzkaller.appspot.com/bug?extid=83fed965338b573115f7
    Fixes: b36e4523d4 ("netfilter: nf_conncount: fix garbage collection confirm race")
    Signed-off-by: Kohei Enju <enjuk@amazon.com>
    Reviewed-by: Florian Westphal <fw@strlen.de>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: Florian Westphal <fwestpha@redhat.com>
2025-03-26 10:12:55 +01:00
Florian Westphal ac4415015f netfilter: nf_tables: make destruction work queue pernet
JIRA: https://issues.redhat.com/browse/RHEL-84577
Upstream Status: commit fb8286562ecf

commit fb8286562ecfb585e26b033c5e32e6fb85efb0b3
Author: Florian Westphal <fw@strlen.de>
Date:   Thu Mar 6 04:05:26 2025 +0100

    netfilter: nf_tables: make destruction work queue pernet

    The call to flush_work before tearing down a table from the netlink
    notifier was supposed to make sure that all earlier updates (e.g. rule
    add) that might reference that table have been processed.

    Unfortunately, flush_work() waits for the last queued instance.
    This could be an instance that is different from the one that we must
    wait for.

    This is because transactions are protected with a pernet mutex, but the
    work item is global, so holding the transaction mutex doesn't prevent
    another netns from queueing more work.

    Make the work item pernet so that flush_work() will wait for all
    transactions queued from this netns.

    A welcome side effect is that we no longer need to wait for transaction
    objects from foreign netns.

    The gc work queue is still global.  This seems to be ok because nft_set
    structures are reference counted and each container structure owns a
    reference on the net namespace.

    The destroy_list is still protected by a global spinlock rather than
    pernet one but the hold time is very short anyway.

    v2: call cancel_work_sync before reaping the remaining tables (Pablo).

    Fixes: 9f6958ba2e90 ("netfilter: nf_tables: unconditionally flush pending work before notifier")
    Reported-by: syzbot+5d8c5789c8cb076b2c25@syzkaller.appspotmail.com
    Signed-off-by: Florian Westphal <fw@strlen.de>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: Florian Westphal <fwestpha@redhat.com>
2025-03-26 10:12:49 +01:00
Florian Westphal ac58aedac2 netfilter: nft_ct: Use __refcount_inc() for per-CPU nft_ct_pcpu_template.
JIRA: https://issues.redhat.com/browse/RHEL-84577
Upstream Status: commit 5cfe5612ca95

commit 5cfe5612ca9590db69b9be29dc83041dbf001108
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Mon Feb 17 17:02:42 2025 +0100

    netfilter: nft_ct: Use __refcount_inc() for per-CPU nft_ct_pcpu_template.

    nft_ct_pcpu_template is a per-CPU variable and relies on disabled BH for its
    locking. The refcounter is read and if its value is set to one then the
    refcounter is incremented and variable is used - otherwise it is already
    in use and left untouched.

    Without per-CPU locking in local_bh_disable() on PREEMPT_RT the
    read-then-increment operation is not atomic and therefore racy.

    This can be avoided by using unconditionally __refcount_inc() which will
    increment counter and return the old value as an atomic operation.
    In case the returned counter is not one, the variable is in use and we
    need to decrement counter. Otherwise we can use it.

    Use __refcount_inc() instead of read and a conditional increment.

    Fixes: edee4f1e92 ("netfilter: nft_ct: add zone id set support")
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Reviewed-by: Florian Westphal <fw@strlen.de>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: Florian Westphal <fwestpha@redhat.com>
2025-03-26 10:12:48 +01:00
Florian Westphal e86e5c7f86 netfilter: nft_flow_offload: update tcp state flags under lock
JIRA: https://issues.redhat.com/browse/RHEL-84577
Upstream Status: commit 7a4b61406395

commit 7a4b61406395291ffb7220a10e8951a9a8684819
Author: Florian Westphal <fw@strlen.de>
Date:   Tue Jan 14 00:50:34 2025 +0100

    netfilter: nft_flow_offload: update tcp state flags under lock

    The conntrack entry is already public, there is a small chance that another
    CPU is handling a packet in reply direction and racing with the tcp state
    update.

    Move this under ct spinlock.

    This is done once, when ct is about to be offloaded, so this should
    not result in a noticeable performance hit.

    Fixes: 8437a6209f ("netfilter: nft_flow_offload: set liberal tracking mode for tcp")
    Signed-off-by: Florian Westphal <fw@strlen.de>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: Florian Westphal <fwestpha@redhat.com>
2025-03-26 10:12:48 +01:00
Florian Westphal 6b1086a4cb netfilter: nft_flow_offload: clear tcp MAXACK flag before moving to slowpath
JIRA: https://issues.redhat.com/browse/RHEL-84577
Upstream Status: commit d9d7b489416d

commit d9d7b489416d18ba696c32a93623ecb0176b374e
Author: Florian Westphal <fw@strlen.de>
Date:   Tue Jan 14 00:50:33 2025 +0100

    netfilter: nft_flow_offload: clear tcp MAXACK flag before moving to slowpath

    This state reset is racy, no locks are held here.

    Since commit
    8437a6209f ("netfilter: nft_flow_offload: set liberal tracking mode for tcp"),
    the window checks are disabled for normal data packets, but MAXACK flag
    is checked when validating TCP resets.

    Clear the flag so tcp reset validation checks are ignored.

    Signed-off-by: Florian Westphal <fw@strlen.de>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: Florian Westphal <fwestpha@redhat.com>
2025-03-26 10:12:48 +01:00
Augusto Caringi d5cbb3a73e Merge: CVE-2025-21826: netfilter: nf_tables: reject mismatching sum of field_len with set key length
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/6532

JIRA: https://issues.redhat.com/browse/RHEL-82489
CVE: CVE-2025-21826

```
commit 1b9335a8000fb70742f7db10af314104b6ace220
Author: Pablo Neira Ayuso <pablo@netfilter.org>
Date:   Tue Jan 28 12:26:33 2025 +0100

    netfilter: nf_tables: reject mismatching sum of field_len with set key length

    The field length description provides the length of each separated key
    field in the concatenation, each field gets rounded up to 32-bits to
    calculate the pipapo rule width from pipapo_init(). The set key length
    provides the total size of the key aligned to 32-bits.

    Register-based arithmetics still allows for combining mismatching set
    key length and field length description, eg. set key length 10 and field
    description [ 5, 4 ] leading to pipapo width of 12.

    Cc: stable@vger.kernel.org
    Fixes: 3ce67e3793f4 ("netfilter: nf_tables: do not allow mismatch field size and set key length")
    Reported-by: Noam Rathaus <noamr@ssd-disclosure.com>
    Reviewed-by: Florian Westphal <fw@strlen.de>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>```

Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>

---

<small>Created 2025-03-06 18:39 UTC by backporter - [KWF FAQ](https://red.ht/kernel_workflow_doc) - [Slack #team-kernel-workflow](https://redhat-internal.slack.com/archives/C04LRUPMJQ5) - [Source](https://gitlab.com/cki-project/kernel-workflow/-/blob/main/webhook/utils/backporter.py) - [Documentation](https://gitlab.com/cki-project/kernel-workflow/-/blob/main/docs/README.backporter.md) - [Report an issue](https://gitlab.com/cki-project/kernel-workflow/-/issues/new?issue%5Btitle%5D=backporter%20webhook%20issue)</small>

Approved-by: Florian Westphal <fwestpha@redhat.com>
Approved-by: Xin Long <lxin@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Augusto Caringi <acaringi@redhat.com>
2025-03-20 11:20:20 -03:00
Augusto Caringi e7e88834bf Merge: netfilter: nfnetlink_queue: drop bogus WARN_ON
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/6402

JIRA: https://issues.redhat.com/browse/RHEL-80104
Upstream Status: commit 631a4b3ddc78

Updated nft_queue.sh kselftest can trigger a WARN splat.
Note that upstream "Fixes" tag is incorrect, the problem
does exist in 9.x releases.

Signed-off-by: Florian Westphal <fwestpha@redhat.com>

Approved-by: Hangbin Liu <haliu@redhat.com>
Approved-by: Antoine Tenart <atenart@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Augusto Caringi <acaringi@redhat.com>
2025-03-17 14:57:52 -03:00
CKI Backport Bot 6969a826ff netfilter: nf_tables: reject mismatching sum of field_len with set key length
JIRA: https://issues.redhat.com/browse/RHEL-82489
CVE: CVE-2025-21826

commit 1b9335a8000fb70742f7db10af314104b6ace220
Author: Pablo Neira Ayuso <pablo@netfilter.org>
Date:   Tue Jan 28 12:26:33 2025 +0100

    netfilter: nf_tables: reject mismatching sum of field_len with set key length

    The field length description provides the length of each separated key
    field in the concatenation, each field gets rounded up to 32-bits to
    calculate the pipapo rule width from pipapo_init(). The set key length
    provides the total size of the key aligned to 32-bits.

    Register-based arithmetics still allows for combining mismatching set
    key length and field length description, eg. set key length 10 and field
    description [ 5, 4 ] leading to pipapo width of 12.

    Cc: stable@vger.kernel.org
    Fixes: 3ce67e3793f4 ("netfilter: nf_tables: do not allow mismatch field size and set key length")
    Reported-by: Noam Rathaus <noamr@ssd-disclosure.com>
    Reviewed-by: Florian Westphal <fw@strlen.de>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>
2025-03-06 18:39:57 +00:00
Augusto Caringi 19e4d875cf Merge: CVE-2024-53680: ipvs: fix UB due to uninitialized stack access in ip_vs_protocol_init()
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/6343

JIRA: https://issues.redhat.com/browse/RHEL-77915
CVE: CVE-2024-53680

```
ipvs: fix UB due to uninitialized stack access in ip_vs_protocol_init()

Under certain kernel configurations when building with Clang/LLVM, the
compiler does not generate a return or jump as the terminator
instruction for ip_vs_protocol_init(), triggering the following objtool
warning during build time:

  vmlinux.o: warning: objtool: ip_vs_protocol_init() falls through to next function __initstub__kmod_ip_vs_rr__935_123_ip_vs_rr_init6()

At runtime, this either causes an oops when trying to load the ipvs
module or a boot-time panic if ipvs is built-in. This same issue has
been reported by the Intel kernel test robot previously.

Digging deeper into both LLVM and the kernel code reveals this to be a
undefined behavior problem. ip_vs_protocol_init() uses a on-stack buffer
of 64 chars to store the registered protocol names and leaves it
uninitialized after definition. The function calls strnlen() when
concatenating protocol names into the buffer. With CONFIG_FORTIFY_SOURCE
strnlen() performs an extra step to check whether the last byte of the
input char buffer is a null character (commit 3009f891bb9f ("fortify:
Allow strlen() and strnlen() to pass compile-time known lengths")).
This, together with possibly other configurations, cause the following
IR to be generated:

  define hidden i32 @ip_vs_protocol_init() local_unnamed_addr #5 section ".init.text" align 16 !kcfi_type !29 {
    %1 = alloca [64 x i8], align 16
    ...

  14:                                               ; preds = %11
    %15 = getelementptr inbounds i8, ptr %1, i64 63
    %16 = load i8, ptr %15, align 1
    %17 = tail call i1 @llvm.is.constant.i8(i8 %16)
    %18 = icmp eq i8 %16, 0
    %19 = select i1 %17, i1 %18, i1 false
    br i1 %19, label %20, label %23

  20:                                               ; preds = %14
    %21 = call i64 @strlen(ptr noundef nonnull dereferenceable(1) %1) #23
    ...

  23:                                               ; preds = %14, %11, %20
    %24 = call i64 @strnlen(ptr noundef nonnull dereferenceable(1) %1, i64 noundef 64) #24
    ...
  }

The above code calculates the address of the last char in the buffer
(value %15) and then loads from it (value %16). Because the buffer is
never initialized, the LLVM GVN pass marks value %16 as undefined:

  %13 = getelementptr inbounds i8, ptr %1, i64 63
  br i1 undef, label %14, label %17

This gives later passes (SCCP, in particular) more DCE opportunities by
propagating the undef value further, and eventually removes everything
after the load on the uninitialized stack location:

  define hidden i32 @ip_vs_protocol_init() local_unnamed_addr #0 section ".init.text" align 16 !kcfi_type !11 {
    %1 = alloca [64 x i8], align 16
    ...

  12:                                               ; preds = %11
    %13 = getelementptr inbounds i8, ptr %1, i64 63
    unreachable
  }

In this way, the generated native code will just fall through to the
next function, as LLVM does not generate any code for the unreachable IR
instruction and leaves the function without a terminator.

Zero the on-stack buffer to avoid this possible UB.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202402100205.PWXIz1ZK-lkp@intel.com/
Co-developed-by: Ruowen Qin <ruqin@redhat.com>
Signed-off-by: Ruowen Qin <ruqin@redhat.com>
Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
(cherry picked from commit 146b6f1112eb30a19776d6c323c994e9d67790db)
```

Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>

---

<small>Created 2025-02-05 14:10 UTC by backporter - [KWF FAQ](https://red.ht/kernel_workflow_doc) - [Slack #team-kernel-workflow](https://redhat-internal.slack.com/archives/C04LRUPMJQ5) - [Source](https://gitlab.com/cki-project/kernel-workflow/-/blob/main/webhook/utils/backporter.py) - [Documentation](https://gitlab.com/cki-project/kernel-workflow/-/blob/main/docs/README.backporter.md) - [Report an issue](https://gitlab.com/cki-project/kernel-workflow/-/issues/new?issue%5Btitle%5D=backporter%20webhook%20issue)</small>

Approved-by: Guillaume Nault <gnault@redhat.com>
Approved-by: Hangbin Liu <haliu@redhat.com>
Approved-by: Andrea Claudi <aclaudi@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Augusto Caringi <acaringi@redhat.com>
2025-03-06 00:01:06 -03:00
Florian Westphal 0aeb832cac netfilter: nfnetlink_queue: drop bogus WARN_ON
JIRA: https://issues.redhat.com/browse/RHEL-80104
Upstream Status: commit 631a4b3ddc78

Conflicts: net/netfilter/nfnetlink_queue.c

The function was moved upstream from nf_queue.c to nfnetlink_queue.c,
the former is baked into vmlinux while the latter is part of nfnetlink_queue
module.

While we could pick up 3f8019688894 ("netfilter: move nf_reinject into
nfnetlink_queue modules"), it doesn't apply as-is either because of other
upstream changes.

commit 631a4b3ddc7831b20442c59c28b0476d0704c9af
Author: Florian Westphal <fw@strlen.de>
Date:   Tue Jul 9 02:02:26 2024 +0200

    netfilter: nfnetlink_queue: drop bogus WARN_ON

    Happens when rules get flushed/deleted while packet is out, so remove
    this WARN_ON.

    This WARN exists in one form or another since v4.14, no need to backport
    this to older releases, hence use a more recent fixes tag.

    Fixes: 3f8019688894 ("netfilter: move nf_reinject into nfnetlink_queue modules")
    Reported-by: kernel test robot <oliver.sang@intel.com>
    Closes: https://lore.kernel.org/oe-lkp/202407081453.11ac0f63-lkp@intel.com
    Signed-off-by: Florian Westphal <fw@strlen.de>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: Florian Westphal <fwestpha@redhat.com>
2025-02-19 13:19:20 +01:00
CKI Backport Bot 237e6ee3cb ipvs: fix UB due to uninitialized stack access in ip_vs_protocol_init()
JIRA: https://issues.redhat.com/browse/RHEL-77915
CVE: CVE-2024-53680

commit 146b6f1112eb30a19776d6c323c994e9d67790db
Author: Jinghao Jia <jinghao7@illinois.edu>
Date:   Sat Nov 23 03:42:56 2024 -0600

    ipvs: fix UB due to uninitialized stack access in ip_vs_protocol_init()

    Under certain kernel configurations when building with Clang/LLVM, the
    compiler does not generate a return or jump as the terminator
    instruction for ip_vs_protocol_init(), triggering the following objtool
    warning during build time:

      vmlinux.o: warning: objtool: ip_vs_protocol_init() falls through to next function __initstub__kmod_ip_vs_rr__935_123_ip_vs_rr_init6()

    At runtime, this either causes an oops when trying to load the ipvs
    module or a boot-time panic if ipvs is built-in. This same issue has
    been reported by the Intel kernel test robot previously.

    Digging deeper into both LLVM and the kernel code reveals this to be a
    undefined behavior problem. ip_vs_protocol_init() uses a on-stack buffer
    of 64 chars to store the registered protocol names and leaves it
    uninitialized after definition. The function calls strnlen() when
    concatenating protocol names into the buffer. With CONFIG_FORTIFY_SOURCE
    strnlen() performs an extra step to check whether the last byte of the
    input char buffer is a null character (commit 3009f891bb9f ("fortify:
    Allow strlen() and strnlen() to pass compile-time known lengths")).
    This, together with possibly other configurations, cause the following
    IR to be generated:

      define hidden i32 @ip_vs_protocol_init() local_unnamed_addr #5 section ".init.text" align 16 !kcfi_type !29 {
        %1 = alloca [64 x i8], align 16
        ...

      14:                                               ; preds = %11
        %15 = getelementptr inbounds i8, ptr %1, i64 63
        %16 = load i8, ptr %15, align 1
        %17 = tail call i1 @llvm.is.constant.i8(i8 %16)
        %18 = icmp eq i8 %16, 0
        %19 = select i1 %17, i1 %18, i1 false
        br i1 %19, label %20, label %23

      20:                                               ; preds = %14
        %21 = call i64 @strlen(ptr noundef nonnull dereferenceable(1) %1) #23
        ...

      23:                                               ; preds = %14, %11, %20
        %24 = call i64 @strnlen(ptr noundef nonnull dereferenceable(1) %1, i64 noundef 64) #24
        ...
      }

    The above code calculates the address of the last char in the buffer
    (value %15) and then loads from it (value %16). Because the buffer is
    never initialized, the LLVM GVN pass marks value %16 as undefined:

      %13 = getelementptr inbounds i8, ptr %1, i64 63
      br i1 undef, label %14, label %17

    This gives later passes (SCCP, in particular) more DCE opportunities by
    propagating the undef value further, and eventually removes everything
    after the load on the uninitialized stack location:

      define hidden i32 @ip_vs_protocol_init() local_unnamed_addr #0 section ".init.text" align 16 !kcfi_type !11 {
        %1 = alloca [64 x i8], align 16
        ...

      12:                                               ; preds = %11
        %13 = getelementptr inbounds i8, ptr %1, i64 63
        unreachable
      }

    In this way, the generated native code will just fall through to the
    next function, as LLVM does not generate any code for the unreachable IR
    instruction and leaves the function without a terminator.

    Zero the on-stack buffer to avoid this possible UB.

    Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
    Reported-by: kernel test robot <lkp@intel.com>
    Closes: https://lore.kernel.org/oe-kbuild-all/202402100205.PWXIz1ZK-lkp@intel.com/
    Co-developed-by: Ruowen Qin <ruqin@redhat.com>
    Signed-off-by: Ruowen Qin <ruqin@redhat.com>
    Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
    Acked-by: Julian Anastasov <ja@ssi.bg>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>
2025-02-05 14:10:28 +00:00
CKI Backport Bot 39c44c42b2 netfilter: conntrack: clamp maximum hashtable size to INT_MAX
JIRA: https://issues.redhat.com/browse/RHEL-77891
CVE: CVE-2025-21648

commit b541ba7d1f5a5b7b3e2e22dc9e40e18a7d6dbc13
Author: Pablo Neira Ayuso <pablo@netfilter.org>
Date:   Wed Jan 8 22:56:33 2025 +0100

    netfilter: conntrack: clamp maximum hashtable size to INT_MAX

    Use INT_MAX as maximum size for the conntrack hashtable. Otherwise, it
    is possible to hit WARN_ON_ONCE in __kvmalloc_node_noprof() when
    resizing hashtable because __GFP_NOWARN is unset. See:

      0708a0afe291 ("mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls")

    Note: hashtable resize is only possible from init_netns.

    Fixes: 9cc1c73ad6 ("netfilter: conntrack: avoid integer overflow when resizing")
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>
2025-02-05 14:07:46 +00:00
Patrick Talbert e3f336f694 Merge: ipvs: speed up reads from ip_vs_conn proc file
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/6185

JIRA: https://issues.redhat.com/browse/RHEL-74064
Upstream Status: net-next.git

Signed-off-by: Florian Westphal <fwestpha@redhat.com>

Approved-by: Guillaume Nault <gnault@redhat.com>
Approved-by: Eric Garver <egarver@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Patrick Talbert <ptalbert@redhat.com>
2025-01-27 15:24:27 +01:00
Rado Vrbovsky 16f625d8bd Merge: CVE-2024-56783: netfilter: nft_socket: remove WARN_ON_ONCE on maximum cgroup level
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/6151

JIRA: https://issues.redhat.com/browse/RHEL-73350
CVE: CVE-2024-56783

```
netfilter: nft_socket: remove WARN_ON_ONCE on maximum cgroup level

cgroup maximum depth is INT_MAX by default, there is a cgroup toggle to
restrict this maximum depth to a more reasonable value not to harm
performance. Remove unnecessary WARN_ON_ONCE which is reachable from
userspace.

Fixes: 7f3287db6543 ("netfilter: nft_socket: make cgroupsv2 matching work with namespaces")
Reported-by: syzbot+57bac0866ddd99fe47c0@syzkaller.appspotmail.com
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
(cherry picked from commit b7529880cb961d515642ce63f9d7570869bbbdc3)
```

Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>

---

<small>Created 2025-01-13 14:42 UTC by backporter - [KWF FAQ](https://red.ht/kernel_workflow_doc) - [Slack #team-kernel-workflow](https://redhat-internal.slack.com/archives/C04LRUPMJQ5) - [Source](https://gitlab.com/cki-project/kernel-workflow/-/blob/main/webhook/utils/backporter.py) - [Documentation](https://gitlab.com/cki-project/kernel-workflow/-/blob/main/docs/README.backporter.md) - [Report an issue](https://gitlab.com/cki-project/kernel-workflow/-/issues/new?issue%5Btitle%5D=backporter%20webhook%20issue)</small>

Approved-by: Florian Westphal <fwestpha@redhat.com>
Approved-by: Antoine Tenart <atenart@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2025-01-23 13:14:29 +00:00
Florian Westphal 2dabc0c0ee ipvs: speed up reads from ip_vs_conn proc file
JIRA: https://issues.redhat.com/browse/RHEL-74064
Upstream Status: commit 178883fd039d

commit 178883fd039d38a708cc56555489533d9a9c07df
Author: Florian Westphal <fw@strlen.de>
Date:   Tue Dec 3 12:08:30 2024 +0100

    ipvs: speed up reads from ip_vs_conn proc file

    Reading is very slow because ->start() performs a linear re-scan of the
    entire hash table until it finds the successor to the last dumped
    element.  The current implementation uses 'pos' as the 'number of
    elements to skip, then does linear iteration until it has skipped
    'pos' entries.

    Store the last bucket and the number of elements to skip in that
    bucket instead, so we can resume from bucket b directly.

    before this patch, its possible to read ~35k entries in one second, but
    each read() gets slower as the number of entries to skip grows:

    time timeout 60 cat /proc/net/ip_vs_conn > /tmp/all; wc -l /tmp/all
    real    1m0.007s
    user    0m0.003s
    sys     0m59.956s
    140386 /tmp/all

    Only ~100k more got read in remaining the remaining 59s, and did not get
    nowhere near the 1m entries that are stored at the time.

    after this patch, dump completes very quickly:
    time cat /proc/net/ip_vs_conn > /tmp/all; wc -l /tmp/all
    real    0m2.286s
    user    0m0.004s
    sys     0m2.281s
    1000001 /tmp/all

    Signed-off-by: Florian Westphal <fw@strlen.de>
    Acked-by: Julian Anastasov <ja@ssi.bg>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: Florian Westphal <fwestpha@redhat.com>
2025-01-15 15:11:53 +01:00
CKI Backport Bot 64ef02fcd1 netfilter: nft_set_hash: skip duplicated elements pending gc run
JIRA: https://issues.redhat.com/browse/RHEL-73708

commit 7ffc7481153bbabf3332c6a19b289730c7e1edf5
Author: Pablo Neira Ayuso <pablo@netfilter.org>
Date:   Mon Dec 2 00:04:49 2024 +0100

    netfilter: nft_set_hash: skip duplicated elements pending gc run

    rhashtable does not provide stable walk, duplicated elements are
    possible in case of resizing. I considered that checking for errors when
    calling rhashtable_walk_next() was sufficient to detect the resizing.
    However, rhashtable_walk_next() returns -EAGAIN only at the end of the
    iteration, which is too late, because a gc work containing duplicated
    elements could have been already scheduled for removal to the worker.

    Add a u32 gc worker sequence number per set, bump it on every workqueue
    run. Annotate gc worker sequence number on the expired element. Use it
    to skip those already seen in this gc workqueue run.

    Note that this new field is never reset in case gc transaction fails, so
    next gc worker run on the expired element overrides it. Wraparound of gc
    worker sequence number should not be an issue with stale gc worker
    sequence number in the element, that would just postpone the element
    removal in one gc run.

    Note that it is not possible to use flags to annotate that element is
    pending gc run to detect duplicates, given that gc transaction can be
    invalidated in case of update from the control plane, therefore, not
    allowing to clear such flag.

    On x86_64, pahole reports no changes in the size of nft_rhash_elem.

    Fixes: f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API")
    Reported-by: Laurent Fasnacht <laurent.fasnacht@proton.ch>
    Tested-by: Laurent Fasnacht <laurent.fasnacht@proton.ch>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>
2025-01-13 15:04:44 +00:00
CKI Backport Bot f873ac0414 netfilter: nft_inner: incorrect percpu area handling under softirq
JIRA: https://issues.redhat.com/browse/RHEL-73708

commit 7b1d83da254be3bf054965c8f3b1ad976f460ae5
Author: Pablo Neira Ayuso <pablo@netfilter.org>
Date:   Wed Nov 27 12:46:54 2024 +0100

    netfilter: nft_inner: incorrect percpu area handling under softirq

    Softirq can interrupt ongoing packet from process context that is
    walking over the percpu area that contains inner header offsets.

    Disable bh and perform three checks before restoring the percpu inner
    header offsets to validate that the percpu area is valid for this
    skbuff:

    1) If the NFT_PKTINFO_INNER_FULL flag is set on, then this skbuff
       has already been parsed before for inner header fetching to
       register.

    2) Validate that the percpu area refers to this skbuff using the
       skbuff pointer as a cookie. If there is a cookie mismatch, then
       this skbuff needs to be parsed again.

    3) Finally, validate if the percpu area refers to this tunnel type.

    Only after these three checks the percpu area is restored to a on-stack
    copy and bh is enabled again.

    After inner header fetching, the on-stack copy is stored back to the
    percpu area.

    Fixes: 3a07327d10a0 ("netfilter: nft_inner: support for inner tunnel header matching")
    Reported-by: syzbot+84d0441b9860f0d63285@syzkaller.appspotmail.com
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>
2025-01-13 15:04:43 +00:00
CKI Backport Bot 5e54e2cc17 netfilter: x_tables: fix LED ID check in led_tg_check()
JIRA: https://issues.redhat.com/browse/RHEL-73708

commit 04317f4eb2aad312ad85c1a17ad81fe75f1f9bc7
Author: Dmitry Antipov <dmantipov@yandex.ru>
Date:   Thu Nov 21 09:55:42 2024 +0300

    netfilter: x_tables: fix LED ID check in led_tg_check()

    Syzbot has reported the following BUG detected by KASAN:

    BUG: KASAN: slab-out-of-bounds in strlen+0x58/0x70
    Read of size 1 at addr ffff8881022da0c8 by task repro/5879
    ...
    Call Trace:
     <TASK>
     dump_stack_lvl+0x241/0x360
     ? __pfx_dump_stack_lvl+0x10/0x10
     ? __pfx__printk+0x10/0x10
     ? _printk+0xd5/0x120
     ? __virt_addr_valid+0x183/0x530
     ? __virt_addr_valid+0x183/0x530
     print_report+0x169/0x550
     ? __virt_addr_valid+0x183/0x530
     ? __virt_addr_valid+0x183/0x530
     ? __virt_addr_valid+0x45f/0x530
     ? __phys_addr+0xba/0x170
     ? strlen+0x58/0x70
     kasan_report+0x143/0x180
     ? strlen+0x58/0x70
     strlen+0x58/0x70
     kstrdup+0x20/0x80
     led_tg_check+0x18b/0x3c0
     xt_check_target+0x3bb/0xa40
     ? __pfx_xt_check_target+0x10/0x10
     ? stack_depot_save_flags+0x6e4/0x830
     ? nft_target_init+0x174/0xc30
     nft_target_init+0x82d/0xc30
     ? __pfx_nft_target_init+0x10/0x10
     ? nf_tables_newrule+0x1609/0x2980
     ? nf_tables_newrule+0x1609/0x2980
     ? rcu_is_watching+0x15/0xb0
     ? nf_tables_newrule+0x1609/0x2980
     ? nf_tables_newrule+0x1609/0x2980
     ? __kmalloc_noprof+0x21a/0x400
     nf_tables_newrule+0x1860/0x2980
     ? __pfx_nf_tables_newrule+0x10/0x10
     ? __nla_parse+0x40/0x60
     nfnetlink_rcv+0x14e5/0x2ab0
     ? __pfx_validate_chain+0x10/0x10
     ? __pfx_nfnetlink_rcv+0x10/0x10
     ? __lock_acquire+0x1384/0x2050
     ? netlink_deliver_tap+0x2e/0x1b0
     ? __pfx_lock_release+0x10/0x10
     ? netlink_deliver_tap+0x2e/0x1b0
     netlink_unicast+0x7f8/0x990
     ? __pfx_netlink_unicast+0x10/0x10
     ? __virt_addr_valid+0x183/0x530
     ? __check_object_size+0x48e/0x900
     netlink_sendmsg+0x8e4/0xcb0
     ? __pfx_netlink_sendmsg+0x10/0x10
     ? aa_sock_msg_perm+0x91/0x160
     ? __pfx_netlink_sendmsg+0x10/0x10
     __sock_sendmsg+0x223/0x270
     ____sys_sendmsg+0x52a/0x7e0
     ? __pfx_____sys_sendmsg+0x10/0x10
     __sys_sendmsg+0x292/0x380
     ? __pfx___sys_sendmsg+0x10/0x10
     ? lockdep_hardirqs_on_prepare+0x43d/0x780
     ? __pfx_lockdep_hardirqs_on_prepare+0x10/0x10
     ? exc_page_fault+0x590/0x8c0
     ? do_syscall_64+0xb6/0x230
     do_syscall_64+0xf3/0x230
     entry_SYSCALL_64_after_hwframe+0x77/0x7f
    ...
     </TASK>

    Since an invalid (without '\0' byte at all) byte sequence may be passed
    from userspace, add an extra check to ensure that such a sequence is
    rejected as possible ID and so never passed to 'kstrdup()' and further.

    Reported-by: syzbot+6c8215822f35fdb35667@syzkaller.appspotmail.com
    Closes: https://syzkaller.appspot.com/bug?extid=6c8215822f35fdb35667
    Fixes: 268cb38e18 ("netfilter: x_tables: add LED trigger target")
    Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>
2025-01-13 15:04:42 +00:00
CKI Backport Bot 2e568d0cde netfilter: ipset: add missing range check in bitmap_ip_uadt
JIRA: https://issues.redhat.com/browse/RHEL-73708

commit 35f56c554eb1b56b77b3cf197a6b00922d49033d
Author: Jeongjun Park <aha310510@gmail.com>
Date:   Wed Nov 13 22:02:09 2024 +0900

    netfilter: ipset: add missing range check in bitmap_ip_uadt

    When tb[IPSET_ATTR_IP_TO] is not present but tb[IPSET_ATTR_CIDR] exists,
    the values of ip and ip_to are slightly swapped. Therefore, the range check
    for ip should be done later, but this part is missing and it seems that the
    vulnerability occurs.

    So we should add missing range checks and remove unnecessary range checks.

    Cc: <stable@vger.kernel.org>
    Reported-by: syzbot+58c872f7790a4d2ac951@syzkaller.appspotmail.com
    Fixes: 72205fc68b ("netfilter: ipset: bitmap:ip set type support")
    Signed-off-by: Jeongjun Park <aha310510@gmail.com>
    Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>
2025-01-13 15:04:41 +00:00
CKI Backport Bot b5a8b80ce0 netfilter: nf_tables: must hold rcu read lock while iterating object type list
JIRA: https://issues.redhat.com/browse/RHEL-73708

commit cddc04275f95ca3b18da5c0fb111705ac173af89
Author: Florian Westphal <fw@strlen.de>
Date:   Mon Nov 4 10:41:19 2024 +0100

    netfilter: nf_tables: must hold rcu read lock while iterating object type list

    Update of stateful object triggers:
    WARNING: suspicious RCU usage
    net/netfilter/nf_tables_api.c:7759 RCU-list traversed in non-reader section!!

    other info that might help us debug this:
    rcu_scheduler_active = 2, debug_locks = 1
    1 lock held by nft/3060:
     #0: ffff88810f0578c8 (&nft_net->commit_mutex){+.+.}-{4:4}, [..]

    ... but this list is not protected by the transaction mutex but the
    nfnl nftables subsystem mutex.

    Switch to nft_obj_type_get which will acquire rcu read lock,
    bump refcount, and returns the result.

    v3: Dan Carpenter points out nft_obj_type_get returns error pointer, not
    NULL, on error.

    Fixes: dad3bdeef45f ("netfilter: nf_tables: fix memory leak during stateful obj update").
    Signed-off-by: Florian Westphal <fw@strlen.de>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>
2025-01-13 15:04:40 +00:00
CKI Backport Bot 91ba8bb662 netfilter: nf_tables: must hold rcu read lock while iterating expression type list
JIRA: https://issues.redhat.com/browse/RHEL-73708

commit ee666a541ed957937454d50afa4757924508cd74
Author: Florian Westphal <fw@strlen.de>
Date:   Mon Nov 4 10:41:18 2024 +0100

    netfilter: nf_tables: must hold rcu read lock while iterating expression type list

    nft shell tests trigger:
     WARNING: suspicious RCU usage
     net/netfilter/nf_tables_api.c:3125 RCU-list traversed in non-reader section!!
     1 lock held by nft/2068:
      #0: ffff888106c6f8c8 (&nft_net->commit_mutex){+.+.}-{4:4}, at: nf_tables_valid_genid+0x3c/0xf0

    But the transaction mutex doesn't protect this list, the nfnl subsystem
    mutex would, but we can't acquire it here without risk of ABBA
    deadlocks.

    Acquire the rcu read lock to avoid this issue.

    v3: add a comment that explains the ->inner_ops check implies
    expression is builtin and lack of a module owner reference is ok.

    Fixes: 3a07327d10a0 ("netfilter: nft_inner: support for inner tunnel header matching")
    Signed-off-by: Florian Westphal <fw@strlen.de>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>
2025-01-13 15:04:39 +00:00
CKI Backport Bot f9140cdd7d netfilter: ctnetlink: support CTA_FILTER for flush
JIRA: https://issues.redhat.com/browse/RHEL-73708

commit 1ef7f50ccc6e8e2b5de96ad1e304684a277a3055
Author: Changliang Wu <changliang.wu@smartx.com>
Date:   Thu Jun 20 19:35:27 2024 +0800

    netfilter: ctnetlink: support CTA_FILTER for flush

    From cb8aa9a, we can use kernel side filtering for dump, but
    this capability is not available for flush.

    This Patch allows advanced filter with CTA_FILTER for flush

    Performace
    1048576 ct flows in total, delete 50,000 flows by origin src ip
    3.06s -> dump all, compare and delete
    584ms -> directly flush with filter

    Signed-off-by: Changliang Wu <changliang.wu@smartx.com>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>
2025-01-13 15:04:36 +00:00
CKI Backport Bot a699b653ab netfilter: nfnetlink: convert kfree_skb to consume_skb
JIRA: https://issues.redhat.com/browse/RHEL-73708

commit e2444c1d463995477fb447be9d0c54150a5c393b
Author: Donald Hunter <donald.hunter@gmail.com>
Date:   Tue May 28 11:37:54 2024 +0100

    netfilter: nfnetlink: convert kfree_skb to consume_skb

    Use consume_skb in the batch code path to avoid generating spurious
    NOT_SPECIFIED skb drop reasons.

    Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>
2025-01-13 15:04:35 +00:00
CKI Backport Bot 229e48953b netfilter: conntrack: fix ct-state for ICMPv6 Multicast Router Discovery
JIRA: https://issues.redhat.com/browse/RHEL-73708

commit 4a3540a8bf3c13dc3955f0c0895332b9c653be3f
Author: Linus Lüssing <linus.luessing@c0d3.blue>
Date:   Wed Mar 6 15:18:04 2024 +0100

    netfilter: conntrack: fix ct-state for ICMPv6 Multicast Router Discovery

    So far Multicast Router Advertisements and Multicast Router
    Solicitations from the Multicast Router Discovery protocol (RFC4286)
    would be marked as INVALID for IPv6, even if they are in fact intact
    and adhering to RFC4286.

    This broke MRA reception and by that multicast reception on
    IPv6 multicast routers in a Proxmox managed setup, where Proxmox
    would install a rule like "-m conntrack --ctstate INVALID -j DROP"
    at the top of the FORWARD chain with br-nf-call-ip6tables enabled
    by default.

    Similar to as it's done for MLDv1, MLDv2 and IPv6 Neighbor Discovery
    already, fix this issue by excluding MRD from connection tracking
    handling as MRD always uses predefined multicast destinations
    for its messages, too. This changes the ct-state for ICMPv6 MRD messages
    from INVALID to UNTRACKED.

    This issue was found and fixed with the help of the mrdisc tool
    (https://github.com/troglobit/mrdisc).

    Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>
2025-01-13 15:04:34 +00:00
CKI Backport Bot dfe93f9c5c netfilter: nf_tables: skip transaction if update object is not implemented
JIRA: https://issues.redhat.com/browse/RHEL-73708

commit 84b1a0c0140a9a92ea108576c0002210f224ce59
Author: Pablo Neira Ayuso <pablo@netfilter.org>
Date:   Tue Mar 5 09:35:48 2024 +0100

    netfilter: nf_tables: skip transaction if update object is not implemented

    Turn update into noop as a follow up for:

      9fedd894b4 ("netfilter: nf_tables: fix unexpected EOPNOTSUPP error")

    instead of adding a transaction object which is simply discarded at a
    later stage of the commit protocol.

    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>
2025-01-13 15:04:33 +00:00
CKI Backport Bot 9a914a6c65 netfilter: nft_socket: remove WARN_ON_ONCE on maximum cgroup level
JIRA: https://issues.redhat.com/browse/RHEL-73350
CVE: CVE-2024-56783

commit b7529880cb961d515642ce63f9d7570869bbbdc3
Author: Pablo Neira Ayuso <pablo@netfilter.org>
Date:   Tue Nov 26 11:59:06 2024 +0100

    netfilter: nft_socket: remove WARN_ON_ONCE on maximum cgroup level

    cgroup maximum depth is INT_MAX by default, there is a cgroup toggle to
    restrict this maximum depth to a more reasonable value not to harm
    performance. Remove unnecessary WARN_ON_ONCE which is reachable from
    userspace.

    Fixes: 7f3287db6543 ("netfilter: nft_socket: make cgroupsv2 matching work with namespaces")
    Reported-by: syzbot+57bac0866ddd99fe47c0@syzkaller.appspotmail.com
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>
2025-01-13 14:42:49 +00:00
Rado Vrbovsky db51a70cea Merge: netfilter: ipset: Fix for recursive locking warning
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/6059

```
JIRA: https://issues.redhat.com/browse/RHEL-35897
Upstream Status: net.git commit 70b6f46a4ed8bd56c85ffff22df91e20e8c85e33

commit 70b6f46a4ed8bd56c85ffff22df91e20e8c85e33
Author: Phil Sutter <phil@nwl.cc>
Date:   Tue Dec 17 20:56:55 2024 +0100

    netfilter: ipset: Fix for recursive locking warning

    With CONFIG_PROVE_LOCKING, when creating a set of type bitmap:ip, adding
    it to a set of type list:set and populating it from iptables SET target
    triggers a kernel warning:

    | WARNING: possible recursive locking detected
    | 6.12.0-rc7-01692-g5e9a28f41134-dirty #594 Not tainted
    | --------------------------------------------
    | ping/4018 is trying to acquire lock:
    | ffff8881094a6848 (&set->lock){+.-.}-{2:2}, at: ip_set_add+0x28c/0x360 [ip_set]
    |
    | but task is already holding lock:
    | ffff88811034c048 (&set->lock){+.-.}-{2:2}, at: ip_set_add+0x28c/0x360 [ip_set]

    This is a false alarm: ipset does not allow nested list:set type, so the
    loop in list_set_kadd() can never encounter the outer set itself. No
    other set type supports embedded sets, so this is the only case to
    consider.

    To avoid the false report, create a distinct lock class for list:set
    type ipset locks.

    Fixes: f830837f0e ("netfilter: ipset: list:set set type support")
    Signed-off-by: Phil Sutter <phil@nwl.cc>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
```
Signed-off-by: Phil Sutter <psutter@redhat.com>

Approved-by: Florian Westphal <fwestpha@redhat.com>
Approved-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2025-01-06 08:26:07 +00:00
Patrick Talbert 5838c30c9f Merge: netfilter: IDLETIMER: Fix for possible ABBA deadlock
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/6021

```
JIRA: https://issues.redhat.com/browse/RHEL-6041
Upstream Status: net.git commit f36b01994d68ffc253c8296e2228dfe6e6431c03

commit f36b01994d68ffc253c8296e2228dfe6e6431c03
Author: Phil Sutter <phil@nwl.cc>
Date:   Fri Dec 6 19:32:29 2024 +0100

    netfilter: IDLETIMER: Fix for possible ABBA deadlock

    Deletion of the last rule referencing a given idletimer may happen at
    the same time as a read of its file in sysfs:

    | ======================================================
    | WARNING: possible circular locking dependency detected
    | 6.12.0-rc7-01692-g5e9a28f41134-dirty #594 Not tainted
    | ------------------------------------------------------
    | iptables/3303 is trying to acquire lock:
    | ffff8881057e04b8 (kn->active#48){++++}-{0:0}, at: __kernfs_remove+0x20
    |
    | but task is already holding lock:
    | ffffffffa0249068 (list_mutex){+.+.}-{3:3}, at: idletimer_tg_destroy_v]
    |
    | which lock already depends on the new lock.

    A simple reproducer is:

    | #!/bin/bash
    |
    | while true; do
    |         iptables -A INPUT -i foo -j IDLETIMER --timeout 10 --label "testme"
    |         iptables -D INPUT -i foo -j IDLETIMER --timeout 10 --label "testme"
    | done &
    | while true; do
    |         cat /sys/class/xt_idletimer/timers/testme >/dev/null
    | done

    Avoid this by freeing list_mutex right after deleting the element from
    the list, then continuing with the teardown.

    Fixes: 0902b469bd ("netfilter: xtables: idletimer target implementation")
    Signed-off-by: Phil Sutter <phil@nwl.cc>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: Phil Sutter <psutter@redhat.com>
```

Approved-by: Florian Westphal <fwestpha@redhat.com>
Approved-by: Antoine Tenart <atenart@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Patrick Talbert <ptalbert@redhat.com>
2024-12-30 07:30:25 -05:00
Patrick Talbert 98f52f1680 Merge: CNB96: netlink/devlink: update devlink & netlink to the v6.12
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5861

JIRA: https://issues.redhat.com/browse/RHEL-57756
Depends: !5257
Depends: !5851
Signed-off-by: Petr Oros <poros@redhat.com>

Approved-by: José Ignacio Tornos Martínez <jtornosm@redhat.com>
Approved-by: Davide Caratti <dcaratti@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Patrick Talbert <ptalbert@redhat.com>
2024-12-30 07:30:10 -05:00
Phil Sutter 385069a5f1 netfilter: ipset: Fix for recursive locking warning
JIRA: https://issues.redhat.com/browse/RHEL-35897
Upstream Status: net.git commit 70b6f46a4ed8bd56c85ffff22df91e20e8c85e33

commit 70b6f46a4ed8bd56c85ffff22df91e20e8c85e33
Author: Phil Sutter <phil@nwl.cc>
Date:   Tue Dec 17 20:56:55 2024 +0100

    netfilter: ipset: Fix for recursive locking warning

    With CONFIG_PROVE_LOCKING, when creating a set of type bitmap:ip, adding
    it to a set of type list:set and populating it from iptables SET target
    triggers a kernel warning:

    | WARNING: possible recursive locking detected
    | 6.12.0-rc7-01692-g5e9a28f41134-dirty #594 Not tainted
    | --------------------------------------------
    | ping/4018 is trying to acquire lock:
    | ffff8881094a6848 (&set->lock){+.-.}-{2:2}, at: ip_set_add+0x28c/0x360 [ip_set]
    |
    | but task is already holding lock:
    | ffff88811034c048 (&set->lock){+.-.}-{2:2}, at: ip_set_add+0x28c/0x360 [ip_set]

    This is a false alarm: ipset does not allow nested list:set type, so the
    loop in list_set_kadd() can never encounter the outer set itself. No
    other set type supports embedded sets, so this is the only case to
    consider.

    To avoid the false report, create a distinct lock class for list:set
    type ipset locks.

    Fixes: f830837f0e ("netfilter: ipset: list:set set type support")
    Signed-off-by: Phil Sutter <phil@nwl.cc>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: Phil Sutter <psutter@redhat.com>
2024-12-19 13:43:25 +01:00
Phil Sutter 16c2cecd78 netfilter: IDLETIMER: Fix for possible ABBA deadlock
JIRA: https://issues.redhat.com/browse/RHEL-6041
Upstream Status: net.git commit f36b01994d68ffc253c8296e2228dfe6e6431c03

commit f36b01994d68ffc253c8296e2228dfe6e6431c03
Author: Phil Sutter <phil@nwl.cc>
Date:   Fri Dec 6 19:32:29 2024 +0100

    netfilter: IDLETIMER: Fix for possible ABBA deadlock

    Deletion of the last rule referencing a given idletimer may happen at
    the same time as a read of its file in sysfs:

    | ======================================================
    | WARNING: possible circular locking dependency detected
    | 6.12.0-rc7-01692-g5e9a28f41134-dirty #594 Not tainted
    | ------------------------------------------------------
    | iptables/3303 is trying to acquire lock:
    | ffff8881057e04b8 (kn->active#48){++++}-{0:0}, at: __kernfs_remove+0x20
    |
    | but task is already holding lock:
    | ffffffffa0249068 (list_mutex){+.+.}-{3:3}, at: idletimer_tg_destroy_v]
    |
    | which lock already depends on the new lock.

    A simple reproducer is:

    | #!/bin/bash
    |
    | while true; do
    |         iptables -A INPUT -i foo -j IDLETIMER --timeout 10 --label "testme"
    |         iptables -D INPUT -i foo -j IDLETIMER --timeout 10 --label "testme"
    | done &
    | while true; do
    |         cat /sys/class/xt_idletimer/timers/testme >/dev/null
    | done

    Avoid this by freeing list_mutex right after deleting the element from
    the list, then continuing with the teardown.

    Fixes: 0902b469bd ("netfilter: xtables: idletimer target implementation")
    Signed-off-by: Phil Sutter <phil@nwl.cc>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: Phil Sutter <psutter@redhat.com>
2024-12-12 15:46:12 +01:00
Petr Oros 1cd6777e53 netfilter: nfnetlink: Initialise extack before use in ACKs
JIRA: https://issues.redhat.com/browse/RHEL-57756

CVE: CVE-2024-44945

Upstream commit(s):
commit d1a7b382a9d3f0f3e5a80e0be2991c075fa4f618
Author: Donald Hunter <donald.hunter@gmail.com>
Date:   Tue Aug 6 16:43:24 2024 +0100

    netfilter: nfnetlink: Initialise extack before use in ACKs

    Add missing extack initialisation when ACKing BATCH_BEGIN and BATCH_END.

    Fixes: bf2ac490d28c ("netfilter: nfnetlink: Handle ACK flags for batch messages")
    Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-12-10 10:37:56 +01:00
Petr Oros dc1955b023 netfilter: nfnetlink: Handle ACK flags for batch messages
JIRA: https://issues.redhat.com/browse/RHEL-57756

Upstream commit(s):
commit bf2ac490d28c21a349e9eef81edc45320fca4a3c
Author: Donald Hunter <donald.hunter@gmail.com>
Date:   Thu Apr 18 11:47:37 2024 +0100

    netfilter: nfnetlink: Handle ACK flags for batch messages

    The NLM_F_ACK flag is ignored for nfnetlink batch begin and end
    messages. This is a problem for ynl which wants to receive an ack for
    every message it sends, not just the commands in between the begin/end
    messages.

    Add processing for ACKs for begin/end messages and provide responses
    when requested.

    I have checked that iproute2, pyroute2 and systemd are unaffected by
    this change since none of them use NLM_F_ACK for batch begin/end.

    Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
    Link: https://lore.kernel.org/r/20240418104737.77914-5-donald.hunter@gmail.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-12-10 10:37:53 +01:00
Phil Sutter fd462b693e netfilter: ipset: Hold module reference while requesting a module
JIRA: https://issues.redhat.com/browse/RHEL-35819
Upstream Status: net.git commit 456f010bfaefde84d3390c755eedb1b0a5857c3c

commit 456f010bfaefde84d3390c755eedb1b0a5857c3c
Author: Phil Sutter <phil@nwl.cc>
Date:   Fri Nov 29 16:30:38 2024 +0100

    netfilter: ipset: Hold module reference while requesting a module

    User space may unload ip_set.ko while it is itself requesting a set type
    backend module, leading to a kernel crash. The race condition may be
    provoked by inserting an mdelay() right after the nfnl_unlock() call.

    Fixes: a7b4f989a6 ("netfilter: ipset: IP set core support")
    Signed-off-by: Phil Sutter <phil@nwl.cc>
    Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: Phil Sutter <psutter@redhat.com>
2024-12-05 12:59:39 +01:00
Rado Vrbovsky 9da3f14bc5 Merge: CVE-2024-50251: netfilter: nft_payload: sanitize offset and length before calling skb_checksum()
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5746

JIRA: https://issues.redhat.com/browse/RHEL-66855
CVE: CVE-2024-50251

```
netfilter: nft_payload: sanitize offset and length before calling skb_checksum()

If access to offset + length is larger than the skbuff length, then
skb_checksum() triggers BUG_ON().

skb_checksum() internally subtracts the length parameter while iterating
over skbuff, BUG_ON(len) at the end of it checks that the expected
length to be included in the checksum calculation is fully consumed.

Fixes: 7ec3f7b47b ("netfilter: nft_payload: add packet mangling support")
Reported-by: Slavin Liu <slavin-ayu@qq.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
(cherry picked from commit d5953d680f7e96208c29ce4139a0e38de87a57fe)
```

Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>

---

<small>Created 2024-11-11 06:17 UTC by backporter - [KWF FAQ](https://red.ht/kernel_workflow_doc) - [Slack #team-kernel-workflow](https://redhat-internal.slack.com/archives/C04LRUPMJQ5) - [Source](https://gitlab.com/cki-project/kernel-workflow/-/blob/main/webhook/utils/backporter.py) - [Documentation](https://gitlab.com/cki-project/kernel-workflow/-/blob/main/docs/README.backporter.md) - [Report an issue](https://gitlab.com/cki-project/kernel-workflow/-/issues/new?issue%5Btitle%5D=backporter%20webhook%20issue)</small>

Approved-by: Antoine Tenart <atenart@redhat.com>
Approved-by: Xin Long <lxin@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2024-11-27 11:19:38 +00:00
Rado Vrbovsky 17ebd1b961 Merge: netfilter: bpf: must hold reference on net namespace
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5719

JIRA: https://issues.redhat.com/browse/RHEL-65877
Upstream Status: commit 1230fe7ad397
CVE: CVE-2024-50130

Signed-off-by: Florian Westphal <fwestpha@redhat.com>

Approved-by: Chris von Recklinghausen <crecklin@redhat.com>
Approved-by: Xin Long <lxin@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2024-11-22 09:25:30 +00:00
Antoine Tenart f0adec3f81 ipv6: annotate data-races around cnf.hop_limit
JIRA: https://issues.redhat.com/browse/RHEL-62203
Upstream Status: linux.git

commit e0bb2675fea2783c45bb95d74f00c55156720863
Author: Eric Dumazet <edumazet@google.com>
Date:   Wed Feb 28 13:54:29 2024 +0000

    ipv6: annotate data-races around cnf.hop_limit

    idev->cnf.hop_limit and net->ipv6.devconf_all->hop_limit
    might be read locklessly, add appropriate READ_ONCE()
    and WRITE_ONCE() annotations.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Acked-by: Florian Westphal <fw@strlen.de> # for netfilter parts
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2024-11-14 10:16:48 +01:00
Rado Vrbovsky fc0c68cffc Merge: netfilter: xtables: avoid NFPROTO_UNSPEC where needed
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5575

```
CVE: CVE-2024-50038
JIRA: https://issues.redhat.com/browse/RHEL-63905
```
Signed-off-by: Phil Sutter <psutter@redhat.com>

Approved-by: Florian Westphal <fwestpha@redhat.com>
Approved-by: Antoine Tenart <atenart@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2024-11-12 08:14:14 +00:00
Rado Vrbovsky ab32b3c363 Merge: ipvs: properly dereference pe in ip_vs_add_service
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5412

```
CVE: CVE-2024-42322
JIRA: https://issues.redhat.com/browse/RHEL-54908
Upstream Status: commit cbd070a4ae62f119058973f6d2c984e325bce6e7
Conflicts:
- Context change due to missing commit 705dd3444081
  ("ipvs: use kthreads for stats estimation").

commit cbd070a4ae62f119058973f6d2c984e325bce6e7
Author: Chen Hanxiao <chenhx.fnst@fujitsu.com>
Date:   Thu Jun 27 14:15:15 2024 +0800

    ipvs: properly dereference pe in ip_vs_add_service

    Use pe directly to resolve sparse warning:

      net/netfilter/ipvs/ip_vs_ctl.c:1471:27: warning: dereference of noderef expression

    Fixes: 39b9722315 ("ipvs: handle connections started by real-servers")
    Signed-off-by: Chen Hanxiao <chenhx.fnst@fujitsu.com>
    Acked-by: Julian Anastasov <ja@ssi.bg>
    Acked-by: Simon Horman <horms@kernel.org>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: Phil Sutter <psutter@redhat.com>
```

Approved-by: Florian Westphal <fwestpha@redhat.com>
Approved-by: Antoine Tenart <atenart@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2024-11-12 08:03:30 +00:00
CKI Backport Bot 86076f0f96 netfilter: nft_payload: sanitize offset and length before calling skb_checksum()
JIRA: https://issues.redhat.com/browse/RHEL-66855
CVE: CVE-2024-50251

commit d5953d680f7e96208c29ce4139a0e38de87a57fe
Author: Pablo Neira Ayuso <pablo@netfilter.org>
Date:   Wed Oct 30 23:13:48 2024 +0100

    netfilter: nft_payload: sanitize offset and length before calling skb_checksum()

    If access to offset + length is larger than the skbuff length, then
    skb_checksum() triggers BUG_ON().

    skb_checksum() internally subtracts the length parameter while iterating
    over skbuff, BUG_ON(len) at the end of it checks that the expected
    length to be included in the checksum calculation is fully consumed.

    Fixes: 7ec3f7b47b ("netfilter: nft_payload: add packet mangling support")
    Reported-by: Slavin Liu <slavin-ayu@qq.com>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>
2024-11-11 06:17:54 +00:00
Florian Westphal 012a65c2b6 netfilter: bpf: must hold reference on net namespace
JIRA: https://issues.redhat.com/browse/RHEL-65877
Upstream Status: commit 1230fe7ad397
CVE: CVE-2024-50130

Conflicts: net/netfilter/nf_bpf_link.c

RHEL9 lacks nf_bpf defrag support that came with
commit 1721c2d02d3
("netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link"),
so discard/ignore bpf_nf_disable_defrag() call.

commit 1230fe7ad3974f7bf6c78901473e039b34d4fb1f
Author: Florian Westphal <fw@strlen.de>
Date:   Thu Oct 10 18:34:05 2024 +0200

    netfilter: bpf: must hold reference on net namespace

    BUG: KASAN: slab-use-after-free in __nf_unregister_net_hook+0x640/0x6b0
    Read of size 8 at addr ffff8880106fe400 by task repro/72=
    bpf_nf_link_release+0xda/0x1e0
    bpf_link_free+0x139/0x2d0
    bpf_link_release+0x68/0x80
    __fput+0x414/0xb60

    Eric says:
     It seems that bpf was able to defer the __nf_unregister_net_hook()
     after exit()/close() time.
     Perhaps a netns reference is missing, because the netns has been
     dismantled/freed already.
     bpf_nf_link_attach() does :
     link->net = net;
     But I do not see a reference being taken on net.

    Add such a reference and release it after hook unreg.
    Note that I was unable to get syzbot reproducer to work, so I
    do not know if this resolves this splat.

    Fixes: 84601d6ee68a ("bpf: add bpf_link support for BPF_NETFILTER programs")
    Diagnosed-by: Eric Dumazet <edumazet@google.com>
    Reported-by: Lai, Yi <yi1.lai@linux.intel.com>
    Signed-off-by: Florian Westphal <fw@strlen.de>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: Florian Westphal <fwestpha@redhat.com>
2024-11-07 21:52:35 +01:00
Rado Vrbovsky 14b4cc02eb Merge: BPF 6.9 rebase
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5142

Rebase BPF subsystem to upstream version 6.9

JIRA: https://issues.redhat.com/browse/RHEL-23649

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>

Approved-by: Viktor Malik <vmalik@redhat.com>
Approved-by: Chris von Recklinghausen <crecklin@redhat.com>
Approved-by: Rafael Aquini <raquini@redhat.com>
Approved-by: Mark Salter <msalter@redhat.com>
Approved-by: Toke Høiland-Jørgensen <toke@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2024-10-30 07:25:08 +00:00
Rado Vrbovsky 570a71d7db Merge: mm: update core code to v6.6 upstream
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5252

JIRA: https://issues.redhat.com/browse/RHEL-27743  
JIRA: https://issues.redhat.com/browse/RHEL-59459    
CVE: CVE-2024-46787    
Depends: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4961  
  
This MR brings RHEL9 core MM code up to upstream's v6.6 LTS level.    
This work follows up on the previous v6.5 update (RHEL-27742) and as such,    
the bulk of this changeset is comprised of refactoring and clean-ups of     
the internal implementation of several APIs as it further advances the     
conversion to FOLIOS, and follow up on the per-VMA locking changes.

Also, with the rebase to v6.6 LTS, we complete the infrastructure to allow    
Control-flow Enforcement Technology, a.k.a. Shadow Stacks, for x86 builds,    
and we add a potential extra level of protection (assessment pending) to help    
on mitigating kernel heap exploits dubbed as "SlubStick".     
    
Follow-up fixes are omitted from this series either because they are irrelevant to     
the bits we support on RHEL or because they depend on bigger changesets introduced     
upstream more recently. A follow-up ticket (RHEL-27745) will deal with these and other cases separately.    

Omitted-fix: e540b8c5da04 ("mips: mm: add slab availability checking in ioremap_prot")    
Omitted-fix: f7875966dc0c ("tools headers UAPI: Sync files changed by new fchmodat2 and map_shadow_stack syscalls with the kernel sources")   
Omitted-fix: df39038cd895 ("s390/mm: Fix VM_FAULT_HWPOISON handling in do_exception()")    
Omitted-fix: 12bbaae7635a ("mm: create FOLIO_FLAG_FALSE and FOLIO_TYPE_OPS macros")    
Omitted-fix: fd1a745ce03e ("mm: support page_mapcount() on page_has_type() pages")    
Omitted-fix: d99e3140a4d3 ("mm: turn folio_test_hugetlb into a PageType")    
Omitted-fix: fa2690af573d ("mm: page_ref: remove folio_try_get_rcu()")    
Omitted-fix: f442fa614137 ("mm: gup: stop abusing try_grab_folio")    
Omitted-fix: cb0f01beb166 ("mm/mprotect: fix dax pud handling")    
    
Signed-off-by: Rafael Aquini <raquini@redhat.com>

Approved-by: John W. Linville <linville@redhat.com>
Approved-by: Mark Salter <msalter@redhat.com>
Approved-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Approved-by: Chris von Recklinghausen <crecklin@redhat.com>
Approved-by: Steve Best <sbest@redhat.com>
Approved-by: David Airlie <airlied@redhat.com>
Approved-by: Michal Schmidt <mschmidt@redhat.com>
Approved-by: Baoquan He <5820488-baoquan_he@users.noreply.gitlab.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2024-10-30 07:22:28 +00:00
Phil Sutter 01de117062 netfilter: xtables: fix typo causing some targets not to load on IPv6
CVE: CVE-2024-50038
JIRA: https://issues.redhat.com/browse/RHEL-63905
Upstream Status: net.git commit 306ed1728e8438caed30332e1ab46b28c25fe3d8

commit 306ed1728e8438caed30332e1ab46b28c25fe3d8
Author: Pablo Neira Ayuso <pablo@netfilter.org>
Date:   Sun Oct 20 14:49:51 2024 +0200

    netfilter: xtables: fix typo causing some targets not to load on IPv6

    - There is no NFPROTO_IPV6 family for mark and NFLOG.
    - TRACE is also missing module autoload with NFPROTO_IPV6.

    This results in ip6tables failing to restore a ruleset. This issue has been
    reported by several users providing incomplete patches.

    Very similar to Ilya Katsnelson's patch including a missing chunk in the
    TRACE extension.

    Fixes: 0bfcb7b71e73 ("netfilter: xtables: avoid NFPROTO_UNSPEC where needed")
    Reported-by: Ignat Korchagin <ignat@cloudflare.com>
    Reported-by: Ilya Katsnelson <me@0upti.me>
    Reported-by: Krzysztof Olędzki <ole@ans.pl>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: Phil Sutter <psutter@redhat.com>
2024-10-23 16:19:47 +02:00
Phil Sutter 3f35e92a41 netfilter: xtables: avoid NFPROTO_UNSPEC where needed
CVE: CVE-2024-50038
JIRA: https://issues.redhat.com/browse/RHEL-63905
Upstream Status: commit 0bfcb7b71e735560077a42847f69597ec7dcc326
Conflicts: Missing commit f2e3778db7e1 ("netfilter: remove xt pernet
	   data") in RHEL9, keep the deprecation warning for NOTRACK
	   target.

commit 0bfcb7b71e735560077a42847f69597ec7dcc326
Author: Florian Westphal <fw@strlen.de>
Date:   Mon Oct 7 11:28:16 2024 +0200

    netfilter: xtables: avoid NFPROTO_UNSPEC where needed

    syzbot managed to call xt_cluster match via ebtables:

     WARNING: CPU: 0 PID: 11 at net/netfilter/xt_cluster.c:72 xt_cluster_mt+0x196/0x780
     [..]
     ebt_do_table+0x174b/0x2a40

    Module registers to NFPROTO_UNSPEC, but it assumes ipv4/ipv6 packet
    processing.  As this is only useful to restrict locally terminating
    TCP/UDP traffic, register this for ipv4 and ipv6 family only.

    Pablo points out that this is a general issue, direct users of the
    set/getsockopt interface can call into targets/matches that were only
    intended for use with ip(6)tables.

    Check all UNSPEC matches and targets for similar issues:

    - matches and targets are fine except if they assume skb_network_header()
      is valid -- this is only true when called from inet layer: ip(6) stack
      pulls the ip/ipv6 header into linear data area.
    - targets that return XT_CONTINUE or other xtables verdicts must be
      restricted too, they are incompatbile with the ebtables traverser, e.g.
      EBT_CONTINUE is a completely different value than XT_CONTINUE.

    Most matches/targets are changed to register for NFPROTO_IPV4/IPV6, as
    they are provided for use by ip(6)tables.

    The MARK target is also used by arptables, so register for NFPROTO_ARP too.

    While at it, bail out if connbytes fails to enable the corresponding
    conntrack family.

    This change passes the selftests in iptables.git.

    Reported-by: syzbot+256c348558aa5cf611a9@syzkaller.appspotmail.com
    Closes: https://lore.kernel.org/netfilter-devel/66fec2e2.050a0220.9ec68.0047.GAE@google.com/
    Fixes: 0269ea4937 ("netfilter: xtables: add cluster match")
    Signed-off-by: Florian Westphal <fw@strlen.de>
    Co-developed-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: Phil Sutter <psutter@redhat.com>
2024-10-23 16:19:47 +02:00
Phil Sutter 667495128e ipvs: properly dereference pe in ip_vs_add_service
CVE: CVE-2024-42322
JIRA: https://issues.redhat.com/browse/RHEL-54908
Upstream Status: commit cbd070a4ae62f119058973f6d2c984e325bce6e7
Conflicts:
- Context change due to missing commit 705dd3444081
  ("ipvs: use kthreads for stats estimation").

commit cbd070a4ae62f119058973f6d2c984e325bce6e7
Author: Chen Hanxiao <chenhx.fnst@fujitsu.com>
Date:   Thu Jun 27 14:15:15 2024 +0800

    ipvs: properly dereference pe in ip_vs_add_service

    Use pe directly to resolve sparse warning:

      net/netfilter/ipvs/ip_vs_ctl.c:1471:27: warning: dereference of noderef expression

    Fixes: 39b9722315 ("ipvs: handle connections started by real-servers")
    Signed-off-by: Chen Hanxiao <chenhx.fnst@fujitsu.com>
    Acked-by: Julian Anastasov <ja@ssi.bg>
    Acked-by: Simon Horman <horms@kernel.org>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: Phil Sutter <psutter@redhat.com>
2024-10-16 16:07:34 +02:00
Jerome Marchand 563e3eb7e7 bpf: treewide: Annotate BPF kfuncs in BTF
JIRA: https://issues.redhat.com/browse/RHEL-23649

Conflicts: Multiple conflicts due to missing kfuncs. All sections were
switched to use the new macro except bpf_mptcp_fmodret_ids which still
use BTF_SET8_* upstream. I don't know why. That might be an upstream
oversight.

commit 6f3189f38a3e995232e028a4c341164c4aca1b20
Author: Daniel Xu <dxu@dxuuu.xyz>
Date:   Sun Jan 28 18:24:08 2024 -0700

    bpf: treewide: Annotate BPF kfuncs in BTF

    This commit marks kfuncs as such inside the .BTF_ids section. The upshot
    of these annotations is that we'll be able to automatically generate
    kfunc prototypes for downstream users. The process is as follows:

    1. In source, use BTF_KFUNCS_START/END macro pair to mark kfuncs
    2. During build, pahole injects into BTF a "bpf_kfunc" BTF_DECL_TAG for
       each function inside BTF_KFUNCS sets
    3. At runtime, vmlinux or module BTF is made available in sysfs
    4. At runtime, bpftool (or similar) can look at provided BTF and
       generate appropriate prototypes for functions with "bpf_kfunc" tag

    To ensure future kfunc are similarly tagged, we now also return error
    inside kfunc registration for untagged kfuncs. For vmlinux kfuncs,
    we also WARN(), as initcall machinery does not handle errors.

    Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
    Acked-by: Benjamin Tissoires <bentiss@kernel.org>
    Link: https://lore.kernel.org/r/e55150ceecbf0a5d961e608941165c0bee7bc943.1706491398.git.dxu@dxuuu.xyz
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:07 +02:00
Jerome Marchand d1c16d1138 bpf: Take into account BPF token when fetching helper protos
JIRA: https://issues.redhat.com/browse/RHEL-23649

Conflicts: Context change due to missing commit 9a675ba55a96 ("net,
bpf: Add a warning if NAPI cb missed xdp_do_flush().")

commit bbc1d24724e110b86a1a7c3c1724ce0d62cc1e2e
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Tue Jan 23 18:21:04 2024 -0800

    bpf: Take into account BPF token when fetching helper protos

    Instead of performing unconditional system-wide bpf_capable() and
    perfmon_capable() calls inside bpf_base_func_proto() function (and other
    similar ones) to determine eligibility of a given BPF helper for a given
    program, use previously recorded BPF token during BPF_PROG_LOAD command
    handling to inform the decision.

    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Link: https://lore.kernel.org/bpf/20240124022127.2379740-8-andrii@kernel.org

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:03 +02:00
Rafael Aquini 19e74512fe minmax: add in_range() macro
JIRA: https://issues.redhat.com/browse/RHEL-27743
Conflicts:
  * fs/btrfs/misc.h: hunk dropped (unsupported FS)
  * arch/arm/mm/pageattr.c and include/linux/minmax.h: minor context diffs

This patch is a backport of the following upstream commit:
commit f9bff0e31881d03badf191d3b0005839391f5f2b
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Wed Aug 2 16:13:29 2023 +0100

    minmax: add in_range() macro

    Patch series "New page table range API", v6.

    This patchset changes the API used by the MM to set up page table entries.
    The four APIs are:

        set_ptes(mm, addr, ptep, pte, nr)
        update_mmu_cache_range(vma, addr, ptep, nr)
        flush_dcache_folio(folio)
        flush_icache_pages(vma, page, nr)

    flush_dcache_folio() isn't technically new, but no architecture
    implemented it, so I've done that for them.  The old APIs remain around
    but are mostly implemented by calling the new interfaces.

    The new APIs are based around setting up N page table entries at once.
    The N entries belong to the same PMD, the same folio and the same VMA, so
    ptep++ is a legitimate operation, and locking is taken care of for you.
    Some architectures can do a better job of it than just a loop, but I have
    hesitated to make too deep a change to architectures I don't understand
    well.

    One thing I have changed in every architecture is that PG_arch_1 is now a
    per-folio bit instead of a per-page bit when used for dcache clean/dirty
    tracking.  This was something that would have to happen eventually, and it
    makes sense to do it now rather than iterate over every page involved in a
    cache flush and figure out if it needs to happen.

    The point of all this is better performance, and Fengwei Yin has measured
    improvement on x86.  I suspect you'll see improvement on your architecture
    too.  Try the new will-it-scale test mentioned here:
    https://lore.kernel.org/linux-mm/20230206140639.538867-5-fengwei.yin@intel.com/
    You'll need to run it on an XFS filesystem and have
    CONFIG_TRANSPARENT_HUGEPAGE set.

    This patchset is the basis for much of the anonymous large folio work
    being done by Ryan, so it's received quite a lot of testing over the last
    few months.

    This patch (of 38):

    Determine if a value lies within a range more efficiently (subtraction +
    comparison vs two comparisons and an AND).  It also has useful (under some
    circumstances) behaviour if the range exceeds the maximum value of the
    type.  Convert all the conflicting definitions of in_range() within the
    kernel; some can use the generic definition while others need their own
    definition.

    Link: https://lkml.kernel.org/r/20230802151406.3735276-1-willy@infradead.org
    Link: https://lkml.kernel.org/r/20230802151406.3735276-2-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-10-01 11:20:16 -04:00