Commit Graph

2210 Commits

Author SHA1 Message Date
Patrick Talbert 7265dc2d6b Merge: dev: Acquire netdev_rename_lock before restoring dev->name in dev_change_name().
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/6329

JIRA: https://issues.redhat.com/browse/RHEL-77329

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>

Approved-by: Antoine Tenart <atenart@redhat.com>
Approved-by: Paolo Abeni <pabeni@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Patrick Talbert <ptalbert@redhat.com>
2025-02-13 02:24:34 -05:00
Toke Høiland-Jørgensen e076798a08 dev: Acquire netdev_rename_lock before restoring dev->name in dev_change_name().
JIRA: https://issues.redhat.com/browse/RHEL-77329

commit e361560a7912958ba3059f51e7dd21612d119169
Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Wed Jan 15 18:55:43 2025 +0900

    dev: Acquire netdev_rename_lock before restoring dev->name in dev_change_name().

    The cited commit forgot to add netdev_rename_lock in one of the
    error paths in dev_change_name().

    Let's hold netdev_rename_lock before restoring the old dev->name.

    Fixes: 0840556e5a3a ("net: Protect dev->name by seqlock.")
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Link: https://patch.msgid.link/20250115095545.52709-2-kuniyu@amazon.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
2025-02-03 11:06:49 +01:00
Xin Long 1955673054 net: sched: refine software bypass handling in tc_run
JIRA: https://issues.redhat.com/browse/RHEL-60271
JIRA: https://issues.redhat.com/browse/RHEL-61181
Upstream Status: net-next.git
Tested: compile only

commit a12c76a03386e32413ae8eaaefa337e491880632
Author: Xin Long <lucien.xin@gmail.com>
Date:   Wed Jan 15 09:27:54 2025 -0500

    net: sched: refine software bypass handling in tc_run

    This patch addresses issues with filter counting in block (tcf_block),
    particularly for software bypass scenarios, by introducing a more
    accurate mechanism using useswcnt.

    Previously, filtercnt and skipswcnt were introduced by:

      Commit 2081fd3445fe ("net: sched: cls_api: add filter counter") and
      Commit f631ef39d819 ("net: sched: cls_api: add skip_sw counter")

      filtercnt tracked all tp (tcf_proto) objects added to a block, and
      skipswcnt counted tp objects with the skipsw attribute set.

    The problem is: a single tp can contain multiple filters, some with skipsw
    and others without. The current implementation fails in the case:

      When the first filter in a tp has skipsw, both skipswcnt and filtercnt
      are incremented, then adding a second filter without skipsw to the same
      tp does not modify these counters because tp->counted is already set.

      This results in bypass software behavior based solely on skipswcnt
      equaling filtercnt, even when the block includes filters without
      skipsw. Consequently, filters without skipsw are inadvertently bypassed.

    To address this, the patch introduces useswcnt in block to explicitly count
    tp objects containing at least one filter without skipsw. Key changes
    include:

      Whenever a filter without skipsw is added, its tp is marked with usesw
      and counted in useswcnt. tc_run() now uses useswcnt to determine software
      bypass, eliminating reliance on filtercnt and skipswcnt.

      This refined approach prevents software bypass for blocks containing
      mixed filters, ensuring correct behavior in tc_run().

    Additionally, as atomic operations on useswcnt ensure thread safety and
    tp->lock guards access to tp->usesw and tp->counted, the broader lock
    down_write(&block->cb_lock) is no longer required in tc_new_tfilter(),
    and this resolves a performance regression caused by the filter counting
    mechanism during parallel filter insertions.

      The improvement can be demonstrated using the following script:

      # cat insert_tc_rules.sh

        tc qdisc add dev ens1f0np0 ingress
        for i in $(seq 16); do
            taskset -c $i tc -b rules_$i.txt &
        done
        wait

      Each of rules_$i.txt files above includes 100000 tc filter rules to a
      mlx5 driver NIC ens1f0np0.

      Without this patch:

      # time sh insert_tc_rules.sh

        real    0m50.780s
        user    0m23.556s
        sys     4m13.032s

      With this patch:

      # time sh insert_tc_rules.sh

        real    0m17.718s
        user    0m7.807s
        sys     3m45.050s

    Fixes: 047f340b36fc ("net: sched: make skip_sw actually skip software")
    Reported-by: Shuang Li <shuali@redhat.com>
    Signed-off-by: Xin Long <lucien.xin@gmail.com>
    Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
    Reviewed-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
    Tested-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Xin Long <lxin@redhat.com>
2025-01-20 12:01:29 -05:00
Rado Vrbovsky 4da7c39b53 Merge: io_uring: Update to upstream v6.10 + fixes 2025-01-13 18:58:47 +00:00
Rado Vrbovsky 65ee7b65eb Merge: net: visibility patches for 9.6
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5833

JIRA: https://issues.redhat.com/browse/RHEL-68063

Signed-off-by: Antoine Tenart <atenart@redhat.com>

Approved-by: Guillaume Nault <gnault@redhat.com>
Approved-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2025-01-06 08:26:06 +00:00
Petr Oros 33f401223a net: write once on dev->allmulti and dev->promiscuity
JIRA: https://issues.redhat.com/browse/RHEL-57756

Upstream commit(s):
commit 55a2c86c8db3d7aa2c1967efd37ed47d5ae37f43
Author: Eric Dumazet <edumazet@google.com>
Date:   Fri May 3 19:20:55 2024 +0000

    net: write once on dev->allmulti and dev->promiscuity

    In the following patch we want to read dev->allmulti
    and dev->promiscuity locklessly from rtnl_fill_ifinfo()

    In this patch I change __dev_set_promiscuity() and
    __dev_set_allmulti() to write these fields (and dev->flags)
    only if they succeed, with WRITE_ONCE() annotations.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-12-10 10:37:54 +01:00
Petr Oros be265749aa net/core: print message for allmulticast
JIRA: https://issues.redhat.com/browse/RHEL-57756

Upstream commit(s):
commit 802dcbd6f30feaa7c96a1fb4ecb1db57082df9d7
Author: Jesse Brandeburg <jesse.brandeburg@intel.com>
Date:   Tue Feb 14 13:01:16 2023 -0800

    net/core: print message for allmulticast

    When the user sets or clears the IFF_ALLMULTI flag in the netdev, there are
    no log messages printed to the kernel log to indicate anything happened.
    This is inexplicably different from most other dev->flags changes, and
    could suprise the user.

    Typically this occurs from user-space when a user:
    ip link set eth0 allmulticast <on|off>

    However, other devices like bridge set allmulticast as well, and many
    other flows might trigger entry into allmulticast as well.

    The new message uses the standard netdev_info print and looks like:
    [  413.246110] ixgbe 0000:17:00.0 eth0: entered allmulticast mode
    [  415.977184] ixgbe 0000:17:00.0 eth0: left allmulticast mode

    Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-12-10 10:37:54 +01:00
Petr Oros 602dfd4ed4 net/core: refactor promiscuous mode message
JIRA: https://issues.redhat.com/browse/RHEL-57756

Upstream commit(s):
commit 3ba0bf47edf955d6f52fdb16b54acd1932cb9445
Author: Jesse Brandeburg <jesse.brandeburg@intel.com>
Date:   Tue Feb 14 13:01:17 2023 -0800

    net/core: refactor promiscuous mode message

    The kernel stack can be more consistent by printing the IFF_PROMISC
    aka promiscuous enable/disable messages with the standard netdev_info
    message which can include bus and driver info as well as the device.

    typical command usage from user space looks like:
    ip link set eth0 promisc <on|off>

    But lots of utilities such as bridge, tcpdump, etc put the interface into
    promiscuous mode.

    old message:
    [  406.034418] device eth0 entered promiscuous mode
    [  408.424703] device eth0 left promiscuous mode

    new message:
    [  406.034431] ice 0000:17:00.0 eth0: entered promiscuous mode
    [  408.424715] ice 0000:17:00.0 eth0: left promiscuous mode

    Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-12-10 10:37:54 +01:00
Petr Oros 96cea08865 net-core: use netdev_* calls for kernel messages
JIRA: https://issues.redhat.com/browse/RHEL-57756

Upstream commit(s):
commit 5b92be649605504e1019a1ad0c95b0d74a4e2be1
Author: Jesse Brandeburg <jesse.brandeburg@intel.com>
Date:   Tue Oct 19 09:42:28 2021 -0700

    net-core: use netdev_* calls for kernel messages

    While loading a driver and changing the number of queues, I noticed this
    message in the kernel log:

    "[253489.070080] Number of in use tx queues changed invalidating tc
    mappings. Priority traffic classification disabled!"

    But I had no idea what interface was being talked about because this
    message used pr_warn().

    After investigating, it appears we can use the netdev_* helpers already
    defined to create predictably formatted messages, and that already handle
    <unknown netdev> cases, in more of the messages in dev.c.

    After this change, this message (and others) will look like this:
    "[  170.181093] ice 0000:3b:00.0 ens785f0: Number of in use tx queues
    changed invalidating tc mappings. Priority traffic classification
    disabled!"

    One goal here was not to change the message significantly from the
    original format so as to not break user's expectations, so I just
    changed messages that used pr_* and generally started with %s ==
    dev->name.

    Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-12-10 10:37:54 +01:00
Petr Oros 861345bba2 rtnetlink: do not depend on RTNL in rtnl_fill_proto_down()
JIRA: https://issues.redhat.com/browse/RHEL-57756

Upstream commit(s):
commit 6890ab31d1a35444741e6150db19d64797db2919
Author: Eric Dumazet <edumazet@google.com>
Date:   Fri May 3 19:20:57 2024 +0000

    rtnetlink: do not depend on RTNL in rtnl_fill_proto_down()

    Change dev_change_proto_down() and dev_change_proto_down_reason()
    to write once on dev->proto_down and dev->proto_down_reason.

    Then rtnl_fill_proto_down() can use READ_ONCE() annotations
    and run locklessly.

    rtnl_proto_down_size() should assume worst case,
    because readng dev->proto_down_reason multiple
    times would be racy without RTNL in the future.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-12-10 10:37:54 +01:00
Petr Oros ce54afa357 rtnetlink: do not depend on RTNL for IFLA_TXQLEN output
JIRA: https://issues.redhat.com/browse/RHEL-57756

Upstream commit(s):
commit ad13b5b0d1f9eb8e048394919e6393e520b14552
Author: Eric Dumazet <edumazet@google.com>
Date:   Fri May 3 19:20:54 2024 +0000

    rtnetlink: do not depend on RTNL for IFLA_TXQLEN output

    rtnl_fill_ifinfo() can read dev->tx_queue_len locklessly,
    granted we add corresponding READ_ONCE()/WRITE_ONCE() annotations.

    Add missing READ_ONCE(dev->tx_queue_len) in teql_enqueue()

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-12-10 10:37:54 +01:00
Rado Vrbovsky 2a510f17cf Merge: CNB96: arp: Random clean up and RCU conversion for ioctl(SIOCGARP)
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5851

JIRA: https://issues.redhat.com/browse/RHEL-68117

Commits:
```
42033d0cfc86 ("arp: Move ATF_COM setting in arp_req_set().")
0592367424bb ("arp: Validate netmask earlier for SIOCDARP and SIOCSARP in arp_ioctl().")
f8696133f6aa ("arp: Factorise ip_route_output() call in arp_req_set() and arp_req_delete().")
51e9ba48d487 ("arp: Remove a nest in arp_req_get().")
a428bfc77a4d ("arp: Get dev after calling arp_req_(delete|set|get)().")
d0358c1a37db ("net: Remove unused declaration dev_restart()")
0840556e5a3a ("net: Protect dev->name by seqlock.")
bf4ea58874df ("arp: Convert ioctl(SIOCGARP) to RCU.")
62e58ddb1465 ("net: add softirq safety to netdev_rename_lock")
```

Signed-off-by: Ivan Vecera <ivecera@redhat.com>

Approved-by: Petr Oros <poros@redhat.com>
Approved-by: Davide Caratti <dcaratti@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2024-12-09 08:21:19 +00:00
CKI Backport Bot d0e1a6b1a3 net: avoid potential underflow in qdisc_pkt_len_init() with UFO
JIRA: https://issues.redhat.com/browse/RHEL-65404
CVE: CVE-2024-49949

commit c20029db28399ecc50e556964eaba75c43b1e2f1
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Sep 24 15:02:56 2024 +0000

    net: avoid potential underflow in qdisc_pkt_len_init() with UFO

    After commit 7c6d2ecbda ("net: be more gentle about silly gso
    requests coming from user") virtio_net_hdr_to_skb() had sanity check
    to detect malicious attempts from user space to cook a bad GSO packet.

    Then commit cf9acc90c80ec ("net: virtio_net_hdr_to_skb: count
    transport header in UFO") while fixing one issue, allowed user space
    to cook a GSO packet with the following characteristic :

    IPv4 SKB_GSO_UDP, gso_size=3, skb->len = 28.

    When this packet arrives in qdisc_pkt_len_init(), we end up
    with hdr_len = 28 (IPv4 header + UDP header), matching skb->len

    Then the following sets gso_segs to 0 :

    gso_segs = DIV_ROUND_UP(skb->len - hdr_len,
                            shinfo->gso_size);

    Then later we set qdisc_skb_cb(skb)->pkt_len to back to zero :/

    qdisc_skb_cb(skb)->pkt_len += (gso_segs - 1) * hdr_len;

    This leads to the following crash in fq_codel [1]

    qdisc_pkt_len_init() is best effort, we only want an estimation
    of the bytes sent on the wire, not crashing the kernel.

    This patch is fixing this particular issue, a following one
    adds more sanity checks for another potential bug.

    [1]
    [   70.724101] BUG: kernel NULL pointer dereference, address: 0000000000000000
    [   70.724561] #PF: supervisor read access in kernel mode
    [   70.724561] #PF: error_code(0x0000) - not-present page
    [   70.724561] PGD 10ac61067 P4D 10ac61067 PUD 107ee2067 PMD 0
    [   70.724561] Oops: Oops: 0000 [#1] SMP NOPTI
    [   70.724561] CPU: 11 UID: 0 PID: 2163 Comm: b358537762 Not tainted 6.11.0-virtme #991
    [   70.724561] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
    [   70.724561] RIP: 0010:fq_codel_enqueue (net/sched/sch_fq_codel.c:120 net/sched/sch_fq_codel.c:168 net/sched/sch_fq_codel.c:230) sch_fq_codel
    [ 70.724561] Code: 24 08 49 c1 e1 06 44 89 7c 24 18 45 31 ed 45 31 c0 31 ff 89 44 24 14 4c 03 8b 90 01 00 00 eb 04 39 ca 73 37 4d 8b 39 83 c7 01 <49> 8b 17 49 89 11 41 8b 57 28 45 8b 5f 34 49 c7 07 00 00 00 00 49
    All code
    ========
       0:   24 08                   and    $0x8,%al
       2:   49 c1 e1 06             shl    $0x6,%r9
       6:   44 89 7c 24 18          mov    %r15d,0x18(%rsp)
       b:   45 31 ed                xor    %r13d,%r13d
       e:   45 31 c0                xor    %r8d,%r8d
      11:   31 ff                   xor    %edi,%edi
      13:   89 44 24 14             mov    %eax,0x14(%rsp)
      17:   4c 03 8b 90 01 00 00    add    0x190(%rbx),%r9
      1e:   eb 04                   jmp    0x24
      20:   39 ca                   cmp    %ecx,%edx
      22:   73 37                   jae    0x5b
      24:   4d 8b 39                mov    (%r9),%r15
      27:   83 c7 01                add    $0x1,%edi
      2a:*  49 8b 17                mov    (%r15),%rdx              <-- trapping instruction
      2d:   49 89 11                mov    %rdx,(%r9)
      30:   41 8b 57 28             mov    0x28(%r15),%edx
      34:   45 8b 5f 34             mov    0x34(%r15),%r11d
      38:   49 c7 07 00 00 00 00    movq   $0x0,(%r15)
      3f:   49                      rex.WB

    Code starting with the faulting instruction
    ===========================================
       0:   49 8b 17                mov    (%r15),%rdx
       3:   49 89 11                mov    %rdx,(%r9)
       6:   41 8b 57 28             mov    0x28(%r15),%edx
       a:   45 8b 5f 34             mov    0x34(%r15),%r11d
       e:   49 c7 07 00 00 00 00    movq   $0x0,(%r15)
      15:   49                      rex.WB
    [   70.724561] RSP: 0018:ffff95ae85e6fb90 EFLAGS: 00000202
    [   70.724561] RAX: 0000000002000000 RBX: ffff95ae841de000 RCX: 0000000000000000
    [   70.724561] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000001
    [   70.724561] RBP: ffff95ae85e6fbf8 R08: 0000000000000000 R09: ffff95b710a30000
    [   70.724561] R10: 0000000000000000 R11: bdf289445ce31881 R12: ffff95ae85e6fc58
    [   70.724561] R13: 0000000000000000 R14: 0000000000000040 R15: 0000000000000000
    [   70.724561] FS:  000000002c5c1380(0000) GS:ffff95bd7fcc0000(0000) knlGS:0000000000000000
    [   70.724561] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [   70.724561] CR2: 0000000000000000 CR3: 000000010c568000 CR4: 00000000000006f0
    [   70.724561] Call Trace:
    [   70.724561]  <TASK>
    [   70.724561] ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434)
    [   70.724561] ? page_fault_oops (arch/x86/mm/fault.c:715)
    [   70.724561] ? exc_page_fault (./arch/x86/include/asm/irqflags.h:26 ./arch/x86/include/asm/irqflags.h:87 ./arch/x86/include/asm/irqflags.h:147 arch/x86/mm/fault.c:1489 arch/x86/mm/fault.c:1539)
    [   70.724561] ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:623)
    [   70.724561] ? fq_codel_enqueue (net/sched/sch_fq_codel.c:120 net/sched/sch_fq_codel.c:168 net/sched/sch_fq_codel.c:230) sch_fq_codel
    [   70.724561] dev_qdisc_enqueue (net/core/dev.c:3784)
    [   70.724561] __dev_queue_xmit (net/core/dev.c:3880 (discriminator 2) net/core/dev.c:4390 (discriminator 2))
    [   70.724561] ? irqentry_enter (kernel/entry/common.c:237)
    [   70.724561] ? sysvec_apic_timer_interrupt (./arch/x86/include/asm/hardirq.h:74 (discriminator 2) arch/x86/kernel/apic/apic.c:1043 (discriminator 2) arch/x86/kernel/apic/apic.c:1043 (discriminator 2))
    [   70.724561] ? trace_hardirqs_on (kernel/trace/trace_preemptirq.c:58 (discriminator 4))
    [   70.724561] ? asm_sysvec_apic_timer_interrupt (./arch/x86/include/asm/idtentry.h:702)
    [   70.724561] ? virtio_net_hdr_to_skb.constprop.0 (./include/linux/virtio_net.h:129 (discriminator 1))
    [   70.724561] packet_sendmsg (net/packet/af_packet.c:3145 (discriminator 1) net/packet/af_packet.c:3177 (discriminator 1))
    [   70.724561] ? _raw_spin_lock_bh (./arch/x86/include/asm/atomic.h:107 (discriminator 4) ./include/linux/atomic/atomic-arch-fallback.h:2170 (discriminator 4) ./include/linux/atomic/atomic-instrumented.h:1302 (discriminator 4) ./include/asm-generic/qspinlock.h:111 (discriminator 4) ./include/linux/spinlock.h:187 (discriminator 4) ./include/linux/spinlock_api_smp.h:127 (discriminator 4) kernel/locking/spinlock.c:178 (discriminator 4))
    [   70.724561] ? netdev_name_node_lookup_rcu (net/core/dev.c:325 (discriminator 1))
    [   70.724561] __sys_sendto (net/socket.c:730 (discriminator 1) net/socket.c:745 (discriminator 1) net/socket.c:2210 (discriminator 1))
    [   70.724561] ? __sys_setsockopt (./include/linux/file.h:34 net/socket.c:2355)
    [   70.724561] __x64_sys_sendto (net/socket.c:2222 (discriminator 1) net/socket.c:2218 (discriminator 1) net/socket.c:2218 (discriminator 1))
    [   70.724561] do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1))
    [   70.724561] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
    [   70.724561] RIP: 0033:0x41ae09

    Fixes: cf9acc90c80ec ("net: virtio_net_hdr_to_skb: count transport header in UFO")
    Reported-by: syzbot <syzkaller@googlegroups.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: Jonathan Davies <jonathan.davies@nutanix.com>
    Reviewed-by: Willem de Bruijn <willemb@google.com>
    Reviewed-by: Jonathan Davies <jonathan.davies@nutanix.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>
2024-12-03 13:47:31 +00:00
Ivan Vecera 1094177821 net: add softirq safety to netdev_rename_lock
JIRA: https://issues.redhat.com/browse/RHEL-68117

commit 62e58ddb146502faff1dd23164a20688624eaaed
Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Jun 20 13:31:19 2024 +0000

    net: add softirq safety to netdev_rename_lock

    syzbot reported a lockdep violation involving bridge driver [1]

    Make sure netdev_rename_lock is softirq safe to fix this issue.

    [1]
    WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
    6.10.0-rc2-syzkaller-00249-gbe27b8965297 #0 Not tainted
       -----------------------------------------------------
    syz-executor.2/9449 [HC0[0]:SC0[2]:HE0:SE0] is trying to acquire:
     ffffffff8f5de668 (netdev_rename_lock.seqcount){+.+.}-{0:0}, at: rtnl_fill_ifinfo+0x38e/0x2270 net/core/rtnetlink.c:1839

    and this task is already holding:
     ffff888060c64cb8 (&br->lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline]
     ffff888060c64cb8 (&br->lock){+.-.}-{2:2}, at: br_port_slave_changelink+0x3d/0x150 net/bridge/br_netlink.c:1212
    which would create a new lock dependency:
     (&br->lock){+.-.}-{2:2} -> (netdev_rename_lock.seqcount){+.+.}-{0:0}

    but this new dependency connects a SOFTIRQ-irq-safe lock:
     (&br->lock){+.-.}-{2:2}

    ... which became SOFTIRQ-irq-safe at:
       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
       __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
       _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
       spin_lock include/linux/spinlock.h:351 [inline]
       br_forward_delay_timer_expired+0x50/0x440 net/bridge/br_stp_timer.c:86
       call_timer_fn+0x18e/0x650 kernel/time/timer.c:1792
       expire_timers kernel/time/timer.c:1843 [inline]
       __run_timers kernel/time/timer.c:2417 [inline]
       __run_timer_base+0x66a/0x8e0 kernel/time/timer.c:2428
       run_timer_base kernel/time/timer.c:2437 [inline]
       run_timer_softirq+0xb7/0x170 kernel/time/timer.c:2447
       handle_softirqs+0x2c4/0x970 kernel/softirq.c:554
       __do_softirq kernel/softirq.c:588 [inline]
       invoke_softirq kernel/softirq.c:428 [inline]
       __irq_exit_rcu+0xf4/0x1c0 kernel/softirq.c:637
       irq_exit_rcu+0x9/0x30 kernel/softirq.c:649
       instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline]
       sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043
       asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
       lock_acquire+0x264/0x550 kernel/locking/lockdep.c:5758
       fs_reclaim_acquire+0xaf/0x140 mm/page_alloc.c:3800
       might_alloc include/linux/sched/mm.h:334 [inline]
       slab_pre_alloc_hook mm/slub.c:3890 [inline]
       slab_alloc_node mm/slub.c:3980 [inline]
       kmalloc_trace_noprof+0x3d/0x2c0 mm/slub.c:4147
       kmalloc_noprof include/linux/slab.h:660 [inline]
       kzalloc_noprof include/linux/slab.h:778 [inline]
       class_dir_create_and_add drivers/base/core.c:3255 [inline]
       get_device_parent+0x2a7/0x410 drivers/base/core.c:3315
       device_add+0x325/0xbf0 drivers/base/core.c:3645
       netdev_register_kobject+0x17e/0x320 net/core/net-sysfs.c:2136
       register_netdevice+0x11d5/0x19e0 net/core/dev.c:10375
       nsim_init_netdevsim drivers/net/netdevsim/netdev.c:690 [inline]
       nsim_create+0x647/0x890 drivers/net/netdevsim/netdev.c:750
       __nsim_dev_port_add+0x6c0/0xae0 drivers/net/netdevsim/dev.c:1390
       nsim_dev_port_add_all drivers/net/netdevsim/dev.c:1446 [inline]
       nsim_dev_reload_create drivers/net/netdevsim/dev.c:1498 [inline]
       nsim_dev_reload_up+0x69b/0x8e0 drivers/net/netdevsim/dev.c:985
       devlink_reload+0x478/0x870 net/devlink/dev.c:474
       devlink_nl_reload_doit+0xbd6/0xe50 net/devlink/dev.c:586
       genl_family_rcv_msg_doit net/netlink/genetlink.c:1115 [inline]
       genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
       genl_rcv_msg+0xb14/0xec0 net/netlink/genetlink.c:1210
       netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2564
       genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
       netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
       netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
       netlink_sendmsg+0x8db/0xcb0 net/netlink/af_netlink.c:1905
       sock_sendmsg_nosec net/socket.c:730 [inline]
       __sock_sendmsg+0x221/0x270 net/socket.c:745
       ____sys_sendmsg+0x525/0x7d0 net/socket.c:2585
       ___sys_sendmsg net/socket.c:2639 [inline]
       __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2668
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
      entry_SYSCALL_64_after_hwframe+0x77/0x7f

    to a SOFTIRQ-irq-unsafe lock:
     (netdev_rename_lock.seqcount){+.+.}-{0:0}

    ... which became SOFTIRQ-irq-unsafe at:
    ...
       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
       do_write_seqcount_begin_nested include/linux/seqlock.h:469 [inline]
       do_write_seqcount_begin include/linux/seqlock.h:495 [inline]
       write_seqlock include/linux/seqlock.h:823 [inline]
       dev_change_name+0x184/0x920 net/core/dev.c:1229
       do_setlink+0xa4b/0x41f0 net/core/rtnetlink.c:2880
       __rtnl_newlink net/core/rtnetlink.c:3696 [inline]
       rtnl_newlink+0x180b/0x20a0 net/core/rtnetlink.c:3743
       rtnetlink_rcv_msg+0x89b/0x1180 net/core/rtnetlink.c:6635
       netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2564
       netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
       netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
       netlink_sendmsg+0x8db/0xcb0 net/netlink/af_netlink.c:1905
       sock_sendmsg_nosec net/socket.c:730 [inline]
       __sock_sendmsg+0x221/0x270 net/socket.c:745
       __sys_sendto+0x3a4/0x4f0 net/socket.c:2192
       __do_sys_sendto net/socket.c:2204 [inline]
       __se_sys_sendto net/socket.c:2200 [inline]
       __x64_sys_sendto+0xde/0x100 net/socket.c:2200
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
      entry_SYSCALL_64_after_hwframe+0x77/0x7f

    other info that might help us debug this:

     Possible interrupt unsafe locking scenario:

           CPU0                    CPU1
           ----                    ----
      lock(netdev_rename_lock.seqcount);
                                   local_irq_disable();
                                   lock(&br->lock);
                                   lock(netdev_rename_lock.seqcount);
      <Interrupt>
        lock(&br->lock);

     *** DEADLOCK ***

    3 locks held by syz-executor.2/9449:
      #0: ffffffff8f5e7448 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline]
      #0: ffffffff8f5e7448 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x842/0x1180 net/core/rtnetlink.c:6632
      #1: ffff888060c64cb8 (&br->lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline]
      #1: ffff888060c64cb8 (&br->lock){+.-.}-{2:2}, at: br_port_slave_changelink+0x3d/0x150 net/bridge/br_netlink.c:1212
      #2: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:329 [inline]
      #2: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:781 [inline]
      #2: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: team_change_rx_flags+0x29/0x330 drivers/net/team/team_core.c:1767

    the dependencies between SOFTIRQ-irq-safe lock and the holding lock:
    -> (&br->lock){+.-.}-{2:2} {
       HARDIRQ-ON-W at:
                         lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
                         __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
                         _raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
                         spin_lock_bh include/linux/spinlock.h:356 [inline]
                         br_add_if+0xb34/0xef0 net/bridge/br_if.c:682
                         do_set_master net/core/rtnetlink.c:2701 [inline]
                         do_setlink+0xe70/0x41f0 net/core/rtnetlink.c:2907
                         __rtnl_newlink net/core/rtnetlink.c:3696 [inline]
                         rtnl_newlink+0x180b/0x20a0 net/core/rtnetlink.c:3743
                         rtnetlink_rcv_msg+0x89b/0x1180 net/core/rtnetlink.c:6635
                         netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2564
                         netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
                         netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
                         netlink_sendmsg+0x8db/0xcb0 net/netlink/af_netlink.c:1905
                         sock_sendmsg_nosec net/socket.c:730 [inline]
                         __sock_sendmsg+0x221/0x270 net/socket.c:745
                         __sys_sendto+0x3a4/0x4f0 net/socket.c:2192
                         __do_sys_sendto net/socket.c:2204 [inline]
                         __se_sys_sendto net/socket.c:2200 [inline]
                         __x64_sys_sendto+0xde/0x100 net/socket.c:2200
                         do_syscall_x64 arch/x86/entry/common.c:52 [inline]
                         do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
                        entry_SYSCALL_64_after_hwframe+0x77/0x7f
       IN-SOFTIRQ-W at:
                         lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
                         __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
                         _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
                         spin_lock include/linux/spinlock.h:351 [inline]
                         br_forward_delay_timer_expired+0x50/0x440 net/bridge/br_stp_timer.c:86
                         call_timer_fn+0x18e/0x650 kernel/time/timer.c:1792
                         expire_timers kernel/time/timer.c:1843 [inline]
                         __run_timers kernel/time/timer.c:2417 [inline]
                         __run_timer_base+0x66a/0x8e0 kernel/time/timer.c:2428
                         run_timer_base kernel/time/timer.c:2437 [inline]
                         run_timer_softirq+0xb7/0x170 kernel/time/timer.c:2447
                         handle_softirqs+0x2c4/0x970 kernel/softirq.c:554
                         __do_softirq kernel/softirq.c:588 [inline]
                         invoke_softirq kernel/softirq.c:428 [inline]
                         __irq_exit_rcu+0xf4/0x1c0 kernel/softirq.c:637
                         irq_exit_rcu+0x9/0x30 kernel/softirq.c:649
                         instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline]
                         sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043
                         asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
                         lock_acquire+0x264/0x550 kernel/locking/lockdep.c:5758
                         fs_reclaim_acquire+0xaf/0x140 mm/page_alloc.c:3800
                         might_alloc include/linux/sched/mm.h:334 [inline]
                         slab_pre_alloc_hook mm/slub.c:3890 [inline]
                         slab_alloc_node mm/slub.c:3980 [inline]
                         kmalloc_trace_noprof+0x3d/0x2c0 mm/slub.c:4147
                         kmalloc_noprof include/linux/slab.h:660 [inline]
                         kzalloc_noprof include/linux/slab.h:778 [inline]
                         class_dir_create_and_add drivers/base/core.c:3255 [inline]
                         get_device_parent+0x2a7/0x410 drivers/base/core.c:3315
                         device_add+0x325/0xbf0 drivers/base/core.c:3645
                         netdev_register_kobject+0x17e/0x320 net/core/net-sysfs.c:2136
                         register_netdevice+0x11d5/0x19e0 net/core/dev.c:10375
                         nsim_init_netdevsim drivers/net/netdevsim/netdev.c:690 [inline]
                         nsim_create+0x647/0x890 drivers/net/netdevsim/netdev.c:750
                         __nsim_dev_port_add+0x6c0/0xae0 drivers/net/netdevsim/dev.c:1390
                         nsim_dev_port_add_all drivers/net/netdevsim/dev.c:1446 [inline]
                         nsim_dev_reload_create drivers/net/netdevsim/dev.c:1498 [inline]
                         nsim_dev_reload_up+0x69b/0x8e0 drivers/net/netdevsim/dev.c:985
                         devlink_reload+0x478/0x870 net/devlink/dev.c:474
                         devlink_nl_reload_doit+0xbd6/0xe50 net/devlink/dev.c:586
                         genl_family_rcv_msg_doit net/netlink/genetlink.c:1115 [inline]
                         genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
                         genl_rcv_msg+0xb14/0xec0 net/netlink/genetlink.c:1210
                         netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2564
                         genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
                         netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
                         netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
                         netlink_sendmsg+0x8db/0xcb0 net/netlink/af_netlink.c:1905
                         sock_sendmsg_nosec net/socket.c:730 [inline]
                         __sock_sendmsg+0x221/0x270 net/socket.c:745
                         ____sys_sendmsg+0x525/0x7d0 net/socket.c:2585
                         ___sys_sendmsg net/socket.c:2639 [inline]
                         __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2668
                         do_syscall_x64 arch/x86/entry/common.c:52 [inline]
                         do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
                        entry_SYSCALL_64_after_hwframe+0x77/0x7f
       INITIAL USE at:
                        lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
                        __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
                        _raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
                        spin_lock_bh include/linux/spinlock.h:356 [inline]
                        br_add_if+0xb34/0xef0 net/bridge/br_if.c:682
                        do_set_master net/core/rtnetlink.c:2701 [inline]
                        do_setlink+0xe70/0x41f0 net/core/rtnetlink.c:2907
                        __rtnl_newlink net/core/rtnetlink.c:3696 [inline]
                        rtnl_newlink+0x180b/0x20a0 net/core/rtnetlink.c:3743
                        rtnetlink_rcv_msg+0x89b/0x1180 net/core/rtnetlink.c:6635
                        netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2564
                        netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
                        netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
                        netlink_sendmsg+0x8db/0xcb0 net/netlink/af_netlink.c:1905
                        sock_sendmsg_nosec net/socket.c:730 [inline]
                        __sock_sendmsg+0x221/0x270 net/socket.c:745
                        __sys_sendto+0x3a4/0x4f0 net/socket.c:2192
                        __do_sys_sendto net/socket.c:2204 [inline]
                        __se_sys_sendto net/socket.c:2200 [inline]
                        __x64_sys_sendto+0xde/0x100 net/socket.c:2200
                        do_syscall_x64 arch/x86/entry/common.c:52 [inline]
                        do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
                       entry_SYSCALL_64_after_hwframe+0x77/0x7f
     }
     ... key      at: [<ffffffff94b9a1a0>] br_dev_setup.__key+0x0/0x20

    the dependencies between the lock to be acquired
     and SOFTIRQ-irq-unsafe lock:
    -> (netdev_rename_lock.seqcount){+.+.}-{0:0} {
       HARDIRQ-ON-W at:
                         lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
                         do_write_seqcount_begin_nested include/linux/seqlock.h:469 [inline]
                         do_write_seqcount_begin include/linux/seqlock.h:495 [inline]
                         write_seqlock include/linux/seqlock.h:823 [inline]
                         dev_change_name+0x184/0x920 net/core/dev.c:1229
                         do_setlink+0xa4b/0x41f0 net/core/rtnetlink.c:2880
                         __rtnl_newlink net/core/rtnetlink.c:3696 [inline]
                         rtnl_newlink+0x180b/0x20a0 net/core/rtnetlink.c:3743
                         rtnetlink_rcv_msg+0x89b/0x1180 net/core/rtnetlink.c:6635
                         netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2564
                         netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
                         netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
                         netlink_sendmsg+0x8db/0xcb0 net/netlink/af_netlink.c:1905
                         sock_sendmsg_nosec net/socket.c:730 [inline]
                         __sock_sendmsg+0x221/0x270 net/socket.c:745
                         __sys_sendto+0x3a4/0x4f0 net/socket.c:2192
                         __do_sys_sendto net/socket.c:2204 [inline]
                         __se_sys_sendto net/socket.c:2200 [inline]
                         __x64_sys_sendto+0xde/0x100 net/socket.c:2200
                         do_syscall_x64 arch/x86/entry/common.c:52 [inline]
                         do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
                        entry_SYSCALL_64_after_hwframe+0x77/0x7f
       SOFTIRQ-ON-W at:
                         lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
                         do_write_seqcount_begin_nested include/linux/seqlock.h:469 [inline]
                         do_write_seqcount_begin include/linux/seqlock.h:495 [inline]
                         write_seqlock include/linux/seqlock.h:823 [inline]
                         dev_change_name+0x184/0x920 net/core/dev.c:1229
                         do_setlink+0xa4b/0x41f0 net/core/rtnetlink.c:2880
                         __rtnl_newlink net/core/rtnetlink.c:3696 [inline]
                         rtnl_newlink+0x180b/0x20a0 net/core/rtnetlink.c:3743
                         rtnetlink_rcv_msg+0x89b/0x1180 net/core/rtnetlink.c:6635
                         netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2564
                         netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
                         netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
                         netlink_sendmsg+0x8db/0xcb0 net/netlink/af_netlink.c:1905
                         sock_sendmsg_nosec net/socket.c:730 [inline]
                         __sock_sendmsg+0x221/0x270 net/socket.c:745
                         __sys_sendto+0x3a4/0x4f0 net/socket.c:2192
                         __do_sys_sendto net/socket.c:2204 [inline]
                         __se_sys_sendto net/socket.c:2200 [inline]
                         __x64_sys_sendto+0xde/0x100 net/socket.c:2200
                         do_syscall_x64 arch/x86/entry/common.c:52 [inline]
                         do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
                        entry_SYSCALL_64_after_hwframe+0x77/0x7f
       INITIAL USE at:
                        lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
                        do_write_seqcount_begin_nested include/linux/seqlock.h:469 [inline]
                        do_write_seqcount_begin include/linux/seqlock.h:495 [inline]
                        write_seqlock include/linux/seqlock.h:823 [inline]
                        dev_change_name+0x184/0x920 net/core/dev.c:1229
                        do_setlink+0xa4b/0x41f0 net/core/rtnetlink.c:2880
                        __rtnl_newlink net/core/rtnetlink.c:3696 [inline]
                        rtnl_newlink+0x180b/0x20a0 net/core/rtnetlink.c:3743
                        rtnetlink_rcv_msg+0x89b/0x1180 net/core/rtnetlink.c:6635
                        netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2564
                        netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
                        netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
                        netlink_sendmsg+0x8db/0xcb0 net/netlink/af_netlink.c:1905
                        sock_sendmsg_nosec net/socket.c:730 [inline]
                        __sock_sendmsg+0x221/0x270 net/socket.c:745
                        __sys_sendto+0x3a4/0x4f0 net/socket.c:2192
                        __do_sys_sendto net/socket.c:2204 [inline]
                        __se_sys_sendto net/socket.c:2200 [inline]
                        __x64_sys_sendto+0xde/0x100 net/socket.c:2200
                        do_syscall_x64 arch/x86/entry/common.c:52 [inline]
                        do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
                       entry_SYSCALL_64_after_hwframe+0x77/0x7f
       INITIAL READ USE at:
                             lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
                             seqcount_lockdep_reader_access include/linux/seqlock.h:72 [inline]
                             read_seqbegin include/linux/seqlock.h:772 [inline]
                             netdev_copy_name+0x168/0x2c0 net/core/dev.c:949
                             rtnl_fill_ifinfo+0x38e/0x2270 net/core/rtnetlink.c:1839
                             rtmsg_ifinfo_build_skb+0x18a/0x260 net/core/rtnetlink.c:4073
                             rtmsg_ifinfo_event net/core/rtnetlink.c:4107 [inline]
                             rtmsg_ifinfo+0x91/0x1b0 net/core/rtnetlink.c:4116
                             register_netdevice+0x1665/0x19e0 net/core/dev.c:10422
                             register_netdev+0x3b/0x50 net/core/dev.c:10512
                             loopback_net_init+0x73/0x150 drivers/net/loopback.c:217
                             ops_init+0x359/0x610 net/core/net_namespace.c:139
                             __register_pernet_operations net/core/net_namespace.c:1247 [inline]
                             register_pernet_operations+0x2cb/0x660 net/core/net_namespace.c:1320
                             register_pernet_device+0x33/0x80 net/core/net_namespace.c:1407
                             net_dev_init+0xfcd/0x10d0 net/core/dev.c:11956
                             do_one_initcall+0x248/0x880 init/main.c:1267
                             do_initcall_level+0x157/0x210 init/main.c:1329
                             do_initcalls+0x3f/0x80 init/main.c:1345
                             kernel_init_freeable+0x435/0x5d0 init/main.c:1578
                             kernel_init+0x1d/0x2b0 init/main.c:1467
                             ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
                             ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
     }
     ... key      at: [<ffffffff8f5de668>] netdev_rename_lock+0x8/0xa0
     ... acquired at:
        lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
        seqcount_lockdep_reader_access include/linux/seqlock.h:72 [inline]
        read_seqbegin include/linux/seqlock.h:772 [inline]
        netdev_copy_name+0x168/0x2c0 net/core/dev.c:949
        rtnl_fill_ifinfo+0x38e/0x2270 net/core/rtnetlink.c:1839
        rtmsg_ifinfo_build_skb+0x18a/0x260 net/core/rtnetlink.c:4073
        rtmsg_ifinfo_event net/core/rtnetlink.c:4107 [inline]
        rtmsg_ifinfo+0x91/0x1b0 net/core/rtnetlink.c:4116
        __dev_notify_flags+0xf7/0x400 net/core/dev.c:8816
        __dev_set_promiscuity+0x152/0x5a0 net/core/dev.c:8588
        dev_set_promiscuity+0x51/0xe0 net/core/dev.c:8608
        team_change_rx_flags+0x203/0x330 drivers/net/team/team_core.c:1771
        dev_change_rx_flags net/core/dev.c:8541 [inline]
        __dev_set_promiscuity+0x406/0x5a0 net/core/dev.c:8585
        dev_set_promiscuity+0x51/0xe0 net/core/dev.c:8608
        br_port_clear_promisc net/bridge/br_if.c:135 [inline]
        br_manage_promisc+0x505/0x590 net/bridge/br_if.c:172
        nbp_update_port_count net/bridge/br_if.c:242 [inline]
        br_port_flags_change+0x161/0x1f0 net/bridge/br_if.c:761
        br_setport+0xcb5/0x16d0 net/bridge/br_netlink.c:1000
        br_port_slave_changelink+0x135/0x150 net/bridge/br_netlink.c:1213
        __rtnl_newlink net/core/rtnetlink.c:3689 [inline]
        rtnl_newlink+0x169f/0x20a0 net/core/rtnetlink.c:3743
        rtnetlink_rcv_msg+0x89b/0x1180 net/core/rtnetlink.c:6635
        netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2564
        netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
        netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
        netlink_sendmsg+0x8db/0xcb0 net/netlink/af_netlink.c:1905
        sock_sendmsg_nosec net/socket.c:730 [inline]
        __sock_sendmsg+0x221/0x270 net/socket.c:745
        ____sys_sendmsg+0x525/0x7d0 net/socket.c:2585
        ___sys_sendmsg net/socket.c:2639 [inline]
        __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2668
        do_syscall_x64 arch/x86/entry/common.c:52 [inline]
        do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

    stack backtrace:
    CPU: 0 PID: 9449 Comm: syz-executor.2 Not tainted 6.10.0-rc2-syzkaller-00249-gbe27b8965297 #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
    Call Trace:
     <TASK>
      __dump_stack lib/dump_stack.c:88 [inline]
      dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
      print_bad_irq_dependency kernel/locking/lockdep.c:2626 [inline]
      check_irq_usage kernel/locking/lockdep.c:2865 [inline]
      check_prev_add kernel/locking/lockdep.c:3138 [inline]
      check_prevs_add kernel/locking/lockdep.c:3253 [inline]
      validate_chain+0x4de0/0x5900 kernel/locking/lockdep.c:3869
      __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
      lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
      seqcount_lockdep_reader_access include/linux/seqlock.h:72 [inline]
      read_seqbegin include/linux/seqlock.h:772 [inline]
      netdev_copy_name+0x168/0x2c0 net/core/dev.c:949
      rtnl_fill_ifinfo+0x38e/0x2270 net/core/rtnetlink.c:1839
      rtmsg_ifinfo_build_skb+0x18a/0x260 net/core/rtnetlink.c:4073
      rtmsg_ifinfo_event net/core/rtnetlink.c:4107 [inline]
      rtmsg_ifinfo+0x91/0x1b0 net/core/rtnetlink.c:4116
      __dev_notify_flags+0xf7/0x400 net/core/dev.c:8816
      __dev_set_promiscuity+0x152/0x5a0 net/core/dev.c:8588
      dev_set_promiscuity+0x51/0xe0 net/core/dev.c:8608
      team_change_rx_flags+0x203/0x330 drivers/net/team/team_core.c:1771
      dev_change_rx_flags net/core/dev.c:8541 [inline]
      __dev_set_promiscuity+0x406/0x5a0 net/core/dev.c:8585
      dev_set_promiscuity+0x51/0xe0 net/core/dev.c:8608
      br_port_clear_promisc net/bridge/br_if.c:135 [inline]
      br_manage_promisc+0x505/0x590 net/bridge/br_if.c:172
      nbp_update_port_count net/bridge/br_if.c:242 [inline]
      br_port_flags_change+0x161/0x1f0 net/bridge/br_if.c:761
      br_setport+0xcb5/0x16d0 net/bridge/br_netlink.c:1000
      br_port_slave_changelink+0x135/0x150 net/bridge/br_netlink.c:1213
      __rtnl_newlink net/core/rtnetlink.c:3689 [inline]
      rtnl_newlink+0x169f/0x20a0 net/core/rtnetlink.c:3743
      rtnetlink_rcv_msg+0x89b/0x1180 net/core/rtnetlink.c:6635
      netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2564
      netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
      netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
      netlink_sendmsg+0x8db/0xcb0 net/netlink/af_netlink.c:1905
      sock_sendmsg_nosec net/socket.c:730 [inline]
      __sock_sendmsg+0x221/0x270 net/socket.c:745
      ____sys_sendmsg+0x525/0x7d0 net/socket.c:2585
      ___sys_sendmsg net/socket.c:2639 [inline]
      __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2668
      do_syscall_x64 arch/x86/entry/common.c:52 [inline]
      do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
     entry_SYSCALL_64_after_hwframe+0x77/0x7f
    RIP: 0033:0x7f3b3047cf29
    Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
    RSP: 002b:00007f3b311740c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 00007f3b305b4050 RCX: 00007f3b3047cf29
    RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000008
    RBP: 00007f3b304ec074 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
    R13: 000000000000006e R14: 00007f3b305b4050 R15: 00007ffca2f3dc68
     </TASK>

    Fixes: 0840556e5a3a ("net: Protect dev->name by seqlock.")
    Reported-by: syzbot <syzkaller@googlegroups.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
    Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-11-29 10:54:26 +01:00
Ivan Vecera cf614e3a7e net: Protect dev->name by seqlock.
JIRA: https://issues.redhat.com/browse/RHEL-68117

Conflicts:
- context due to missing dd891b5b106fa ("net: do not send a MOVE event
  when netdev changes netns")

commit 0840556e5a3a331b6932ef17dd4bc94445df3297
Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Mon Apr 29 18:58:12 2024 -0700

    net: Protect dev->name by seqlock.

    We will convert ioctl(SIOCGARP) to RCU, and then we need to copy
    dev->name which is currently protected by rtnl_lock().

    This patch does the following:

      1) Add seqlock netdev_rename_lock to protect dev->name

      2) Add netdev_copy_name() that copies dev->name to buffer
         under netdev_rename_lock

      3) Use netdev_copy_name() in netdev_get_name() and drop
         devnet_rename_sem

    Suggested-by: Eric Dumazet <edumazet@google.com>
    Link: https://lore.kernel.org/netdev/CANn89iJEWs7AYSJqGCUABeVqOCTkErponfZdT5kV-iD=-SajnQ@mail.gmail.com/
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Link: https://lore.kernel.org/r/20240430015813.71143-7-kuniyu@amazon.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-11-29 10:54:26 +01:00
Jeff Moyer aa837411bc net: add napi_busy_loop_rcu()
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit b4e8ae5c8c41355791a99fdf2fcac16deace1e79
Author: Stefan Roesch <shr@devkernel.io>
Date:   Tue Feb 6 09:30:04 2024 -0700

    net: add napi_busy_loop_rcu()
    
    This adds the napi_busy_loop_rcu() function. This function assumes that
    the calling function is already holding the rcu read lock and
    napi_busy_loop() does not need to take the rcu read lock. Add a
    NAPI_F_NO_SCHED flag, which tells __napi_busy_loop() to abort if we
    need to reschedule rather than drop the RCU read lock and reschedule.
    
    Signed-off-by: Stefan Roesch <shr@devkernel.io>
    Link: https://lore.kernel.org/r/20230608163839.2891748-3-shr@devkernel.io
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 15:39:44 -05:00
Jeff Moyer f82d532fde net: split off __napi_busy_poll from napi_busy_poll
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit 13d381b440ed84ec4cc92975de035efb1a9e5f7e
Author: Stefan Roesch <shr@devkernel.io>
Date:   Tue Feb 6 09:30:03 2024 -0700

    net: split off __napi_busy_poll from napi_busy_poll
    
    This splits off the key part of the napi_busy_poll function into its own
    function, __napi_busy_poll, and changes the prefer_busy_poll bool to be
    flag based to allow passing in more flags in the future.
    
    This is done in preparation for an additional napi_busy_poll() function,
    that doesn't take the rcu_read_lock(). The new function is introduced
    in the next patch.
    
    Signed-off-by: Stefan Roesch <shr@devkernel.io>
    Link: https://lore.kernel.org/r/20230608163839.2891748-2-shr@devkernel.io
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 15:38:44 -05:00
Jeff Moyer b301afebc8 net/core: Enable socket busy polling on -RT
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit c857946a4e262e0f798fe7625fadf85bf1190fc4
Author: Kurt Kanzenbach <kurt@linutronix.de>
Date:   Tue May 23 13:15:18 2023 +0200

    net/core: Enable socket busy polling on -RT
    
    Busy polling is currently not allowed on PREEMPT_RT, because it disables
    preemption while invoking the NAPI callback. It is not possible to acquire
    sleeping locks with disabled preemption. For details see commit
    20ab39d13e2e ("net/core: disable NET_RX_BUSY_POLL on PREEMPT_RT").
    
    However, strict cyclic and/or low latency network applications may prefer busy
    polling e.g., using AF_XDP instead of interrupt driven communication.
    
    The preempt_disable() is used in order to prevent the poll_owner and NAPI owner
    to be preempted while owning the resource to ensure progress. Netpoll performs
    busy polling in order to acquire the lock. NAPI is locked by setting the
    NAPIF_STATE_SCHED flag. There is no busy polling if the flag is set and the
    "owner" is preempted. Worst case is that the task owning NAPI gets preempted and
    NAPI processing stalls.  This is can be prevented by properly prioritising the
    tasks within the system.
    
    Allow RX_BUSY_POLL on PREEMPT_RT if NETPOLL is disabled. Don't disable
    preemption on PREEMPT_RT within the busy poll loop.
    
    Tested on x86 hardware with v6.1-RT and v6.3-RT on Intel i225 (igc) with
    AF_XDP/ZC sockets configured to run in busy polling mode.
    
    Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 15:37:44 -05:00
Rado Vrbovsky f55e4a4e81 Merge: CNB96: page_pool: update to v6.12
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5432

JIRA: https://issues.redhat.com/browse/RHEL-57765

Updating page_pool to upstream v6.12 where necessary to enable driver
updates.

Signed-off-by: Felix Maurer <fmaurer@redhat.com>

Approved-by: Ivan Vecera <ivecera@redhat.com>
Approved-by: Petr Oros <poros@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2024-11-27 11:19:28 +00:00
Rado Vrbovsky fb874c9815 Merge: CNB96: netlink/devlink: update devlink & netlink to the v6.9
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5257

JIRA: https://issues.redhat.com/browse/RHEL-57755
Depends: !5414
Depends: !4753
Signed-off-by: Petr Oros <poros@redhat.com>

Approved-by: Ivan Vecera <ivecera@redhat.com>
Approved-by: José Ignacio Tornos Martínez <jtornosm@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2024-11-27 11:19:20 +00:00
Rado Vrbovsky 18484e6ffa Merge: CNB96: net: RTNL pressure reduction
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5605

A series of patches reducing RTNL pressure in net, namely the following upstream series and their prerequisites / fixes / related changes:  
- 3cbab89268c6 Merge branch 'inet-implement-lockless-rtm_getnetconf-ops'  
- 9f780efa6eaa Merge branch 'ipv6-devconf-lockless'  
- e96082570933 Merge branch 'inet_dump_ifaddr-no-rtnl'  
- 570c86ed60cc Merge branch 'ipv6-lockless-dump-addrs'  
  
Depends: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5414  
  
JIRA: https://issues.redhat.com/browse/RHEL-62205  
JIRA: https://issues.redhat.com/browse/RHEL-62204  
JIRA: https://issues.redhat.com/browse/RHEL-62203  
JIRA: https://issues.redhat.com/browse/RHEL-62202  
  
Signed-off-by: Antoine Tenart <atenart@redhat.com>

Approved-by: Sabrina Dubroca <sdubroca@redhat.com>
Approved-by: Ivan Vecera <ivecera@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2024-11-22 09:20:48 +00:00
Rado Vrbovsky 033238aef9 Merge: net-core: stable backports for rhel 9.6 phase 1
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5440

JIRA: https://issues.redhat.com/browse/RHEL-62849
JIRA: https://issues.redhat.com/browse/RHEL-64328
CVE: CVE-2024-49948
Tested: LNST, Tier1

A bunch of stable backport from upstream addressing minor core issues.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Approved-by: Florian Westphal <fwestpha@redhat.com>
Approved-by: Xin Long <lxin@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2024-11-22 09:17:29 +00:00
Petr Oros f70dcaae94 net: make dev_unreg_count global
JIRA: https://issues.redhat.com/browse/RHEL-57755

Upstream commit(s):
commit ffabe98cb576097b77d404d39e8b3df03caa986a
Author: Eric Dumazet <edumazet@google.com>
Date:   Fri Feb 2 10:11:06 2024 +0000

    net: make dev_unreg_count global

    We can use a global dev_unreg_count counter instead
    of a per netns one.

    As a bonus we can factorize the changes done on it
    for bulk device removals.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-11-20 10:13:42 +01:00
Petr Oros 22331632bc net: get rid of rtnl_lock_unregistering()
JIRA: https://issues.redhat.com/browse/RHEL-57755

Upstream commit(s):
commit 8a4fc54b07d756f1884af6c47ec84dfa3da663df
Author: Eric Dumazet <edumazet@google.com>
Date:   Fri Feb 18 09:58:56 2022 -0800

    net: get rid of rtnl_lock_unregistering()

    After recent patches, and in particular commits
     faab39f63c1f ("net: allow out-of-order netdev unregistration") and
     e5f80fcf869a ("ipv6: give an IPv6 dev to blackhole_netdev")
    we no longer need the barrier implemented in rtnl_lock_unregistering().

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-11-20 10:13:40 +01:00
Antoine Tenart d0a81589eb net/core: Introduce netdev_core_stats_inc()
JIRA: https://issues.redhat.com/browse/RHEL-68063
Upstream Status: linux.git
Conflicts:\
- Context difference due to missing upstream commit 794c24e9921f
  ("net-core: rx_otherhost_dropped to core_stats") in c9s.

commit 5247dbf16cee4e83eb89e4d3b87bd5e79c5d1655
Author: Yajun Deng <yajun.deng@linux.dev>
Date:   Mon Oct 9 19:16:33 2023 +0800

    net/core: Introduce netdev_core_stats_inc()

    Although there is a kfree_skb_reason() helper function that can be used to
    find the reason why this skb is dropped, but most callers didn't increase
    one of rx_dropped, tx_dropped, rx_nohandler and rx_otherhost_dropped.

    For the users, people are more concerned about why the dropped in ip
    is increasing.

    Introduce netdev_core_stats_inc() for trace the caller of
    dev_core_stats_*_inc().

    Also, add __code to netdev_core_stats_alloc(), as it's called with small
    probability. And add noinline make sure netdev_core_stats_inc was never
    inlined.

    Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
    Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2024-11-19 14:18:55 +01:00
Rado Vrbovsky f72847aeb4 Merge: CNB96: ethtool: update ethtool core to upstream v6.12
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5657

JIRA: https://issues.redhat.com/browse/RHEL-57751

Depends: !5431 

New features:
 - Extend core and ethtool APIs to support many PHYs connected to a single interface (PHY topologies).
 - Extend cable diagnostics to specify whether Time Domain Reflectometry (TDR) or Active Link Cable Diagnostic (ALCD) was used.
 - Support listing / dumping all allocated RSS contexts.

About half of the patches are simple cleanups in drivers' .get_ts_info implementations after moving the responsibility of reporting SOF_TIMESTAMPING_RX_SOFTWARE, SOF_TIMESTAMPING_SOFTWARE and setting phc_index to -1 to the core.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>

Approved-by: Kamal Heib <kheib@redhat.com>
Approved-by: Chris von Recklinghausen <crecklin@redhat.com>
Approved-by: Eric Chanudet <echanude@redhat.com>
Approved-by: José Ignacio Tornos Martínez <jtornosm@redhat.com>
Approved-by: Ivan Vecera <ivecera@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2024-11-15 21:15:13 +00:00
Rado Vrbovsky 0e814ddac4 Merge: bpf: backports from upstream [9.6 phase 1]
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5622

JIRA: https://issues.redhat.com/browse/RHEL-65205  
JIRA: https://issues.redhat.com/browse/RHEL-63189  
JIRA: https://issues.redhat.com/browse/RHEL-54828  
JIRA: https://issues.redhat.com/browse/RHEL-65858  
CVE: CVE-2024-47710  
CVE: CVE-2024-43834  
CVE: CVE-2024-41010  
  
Backporting stable fixes from upstream.  
  
Omitted-fix: 517125f67494 ("selftests/bpf: DENYLIST.aarch64: Skip fexit_sleep again")  
    We have a different way of skipping broken selftests in CKI  
  
Signed-off-by: Felix Maurer <fmaurer@redhat.com>

Approved-by: Chris von Recklinghausen <crecklin@redhat.com>
Approved-by: Toke Høiland-Jørgensen <toke@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2024-11-15 21:02:44 +00:00
Paolo Abeni 889018b514 net: add more sanity checks to qdisc_pkt_len_init()
JIRA: https://issues.redhat.com/browse/RHEL-62849
JIRA: https://issues.redhat.com/browse/RHEL-64328
CVE: CVE-2024-49948
Tested: LNST, Tier1

Upstream commit:
commit ab9a9a9e9647392a19e7a885b08000e89c86b535
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Sep 24 15:02:57 2024 +0000

    net: add more sanity checks to qdisc_pkt_len_init()

    One path takes care of SKB_GSO_DODGY, assuming
    skb->len is bigger than hdr_len.

    virtio_net_hdr_to_skb() does not fully dissect TCP headers,
    it only make sure it is at least 20 bytes.

    It is possible for an user to provide a malicious 'GSO' packet,
    total length of 80 bytes.

    - 20 bytes of IPv4 header
    - 60 bytes TCP header
    - a small gso_size like 8

    virtio_net_hdr_to_skb() would declare this packet as a normal
    GSO packet, because it would see 40 bytes of payload,
    bigger than gso_size.

    We need to make detect this case to not underflow
    qdisc_skb_cb(skb)->pkt_len.

    Fixes: 1def9238d4 ("net_sched: more precise pkt_len computation")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Willem de Bruijn <willemb@google.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-15 09:21:39 +01:00
Paolo Abeni e1662aa066 net: Fix gso_features_check to check for both dev->gso_{ipv4_,}max_size
JIRA: https://issues.redhat.com/browse/RHEL-62849
Tested: LNST, Tier1

Upstream commit:
commit e609c959a939660c7519895f853dfa5624c6827a
Author: Daniel Borkmann <daniel@iogearbox.net>
Date:   Mon Sep 23 23:22:42 2024 +0200

    net: Fix gso_features_check to check for both dev->gso_{ipv4_,}max_size

    Commit 24ab059d2ebd ("net: check dev->gso_max_size in gso_features_check()")
    added a dev->gso_max_size test to gso_features_check() in order to fall
    back to GSO when needed.

    This was added as it was noticed that some drivers could misbehave if TSO
    packets get too big. However, the check doesn't respect dev->gso_ipv4_max_size
    limit. For instance, a device could be configured with BIG TCP for IPv4,
    but not IPv6.

    Therefore, add a netif_get_gso_max_size() equivalent to netif_get_gro_max_size()
    and use the helper to respect both limits before falling back to GSO engine.

    Fixes: 24ab059d2ebd ("net: check dev->gso_max_size in gso_features_check()")
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Cc: Eric Dumazet <edumazet@google.com>
    Cc: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Link: https://patch.msgid.link/20240923212242.15669-2-daniel@iogearbox.net
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-15 09:21:38 +01:00
Paolo Abeni 1292263571 net: give more chances to rcu in netdev_wait_allrefs_any()
JIRA: https://issues.redhat.com/browse/RHEL-62849
Tested: LNST, Tier1

Upstream commit:
commit cd42ba1c8ac9deb9032add6adf491110e7442040
Author: Eric Dumazet <edumazet@google.com>
Date:   Fri Apr 26 06:42:22 2024 +0000

    net: give more chances to rcu in netdev_wait_allrefs_any()

    This came while reviewing commit c4e86b4363ac ("net: add two more
    call_rcu_hurry()").

    Paolo asked if adding one synchronize_rcu() would help.

    While synchronize_rcu() does not help, making sure to call
    rcu_barrier() before msleep(wait) is definitely helping
    to make sure lazy call_rcu() are completed.

    Instead of waiting ~100 seconds in my tests, the ref_tracker
    splats occurs one time only, and netdev_wait_allrefs_any()
    latency is reduced to the strict minimum.

    Ideally we should audit our call_rcu() users to make sure
    no refcount (or cascading call_rcu()) is held too long,
    because rcu_barrier() is quite expensive.

    Fixes: 0e4be9e57e ("net: use exponential backoff in netdev_wait_allrefs")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Link: https://lore.kernel.org/all/28bbf698-befb-42f6-b561-851c67f464aa@kernel.org/T/#m76d73ed6b03cd930778ac4d20a777f22a08d6824
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-15 09:21:33 +01:00
Antoine Tenart bbb75e1453 inet: prepare inet_base_seq() to run without RTNL
JIRA: https://issues.redhat.com/browse/RHEL-62204
Upstream Status: linux.git

commit 590e92cdc835fcf435d8611f2477fff0e16877c7
Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Feb 29 11:40:15 2024 +0000

    inet: prepare inet_base_seq() to run without RTNL

    In the following patch, inet_base_seq() will no longer be called
    with RTNL held.

    Add READ_ONCE()/WRITE_ONCE() annotations in dev_base_seq_inc()
    and inet_base_seq().

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2024-11-14 10:16:49 +01:00
Felix Maurer abb7b0a74f bpf: Fix dev's rx stats for bpf_redirect_peer traffic
JIRA: https://issues.redhat.com/browse/RHEL-65205

commit 024ee930cb3c9ae49e4266aee89cfde0ebb407e1
Author: Peilin Ye <peilin.ye@bytedance.com>
Date:   Tue Nov 14 01:42:17 2023 +0100

    bpf: Fix dev's rx stats for bpf_redirect_peer traffic
    
    Traffic redirected by bpf_redirect_peer() (used by recent CNIs like Cilium)
    is not accounted for in the RX stats of supported devices (that is, veth
    and netkit), confusing user space metrics collectors such as cAdvisor [0],
    as reported by Youlun.
    
    Fix it by calling dev_sw_netstats_rx_add() in skb_do_redirect(), to update
    RX traffic counters. Devices that support ndo_get_peer_dev _must_ use the
    @tstats per-CPU counters (instead of @lstats, or @dstats).
    
    To make this more fool-proof, error out when ndo_get_peer_dev is set but
    @tstats are not selected.
    
      [0] Specifically, the "container_network_receive_{byte,packet}s_total"
          counters are affected.
    
    Fixes: 9aa1206e8f ("bpf: Add redirect_peer helper")
    Reported-by: Youlun Zhang <zhangyoulun@bytedance.com>
    Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
    Co-developed-by: Daniel Borkmann <daniel@iogearbox.net>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
    Link: https://lore.kernel.org/r/20231114004220.6495-6-daniel@iogearbox.net
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
2024-11-06 18:56:49 +01:00
Michal Schmidt b9bb091e5a net: phy: Introduce ethernet link topology representation
JIRA: https://issues.redhat.com/browse/RHEL-57751

commit 3849687869092094003ba009dc00e2e0237a3b8a
Author: Maxime Chevallier <maxime.chevallier@bootlin.com>
Date:   Wed Aug 21 17:09:55 2024 +0200

    net: phy: Introduce ethernet link topology representation

    Link topologies containing multiple network PHYs attached to the same
    net_device can be found when using a PHY as a media converter for use
    with an SFP connector, on which an SFP transceiver containing a PHY can
    be used.

    With the current model, the transceiver's PHY can't be used for
    operations such as cable testing, timestamping, macsec offload, etc.

    The reason being that most of the logic for these configuration, coming
    from either ethtool netlink or ioctls tend to use netdev->phydev, which
    in multi-phy systems will reference the PHY closest to the MAC.

    Introduce a numbering scheme allowing to enumerate PHY devices that
    belong to any netdev, which can in turn allow userspace to take more
    precise decisions with regard to each PHY's configuration.

    The numbering is maintained per-netdev, in a phy_device_list.
    The numbering works similarly to a netdevice's ifindex, with
    identifiers that are only recycled once INT_MAX has been reached.

    This prevents races that could occur between PHY listing and SFP
    transceiver removal/insertion.

    The identifiers are assigned at phy_attach time, as the numbering
    depends on the netdevice the phy is attached to. The PHY index can be
    re-used for PHYs that are persistent.

    Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
    Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
    Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Conflicts: Dropped a whitespace-only hunk.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
2024-11-05 11:38:22 +01:00
Ivan Vecera 39c44df073 ipv6: prepare inet6_fill_ifinfo() for RCU protection
JIRA: https://issues.redhat.com/browse/RHEL-62123

commit 8afc7a78d55de726b2747d7775c54def79509ec5
Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Feb 22 10:50:10 2024 +0000

    ipv6: prepare inet6_fill_ifinfo() for RCU protection

    We want to use RCU protection instead of RTNL
    for inet6_fill_ifinfo().

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-10-24 16:14:43 +02:00
Ivan Vecera f183fb3c8a rtnetlink: prepare nla_put_iflink() to run under RCU
JIRA: https://issues.redhat.com/browse/RHEL-62123

Conflicts:
* drivers/net/netkit.c
  - hunk omitted as the driver is not present in RHEL
* net/dsa/user.c
  - the hunk applied in dsa/slave.c due to absence of DSA deps

commit e353ea9ce471331c13edffd5977eadd602d1bb80
Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Feb 22 10:50:08 2024 +0000

    rtnetlink: prepare nla_put_iflink() to run under RCU

    We want to be able to run rtnl_fill_ifinfo() under RCU protection
    instead of RTNL in the future.

    This patch prepares dev_get_iflink() and nla_put_iflink()
    to run either with RTNL or RCU held.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-10-24 16:14:43 +02:00
Ivan Vecera 427128699e net: free altname using an RCU callback
JIRA: https://issues.redhat.com/browse/RHEL-62123

commit 723de3ebef03bc14bd72531f00f9094337654009
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Fri Jan 26 12:14:49 2024 -0800

    net: free altname using an RCU callback

    We had to add another synchronize_rcu() in recent fix.
    Bite the bullet and add an rcu_head to netdev_name_node,
    free from RCU.

    Note that name_node does not hold any reference on dev
    to which it points, but there must be a synchronize_rcu()
    on device removal path, so we should be fine.

    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-10-24 16:14:43 +02:00
Ivan Vecera 3676002068 net: fix removing a namespace with conflicting altnames
JIRA: https://issues.redhat.com/browse/RHEL-62123

commit d09486a04f5da0a812c26217213b89a3b1acf836
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Thu Jan 18 16:58:59 2024 -0800

    net: fix removing a namespace with conflicting altnames

    Mark reports a BUG() when a net namespace is removed.

        kernel BUG at net/core/dev.c:11520!

    Physical interfaces moved outside of init_net get "refunded"
    to init_net when that namespace disappears. The main interface
    name may get overwritten in the process if it would have
    conflicted. We need to also discard all conflicting altnames.
    Recent fixes addressed ensuring that altnames get moved
    with the main interface, which surfaced this problem.

    Reported-by: Марк Коренберг <socketpair@gmail.com>
    Link: https://lore.kernel.org/all/CAEmTpZFZ4Sv3KwqFOY2WKDHeZYdi0O7N5H1nTvcGp=SAEavtDg@mail.gmail.com/
    Fixes: 7663d522099e ("net: check for altname conflicts when changing netdev's netns")
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Xin Long <lucien.xin@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-10-24 16:14:42 +02:00
Ivan Vecera 97abada084 net: move altnames together with the netdevice
JIRA: https://issues.redhat.com/browse/RHEL-62123

Conflicts:
* net/core/dev.c
  - simple context conflict due to existing backport of commit
    1b3ef46cb7f26 ("net: remove dev_base_lock")

commit 8e15aee621618a3ee3abecaf1fd8c1428098b7ef
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Tue Oct 17 18:38:16 2023 -0700

    net: move altnames together with the netdevice

    The altname nodes are currently not moved to the new netns
    when netdevice itself moves:

      [ ~]# ip netns add test
      [ ~]# ip -netns test link add name eth0 type dummy
      [ ~]# ip -netns test link property add dev eth0 altname some-name
      [ ~]# ip -netns test link show dev some-name
      2: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
          link/ether 1e:67:ed:19:3d:24 brd ff:ff:ff:ff:ff:ff
          altname some-name
      [ ~]# ip -netns test link set dev eth0 netns 1
      [ ~]# ip link
      ...
      3: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
          link/ether 02:40:88:62:ec:b8 brd ff:ff:ff:ff:ff:ff
          altname some-name
      [ ~]# ip li show dev some-name
      Device "some-name" does not exist.

    Remove them from the hash table when device is unlisted
    and add back when listed again.

    Fixes: 36fbf1e52b ("net: rtnetlink: add linkprop commands to add and delete alternative ifnames")
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-10-24 15:55:37 +02:00
Ivan Vecera 75ff0ddc8a net: avoid UAF on deleted altname
JIRA: https://issues.redhat.com/browse/RHEL-62123

commit 1a83f4a7c156fa6bbd6b530e89fa3270bf3d9d1b
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Tue Oct 17 18:38:15 2023 -0700

    net: avoid UAF on deleted altname

    Altnames are accessed under RCU (dev_get_by_name_rcu())
    but freed by kfree() with no synchronization point.

    Each node has one or two allocations (node and a variable-size
    name, sometimes the name is netdev->name). Adding rcu_heads
    here is a bit tedious. Besides most code which unlists the names
    already has rcu barriers - so take the simpler approach of adding
    synchronize_rcu(). Note that the one on the unregistration path
    (which matters more) is removed by the next fix.

    Fixes: ff92741270 ("net: introduce name_node struct to be used in hashlist")
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-10-24 15:55:36 +02:00
Ivan Vecera f6ec4f3e1b net: check for altname conflicts when changing netdev's netns
JIRA: https://issues.redhat.com/browse/RHEL-62123

commit 7663d522099ecc464512164e660bc771b2ff7b64
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Tue Oct 17 18:38:14 2023 -0700

    net: check for altname conflicts when changing netdev's netns

    It's currently possible to create an altname conflicting
    with an altname or real name of another device by creating
    it in another netns and moving it over:

     [ ~]$ ip link add dev eth0 type dummy

     [ ~]$ ip netns add test
     [ ~]$ ip -netns test link add dev ethX netns test type dummy
     [ ~]$ ip -netns test link property add dev ethX altname eth0
     [ ~]$ ip -netns test link set dev ethX netns 1

     [ ~]$ ip link
     ...
     3: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
         link/ether 02:40:88:62:ec:b8 brd ff:ff:ff:ff:ff:ff
     ...
     5: ethX: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
         link/ether 26:b7:28:78:38:0f brd ff:ff:ff:ff:ff:ff
         altname eth0

    Create a macro for walking the altnames, this hopefully makes
    it clearer that the list we walk contains only altnames.
    Which is otherwise not entirely intuitive.

    Fixes: 36fbf1e52b ("net: rtnetlink: add linkprop commands to add and delete alternative ifnames")
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-10-24 15:55:36 +02:00
Ivan Vecera fdaa0772ed net: fix ifname in netlink ntf during netns move
JIRA: https://issues.redhat.com/browse/RHEL-62123

commit 311cca40661f428b7aa114fb5af578cfdbe3e8b6
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Tue Oct 17 18:38:13 2023 -0700

    net: fix ifname in netlink ntf during netns move

    dev_get_valid_name() overwrites the netdev's name on success.
    This makes it hard to use in prepare-commit-like fashion,
    where we do validation first, and "commit" to the change
    later.

    Factor out a helper which lets us save the new name to a buffer.
    Use it to fix the problem of notification on netns move having
    incorrect name:

     5: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
         link/ether be:4d:58:f9:d5:40 brd ff:ff:ff:ff:ff:ff
     6: eth1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
         link/ether 1e:4a:34:36:e3:cd brd ff:ff:ff:ff:ff:ff

     [ ~]# ip link set dev eth0 netns 1 name eth1

    ip monitor inside netns:
     Deleted inet eth0
     Deleted inet6 eth0
     Deleted 5: eth1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
         link/ether be:4d:58:f9:d5:40 brd ff:ff:ff:ff:ff:ff new-netnsid 0 new-ifindex 7

    Name is reported as eth1 in old netns for ifindex 5, already renamed.

    Fixes: d90310243f ("net: device name allocation cleanups")
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-10-24 15:55:36 +02:00
Ivan Vecera ac8fabb776 net: move from strlcpy with unused retval to strscpy
JIRA: https://issues.redhat.com/browse/RHEL-62123

commit 70986397a15bf337d4ca3215a65e30bbe95e5d3c
Author: Wolfram Sang <wsa+renesas@sang-engineering.com>
Date:   Thu Aug 18 23:02:15 2022 +0200

    net: move from strlcpy with unused retval to strscpy

    Follow the advice of the below link and prefer 'strscpy' in this
    subsystem. Conversion is 1:1 because the return value is not used.
    Generated by a coccinelle script.

    Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
    Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
    Link: https://lore.kernel.org/r/20220818210215.8395-1-wsa+renesas@sang-engineering.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-10-24 15:55:36 +02:00
Ivan Vecera f01d4fe874 net: introduce a function to check if a netdev name is in use
JIRA: https://issues.redhat.com/browse/RHEL-62123

commit 75ea27d0d62281c31ee259c872dfdeb072cf5e39
Author: Antoine Tenart <atenart@kernel.org>
Date:   Thu Oct 7 18:16:50 2021 +0200

    net: introduce a function to check if a netdev name is in use

    __dev_get_by_name is currently used to either retrieve a net device
    reference using its name or to check if a name is already used by a
    registered net device (per ns). In the later case there is no need to
    return a reference to a net device.

    Introduce a new helper, netdev_name_in_use, to check if a name is
    currently used by a registered net device without leaking a reference
    the corresponding net device. This helper uses netdev_name_node_lookup
    instead of __dev_get_by_name as we don't need the extra logic retrieving
    a reference to the corresponding net device.

    Signed-off-by: Antoine Tenart <atenart@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-10-24 15:55:36 +02:00
Felix Maurer 2e7d822903 net: page_pool: fix recycle stats for system page_pool allocator
JIRA: https://issues.redhat.com/browse/RHEL-57765
Conflicts:
- net/core/page_pool.c: context difference due to missing aaf153aecef1
  ("page_pool: halve BIAS_MAX for multiple user references of a fragment")

commit f853fa5c54e7a0364a52125074dedeaf2c7ddace
Author: Lorenzo Bianconi <lorenzo@kernel.org>
Date:   Fri Feb 16 10:25:43 2024 +0100

    net: page_pool: fix recycle stats for system page_pool allocator

    Use global percpu page_pool_recycle_stats counter for system page_pool
    allocator instead of allocating a separate percpu variable for each
    (also percpu) page pool instance.

    Reviewed-by: Toke Hoiland-Jorgensen <toke@redhat.com>
    Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
    Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    Link: https://lore.kernel.org/r/87f572425e98faea3da45f76c3c68815c01a20ee.1708075412.git.lorenzo@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
2024-10-21 16:37:42 +02:00
Rado Vrbovsky f177edd8c5 Merge: CNB96: netdev_features: start cleaning netdev_features_t up
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5362

JIRA: https://issues.redhat.com/browse/RHEL-59091

Explanation from the upstream cover letter by Alexander Lobakin:

> NETDEV_FEATURE_COUNT is currently 64, which means we can't add any new
> features as netdev_features_t is u64.
> As per several discussions, instead of converting netdev_features_t to
> a bitmap, which would mean A LOT of changes, we can try cleaning up
> netdev feature bits.
> There's a bunch of bits which don't really mean features, rather device
> attributes/properties that can't be changed via Ethtool in any of the
> drivers. Such attributes can be moved to netdev private flags without
> losing any functionality.
> 
> Start converting some read-only netdev features to private flags from
> the ones that are most obvious, like lockless Tx, inability to change
> network namespace etc. I was able to reduce NETDEV_FEATURE_COUNT from
> 64 to 60, which mean 4 free slots for new features. There are obviously
> more read-only features to convert, such as highDMA, "challenged VLAN",
> HSR (4 bits) - this will be done in subsequent series.
> Please note that netdev features are not uAPI/ABI by any means. Ethtool
> passes their names and bits to the userspace separately and there are no
> hardcoded names/bits in the userspace, so that new Ethtool could work
> on older kernels and vice versa. Even shell scripts won't most likely
> break since the removed bits were always read-only, meaning nobody would
> try touching them from a script.

I proposed a Release Note Text in the Jira to document that "tx-lockless", "netns-local", "fcoe-mtu" will no longer appear in "ethtool -k". 

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>

Approved-by: José Ignacio Tornos Martínez <jtornosm@redhat.com>
Approved-by: Ivan Vecera <ivecera@redhat.com>
Approved-by: Antoine Tenart <atenart@redhat.com>
Approved-by: Eric Chanudet <echanude@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2024-10-20 09:09:03 +00:00
Rado Vrbovsky 40945cb730 Merge: CNB96: net/ethtool: rebase to v6.11
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5197

JIRA: https://issues.redhat.com/browse/RHEL-57750  
Depends: !5196

This updates the ethtool subsystem to v6.11. At the end of this series, the only remaining diffs from v6.11 are the RH_KABI_RESERVES in struct ethtool_ops, as shown by:  
`git diff v6.11 -- net/ethtool include/linux/ethtool.h include/uapi/linux/ethtool{,_netlink}.h Documentation/netlink/specs/ethtool.yaml Documentation/networking/ethtool-netlink.rst tools/net/ynl/ethtool.py`

Omitted-Fix: 9dbad38336a9 ("eth: bnxt: populate defaults in the RSS context struct")
 - bnxt has not been converted to .create_rxfh_context yet.
   This will be in a driver update later.

Omitted-Fix: cdc90f75387c ("pse-core: Conditionally set current limit during PI regulator registration")  
Omitted-Fix: 326f442784c2 ("net: pse-pd: pse_core: Fix pse regulator type")
 - All changes to pse_core.c omitted in the series.

Omitted-Fix: 2fa809b90617 ("net: pse-pd: Kconfig: Add missing Regulator API dependency")
 - Irrelevant. CONFIG_PSE_CONTROLLER is disabled.

Omitted-Fix: 93c3a96c301f ("net: pse-pd: Do not return EOPNOSUPP if config is null")
 - Contained in the merge conflict resolution backported in "net: ethtool: pse-pd: Fix possible null-deref".

Omitted-Fix: dda3529d2e84 ("net: pse-pd: Fix enabled status mismatch")
 - Irrelevant. CONFIG_PSE_CONTROLLER is disabled.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>

Approved-by: Antoine Tenart <atenart@redhat.com>
Approved-by: Ivan Vecera <ivecera@redhat.com>
Approved-by: Eric Chanudet <echanude@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2024-10-19 08:11:42 +00:00
Rado Vrbovsky 3438e40aac Merge: net: Provide SMP threads for backlog NAPI
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4253

JIRA: https://issues.redhat.com/browse/RHEL-9145

Signed-off-by: Wander Lairson Costa <wander@redhat.com>

Approved-by: Antoine Tenart <atenart@redhat.com>
Approved-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Approved-by: Eder Zulian <ezulian@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2024-10-19 08:04:53 +00:00
Rado Vrbovsky 90b989331e Merge: CNB96: net: complete dev_base_lock removal
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5196

JIRA: https://issues.redhat.com/browse/RHEL-59100  
Tested: LNST net_driver (http://dashboard.lnst.anl.lab.eng.bos.redhat.com/pipeline/5496)  
Depends: !5146  

Commits:
```
73c2e90a0edc ("net-sysfs: Convert to use sysfs_emit() APIs")
facd15dfd691 ("net: core: synchronize link-watch when carrier is queried")
bf17b36ccdd5 ("net: sysfs: fix locking in carrier read")
1c07dbb0cccf ("net: annotate data-races around dev->name_assign_type")
f694eee9e1c0 ("ip_tunnel: annotate data-races around t->parms.link")
a6473fe9b623 ("dev: annotate accesses to dev->link")
4d42b37def70 ("net: convert dev->reg_state to u8")
12692e3df2da ("net-sysfs: convert netdev_show() to RCU")
c7d52737e7eb ("net-sysfs: use dev_addr_sem to remove races in address_show()")
004d138364fd ("net-sysfs: convert dev->operstate reads to lockless ones")
e154bb7a6ebb ("net-sysfs: convert netstat_show() to RCU")
328771deab16 ("net: remove stale mentions of dev_base_lock in comments")
6a2968ee1ee2 ("net: add netdev_set_operstate() helper")
2dd4d828d648 ("net: remove dev_base_lock from do_setlink()")
e51b96243874 ("net: remove dev_base_lock from register_netdevice() and friends.")
1b3ef46cb7f2 ("net: remove dev_base_lock")
```

Signed-off-by: Ivan Vecera <ivecera@redhat.com>

Approved-by: Antoine Tenart <atenart@redhat.com>
Approved-by: Michal Schmidt <mschmidt@redhat.com>
Approved-by: Petr Oros <poros@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2024-10-16 12:13:53 +00:00
Rado Vrbovsky 6236cd4de2 Merge: CNB96: net: Move {l,t,d}stats allocation to core and convert veth & vrf
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5146

JIRA: https://issues.redhat.com/browse/RHEL-57740  

Commits:
```
79e0c5be8c73 ("net, vrf: Move dstats structure to core")
34d21de99cea ("net: Move {l,t,d}stats allocation to core and convert veth & vrf")
```

Signed-off-by: Ivan Vecera <ivecera@redhat.com>

Approved-by: José Ignacio Tornos Martínez <jtornosm@redhat.com>
Approved-by: Antoine Tenart <atenart@redhat.com>
Approved-by: Petr Oros <poros@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2024-10-10 12:27:50 +00:00
Rado Vrbovsky 2239f06d77 Merge: CNB96: net: create a dummy net_device allocator
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5264

JIRA: https://issues.redhat.com/browse/RHEL-59092  

Signed-off-by: Izabela Bakollari <ibakolla@redhat.com>

Approved-by: Sabrina Dubroca <sdubroca@redhat.com>
Approved-by: Ivan Vecera <ivecera@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2024-10-10 12:04:40 +00:00