Commit Graph

291 Commits

Author SHA1 Message Date
Antoine Tenart 8edbf53cfb net: napi: Prevent overflow of napi_defer_hard_irqs
JIRA: https://issues.redhat.com/browse/RHEL-63914
Upstream Status: linux.git
CVE: CVE-2024-50018

commit 08062af0a52107a243f7608fd972edb54ca5b7f8
Author: Joe Damato <jdamato@fastly.com>
Date:   Wed Sep 4 15:34:30 2024 +0000

    net: napi: Prevent overflow of napi_defer_hard_irqs

    In commit 6f8b12d661 ("net: napi: add hard irqs deferral feature")
    napi_defer_irqs was added to net_device and napi_defer_irqs_count was
    added to napi_struct, both as type int.

    This value never goes below zero, so there is not reason for it to be a
    signed int. Change the type for both from int to u32, and add an
    overflow check to sysfs to limit the value to S32_MAX.

    The limit of S32_MAX was chosen because the practical limit before this
    patch was S32_MAX (anything larger was an overflow) and thus there are
    no behavioral changes introduced. If the extra bit is needed in the
    future, the limit can be raised.

    Before this patch:

    $ sudo bash -c 'echo 2147483649 > /sys/class/net/eth4/napi_defer_hard_irqs'
    $ cat /sys/class/net/eth4/napi_defer_hard_irqs
    -2147483647

    After this patch:

    $ sudo bash -c 'echo 2147483649 > /sys/class/net/eth4/napi_defer_hard_irqs'
    bash: line 0: echo: write error: Numerical result out of range

    Similarly, /sys/class/net/XXXXX/tx_queue_len is defined as unsigned:

    include/linux/netdevice.h:      unsigned int            tx_queue_len;

    And has an overflow check:

    dev_change_tx_queue_len(..., unsigned long new_len):

      if (new_len != (unsigned int)new_len)
              return -ERANGE;

    Suggested-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Joe Damato <jdamato@fastly.com>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Link: https://patch.msgid.link/20240904153431.307932-1-jdamato@fastly.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2024-11-05 10:51:34 +01:00
Michal Schmidt 94a28450df ethtool: check device is present when getting link settings
JIRA: https://issues.redhat.com/browse/RHEL-57750

commit a699781c79ecf6cfe67fb00a0331b4088c7c8466
Author: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Date:   Fri Aug 23 16:26:58 2024 +1000

    ethtool: check device is present when getting link settings

    A sysfs reader can race with a device reset or removal, attempting to
    read device state when the device is not actually present. eg:

         [exception RIP: qed_get_current_link+17]
      #8 [ffffb9e4f2907c48] qede_get_link_ksettings at ffffffffc07a994a [qede]
      #9 [ffffb9e4f2907cd8] __rh_call_get_link_ksettings at ffffffff992b01a3
     #10 [ffffb9e4f2907d38] __ethtool_get_link_ksettings at ffffffff992b04e4
     #11 [ffffb9e4f2907d90] duplex_show at ffffffff99260300
     #12 [ffffb9e4f2907e38] dev_attr_show at ffffffff9905a01c
     #13 [ffffb9e4f2907e50] sysfs_kf_seq_show at ffffffff98e0145b
     #14 [ffffb9e4f2907e68] seq_read at ffffffff98d902e3
     #15 [ffffb9e4f2907ec8] vfs_read at ffffffff98d657d1
     #16 [ffffb9e4f2907f00] ksys_read at ffffffff98d65c3f
     #17 [ffffb9e4f2907f38] do_syscall_64 at ffffffff98a052fb

     crash> struct net_device.state ffff9a9d21336000
        state = 5,

    state 5 is __LINK_STATE_START (0b1) and __LINK_STATE_NOCARRIER (0b100).
    The device is not present, note lack of __LINK_STATE_PRESENT (0b10).

    This is the same sort of panic as observed in commit 4224cfd7fb65
    ("net-sysfs: add check for netdevice being present to speed_show").

    There are many other callers of __ethtool_get_link_ksettings() which
    don't have a device presence check.

    Move this check into ethtool to protect all callers.

    Fixes: d519e17e2d ("net: export device speed and duplex via sysfs")
    Fixes: 4224cfd7fb65 ("net-sysfs: add check for netdevice being present to speed_show")
    Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
    Link: https://patch.msgid.link/8bae218864beaa44ed01628140475b9bf641c5b0.1724393671.git.jamie.bainbridge@gmail.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
2024-10-01 12:19:15 +02:00
Michal Schmidt befa9a237f net: no longer acquire RTNL in threaded_show()
JIRA: https://issues.redhat.com/browse/RHEL-57750

commit c1742dcb6bda5fd535fbaa2145f0a180bc329aa6
Author: Eric Dumazet <edumazet@google.com>
Date:   Thu May 2 17:39:26 2024 +0000

    net: no longer acquire RTNL in threaded_show()

    dev->threaded can be read locklessly, if we add
    corresponding READ_ONCE()/WRITE_ONCE() annotations.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Link: https://lore.kernel.org/r/20240502173926.2010646-1-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
2024-10-01 12:19:13 +02:00
Ivan Vecera da2dfc4408 net-sysfs: convert netstat_show() to RCU
JIRA: https://issues.redhat.com/browse/RHEL-59100

commit e154bb7a6ebbe5414accb5d94dc5ba80c204ea64
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Feb 13 06:32:40 2024 +0000

    net-sysfs: convert netstat_show() to RCU

    dev_get_stats() can be called from RCU, there is no need
    to acquire dev_base_lock.

    Change dev_isalive() comment to reflect we no longer use
    dev_base_lock from net/core/net-sysfs.c

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-09-17 12:17:16 +02:00
Ivan Vecera 116bf3b894 net-sysfs: convert dev->operstate reads to lockless ones
JIRA: https://issues.redhat.com/browse/RHEL-59100

commit 004d138364fd10dd5ff8ceb54cfdc2d792a7b338
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Feb 13 06:32:39 2024 +0000

    net-sysfs: convert dev->operstate reads to lockless ones

    operstate_show() can omit dev_base_lock acquisition only
    to read dev->operstate.

    Annotate accesses to dev->operstate.

    Writers still acquire dev_base_lock for mutual exclusion.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-09-17 12:17:15 +02:00
Ivan Vecera ca118e46cc net-sysfs: use dev_addr_sem to remove races in address_show()
JIRA: https://issues.redhat.com/browse/RHEL-59100

commit c7d52737e7ebd31cc5fef46380d94b58becf9479
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Feb 13 06:32:38 2024 +0000

    net-sysfs: use dev_addr_sem to remove races in address_show()

    Using dev_base_lock is not preventing from reading garbage.

    Use dev_addr_sem instead.

    v4: place dev_addr_sem extern in net/core/dev.h (Jakub Kicinski)
     Link: https://lore.kernel.org/netdev/20240212175845.10f6680a@kernel.org/

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-09-17 12:17:14 +02:00
Ivan Vecera a611074056 net-sysfs: convert netdev_show() to RCU
JIRA: https://issues.redhat.com/browse/RHEL-59100

commit 12692e3df2dacf2993c56aa23b6d3de921a5bdff
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Feb 13 06:32:37 2024 +0000

    net-sysfs: convert netdev_show() to RCU

    Make clear dev_isalive() can be called with RCU protection.

    Then convert netdev_show() to RCU, to remove dev_base_lock
    dependency.

    Also add RCU to broadcast_show().

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-09-17 12:17:14 +02:00
Ivan Vecera 7e1ceb8047 net: annotate data-races around dev->name_assign_type
JIRA: https://issues.redhat.com/browse/RHEL-59100

commit 1c07dbb0cccfe85060b6eb089db3d6bfeb6aaf31
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Feb 13 06:32:33 2024 +0000

    net: annotate data-races around dev->name_assign_type

    name_assign_type_show() runs locklessly, we should annotate
    accesses to dev->name_assign_type.

    Alternative would be to grab devnet_rename_sem semaphore
    from name_assign_type_show(), but this would not bring
    more accuracy.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-09-17 12:17:11 +02:00
Ivan Vecera d0865dbf75 net: sysfs: fix locking in carrier read
JIRA: https://issues.redhat.com/browse/RHEL-59100

commit bf17b36ccdd5b7b9dd482d7753bcb9aff2d21d39
Author: Johannes Berg <johannes.berg@intel.com>
Date:   Wed Dec 6 17:21:23 2023 +0100

    net: sysfs: fix locking in carrier read

    My previous patch added a call to linkwatch_sync_dev(),
    but that of course needs to be called under RTNL, which
    I missed earlier, but now saw RCU warnings from.

    Fix that by acquiring the RTNL in a similar fashion to
    how other files do it here.

    Fixes: facd15dfd691 ("net: core: synchronize link-watch when carrier is queried")
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Link: https://lore.kernel.org/r/20231206172122.859df6ba937f.I9c80608bcfbab171943ff4942b52dbd5e97fe06e@changeid
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-09-17 12:17:10 +02:00
Ivan Vecera 4863bafaf6 net: core: synchronize link-watch when carrier is queried
JIRA: https://issues.redhat.com/browse/RHEL-59100

commit facd15dfd69122042502d99ab8c9f888b48ee994
Author: Johannes Berg <johannes.berg@intel.com>
Date:   Mon Dec 4 21:47:07 2023 +0100

    net: core: synchronize link-watch when carrier is queried

    There are multiple ways to query for the carrier state: through
    rtnetlink, sysfs, and (possibly) ethtool. Synchronize linkwatch
    work before these operations so that we don't have a situation
    where userspace queries the carrier state between the driver's
    carrier off->on transition and linkwatch running and expects it
    to work, when really (at least) TX cannot work until linkwatch
    has run.

    I previously posted a longer explanation of how this applies to
    wireless [1] but with this wireless can simply query the state
    before sending data, to ensure the kernel is ready for it.

    [1] https://lore.kernel.org/all/346b21d87c69f817ea3c37caceb34f1f56255884.camel@sipsolutions.net/

    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Link: https://lore.kernel.org/r/20231204214706.303c62768415.I1caedccae72ee5a45c9085c5eb49c145ce1c0dd5@changeid
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-09-17 12:17:10 +02:00
Ivan Vecera fdf1c3ce0b net-sysfs: Convert to use sysfs_emit() APIs
JIRA: https://issues.redhat.com/browse/RHEL-59100

commit 73c2e90a0edc84751c4b95b12fc52051dd60f542
Author: Wang Yufen <wangyufen@huawei.com>
Date:   Wed Sep 28 19:49:44 2022 +0800

    net-sysfs: Convert to use sysfs_emit() APIs

    Follow the advice of the Documentation/filesystems/sysfs.rst and show()
    should only use sysfs_emit() or sysfs_emit_at() when formatting the value
    to be returned to user space.

    Signed-off-by: Wang Yufen <wangyufen@huawei.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-09-17 12:17:09 +02:00
Ivan Vecera 4ee448db07 net: introduce include/net/rps.h
JIRA: https://issues.redhat.com/browse/RHEL-31916

Conflicts:
* net/core/dev.c
  context conflict due to missing commit 2b0cfa6e49566 ("net: add
  generic percpu page_pool allocator")
* net/core/sysctl_net_core.c
  context conflict due to missing commit 2658b5a8a4eee ("net: introduce
  struct net_hotdata")

commit 490a79faf95e705ba0ffd9ebf04a624b379e53c9
Author: Eric Dumazet <edumazet@google.com>
Date:   Wed Mar 6 16:00:30 2024 +0000

    net: introduce include/net/rps.h

    Move RPS related structures and helpers from include/linux/netdevice.h
    and include/net/sock.h to a new include file.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Link: https://lore.kernel.org/r/20240306160031.874438-18-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-04-05 16:03:32 +02:00
Ivan Vecera 139012e61c net: move struct netdev_rx_queue out of netdevice.h
JIRA: https://issues.redhat.com/browse/RHEL-31916

Conflicts:
* include/linux/netdevice.h
  Adjusted due to KABI reservations made by RHEL
  commit 3b3a52715a ("net: exclude BPF/XDP from kABI")

commit 49e47a5b6145d86c30022fe0e949bbb24bae28ba
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Wed Aug 2 18:02:29 2023 -0700

    net: move struct netdev_rx_queue out of netdevice.h

    struct netdev_rx_queue is touched in only a few places
    and having it defined in netdevice.h brings in the dependency
    on xdp.h, because struct xdp_rxq_info gets embedded in
    struct netdev_rx_queue.

    In prep for removal of xdp.h from netdevice.h move all
    the netdev_rx_queue stuff to a new header.

    We could technically break the new header up to avoid
    the sysfs.h include but it's so rarely included it
    doesn't seem to be worth it at this point.

    Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
    Link: https://lore.kernel.org/r/20230803010230.1755386-3-kuba@kernel.org
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-04-05 16:03:26 +02:00
Sabrina Dubroca 1a10ee13b5 net: add reserved fields to rtnl_link_stats*
JIRA: https://issues.redhat.com/browse/RHEL-21356
Upstream Status: RHEL-only

rtnl_link_stats and rtnl_link_stats64 are protected by kABI, add 4
reserved fields. We need to use a custom mechanism here, because those
structures are part of uapi.

Signed-off-by: Sabrina Dubroca <sdubroca@redhat.com>
2024-01-12 14:27:38 +01:00
Mark Langsdorf 2c3e3353e5 kobject: make kobject_get_ownership() take a constant kobject *
JIRA: https://issues.redhat.com/browse/RHEL-1023

commit 02a476d932287cf3096f78962ccb70d94d6203c6
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Tue, 22 Nov 2022 17:34:29 +0000

The call, kobject_get_ownership(), does not modify the kobject passed
into it, so make it const.  This propagates down into the kobj_type
function callbacks so make the kobject passed into them also const,
ensuring that nothing in the kobject is being changed here.

This helps make it more obvious what calls and callbacks do, and do not,
modify structures passed to them.

Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Roopa Prabhu <roopa@nvidia.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: linux-nfs@vger.kernel.org
Cc: bridge@lists.linux-foundation.org
Cc: netdev@vger.kernel.org
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20221121094649.1556002-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Mark Langsdorf <mlangsdo@redhat.com>
2023-10-23 10:35:56 -05:00
Mark Langsdorf 726882e7ad driver core: make struct class.dev_uevent() take a const *
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2178302
Conflicts:
	drivers/base/firmware_loader/sysfs.h - replace the single
line version of to_fw_sysfs with the longer inline version

commit 23680f0b7d7f67a935adb38058110d2d81bbe6ea
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Wed, 23 Nov 2022 13:25:19 +0100

The dev_uevent() in struct class should not be modifying the device that
is passed into it, so mark it as a const * and propagate the function
signature changes out into all relevant subsystems that use this
callback.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Russ Weight <russell.h.weight@intel.com>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Johan Hovold <johan@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Wolfram Sang <wsa+renesas@sang-engineering.com>
Cc: Raed Salem <raeds@nvidia.com>
Cc: Chen Zhongjin <chenzhongjin@huawei.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Avihai Horon <avihaih@nvidia.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Jakob Koschel <jakobkoschel@gmail.com>
Cc: Antoine Tenart <atenart@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Wang Yufen <wangyufen@huawei.com>
Cc: linux-block@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: linux-nvme@lists.infradead.org
Cc: linux-pm@vger.kernel.org
Cc: linux-rdma@vger.kernel.org
Cc: linux-usb@vger.kernel.org
Cc: linux-wireless@vger.kernel.org
Cc: netdev@vger.kernel.org
Acked-by: Sebastian Reichel <sre@kernel.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20221123122523.1332370-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Mark Langsdorf <mlangsdo@redhat.com>
2023-06-08 12:32:05 -04:00
Mark Langsdorf 52cc0e4c5f driver core: class: make namespace and get_ownership take const *
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2178302

commit fa627348cfc7fb174468d88756b83c2d97890b07
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Sat, 1 Oct 2022 18:54:26 +0200

The callbacks in struct class namespace() and get_ownership() do not
modify the struct device passed to them, so mark the pointer as constant
and fix up all callbacks in the kernel to have the correct function
signature.

This helps make it more obvious what calls and callbacks do, and do not,
modify structures passed to them.

Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Link: https://lore.kernel.org/r/20221001165426.2690912-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Mark Langsdorf <mlangsdo@redhat.com>
2023-06-08 12:20:28 -04:00
Jan Stancek 8e94775eed Merge: CNB: rebase/update devlink for RHEL 9.3
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/2191

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2172273
Tested: selftests, basic devlink features on ice and mlx5
Depends: https://bugzilla.redhat.com/show_bug.cgi?id=2175249
Depends: https://bugzilla.redhat.com/show_bug.cgi?id=2175250
Depends: https://bugzilla.redhat.com/show_bug.cgi?id=2176150

Update devlink up to v6.3.

Signed-off-by: Petr Oros <poros@redhat.com>

Approved-by: Ivan Vecera <ivecera@redhat.com>
Approved-by: Íñigo Huguet <ihuguet@redhat.com>
Approved-by: José Ignacio Tornos Martínez <jtornosm@redhat.com>
Approved-by: Herbert Xu <zxu@redhat.com>

Signed-off-by: Jan Stancek <jstancek@redhat.com>
2023-04-27 07:47:22 +02:00
Petr Oros 5cad4f8fa8 net: devlink: reintroduce ndo_get_devlink_port
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2172273
Upstream status: RHEL only

In the current upstream implementation, the devlink_port pointer is
assigned to netdevice using the SET_NETDEV_DEVLINK_PORT macro.
For most drivers, this is not a problem and everything remains as it
was in the past. The ICE driver is an exception.

In the old implementation ice_get_devlink_port returns devlink_port
only in switchdev mode. Because phys_port_name_show() and
phys_switch_id_show() use ndo_get_devlink_port. devlink_port was
invisible and not used to generate the interface name.
To preserve the interface naming scheme, we need to preserve this
behavior for the ice driver.

- re-implement ice_get_devlink_port which was deleted in commit
77df1db80da384 ("net: remove unused ndo_get_devlink_port")
- use __rh_deprecated_ndo_get_devlink_port in phys_port_name_show() a
phys_switch_id_show.

This change preserves the old behavior for phys_port_id and phys_port_name
sysfs files for the ice driver.

Signed-off-by: Petr Oros <poros@redhat.com>
2023-04-03 14:04:39 +02:00
Petr Oros 5b39143e64 net: devlink: use devlink_port pointer instead of ndo_get_devlink_port
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2172273

Upstream commit(s):
commit 8eba37f7e9bc82fac08f31d318e36f341494442d
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Wed Nov 2 17:02:09 2022 +0100

    net: devlink: use devlink_port pointer instead of ndo_get_devlink_port

    Use newly introduced devlink_port pointer instead of getting it calling
    to ndo_get_devlink_port op.

    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2023-04-03 10:57:13 +02:00
Íñigo Huguet 3a91b473a8 net: rename reference+tracking helpers
Bugzilla: https://bugzilla.redhat.com/2175258

Conflicts:
 - Removed chunks of unsupported protocol AX.25
 - Renamed the funtions also in ipvlan. Commit 40b9d1ab63f5 ("ipvlan: hold lower
   dev to avoid possible use-after-free") was backported out of order so it had
   to use the old functions names.

commit d62607c3fe45911b2331fac073355a8c914bbde2
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Tue Jun 7 21:39:55 2022 -0700

    net: rename reference+tracking helpers

    Netdev reference helpers have a dev_ prefix for historic
    reasons. Renaming the old helpers would be too much churn
    but we can rename the tracking ones which are relatively
    recent and should be the default for new code.

    Rename:
     dev_hold_track()    -> netdev_hold()
     dev_put_track()     -> netdev_put()
     dev_replace_track() -> netdev_ref_replace()

    Link: https://lore.kernel.org/r/20220608043955.919359-1-kuba@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
2023-03-23 16:19:21 +01:00
Paolo Abeni 31e1e15d23 net: make default_rps_mask a per netns attribute
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2168875
Tested: LNST, Tier1

Upstream commit:
commit 50bcfe8df7c73ce51762f65d218b4ef0cc5da3ee
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Fri Feb 17 13:28:49 2023 +0100

    net: make default_rps_mask a per netns attribute

    That really was meant to be a per netns attribute from the beginning.

    The idea is that once proper isolation is in place in the main
    namespace, additional demux in the child namespaces will be redundant.
    Let's make child netns default rps mask empty by default.

    To avoid bloating the netns with a possibly large cpumask, allocate
    it on-demand during the first write operation.

    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-03-03 14:15:15 +01:00
Paolo Abeni 1bbded2685 net: introduce default_rps_mask netns attribute
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2168875
Tested: LNST, Tier1

Upstream commit:
commit 605cfa1b1090b5d9e227d8a8f7d08fdd04f07724
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Tue Feb 7 19:44:57 2023 +0100

    net: introduce default_rps_mask netns attribute

    If RPS is enabled, this allows configuring a default rps
    mask, which is effective since receive queue creation time.

    A default RPS mask allows the system admin to ensure proper
    isolation, avoiding races at network namespace or device
    creation time.

    The default RPS mask is initially empty, and can be
    modified via a newly added sysctl entry.

    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Simon Horman <simon.horman@corigine.com>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-03-03 14:15:15 +01:00
Paolo Abeni 6ac38a092f net-sysctl: factor-out rpm mask manipulation helpers
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2168875
Tested: LNST, Tier1

Upstream commit:
commit 370ca718fd5e1fd45ccfdf7a9d76d010f561e607
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Tue Feb 7 19:44:56 2023 +0100

    net-sysctl: factor-out rpm mask manipulation helpers

    Will simplify the following patch. No functional change
    intended.

    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Simon Horman <simon.horman@corigine.com>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-03-03 14:15:15 +01:00
Beniamino Galvani 03fdcdd2e5 net-sysfs: add check for netdevice being present to speed_show
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2148349
Tested: by reporter

commit 4224cfd7fb6523f7a9d1c8bb91bb5df1e38eb624
Author: suresh kumar <suresh2514@gmail.com>
Date:   Thu Feb 17 07:25:18 2022 +0530

    net-sysfs: add check for netdevice being present to speed_show

    When bringing down the netdevice or system shutdown, a panic can be
    triggered while accessing the sysfs path because the device is already
    removed.

        [  755.549084] mlx5_core 0000:12:00.1: Shutdown was called
        [  756.404455] mlx5_core 0000:12:00.0: Shutdown was called
        ...
        [  757.937260] BUG: unable to handle kernel NULL pointer dereference at           (null)
        [  758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280

        crash> bt
        ...
        PID: 12649  TASK: ffff8924108f2100  CPU: 1   COMMAND: "amsd"
        ...
         #9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778
            [exception RIP: dma_pool_alloc+0x1ab]
            RIP: ffffffff8ee11acb  RSP: ffff89240e1a3968  RFLAGS: 00010046
            RAX: 0000000000000246  RBX: ffff89243d874100  RCX: 0000000000001000
            RDX: 0000000000000000  RSI: 0000000000000246  RDI: ffff89243d874090
            RBP: ffff89240e1a39c0   R8: 000000000001f080   R9: ffff8905ffc03c00
            R10: ffffffffc04680d4  R11: ffffffff8edde9fd  R12: 00000000000080d0
            R13: ffff89243d874090  R14: ffff89243d874080  R15: 0000000000000000
            ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
        #10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core]
        #11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core]
        #12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core]
        #13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core]
        #14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core]
        #15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core]
        #16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core]
        #17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46
        #18 [ffff89240e1a3d48] speed_show at ffffffff8f277208
        #19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3
        #20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf
        #21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596
        #22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10
        #23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5
        #24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff
        #25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f
        #26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92

        crash> net_device.state ffff89443b0c0000
          state = 0x5  (__LINK_STATE_START| __LINK_STATE_NOCARRIER)

    To prevent this scenario, we also make sure that the netdevice is present.

    Signed-off-by: suresh kumar <suresh2514@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Beniamino Galvani <bgalvani@redhat.com>
2023-02-14 11:10:39 +01:00
Íñigo Huguet 7141c0dcb7 net: wrap the wireless pointers in struct net_device in an ifdef
Bugzilla: https://bugzilla.redhat.com/2143376

Conflicts: netdevice.h: context conflict, missing d6c6d0bb2cb3 ("net: remove
references to CONFIG_IRDA in network header files")

commit c304eddcecfe2513ff98ce3ae97d1c196d82ba08
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Thu May 19 13:20:54 2022 -0700

    net: wrap the wireless pointers in struct net_device in an ifdef
    
    Most protocol-specific pointers in struct net_device are under
    a respective ifdef. Wireless is the notable exception. Since
    there's a sizable number of custom-built kernels for datacenter
    workloads which don't build wireless it seems reasonable to
    ifdefy those pointers as well.
    
    While at it move IPv4 and IPv6 pointers up, those are special
    for obvious reasons.
    
    Acked-by: Johannes Berg <johannes@sipsolutions.net>
    Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> # ieee802154
    Acked-by: Sven Eckelmann <sven@narfation.org>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
2022-11-18 11:45:54 +01:00
Ivan Vecera 5a0eef8003 net: extract a few internals from netdevice.h
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2128180

Conflicts:
- slightly modified due to missing 0b5c21bbc01e ("net: ensure
  net_todo_list is processed quickly") and d07b26f5bbea ("dev_addr:
  add a modification check")

commit 6264f58ca0e54e41d63c2d00334a48bac28fbf30
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Wed Apr 6 14:37:54 2022 -0700

    net: extract a few internals from netdevice.h

    There's a number of functions and static variables used
    under net/core/ but not from the outside. We currently
    dump most of them into netdevice.h. That bad for many
    reasons:
     - netdevice.h is very cluttered, hard to figure out
       what the APIs are;
     - netdevice.h is very long;
     - we have to touch netdevice.h more which causes expensive
       incremental builds.

    Create a header under net/core/ and move some declarations.

    The new header is also a bit of a catch-all but that's
    fine, if we create more specific headers people will
    likely over-think where their declaration fit best.
    And end up putting them in netdevice.h, again.

    More work should be done on splitting netdevice.h into more
    targeted headers, but that'd be more time consuming so small
    steps.

    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2022-10-18 10:27:16 +02:00
Ivan Vecera 616826f600 net: remove .ndo_change_proto_down
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2128180

Conflicts:
- small context conflict due to existing backport of 3b89b511ea0c ("net:
  fix IFF_TX_SKB_NO_LINEAR definition")

commit 2106efda785b55a8957efed9a52dfa28ee0d7280
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Mon Nov 22 17:24:47 2021 -0800

    net: remove .ndo_change_proto_down

    .ndo_change_proto_down was added seemingly to enable out-of-tree
    implementations. Over 2.5yrs later we still have no real users
    upstream. Hardwire the generic implementation for now, we can
    revert once real users materialize. (rocker is a test vehicle,
    not a user.)

    We need to drop the optimization on the sysfs side, because
    unlike ndos priv_flags will be changed at runtime, so we'd
    need READ_ONCE/WRITE_ONCE everywhere..

    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2022-10-03 17:02:55 +02:00
Herton R. Krzesinski ce943ecba6 Merge: Upgrade drivers/of to support Arm SystemReady IR
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1082

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2071840

While POWER systems already use Open Firmware (drivers/of), it is
not exactly the same as full DeviceTree support.  Arm SystemReady IR
platforms use DeviceTree extensively.  This patch set brings this
subsystem up to date with Linux 5.19 so that all of the DeviceTree
functionality needed for Arm SystemReady IR can be supported.

NB: this is one of a series of patch sets needed to fully support
Arm SystemReady IR in the kernel; drivers/base and drivers/firmware
are also being updated and will end up depending on this patch set.
Individual drivers for specific SystemReady IR compliant platforms
will also be needed.

v3:
* fixed incorrent commit ids in the last three commits

v2:
* Added some labels for Omitted-fixes.
* Added fixes noted in Mark's review

Signed-off-by: Al Stone <ahs3@redhat.com>

Approved-by: Jerry Snitselaar <jsnitsel@redhat.com>
Approved-by: Mark Langsdorf <mlangsdo@redhat.com>
Approved-by: Rafael Aquini <aquini@redhat.com>
Approved-by: Torez Smith <torez@redhat.com>
Approved-by: Chris von Recklinghausen <crecklin@redhat.com>
Approved-by: Lyude Paul <lyude@redhat.com>
Approved-by: Mark Salter <msalter@redhat.com>
Approved-by: David Arcari <darcari@redhat.com>
Approved-by: Jiri Benc <jbenc@redhat.com>
Approved-by: Jarod Wilson <jarod@redhat.com>
Approved-by: Andrea Claudi <aclaudi@redhat.com>

Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
2022-08-22 14:07:55 +00:00
Patrick Talbert 5f85d33e47 Merge: net/core: backport fixes from upstream for 9.1 P2
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1057

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2101278

The latest path depends on the second latest patch.

Signed-off-by: Hangbin Liu <haliu@redhat.com>

Approved-by: Jarod Wilson <jarod@redhat.com>
Approved-by: Florian Westphal <fwestpha@redhat.com>
Approved-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Patrick Talbert <ptalbert@redhat.com>
2022-07-14 12:07:49 +02:00
Al Stone 1a4f49f7ab of: net: move of_net under net/
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2071840
Tested: This is one of a series of patch sets to enable Arm SystemReady IR
 support in the kernel for NXP i.MX8 platforms.  At this stage, this
 has been tested by ensuring we can survive the CI/CD loop -- i.e.,
 that we have not broken anything else, and a simple boot test.  When
 sufficient drivers have been brought in for i.MX8M, we will be able
 to run further tests.

Conflicts:
    drivers/net/ethernet/litex/Kconfig

    This driver is not included in this source tree.

commit e330fb14590c5c80f7195c3d8c9b4bcf79e1a5cd
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Wed Oct 6 18:06:54 2021 -0700

    of: net: move of_net under net/

    Rob suggests to move of_net.c from under drivers/of/ somewhere
    to the networking code.

    Suggested-by: Rob Herring <robh@kernel.org>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Reviewed-by: Rob Herring <robh@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    (cherry picked from commit e330fb14590c5c80f7195c3d8c9b4bcf79e1a5cd)

Signed-off-by: Al Stone <ahs3@redhat.com>
2022-07-01 16:12:53 -06:00
Hangbin Liu e4c3a2b313 net: fix data-race in dev_isalive()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2101278
Upstream Status: net.git commit cc26c2661fef

Conflicts: context conflicts due to missing ae68db14b616 ("net: transition
netdev reg state earlier in run_todo") and 86213f80da1b ("net: avoid quadratic
behavior in netdev_wait_allrefs_any()")

commit cc26c2661fefea215f41edb665193324a5f99021
Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Jun 16 00:34:34 2022 -0700

    net: fix data-race in dev_isalive()

    dev_isalive() is called under RTNL or dev_base_lock protection.

    This means that changes to dev->reg_state should be done with both locks held.

    syzbot reported:

    BUG: KCSAN: data-race in register_netdevice / type_show

    write to 0xffff888144ecf518 of 1 bytes by task 20886 on cpu 0:
    register_netdevice+0xb9f/0xdf0 net/core/dev.c:10050
    lapbeth_new_device drivers/net/wan/lapbether.c:414 [inline]
    lapbeth_device_event+0x4a0/0x6c0 drivers/net/wan/lapbether.c:456
    notifier_call_chain kernel/notifier.c:87 [inline]
    raw_notifier_call_chain+0x53/0xb0 kernel/notifier.c:455
    __dev_notify_flags+0x1d6/0x3a0
    dev_change_flags+0xa2/0xc0 net/core/dev.c:8607
    do_setlink+0x778/0x2230 net/core/rtnetlink.c:2780
    __rtnl_newlink net/core/rtnetlink.c:3546 [inline]
    rtnl_newlink+0x114c/0x16a0 net/core/rtnetlink.c:3593
    rtnetlink_rcv_msg+0x811/0x8c0 net/core/rtnetlink.c:6089
    netlink_rcv_skb+0x13e/0x240 net/netlink/af_netlink.c:2501
    rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:6107
    netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
    netlink_unicast+0x58a/0x660 net/netlink/af_netlink.c:1345
    netlink_sendmsg+0x661/0x750 net/netlink/af_netlink.c:1921
    sock_sendmsg_nosec net/socket.c:714 [inline]
    sock_sendmsg net/socket.c:734 [inline]
    __sys_sendto+0x21e/0x2c0 net/socket.c:2119
    __do_sys_sendto net/socket.c:2131 [inline]
    __se_sys_sendto net/socket.c:2127 [inline]
    __x64_sys_sendto+0x74/0x90 net/socket.c:2127
    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
    entry_SYSCALL_64_after_hwframe+0x46/0xb0

    read to 0xffff888144ecf518 of 1 bytes by task 20423 on cpu 1:
    dev_isalive net/core/net-sysfs.c:38 [inline]
    netdev_show net/core/net-sysfs.c:50 [inline]
    type_show+0x24/0x90 net/core/net-sysfs.c:112
    dev_attr_show+0x35/0x90 drivers/base/core.c:2095
    sysfs_kf_seq_show+0x175/0x240 fs/sysfs/file.c:59
    kernfs_seq_show+0x75/0x80 fs/kernfs/file.c:162
    seq_read_iter+0x2c3/0x8e0 fs/seq_file.c:230
    kernfs_fop_read_iter+0xd1/0x2f0 fs/kernfs/file.c:235
    call_read_iter include/linux/fs.h:2052 [inline]
    new_sync_read fs/read_write.c:401 [inline]
    vfs_read+0x5a5/0x6a0 fs/read_write.c:482
    ksys_read+0xe8/0x1a0 fs/read_write.c:620
    __do_sys_read fs/read_write.c:630 [inline]
    __se_sys_read fs/read_write.c:628 [inline]
    __x64_sys_read+0x3e/0x50 fs/read_write.c:628
    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
    entry_SYSCALL_64_after_hwframe+0x46/0xb0

    value changed: 0x00 -> 0x01

    Reported by Kernel Concurrency Sanitizer on:
    CPU: 1 PID: 20423 Comm: udevd Tainted: G W 5.19.0-rc2-syzkaller-dirty #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

    Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reported-by: syzbot <syzkaller@googlegroups.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Hangbin Liu <haliu@redhat.com>
2022-06-27 16:39:41 +08:00
Ivan Vecera ca7263fa56 net: add net device refcount tracker to struct netdev_queue
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2096377

commit 0b688f24b7d611db3a02f3d4ab562d049c78a17d
Author: Eric Dumazet <edumazet@google.com>
Date:   Sat Dec 4 20:21:59 2021 -0800

    net: add net device refcount tracker to struct netdev_queue

    This will help debugging pesky netdev reference leaks.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2022-06-13 18:36:48 +02:00
Ivan Vecera cb1500167b net: add net device refcount tracker to struct netdev_rx_queue
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2096377

commit 80e8921b2b72c300ca56a01729004d30bedb82cd
Author: Eric Dumazet <edumazet@google.com>
Date:   Sat Dec 4 20:21:58 2021 -0800

    net: add net device refcount tracker to struct netdev_rx_queue

    This helps debugging net device refcount leaks.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2022-06-13 18:36:48 +02:00
Patrick Talbert d46e36b09c Merge: sched/isolation: Split housekeeping cpumask per isolation features
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/671

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065222
Depends: https://bugzilla.redhat.com/show_bug.cgi?id=2065994
Tested: Setup isolation and ran scheduler tests, checked that housekeeping
looked right (tasks offloaded from isolated cpus to HK ones etc).

Split the housekeeping flags into finer granularity in preparation
for allowing them to be configured dynamically. There should not be
much functional change.

Signed-off-by: Phil Auld <pauld@redhat.com>

Approved-by: Jiri Benc <jbenc@redhat.com>
Approved-by: Waiman Long <longman@redhat.com>
Approved-by: Prarit Bhargava <prarit@redhat.com>
Approved-by: Paolo Bonzini <bonzini@gnu.org>
Approved-by: Wander Lairson Costa <wander@redhat.com>
Approved-by: David Arcari <darcari@redhat.com>

Signed-off-by: Patrick Talbert <ptalbert@redhat.com>
2022-05-11 08:42:56 +02:00
Ivan Vecera 839a21abc8 net: use an atomic_long_t for queue->trans_timeout
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2073453

commit 8160fb43d55d26d64607fd32fe69185a5f5fe41f
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Nov 16 19:29:21 2021 -0800

    net: use an atomic_long_t for queue->trans_timeout

    tx_timeout_show() assumed dev_watchdog() would stop all
    the queues, to fetch queue->trans_timeout under protection
    of the queue->_xmit_lock.

    As we want to no longer disrupt transmits, we use an
    atomic_long_t instead.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: david decotigny <david.decotigny@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2022-04-13 14:36:32 +02:00
Phil Auld 1cf795c344 sched/isolation: Use single feature type while referring to housekeeping cpumask
Bugzilla: http://bugzilla.redhat.com/2065222

commit 04d4e665a60902cf36e7ad39af1179cb5df542ad
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Mon Feb 7 16:59:06 2022 +0100

    sched/isolation: Use single feature type while referring to housekeeping cpumask

    Refer to housekeeping APIs using single feature types instead of flags.
    This prevents from passing multiple isolation features at once to
    housekeeping interfaces, which soon won't be possible anymore as each
    isolation features will have their own cpumask.

    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Reviewed-by: Juri Lelli <juri.lelli@redhat.com>
    Reviewed-by: Phil Auld <pauld@redhat.com>
    Link: https://lore.kernel.org/r/20220207155910.527133-5-frederic@kernel.org

Signed-off-by: Phil Auld <pauld@redhat.com>
2022-03-31 10:40:39 -04:00
Phil Auld 6f72d789ba net: Decouple HK_FLAG_WQ and HK_FLAG_DOMAIN cpumask fetch
Bugzilla: http://bugzilla.redhat.com/2065222

commit c8fb9f22ae22dbe06a43b77717299e1c3e632d5c
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Mon Feb 7 16:59:05 2022 +0100

    net: Decouple HK_FLAG_WQ and HK_FLAG_DOMAIN cpumask fetch

    To prepare for supporting each feature of the housekeeping cpumask
    toward cpuset, prepare each of the HK_FLAG_* entries to move to their
    own cpumask with enforcing to fetch them individually. The new
    constraint is that multiple HK_FLAG_* entries can't be mixed together
    anymore in a single call to housekeeping cpumask().

    This will later allow, for example, to runtime modify the cpulist passed
    through "isolcpus=", "nohz_full=" and "rcu_nocbs=" kernel boot
    parameters.

    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Reviewed-by: Juri Lelli <juri.lelli@redhat.com>
    Reviewed-by: Phil Auld <pauld@redhat.com>
    Link: https://lore.kernel.org/r/20220207155910.527133-4-frederic@kernel.org

Signed-off-by: Phil Auld <pauld@redhat.com>
2022-03-31 10:39:05 -04:00
Herton R. Krzesinski b8f20958b7 Merge: net: core stable backport for rhel 9.0
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/212

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2028276
Tested: LNST, Tier1

This includes a few critical bugfixes for the core network stack.

Notably it includes 7f678def99d2 ("skb_expand_head() adjust skb->truesize incorrectly") and a whole series of pre-requisites. The bug addressed there is nasty and present even prior to skb_expand_head() introduction.

commit 719c57197010 ("net: make napi_disable() symmetric with enable") instead has been explicitly excluded, as it's not really a fix, is known to introduce problems and it's still quite new

Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Approved-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Approved-by: Jarod Wilson <jarod@redhat.com>
Approved-by: Antoine Tenart <atenart@redhat.com>
Approved-by: Guillaume Nault <gnault@redhat.com>
Approved-by: Jiri Benc <jbenc@redhat.com>

Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
2022-01-14 16:53:21 +00:00
Antoine Tenart 7231970d1d net-sysfs: try not to restart the syscall if it will fail eventually
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2030634
Upstream Status: linux.git
Tested: test script in bz

commit 146e5e733310379f51924111068f08a3af0db830
Author: Antoine Tenart <atenart@kernel.org>
Date:   Thu Oct 7 16:00:51 2021 +0200

    net-sysfs: try not to restart the syscall if it will fail eventually

    Due to deadlocks in the networking subsystem spotted 12 years ago[1],
    a workaround was put in place[2] to avoid taking the rtnl lock when it
    was not available and restarting the syscall (back to VFS, letting
    userspace spin). The following construction is found a lot in the net
    sysfs and sysctl code:

      if (!rtnl_trylock())
              return restart_syscall();

    This can be problematic when multiple userspace threads use such
    interfaces in a short period, making them to spin a lot. This happens
    for example when adding and moving virtual interfaces: userspace
    programs listening on events, such as systemd-udevd and NetworkManager,
    do trigger actions reading files in sysfs. It gets worse when a lot of
    virtual interfaces are created concurrently, say when creating
    containers at boot time.

    Returning early without hitting the above pattern when the syscall will
    fail eventually does make things better. While it is not a fix for the
    issue, it does ease things.

    [1] https://lore.kernel.org/netdev/49A4D5D5.5090602@trash.net/
        https://lore.kernel.org/netdev/m14oyhis31.fsf@fess.ebiederm.org/
        and https://lore.kernel.org/netdev/20090226084924.16cb3e08@nehalam/
    [2] Rightfully, those deadlocks are *hard* to solve.

    Signed-off-by: Antoine Tenart <atenart@kernel.org>
    Reviewed-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2021-12-16 14:53:37 +01:00
Paolo Abeni 529ebf03fa net-sysfs: initialize uid and gid before calling net_ns_get_ownership
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2028276
Tested: LNST, Tier1

Upstream commit:
commit f7a1e76d0f608961cc2fc681f867a834f2746bce
Author: Xin Long <lucien.xin@gmail.com>
Date:   Mon Oct 25 02:31:48 2021 -0400

    net-sysfs: initialize uid and gid before calling net_ns_get_ownership

    Currently in net_ns_get_ownership() it may not be able to set uid or gid
    if make_kuid or make_kgid returns an invalid value, and an uninit-value
    issue can be triggered by this.

    This patch is to fix it by initializing the uid and gid before calling
    net_ns_get_ownership(), as it does in kobject_get_ownership()

    Fixes: e6dee9f389 ("net-sysfs: add netdev_change_owner()")
    Reported-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Xin Long <lucien.xin@gmail.com>
    Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2021-12-09 10:44:31 +01:00
Antoine Tenart 7f08ec6e04 net-sysfs: remove possible sleep from an RCU read-side critical section
xps_queue_show is mostly made of an RCU read-side critical section and
calls bitmap_zalloc with GFP_KERNEL in the middle of it. That is not
allowed as this call may sleep and such behaviours aren't allowed in RCU
read-side critical sections. Fix this by using GFP_NOWAIT instead.

Fixes: 5478fcd0f4 ("net: embed nr_ids in the xps maps")
Reported-by: kernel test robot <oliver.sang@intel.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-22 13:28:13 -07:00
Antoine Tenart 2db6cdaeba net-sysfs: move the xps cpus/rxqs retrieval in a common function
Most of the xps_cpus_show and xps_rxqs_show functions share the same
logic. Having it in two different functions does not help maintenance.
This patch moves their common logic into a new function, xps_queue_show,
to improve this.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 14:56:22 -07:00
Antoine Tenart d7be87a687 net-sysfs: move the rtnl unlock up in the xps show helpers
Now that nr_ids and num_tc are stored in the xps dev_maps, which are RCU
protected, we do not have the need to protect the maps in the rtnl lock.
Move the rtnl unlock up so we reduce the rtnl locking section.

We also increase the reference count on the subordinate device if any,
as we don't want this device to be freed while we use it (now that the
rtnl lock isn't protecting it in the whole function).

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 14:56:22 -07:00
Antoine Tenart 044ab86d43 net: move the xps maps to an array
Move the xps maps (xps_cpus_map and xps_rxqs_map) to an array in
net_device. That will simplify a lot the code removing the need for lots
of if/else conditionals as the correct map will be available using its
offset in the array.

This should not modify the xps maps behaviour in any way.

Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 14:56:22 -07:00
Antoine Tenart 6f36158e05 net: remove the xps possible_mask
Remove the xps possible_mask. It was an optimization but we can just
loop from 0 to nr_ids now that it is embedded in the xps dev_maps. That
simplifies the code a bit.

Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 14:56:22 -07:00
Antoine Tenart 5478fcd0f4 net: embed nr_ids in the xps maps
Embed nr_ids (the number of cpu for the xps cpus map, and the number of
rxqs for the xps cpus map) in dev_maps. That will help not accessing out
of bound memory if those values change after dev_maps was allocated.

Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 14:56:22 -07:00
Antoine Tenart 255c04a87f net: embed num_tc in the xps maps
The xps cpus/rxqs map is accessed using dev->num_tc, which is used when
allocating the map. But later updates of dev->num_tc can lead to having
a mismatch between the maps and how they're accessed. In such cases the
map values do not make any sense and out of bound accesses can occur
(that can be easily seen using KASAN).

This patch aims at fixing this by embedding num_tc into the maps, using
the value at the time the map is created. This brings two improvements:
- The maps can be accessed using the embedded num_tc, so we know for
  sure we won't have out of bound accesses.
- Checks can be made before accessing the maps so we know the values
  retrieved will make sense.

We also update __netif_set_xps_queue to conditionally copy old maps from
dev_maps in the new one only if the number of traffic classes from both
maps match.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 14:56:22 -07:00
Antoine Tenart 73f5e52b15 net-sysfs: make xps_cpus_show and xps_rxqs_show consistent
Make the implementations of xps_cpus_show and xps_rxqs_show to converge,
as the two share the same logic but diverted over time. This should not
modify their behaviour but will help future changes and improve
maintenance.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 14:56:22 -07:00
Antoine Tenart d9a063d207 net-sysfs: store the return of get_netdev_queue_index in an unsigned int
In net-sysfs, get_netdev_queue_index returns an unsigned int. Some of
its callers use an unsigned long to store the returned value. Update the
code to be consistent, this should only be cosmetic.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 14:56:22 -07:00