JIRA: https://issues.redhat.com/browse/RHEL-63914
Upstream Status: linux.git
CVE: CVE-2024-50018
commit 08062af0a52107a243f7608fd972edb54ca5b7f8
Author: Joe Damato <jdamato@fastly.com>
Date: Wed Sep 4 15:34:30 2024 +0000
net: napi: Prevent overflow of napi_defer_hard_irqs
In commit 6f8b12d661 ("net: napi: add hard irqs deferral feature")
napi_defer_irqs was added to net_device and napi_defer_irqs_count was
added to napi_struct, both as type int.
This value never goes below zero, so there is not reason for it to be a
signed int. Change the type for both from int to u32, and add an
overflow check to sysfs to limit the value to S32_MAX.
The limit of S32_MAX was chosen because the practical limit before this
patch was S32_MAX (anything larger was an overflow) and thus there are
no behavioral changes introduced. If the extra bit is needed in the
future, the limit can be raised.
Before this patch:
$ sudo bash -c 'echo 2147483649 > /sys/class/net/eth4/napi_defer_hard_irqs'
$ cat /sys/class/net/eth4/napi_defer_hard_irqs
-2147483647
After this patch:
$ sudo bash -c 'echo 2147483649 > /sys/class/net/eth4/napi_defer_hard_irqs'
bash: line 0: echo: write error: Numerical result out of range
Similarly, /sys/class/net/XXXXX/tx_queue_len is defined as unsigned:
include/linux/netdevice.h: unsigned int tx_queue_len;
And has an overflow check:
dev_change_tx_queue_len(..., unsigned long new_len):
if (new_len != (unsigned int)new_len)
return -ERANGE;
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240904153431.307932-1-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-57750
commit a699781c79ecf6cfe67fb00a0331b4088c7c8466
Author: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Date: Fri Aug 23 16:26:58 2024 +1000
ethtool: check device is present when getting link settings
A sysfs reader can race with a device reset or removal, attempting to
read device state when the device is not actually present. eg:
[exception RIP: qed_get_current_link+17]
#8 [ffffb9e4f2907c48] qede_get_link_ksettings at ffffffffc07a994a [qede]
#9 [ffffb9e4f2907cd8] __rh_call_get_link_ksettings at ffffffff992b01a3
#10 [ffffb9e4f2907d38] __ethtool_get_link_ksettings at ffffffff992b04e4
#11 [ffffb9e4f2907d90] duplex_show at ffffffff99260300
#12 [ffffb9e4f2907e38] dev_attr_show at ffffffff9905a01c
#13 [ffffb9e4f2907e50] sysfs_kf_seq_show at ffffffff98e0145b
#14 [ffffb9e4f2907e68] seq_read at ffffffff98d902e3
#15 [ffffb9e4f2907ec8] vfs_read at ffffffff98d657d1
#16 [ffffb9e4f2907f00] ksys_read at ffffffff98d65c3f
#17 [ffffb9e4f2907f38] do_syscall_64 at ffffffff98a052fb
crash> struct net_device.state ffff9a9d21336000
state = 5,
state 5 is __LINK_STATE_START (0b1) and __LINK_STATE_NOCARRIER (0b100).
The device is not present, note lack of __LINK_STATE_PRESENT (0b10).
This is the same sort of panic as observed in commit 4224cfd7fb65
("net-sysfs: add check for netdevice being present to speed_show").
There are many other callers of __ethtool_get_link_ksettings() which
don't have a device presence check.
Move this check into ethtool to protect all callers.
Fixes: d519e17e2d ("net: export device speed and duplex via sysfs")
Fixes: 4224cfd7fb65 ("net-sysfs: add check for netdevice being present to speed_show")
Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Link: https://patch.msgid.link/8bae218864beaa44ed01628140475b9bf641c5b0.1724393671.git.jamie.bainbridge@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-57750
commit c1742dcb6bda5fd535fbaa2145f0a180bc329aa6
Author: Eric Dumazet <edumazet@google.com>
Date: Thu May 2 17:39:26 2024 +0000
net: no longer acquire RTNL in threaded_show()
dev->threaded can be read locklessly, if we add
corresponding READ_ONCE()/WRITE_ONCE() annotations.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240502173926.2010646-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-59100
commit e154bb7a6ebbe5414accb5d94dc5ba80c204ea64
Author: Eric Dumazet <edumazet@google.com>
Date: Tue Feb 13 06:32:40 2024 +0000
net-sysfs: convert netstat_show() to RCU
dev_get_stats() can be called from RCU, there is no need
to acquire dev_base_lock.
Change dev_isalive() comment to reflect we no longer use
dev_base_lock from net/core/net-sysfs.c
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-59100
commit 004d138364fd10dd5ff8ceb54cfdc2d792a7b338
Author: Eric Dumazet <edumazet@google.com>
Date: Tue Feb 13 06:32:39 2024 +0000
net-sysfs: convert dev->operstate reads to lockless ones
operstate_show() can omit dev_base_lock acquisition only
to read dev->operstate.
Annotate accesses to dev->operstate.
Writers still acquire dev_base_lock for mutual exclusion.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-59100
commit c7d52737e7ebd31cc5fef46380d94b58becf9479
Author: Eric Dumazet <edumazet@google.com>
Date: Tue Feb 13 06:32:38 2024 +0000
net-sysfs: use dev_addr_sem to remove races in address_show()
Using dev_base_lock is not preventing from reading garbage.
Use dev_addr_sem instead.
v4: place dev_addr_sem extern in net/core/dev.h (Jakub Kicinski)
Link: https://lore.kernel.org/netdev/20240212175845.10f6680a@kernel.org/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-59100
commit 12692e3df2dacf2993c56aa23b6d3de921a5bdff
Author: Eric Dumazet <edumazet@google.com>
Date: Tue Feb 13 06:32:37 2024 +0000
net-sysfs: convert netdev_show() to RCU
Make clear dev_isalive() can be called with RCU protection.
Then convert netdev_show() to RCU, to remove dev_base_lock
dependency.
Also add RCU to broadcast_show().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-59100
commit 1c07dbb0cccfe85060b6eb089db3d6bfeb6aaf31
Author: Eric Dumazet <edumazet@google.com>
Date: Tue Feb 13 06:32:33 2024 +0000
net: annotate data-races around dev->name_assign_type
name_assign_type_show() runs locklessly, we should annotate
accesses to dev->name_assign_type.
Alternative would be to grab devnet_rename_sem semaphore
from name_assign_type_show(), but this would not bring
more accuracy.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-59100
commit bf17b36ccdd5b7b9dd482d7753bcb9aff2d21d39
Author: Johannes Berg <johannes.berg@intel.com>
Date: Wed Dec 6 17:21:23 2023 +0100
net: sysfs: fix locking in carrier read
My previous patch added a call to linkwatch_sync_dev(),
but that of course needs to be called under RTNL, which
I missed earlier, but now saw RCU warnings from.
Fix that by acquiring the RTNL in a similar fashion to
how other files do it here.
Fixes: facd15dfd691 ("net: core: synchronize link-watch when carrier is queried")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20231206172122.859df6ba937f.I9c80608bcfbab171943ff4942b52dbd5e97fe06e@changeid
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-59100
commit facd15dfd69122042502d99ab8c9f888b48ee994
Author: Johannes Berg <johannes.berg@intel.com>
Date: Mon Dec 4 21:47:07 2023 +0100
net: core: synchronize link-watch when carrier is queried
There are multiple ways to query for the carrier state: through
rtnetlink, sysfs, and (possibly) ethtool. Synchronize linkwatch
work before these operations so that we don't have a situation
where userspace queries the carrier state between the driver's
carrier off->on transition and linkwatch running and expects it
to work, when really (at least) TX cannot work until linkwatch
has run.
I previously posted a longer explanation of how this applies to
wireless [1] but with this wireless can simply query the state
before sending data, to ensure the kernel is ready for it.
[1] https://lore.kernel.org/all/346b21d87c69f817ea3c37caceb34f1f56255884.camel@sipsolutions.net/
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20231204214706.303c62768415.I1caedccae72ee5a45c9085c5eb49c145ce1c0dd5@changeid
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-59100
commit 73c2e90a0edc84751c4b95b12fc52051dd60f542
Author: Wang Yufen <wangyufen@huawei.com>
Date: Wed Sep 28 19:49:44 2022 +0800
net-sysfs: Convert to use sysfs_emit() APIs
Follow the advice of the Documentation/filesystems/sysfs.rst and show()
should only use sysfs_emit() or sysfs_emit_at() when formatting the value
to be returned to user space.
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-31916
Conflicts:
* net/core/dev.c
context conflict due to missing commit 2b0cfa6e49566 ("net: add
generic percpu page_pool allocator")
* net/core/sysctl_net_core.c
context conflict due to missing commit 2658b5a8a4eee ("net: introduce
struct net_hotdata")
commit 490a79faf95e705ba0ffd9ebf04a624b379e53c9
Author: Eric Dumazet <edumazet@google.com>
Date: Wed Mar 6 16:00:30 2024 +0000
net: introduce include/net/rps.h
Move RPS related structures and helpers from include/linux/netdevice.h
and include/net/sock.h to a new include file.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240306160031.874438-18-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-31916
Conflicts:
* include/linux/netdevice.h
Adjusted due to KABI reservations made by RHEL
commit 3b3a52715a ("net: exclude BPF/XDP from kABI")
commit 49e47a5b6145d86c30022fe0e949bbb24bae28ba
Author: Jakub Kicinski <kuba@kernel.org>
Date: Wed Aug 2 18:02:29 2023 -0700
net: move struct netdev_rx_queue out of netdevice.h
struct netdev_rx_queue is touched in only a few places
and having it defined in netdevice.h brings in the dependency
on xdp.h, because struct xdp_rxq_info gets embedded in
struct netdev_rx_queue.
In prep for removal of xdp.h from netdevice.h move all
the netdev_rx_queue stuff to a new header.
We could technically break the new header up to avoid
the sysfs.h include but it's so rarely included it
doesn't seem to be worth it at this point.
Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20230803010230.1755386-3-kuba@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-21356
Upstream Status: RHEL-only
rtnl_link_stats and rtnl_link_stats64 are protected by kABI, add 4
reserved fields. We need to use a custom mechanism here, because those
structures are part of uapi.
Signed-off-by: Sabrina Dubroca <sdubroca@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-1023
commit 02a476d932287cf3096f78962ccb70d94d6203c6
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Tue, 22 Nov 2022 17:34:29 +0000
The call, kobject_get_ownership(), does not modify the kobject passed
into it, so make it const. This propagates down into the kobj_type
function callbacks so make the kobject passed into them also const,
ensuring that nothing in the kobject is being changed here.
This helps make it more obvious what calls and callbacks do, and do not,
modify structures passed to them.
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Roopa Prabhu <roopa@nvidia.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: linux-nfs@vger.kernel.org
Cc: bridge@lists.linux-foundation.org
Cc: netdev@vger.kernel.org
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20221121094649.1556002-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Mark Langsdorf <mlangsdo@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2178302
Conflicts:
drivers/base/firmware_loader/sysfs.h - replace the single
line version of to_fw_sysfs with the longer inline version
commit 23680f0b7d7f67a935adb38058110d2d81bbe6ea
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Wed, 23 Nov 2022 13:25:19 +0100
The dev_uevent() in struct class should not be modifying the device that
is passed into it, so mark it as a const * and propagate the function
signature changes out into all relevant subsystems that use this
callback.
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Russ Weight <russell.h.weight@intel.com>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Johan Hovold <johan@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Wolfram Sang <wsa+renesas@sang-engineering.com>
Cc: Raed Salem <raeds@nvidia.com>
Cc: Chen Zhongjin <chenzhongjin@huawei.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Avihai Horon <avihaih@nvidia.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Jakob Koschel <jakobkoschel@gmail.com>
Cc: Antoine Tenart <atenart@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Wang Yufen <wangyufen@huawei.com>
Cc: linux-block@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: linux-nvme@lists.infradead.org
Cc: linux-pm@vger.kernel.org
Cc: linux-rdma@vger.kernel.org
Cc: linux-usb@vger.kernel.org
Cc: linux-wireless@vger.kernel.org
Cc: netdev@vger.kernel.org
Acked-by: Sebastian Reichel <sre@kernel.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20221123122523.1332370-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Mark Langsdorf <mlangsdo@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2178302
commit fa627348cfc7fb174468d88756b83c2d97890b07
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Sat, 1 Oct 2022 18:54:26 +0200
The callbacks in struct class namespace() and get_ownership() do not
modify the struct device passed to them, so mark the pointer as constant
and fix up all callbacks in the kernel to have the correct function
signature.
This helps make it more obvious what calls and callbacks do, and do not,
modify structures passed to them.
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Link: https://lore.kernel.org/r/20221001165426.2690912-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Mark Langsdorf <mlangsdo@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2172273
Upstream status: RHEL only
In the current upstream implementation, the devlink_port pointer is
assigned to netdevice using the SET_NETDEV_DEVLINK_PORT macro.
For most drivers, this is not a problem and everything remains as it
was in the past. The ICE driver is an exception.
In the old implementation ice_get_devlink_port returns devlink_port
only in switchdev mode. Because phys_port_name_show() and
phys_switch_id_show() use ndo_get_devlink_port. devlink_port was
invisible and not used to generate the interface name.
To preserve the interface naming scheme, we need to preserve this
behavior for the ice driver.
- re-implement ice_get_devlink_port which was deleted in commit
77df1db80da384 ("net: remove unused ndo_get_devlink_port")
- use __rh_deprecated_ndo_get_devlink_port in phys_port_name_show() a
phys_switch_id_show.
This change preserves the old behavior for phys_port_id and phys_port_name
sysfs files for the ice driver.
Signed-off-by: Petr Oros <poros@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2175258
Conflicts:
- Removed chunks of unsupported protocol AX.25
- Renamed the funtions also in ipvlan. Commit 40b9d1ab63f5 ("ipvlan: hold lower
dev to avoid possible use-after-free") was backported out of order so it had
to use the old functions names.
commit d62607c3fe45911b2331fac073355a8c914bbde2
Author: Jakub Kicinski <kuba@kernel.org>
Date: Tue Jun 7 21:39:55 2022 -0700
net: rename reference+tracking helpers
Netdev reference helpers have a dev_ prefix for historic
reasons. Renaming the old helpers would be too much churn
but we can rename the tracking ones which are relatively
recent and should be the default for new code.
Rename:
dev_hold_track() -> netdev_hold()
dev_put_track() -> netdev_put()
dev_replace_track() -> netdev_ref_replace()
Link: https://lore.kernel.org/r/20220608043955.919359-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2168875
Tested: LNST, Tier1
Upstream commit:
commit 50bcfe8df7c73ce51762f65d218b4ef0cc5da3ee
Author: Paolo Abeni <pabeni@redhat.com>
Date: Fri Feb 17 13:28:49 2023 +0100
net: make default_rps_mask a per netns attribute
That really was meant to be a per netns attribute from the beginning.
The idea is that once proper isolation is in place in the main
namespace, additional demux in the child namespaces will be redundant.
Let's make child netns default rps mask empty by default.
To avoid bloating the netns with a possibly large cpumask, allocate
it on-demand during the first write operation.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2168875
Tested: LNST, Tier1
Upstream commit:
commit 605cfa1b1090b5d9e227d8a8f7d08fdd04f07724
Author: Paolo Abeni <pabeni@redhat.com>
Date: Tue Feb 7 19:44:57 2023 +0100
net: introduce default_rps_mask netns attribute
If RPS is enabled, this allows configuring a default rps
mask, which is effective since receive queue creation time.
A default RPS mask allows the system admin to ensure proper
isolation, avoiding races at network namespace or device
creation time.
The default RPS mask is initially empty, and can be
modified via a newly added sysctl entry.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2168875
Tested: LNST, Tier1
Upstream commit:
commit 370ca718fd5e1fd45ccfdf7a9d76d010f561e607
Author: Paolo Abeni <pabeni@redhat.com>
Date: Tue Feb 7 19:44:56 2023 +0100
net-sysctl: factor-out rpm mask manipulation helpers
Will simplify the following patch. No functional change
intended.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2143376
Conflicts: netdevice.h: context conflict, missing d6c6d0bb2cb3 ("net: remove
references to CONFIG_IRDA in network header files")
commit c304eddcecfe2513ff98ce3ae97d1c196d82ba08
Author: Jakub Kicinski <kuba@kernel.org>
Date: Thu May 19 13:20:54 2022 -0700
net: wrap the wireless pointers in struct net_device in an ifdef
Most protocol-specific pointers in struct net_device are under
a respective ifdef. Wireless is the notable exception. Since
there's a sizable number of custom-built kernels for datacenter
workloads which don't build wireless it seems reasonable to
ifdefy those pointers as well.
While at it move IPv4 and IPv6 pointers up, those are special
for obvious reasons.
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> # ieee802154
Acked-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2128180
Conflicts:
- slightly modified due to missing 0b5c21bbc01e ("net: ensure
net_todo_list is processed quickly") and d07b26f5bbea ("dev_addr:
add a modification check")
commit 6264f58ca0e54e41d63c2d00334a48bac28fbf30
Author: Jakub Kicinski <kuba@kernel.org>
Date: Wed Apr 6 14:37:54 2022 -0700
net: extract a few internals from netdevice.h
There's a number of functions and static variables used
under net/core/ but not from the outside. We currently
dump most of them into netdevice.h. That bad for many
reasons:
- netdevice.h is very cluttered, hard to figure out
what the APIs are;
- netdevice.h is very long;
- we have to touch netdevice.h more which causes expensive
incremental builds.
Create a header under net/core/ and move some declarations.
The new header is also a bit of a catch-all but that's
fine, if we create more specific headers people will
likely over-think where their declaration fit best.
And end up putting them in netdevice.h, again.
More work should be done on splitting netdevice.h into more
targeted headers, but that'd be more time consuming so small
steps.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2128180
Conflicts:
- small context conflict due to existing backport of 3b89b511ea0c ("net:
fix IFF_TX_SKB_NO_LINEAR definition")
commit 2106efda785b55a8957efed9a52dfa28ee0d7280
Author: Jakub Kicinski <kuba@kernel.org>
Date: Mon Nov 22 17:24:47 2021 -0800
net: remove .ndo_change_proto_down
.ndo_change_proto_down was added seemingly to enable out-of-tree
implementations. Over 2.5yrs later we still have no real users
upstream. Hardwire the generic implementation for now, we can
revert once real users materialize. (rocker is a test vehicle,
not a user.)
We need to drop the optimization on the sysfs side, because
unlike ndos priv_flags will be changed at runtime, so we'd
need READ_ONCE/WRITE_ONCE everywhere..
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1082
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2071840
While POWER systems already use Open Firmware (drivers/of), it is
not exactly the same as full DeviceTree support. Arm SystemReady IR
platforms use DeviceTree extensively. This patch set brings this
subsystem up to date with Linux 5.19 so that all of the DeviceTree
functionality needed for Arm SystemReady IR can be supported.
NB: this is one of a series of patch sets needed to fully support
Arm SystemReady IR in the kernel; drivers/base and drivers/firmware
are also being updated and will end up depending on this patch set.
Individual drivers for specific SystemReady IR compliant platforms
will also be needed.
v3:
* fixed incorrent commit ids in the last three commits
v2:
* Added some labels for Omitted-fixes.
* Added fixes noted in Mark's review
Signed-off-by: Al Stone <ahs3@redhat.com>
Approved-by: Jerry Snitselaar <jsnitsel@redhat.com>
Approved-by: Mark Langsdorf <mlangsdo@redhat.com>
Approved-by: Rafael Aquini <aquini@redhat.com>
Approved-by: Torez Smith <torez@redhat.com>
Approved-by: Chris von Recklinghausen <crecklin@redhat.com>
Approved-by: Lyude Paul <lyude@redhat.com>
Approved-by: Mark Salter <msalter@redhat.com>
Approved-by: David Arcari <darcari@redhat.com>
Approved-by: Jiri Benc <jbenc@redhat.com>
Approved-by: Jarod Wilson <jarod@redhat.com>
Approved-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2071840
Tested: This is one of a series of patch sets to enable Arm SystemReady IR
support in the kernel for NXP i.MX8 platforms. At this stage, this
has been tested by ensuring we can survive the CI/CD loop -- i.e.,
that we have not broken anything else, and a simple boot test. When
sufficient drivers have been brought in for i.MX8M, we will be able
to run further tests.
Conflicts:
drivers/net/ethernet/litex/Kconfig
This driver is not included in this source tree.
commit e330fb14590c5c80f7195c3d8c9b4bcf79e1a5cd
Author: Jakub Kicinski <kuba@kernel.org>
Date: Wed Oct 6 18:06:54 2021 -0700
of: net: move of_net under net/
Rob suggests to move of_net.c from under drivers/of/ somewhere
to the networking code.
Suggested-by: Rob Herring <robh@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e330fb14590c5c80f7195c3d8c9b4bcf79e1a5cd)
Signed-off-by: Al Stone <ahs3@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2096377
commit 0b688f24b7d611db3a02f3d4ab562d049c78a17d
Author: Eric Dumazet <edumazet@google.com>
Date: Sat Dec 4 20:21:59 2021 -0800
net: add net device refcount tracker to struct netdev_queue
This will help debugging pesky netdev reference leaks.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2096377
commit 80e8921b2b72c300ca56a01729004d30bedb82cd
Author: Eric Dumazet <edumazet@google.com>
Date: Sat Dec 4 20:21:58 2021 -0800
net: add net device refcount tracker to struct netdev_rx_queue
This helps debugging net device refcount leaks.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/671
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065222
Depends: https://bugzilla.redhat.com/show_bug.cgi?id=2065994
Tested: Setup isolation and ran scheduler tests, checked that housekeeping
looked right (tasks offloaded from isolated cpus to HK ones etc).
Split the housekeeping flags into finer granularity in preparation
for allowing them to be configured dynamically. There should not be
much functional change.
Signed-off-by: Phil Auld <pauld@redhat.com>
Approved-by: Jiri Benc <jbenc@redhat.com>
Approved-by: Waiman Long <longman@redhat.com>
Approved-by: Prarit Bhargava <prarit@redhat.com>
Approved-by: Paolo Bonzini <bonzini@gnu.org>
Approved-by: Wander Lairson Costa <wander@redhat.com>
Approved-by: David Arcari <darcari@redhat.com>
Signed-off-by: Patrick Talbert <ptalbert@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2073453
commit 8160fb43d55d26d64607fd32fe69185a5f5fe41f
Author: Eric Dumazet <edumazet@google.com>
Date: Tue Nov 16 19:29:21 2021 -0800
net: use an atomic_long_t for queue->trans_timeout
tx_timeout_show() assumed dev_watchdog() would stop all
the queues, to fetch queue->trans_timeout under protection
of the queue->_xmit_lock.
As we want to no longer disrupt transmits, we use an
atomic_long_t instead.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: david decotigny <david.decotigny@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Bugzilla: http://bugzilla.redhat.com/2065222
commit 04d4e665a60902cf36e7ad39af1179cb5df542ad
Author: Frederic Weisbecker <frederic@kernel.org>
Date: Mon Feb 7 16:59:06 2022 +0100
sched/isolation: Use single feature type while referring to housekeeping cpumask
Refer to housekeeping APIs using single feature types instead of flags.
This prevents from passing multiple isolation features at once to
housekeeping interfaces, which soon won't be possible anymore as each
isolation features will have their own cpumask.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Phil Auld <pauld@redhat.com>
Link: https://lore.kernel.org/r/20220207155910.527133-5-frederic@kernel.org
Signed-off-by: Phil Auld <pauld@redhat.com>
Bugzilla: http://bugzilla.redhat.com/2065222
commit c8fb9f22ae22dbe06a43b77717299e1c3e632d5c
Author: Frederic Weisbecker <frederic@kernel.org>
Date: Mon Feb 7 16:59:05 2022 +0100
net: Decouple HK_FLAG_WQ and HK_FLAG_DOMAIN cpumask fetch
To prepare for supporting each feature of the housekeeping cpumask
toward cpuset, prepare each of the HK_FLAG_* entries to move to their
own cpumask with enforcing to fetch them individually. The new
constraint is that multiple HK_FLAG_* entries can't be mixed together
anymore in a single call to housekeeping cpumask().
This will later allow, for example, to runtime modify the cpulist passed
through "isolcpus=", "nohz_full=" and "rcu_nocbs=" kernel boot
parameters.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Phil Auld <pauld@redhat.com>
Link: https://lore.kernel.org/r/20220207155910.527133-4-frederic@kernel.org
Signed-off-by: Phil Auld <pauld@redhat.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/212
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2028276
Tested: LNST, Tier1
This includes a few critical bugfixes for the core network stack.
Notably it includes 7f678def99d2 ("skb_expand_head() adjust skb->truesize incorrectly") and a whole series of pre-requisites. The bug addressed there is nasty and present even prior to skb_expand_head() introduction.
commit 719c57197010 ("net: make napi_disable() symmetric with enable") instead has been explicitly excluded, as it's not really a fix, is known to introduce problems and it's still quite new
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Approved-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Approved-by: Jarod Wilson <jarod@redhat.com>
Approved-by: Antoine Tenart <atenart@redhat.com>
Approved-by: Guillaume Nault <gnault@redhat.com>
Approved-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2030634
Upstream Status: linux.git
Tested: test script in bz
commit 146e5e733310379f51924111068f08a3af0db830
Author: Antoine Tenart <atenart@kernel.org>
Date: Thu Oct 7 16:00:51 2021 +0200
net-sysfs: try not to restart the syscall if it will fail eventually
Due to deadlocks in the networking subsystem spotted 12 years ago[1],
a workaround was put in place[2] to avoid taking the rtnl lock when it
was not available and restarting the syscall (back to VFS, letting
userspace spin). The following construction is found a lot in the net
sysfs and sysctl code:
if (!rtnl_trylock())
return restart_syscall();
This can be problematic when multiple userspace threads use such
interfaces in a short period, making them to spin a lot. This happens
for example when adding and moving virtual interfaces: userspace
programs listening on events, such as systemd-udevd and NetworkManager,
do trigger actions reading files in sysfs. It gets worse when a lot of
virtual interfaces are created concurrently, say when creating
containers at boot time.
Returning early without hitting the above pattern when the syscall will
fail eventually does make things better. While it is not a fix for the
issue, it does ease things.
[1] https://lore.kernel.org/netdev/49A4D5D5.5090602@trash.net/https://lore.kernel.org/netdev/m14oyhis31.fsf@fess.ebiederm.org/
and https://lore.kernel.org/netdev/20090226084924.16cb3e08@nehalam/
[2] Rightfully, those deadlocks are *hard* to solve.
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2028276
Tested: LNST, Tier1
Upstream commit:
commit f7a1e76d0f608961cc2fc681f867a834f2746bce
Author: Xin Long <lucien.xin@gmail.com>
Date: Mon Oct 25 02:31:48 2021 -0400
net-sysfs: initialize uid and gid before calling net_ns_get_ownership
Currently in net_ns_get_ownership() it may not be able to set uid or gid
if make_kuid or make_kgid returns an invalid value, and an uninit-value
issue can be triggered by this.
This patch is to fix it by initializing the uid and gid before calling
net_ns_get_ownership(), as it does in kobject_get_ownership()
Fixes: e6dee9f389 ("net-sysfs: add netdev_change_owner()")
Reported-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
xps_queue_show is mostly made of an RCU read-side critical section and
calls bitmap_zalloc with GFP_KERNEL in the middle of it. That is not
allowed as this call may sleep and such behaviours aren't allowed in RCU
read-side critical sections. Fix this by using GFP_NOWAIT instead.
Fixes: 5478fcd0f4 ("net: embed nr_ids in the xps maps")
Reported-by: kernel test robot <oliver.sang@intel.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most of the xps_cpus_show and xps_rxqs_show functions share the same
logic. Having it in two different functions does not help maintenance.
This patch moves their common logic into a new function, xps_queue_show,
to improve this.
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that nr_ids and num_tc are stored in the xps dev_maps, which are RCU
protected, we do not have the need to protect the maps in the rtnl lock.
Move the rtnl unlock up so we reduce the rtnl locking section.
We also increase the reference count on the subordinate device if any,
as we don't want this device to be freed while we use it (now that the
rtnl lock isn't protecting it in the whole function).
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the xps maps (xps_cpus_map and xps_rxqs_map) to an array in
net_device. That will simplify a lot the code removing the need for lots
of if/else conditionals as the correct map will be available using its
offset in the array.
This should not modify the xps maps behaviour in any way.
Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the xps possible_mask. It was an optimization but we can just
loop from 0 to nr_ids now that it is embedded in the xps dev_maps. That
simplifies the code a bit.
Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Embed nr_ids (the number of cpu for the xps cpus map, and the number of
rxqs for the xps cpus map) in dev_maps. That will help not accessing out
of bound memory if those values change after dev_maps was allocated.
Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The xps cpus/rxqs map is accessed using dev->num_tc, which is used when
allocating the map. But later updates of dev->num_tc can lead to having
a mismatch between the maps and how they're accessed. In such cases the
map values do not make any sense and out of bound accesses can occur
(that can be easily seen using KASAN).
This patch aims at fixing this by embedding num_tc into the maps, using
the value at the time the map is created. This brings two improvements:
- The maps can be accessed using the embedded num_tc, so we know for
sure we won't have out of bound accesses.
- Checks can be made before accessing the maps so we know the values
retrieved will make sense.
We also update __netif_set_xps_queue to conditionally copy old maps from
dev_maps in the new one only if the number of traffic classes from both
maps match.
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the implementations of xps_cpus_show and xps_rxqs_show to converge,
as the two share the same logic but diverted over time. This should not
modify their behaviour but will help future changes and improve
maintenance.
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In net-sysfs, get_netdev_queue_index returns an unsigned int. Some of
its callers use an unsigned long to store the returned value. Update the
code to be consistent, this should only be cosmetic.
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>