JIRA: https://issues.redhat.com/browse/RHEL-73391
Upstream Status: linux.git
commit 22600596b6756b166fd052d5facb66287e6f0bad
Author: Xin Long <lucien.xin@gmail.com>
Date: Wed Oct 9 14:47:13 2024 -0400
ipv4: give an IPv4 dev to blackhole_netdev
After commit 8d7017fd62 ("blackhole_netdev: use blackhole_netdev to
invalidate dst entries"), blackhole_netdev was introduced to invalidate
dst cache entries on the TX path whenever the cache times out or is
flushed.
When two UDP sockets (sk1 and sk2) send messages to the same destination
simultaneously, they are using the same dst cache. If the dst cache is
invalidated on one path (sk2) while the other (sk1) is still transmitting,
sk1 may try to use the invalid dst entry.
CPU1 CPU2
udp_sendmsg(sk1) udp_sendmsg(sk2)
udp_send_skb()
ip_output()
<--- dst timeout or flushed
dst_dev_put()
ip_finish_output2()
ip_neigh_for_gw()
This results in a scenario where ip_neigh_for_gw() returns -EINVAL because
blackhole_dev lacks an in_dev, which is needed to initialize the neigh in
arp_constructor(). This error is then propagated back to userspace,
breaking the UDP application.
The patch fixes this issue by assigning an in_dev to blackhole_dev for
IPv4, similar to what was done for IPv6 in commit e5f80fcf869a ("ipv6:
give an IPv6 dev to blackhole_netdev"). This ensures that even when the
dst entry is invalidated with blackhole_dev, it will not fail to create
the neigh entry.
As devinet_init() is called ealier than blackhole_netdev_init() in system
booting, it can not assign the in_dev to blackhole_dev in devinet_init().
As Paolo suggested, add a separate late_initcall() in devinet.c to ensure
inet_blackhole_dev_init() is called after blackhole_netdev_init().
Fixes: 8d7017fd62 ("blackhole_netdev: use blackhole_netdev to invalidate dst entries")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/3000792d45ca44e16c785ebe2b092e610e5b3df1.1728499633.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Guillaume Nault <gnault@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-57756
Upstream commit(s):
commit 02e24903e5a46b7a7fca44bcfe0cd6fa5b240c34
Author: Eric Dumazet <edumazet@google.com>
Date: Wed Mar 6 10:24:26 2024 +0000
netlink: let core handle error cases in dump operations
After commit b5a899154aa9 ("netlink: handle EMSGSIZE errors
in the core"), we can remove some code that was not 100 % correct
anyway.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240306102426.245689-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Petr Oros <poros@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-62204
Upstream Status: linux.git
Conflicts:\
- Context difference due to missing upstream commit 47f0bd503210 ("net:
Add new protocol attribute to IP addresses") in c9s.
commit 1af7f88af269c4e06a4dc3bc920ff6cdf7471124
Author: Eric Dumazet <edumazet@google.com>
Date: Fri May 10 07:29:32 2024 +0000
inet: fix inet_fill_ifaddr() flags truncation
I missed that (struct ifaddrmsg)->ifa_flags was only 8bits,
while (struct in_ifaddr)->ifa_flags is 32bits.
Use a temporary 32bit variable as I did in set_ifa_lifetime()
and check_lifetime().
Fixes: 3ddc2231c810 ("inet: annotate data-races around ifa->ifa_flags")
Reported-by: Yu Watanabe <watanabe.yu@gmail.com>
Dianosed-by: Yu Watanabe <watanabe.yu@gmail.com>
Closes: https://github.com/systemd/systemd/pull/32666#issuecomment-2103977928
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240510072932.2678952-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-62204
Upstream Status: linux.git
commit 5b4b62a169e10401cca34a6e7ac39161986f5605
Author: Jakub Kicinski <kuba@kernel.org>
Date: Mon Jun 3 11:48:26 2024 -0700
rtnetlink: make the "split" NLM_DONE handling generic
Jaroslav reports Dell's OMSA Systems Management Data Engine
expects NLM_DONE in a separate recvmsg(), both for rtnl_dump_ifinfo()
and inet_dump_ifaddr(). We already added a similar fix previously in
commit 460b0d33cf10 ("inet: bring NLM_DONE out to a separate recv() again")
Instead of modifying all the dump handlers, and making them look
different than modern for_each_netdev_dump()-based dump handlers -
put the workaround in rtnetlink code. This will also help us move
the custom rtnl-locking from af_netlink in the future (in net-next).
Note that this change is not touching rtnl_dump_all(). rtnl_dump_all()
is different kettle of fish and a potential problem. We now mix families
in a single recvmsg(), but NLM_DONE is not coalesced.
Tested:
./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_addr.yaml \
--dump getaddr --json '{"ifa-family": 2}'
./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \
--dump getroute --json '{"rtm-family": 2}'
./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_link.yaml \
--dump getlink
Fixes: 3e41af90767d ("rtnetlink: use xarray iterator to implement rtnl_dump_ifinfo()")
Fixes: cdb2f80f1c10 ("inet: use xa_array iterator to implement inet_dump_ifaddr()")
Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
Link: https://lore.kernel.org/all/CAK8fFZ7MKoFSEzMBDAOjoUt+vTZRRQgLDNXEOfdCCXSoXXKE0g@mail.gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-62204
Upstream Status: linux.git
commit b8c8abefc07b47f0dc9342530b7618237df96724
Author: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Date: Tue May 28 22:30:30 2024 +0200
ipv4: correctly iterate over the target netns in inet_dump_ifaddr()
A recent change to inet_dump_ifaddr had the function incorrectly iterate
over net rather than tgt_net, resulting in the data coming for the
incorrect network namespace.
Fixes: cdb2f80f1c10 ("inet: use xa_array iterator to implement inet_dump_ifaddr()")
Reported-by: Stéphane Graber <stgraber@stgraber.org>
Closes: https://github.com/lxc/incus/issues/892
Bisected-by: Stéphane Graber <stgraber@stgraber.org>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Tested-by: Stéphane Graber <stgraber@stgraber.org>
Acked-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240528203030.10839-1-aleksandr.mikhalitsyn@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-62204
Upstream Status: linux.git
commit 7b05ab85e28f615e70520d24c075249b4512044e
Author: Ido Schimmel <idosch@nvidia.com>
Date: Thu May 23 14:02:57 2024 +0300
ipv4: Fix address dump when IPv4 is disabled on an interface
Cited commit started returning an error when user space requests to dump
the interface's IPv4 addresses and IPv4 is disabled on the interface.
Restore the previous behavior and do not return an error.
Before cited commit:
# ip address show dev dummy1
10: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether e2:40:68:98:d0:18 brd ff:ff:ff:ff:ff:ff
inet6 fe80::e040:68ff:fe98:d018/64 scope link proto kernel_ll
valid_lft forever preferred_lft forever
# ip link set dev dummy1 mtu 67
# ip address show dev dummy1
10: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 67 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether e2:40:68:98:d0:18 brd ff:ff:ff:ff:ff:ff
After cited commit:
# ip address show dev dummy1
10: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether 32:2d:69:f2:9c:99 brd ff:ff:ff:ff:ff:ff
inet6 fe80::302d:69ff:fef2:9c99/64 scope link proto kernel_ll
valid_lft forever preferred_lft forever
# ip link set dev dummy1 mtu 67
# ip address show dev dummy1
RTNETLINK answers: No such device
Dump terminated
With this patch:
# ip address show dev dummy1
10: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether de:17:56:bb:57:c0 brd ff:ff:ff:ff:ff:ff
inet6 fe80::dc17:56ff:febb:57c0/64 scope link proto kernel_ll
valid_lft forever preferred_lft forever
# ip link set dev dummy1 mtu 67
# ip address show dev dummy1
10: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 67 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether de:17:56:bb:57:c0 brd ff:ff:ff:ff:ff:ff
I fixed the exact same issue for IPv6 in commit c04f7dfe6ec2 ("ipv6: Fix
address dump when IPv6 is disabled on an interface"), but noted [1] that
I am not doing the change for IPv4 because I am not aware of a way to
disable IPv4 on an interface other than unregistering it. I clearly
missed the above case.
[1] https://lore.kernel.org/netdev/20240321173042.2151756-1-idosch@nvidia.com/
Fixes: cdb2f80f1c10 ("inet: use xa_array iterator to implement inet_dump_ifaddr()")
Reported-by: Carolina Jubran <cjubran@nvidia.com>
Reported-by: Yamen Safadi <ysafadi@nvidia.com>
Tested-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240523110257.334315-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-62204
Upstream Status: linux.git
commit cdb2f80f1c10654efc66c1624f66df2b87eabf06
Author: Eric Dumazet <edumazet@google.com>
Date: Thu Feb 29 11:40:16 2024 +0000
inet: use xa_array iterator to implement inet_dump_ifaddr()
1) inet_dump_ifaddr() can can run under RCU protection
instead of RTNL.
2) properly return 0 at the end of a dump, avoiding an
an extra recvmsg() system call.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-62204
Upstream Status: linux.git
commit 590e92cdc835fcf435d8611f2477fff0e16877c7
Author: Eric Dumazet <edumazet@google.com>
Date: Thu Feb 29 11:40:15 2024 +0000
inet: prepare inet_base_seq() to run without RTNL
In the following patch, inet_base_seq() will no longer be called
with RTNL held.
Add READ_ONCE()/WRITE_ONCE() annotations in dev_base_seq_inc()
and inet_base_seq().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-62204
Upstream Status: linux.git
Conflicts:\
- Context diff due to missing upstream commit 47f0bd503210 ("net: Add
new protocol attribute to IP addresses") in c9s.
commit 3ddc2231c8108302a8229d3c5849ee792a63230d
Author: Eric Dumazet <edumazet@google.com>
Date: Thu Feb 29 11:40:14 2024 +0000
inet: annotate data-races around ifa->ifa_flags
ifa->ifa_flags can be read locklessly.
Add appropriate READ_ONCE()/WRITE_ONCE() annotations.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-62204
Upstream Status: linux.git
commit 9f6fa3c4e722cbb9a007c3b85797bebfcdee84e9
Author: Eric Dumazet <edumazet@google.com>
Date: Thu Feb 29 11:40:13 2024 +0000
inet: annotate data-races around ifa->ifa_preferred_lft
ifa->ifa_preferred_lft can be read locklessly.
Add appropriate READ_ONCE()/WRITE_ONCE() annotations.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-62204
Upstream Status: linux.git
commit a5fcf74d80bec9948701ff0f7529ae96a0c4a41c
Author: Eric Dumazet <edumazet@google.com>
Date: Thu Feb 29 11:40:12 2024 +0000
inet: annotate data-races around ifa->ifa_valid_lft
ifa->ifa_valid_lft can be read locklessly.
Add appropriate READ_ONCE()/WRITE_ONCE() annotations.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-62204
Upstream Status: linux.git
commit 3cd3e72ccb3aade0e8fe037ef07a44b341ab577c
Author: Eric Dumazet <edumazet@google.com>
Date: Thu Feb 29 11:40:11 2024 +0000
inet: annotate data-races around ifa->ifa_tstamp and ifa->ifa_cstamp
ifa->ifa_tstamp can be read locklessly.
Add appropriate READ_ONCE()/WRITE_ONCE() annotations.
Do the same for ifa->ifa_cstamp to prepare upcoming changes.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-62202
Upstream Status: linux.git
commit 167487070d644a285ed863516c80b3c35ec929d6
Author: Eric Dumazet <edumazet@google.com>
Date: Tue Feb 27 09:24:11 2024 +0000
inet: use xa_array iterator to implement inet_netconf_dump_devconf()
1) inet_netconf_dump_devconf() can run under RCU protection
instead of RTNL.
2) properly return 0 at the end of a dump, avoiding an
an extra recvmsg() system call.
3) Do not use inet_base_seq() anymore, for_each_netdev_dump()
has nice properties. Restarting a GETDEVCONF dump if a device has
been added/removed or if net->ipv4.dev_addr_genid has changed is moot.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240227092411.2315725-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-62202
Upstream Status: linux.git
commit bbcf91053bb622c4c26a9bfc998d3b0c59227f10
Author: Eric Dumazet <edumazet@google.com>
Date: Tue Feb 27 09:24:10 2024 +0000
inet: do not use RTNL in inet_netconf_get_devconf()
"ip -4 netconf show dev XXXX" no longer acquires RTNL.
Return -ENODEV instead of -EINVAL if no netdev or idev can be found.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240227092411.2315725-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-62202
Upstream Status: linux.git
commit 0598f8f3bb77893a13105d47bb7dfe42f1dc1f4e
Author: Eric Dumazet <edumazet@google.com>
Date: Tue Feb 27 09:24:09 2024 +0000
inet: annotate devconf data-races
Add READ_ONCE() in ipv4_devconf_get() and corresponding
WRITE_ONCE() in ipv4_devconf_set()
Add IPV4_DEVCONF_RO() and IPV4_DEVCONF_ALL_RO() macros,
and use them when reading devconf fields.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240227092411.2315725-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-31492
Upstream Status: linux.git
commit 081a0e3b0d4c061419d3f4679dec9f68725b17e4
Author: Eric Dumazet <edumazet@google.com>
Date: Thu Feb 15 17:21:06 2024 +0000
ipv4: properly combine dev_base_seq and ipv4.dev_addr_genid
net->dev_base_seq and ipv4.dev_addr_genid are monotonically increasing.
If we XOR their values, we could miss to detect if both values
were changed with the same amount.
Fixes: 0465277f6b ("ipv4: provide addr and netconf dump consistency info")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Guillaume Nault <gnault@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-21356
Upstream Status: RHEL-only
ipv4_devconf is protected by kABI and embedded in struct net. Add 16
reserved fields using a custom mechanism.
Signed-off-by: Sabrina Dubroca <sdubroca@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-17413
Upstream Status: linux.git
commit b4672c733713f3bc9029c83efa7a2f1ef42ddf5b
Author: Hangbin Liu <liuhangbin@gmail.com>
Date: Fri Aug 18 16:25:23 2023 +0800
IPv4: add extack info for IPv4 address add/delete
Add extack info for IPv4 address add/delete, which would be useful for
users to understand the problem without having to read kernel code.
No extack message for the ifa_local checking in __inet_insert_ifa() as
it has been checked in find_matching_ifa().
Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-14295
Upstream Status: linux.git
commit ac28b1ec6135649b5d78b028e47264cb3ebca5ea
Author: Liu Jian <liujian56@huawei.com>
Date: Thu Sep 7 10:57:09 2023 +0800
net: ipv4: fix one memleak in __inet_del_ifa()
I got the below warning when do fuzzing test:
unregister_netdevice: waiting for bond0 to become free. Usage count = 2
It can be repoduced via:
ip link add bond0 type bond
sysctl -w net.ipv4.conf.bond0.promote_secondaries=1
ip addr add 4.117.174.103/0 scope 0x40 dev bond0
ip addr add 192.168.100.111/255.255.255.254 scope 0 dev bond0
ip addr add 0.0.0.4/0 scope 0x40 secondary dev bond0
ip addr del 4.117.174.103/0 scope 0x40 dev bond0
ip link delete bond0 type bond
In this reproduction test case, an incorrect 'last_prim' is found in
__inet_del_ifa(), as a result, the secondary address(0.0.0.4/0 scope 0x40)
is lost. The memory of the secondary address is leaked and the reference of
in_device and net_device is leaked.
Fix this problem:
Look for 'last_prim' starting at location of the deleted IP and inserting
the promoted IP into the location of 'last_prim'.
Fixes: 0ff60a4567 ("[IPV4]: Fix secondary IP addresses after promotion")
Signed-off-by: Liu Jian <liujian56@huawei.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Guillaume Nault <gnault@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-3646
commit d4150779e60fb6c49be25572596b2cdfc5d46a09
Author: Jason A. Donenfeld <Jason@zx2c4.com>
Date: Wed May 11 16:11:29 2022 +0200
random32: use real rng for non-deterministic randomness
random32.c has two random number generators in it: one that is meant to
be used deterministically, with some predefined seed, and one that does
the same exact thing as random.c, except does it poorly. The first one
has some use cases. The second one no longer does and can be replaced
with calls to random.c's proper random number generator.
The relatively recent siphash-based bad random32.c code was added in
response to concerns that the prior random32.c was too deterministic.
Out of fears that random.c was (at the time) too slow, this code was
anonymously contributed. Then out of that emerged a kind of shadow
entropy gathering system, with its own tentacles throughout various net
code, added willy nilly.
Stop👏making👏bespoke👏random👏number👏generators👏.
Fortunately, recent advances in random.c mean that we can stop playing
with this sketchiness, and just use get_random_u32(), which is now fast
enough. In micro benchmarks using RDPMC, I'm seeing the same median
cycle count between the two functions, with the mean being _slightly_
higher due to batches refilling (which we can optimize further need be).
However, when doing *real* benchmarks of the net functions that actually
use these random numbers, the mean cycles actually *decreased* slightly
(with the median still staying the same), likely because the additional
prandom code means icache misses and complexity, whereas random.c is
generally already being used by something else nearby.
The biggest benefit of this is that there are many users of prandom who
probably should be using cryptographically secure random numbers. This
makes all of those accidental cases become secure by just flipping a
switch. Later on, we can do a tree-wide cleanup to remove the static
inline wrapper functions that this commit adds.
There are also some low-ish hanging fruits for making this even faster
in the future: a get_random_u16() function for use in the networking
stack will give a 2x performance boost there, using SIMD for ChaCha20
will let us compute 4 or 8 or 16 blocks of output in parallel, instead
of just one, giving us large buffers for cheap, and introducing a
get_random_*_bh() function that assumes irqs are already disabled will
shave off a few cycles for ordinary calls. These are things we can chip
away at down the road.
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2175258
Conflicts:
- Removed chunks of unsupported protocol AX.25
- Renamed the funtions also in ipvlan. Commit 40b9d1ab63f5 ("ipvlan: hold lower
dev to avoid possible use-after-free") was backported out of order so it had
to use the old functions names.
commit d62607c3fe45911b2331fac073355a8c914bbde2
Author: Jakub Kicinski <kuba@kernel.org>
Date: Tue Jun 7 21:39:55 2022 -0700
net: rename reference+tracking helpers
Netdev reference helpers have a dev_ prefix for historic
reasons. Renaming the old helpers would be too much churn
but we can rename the tracking ones which are relatively
recent and should be the default for new code.
Rename:
dev_hold_track() -> netdev_hold()
dev_put_track() -> netdev_put()
dev_replace_track() -> netdev_ref_replace()
Link: https://lore.kernel.org/r/20220608043955.919359-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2139501
commit 8679c31e0284aa3aaba038035e443180b5bacb99
Author: Rocco Yue <rocco.yue@mediatek.com>
Date: Tue Aug 3 20:02:50 2021 +0800
net: add extack arg for link ops
Pass extack arg to validate_linkmsg and validate_link_af callbacks.
If a netlink attribute has a reject_message, use the extended ack
mechanism to carry the message back to user space.
Signed-off-by: Rocco Yue <rocco.yue@mediatek.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Merge conflicts:
-----------------
Conflicts with !1142(merged) "io_uring: update to v5.15"
fs/io-wq.c
- static bool io_wqe_create_worker(struct io_wqe *wqe, struct io_wqe_acct *acct)
!1142 already contains backport of 3146cba99aa2 ("io-wq: make worker creation resilient against signals")
along with other commits which are not present in !1370. Resolved in favor of HEAD(!1142)
- static int io_wqe_worker(void *data)
!1370 does not contain 767a65e9f317 ("io-wq: fix potential race of acct->nr_workers")
Resolved in favor of HEAD(!1142)
- static void io_init_new_worker(struct io_wqe *wqe, struct io_worker *worker,
HEAD(!1142) does not contain e32cf5dfbe22 ("kthread: Generalize pf_io_worker so it can point to struct kthread")
Resolved in favor of !1370
- static void create_worker_cont(struct callback_head *cb)
!1370 does not contain 66e70be72288 ("io-wq: fix memory leak in create_io_worker()")
Resolved in favor of HEAD(!1142)
- static void io_workqueue_create(struct work_struct *work)
!1370 does not contain 66e70be72288 ("io-wq: fix memory leak in create_io_worker()")
Resolved in favor of HEAD(!1142)
- static bool create_io_worker(struct io_wq *wq, struct io_wqe *wqe, int index)
!1370 does not contain 66e70be72288 ("io-wq: fix memory leak in create_io_worker()")
Resolved in favor of HEAD(!1142)
- static bool io_wq_work_match_item(struct io_wq_work *work, void *data)
!1370 does not contain 713b9825a4c4 ("io-wq: fix cancellation on create-worker failure")
Resolved in favor of HEAD(!1142)
- static void io_wqe_enqueue(struct io_wqe *wqe, struct io_wq_work *work)
!1370 is missing 713b9825a4c4 ("io-wq: fix cancellation on create-worker failure")
removed wrongly merged run_cancel label
Resolved in favor of HEAD(!1142)
- static bool io_task_work_match(struct callback_head *cb, void *data)
!1370 is missing 3b33e3f4a6c0 ("io-wq: fix silly logic error in io_task_work_match()")
Resolved in favor of HEAD(!1142)
- static void io_wq_exit_workers(struct io_wq *wq)
!1370 is missing 3b33e3f4a6c0 ("io-wq: fix silly logic error in io_task_work_match()")
Resolved in favor of HEAD(!1142)
- int io_wq_max_workers(struct io_wq *wq, int *new_count)
!1370 is missing 3b33e3f4a6c0 ("io-wq: fix silly logic error in io_task_work_match()")
fs/io_uring.c
- static int io_register_iowq_max_workers(struct io_ring_ctx *ctx,
!1370 is missing bunch of commits after 2e480058ddc2 ("io-wq: provide a way to limit max number of workers")
Resolved in favor of HEAD(!1142)
include/uapi/linux/io_uring.h
- !1370 is missing dd47c104533d ("io-wq: provide IO_WQ_* constants for IORING_REGISTER_IOWQ_MAX_WORKERS arg items")
just a comment conflict
Resolved in favor of HEAD(!1142)
kernel/exit.c
- void __noreturn do_exit(long code)
- !1370 contains bunch of commits after f552a27afe67 ("io_uring: remove files pointer in cancellation functions")
Resolved in favor of !1370
Conflicts with !1357(merged) "NFS refresh for RHEL-9.2"
fs/nfs/callback.c
- nfs4_callback_svc(void *vrqstp)
!1370 is missing f49169c97fce ("NFSD: Remove svc_serv_ops::svo_module") where the module_put_and_kthread_exit() was removed
Resolved in favor of HEAD(!1357)
fs/nfs/file.c
!1357 is missing 187c82cb0380 ("fs: Convert trivial uses of __set_page_dirty_nobuffers to filemap_dirty_folio")
Resolved in favor of HEAD(!1370)
fs/nfsd/nfssvc.c
- nfsd(void *vrqstp)
!1370 is missing f49169c97fce ("NFSD: Remove svc_serv_ops::svo_module")
Resolved in favor of HEAD(!1357)
-----------------
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1370
Bugzilla: https://bugzilla.redhat.com/2120352
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2099722
Patches 1-9 are changes to selftests
Patches 10-31 are reverts of RHEL-only patches to address COR CVE
Patches 32-320 are the machine dependent mm changes ported by Rafael
Patch 321 reverts the backport of 6692c98c7df5. See below.
Patches 322-981 are the machine independent mm changes
Patches 982-1016 are David Hildebrand's upstream changes to address the COR CVE
RHEL commit b23c298982 fork: Stop protecting back_fork_cleanup_cgroup_lock with CONFIG_NUMA
which is a backport of upstream 6692c98c7df5 and is reverted early in this series. 6692c98c7df5
is a fix for upstream 40966e316f86 which was not in RHEL until this series. 6692c98c7df5 is re-added
after 40966e316f86.
Omitted-fix: 310d1344e3c5 ("Revert "powerpc: Remove unused FW_FEATURE_NATIVE references"")
to be fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2131716
Omitted-fix: 465d0eb0dc31 ("Docs/admin-guide/mm/damon/usage: fix the example code snip")
to be fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2131716
Omitted-fix: 317314527d17 ("mm/hugetlb: correct demote page offset logic")
to be fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2131716
Omitted-fix: 37dcc673d065 ("frontswap: don't call ->init if no ops are registered")
to be fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2131716
Omitted-fix: 30c19366636f ("mm: fix BUG splat with kvmalloc + GFP_ATOMIC")
to be fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2131716
Omitted: fix: fa84693b3c89 io_uring: ensure IORING_REGISTER_IOWQ_MAX_WORKERS works with SQPOLL
fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107656
Omitted-fix: 009ad9f0c6ee io_uring: drop ctx->uring_lock before acquiring sqd->lock
fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107656
Omitted-fix: bc369921d670 io-wq: max_worker fixes
fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107743
Omitted-fix: e139a1ec92f8 io_uring: apply max_workers limit to all future users
fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107743
Omitted-fix: 71c9ce27bb57 io-wq: fix max-workers not correctly set on multi-node system
fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107743
Omitted-fix: 41d3a6bd1d37 io_uring: pin SQPOLL data before unlocking ring lock
fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107656
Omitted-fix: bad119b9a000 io_uring: honour zeroes as io-wq worker limits
fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107743
Omitted-fix: 08bdbd39b584 io-wq: ensure that hash wait lock is IRQ disabling
fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107656
Omitted-fix: 713b9825a4c4 io-wq: fix cancellation on create-worker failure
fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107656
Omitted-fix: 3b33e3f4a6c0 io-wq: fix silly logic error in io_task_work_match()
fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107656
Omitted-fix: 71e1cef2d794 io-wq: Remove duplicate code in io_workqueue_create()
fixed under https://bugzilla.redhat.com/show_bug.cgi?id=210774
Omitted-fix: a226abcd5d42 io-wq: don't retry task_work creation failure on fatal conditions
fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107743
Omitted-fix: fa84693b3c89 io_uring: ensure IORING_REGISTER_IOWQ_MAX_WORKERS works with SQPOLL
fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107656
Omitted-fix: dd47c104533d io-wq: provide IO_WQ_* constants for IORING_REGISTER_IOWQ_MAX_WORKERS arg items
fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107656
Omitted-fix: 4f0712ccec09 hexagon: Fix function name in die()
unsupported arch
Omitted-fix: 751971af2e36 csky: Fix function name in csky_alignment() and die()
unsupported arch
Omitted-fix: dcbc65aac283 ptrace: Remove duplicated include in ptrace.c
unsupported arch
Omitted-fix: eb48d4219879 drm/i915: Fix oops due to missing stack depot
fixed in RHEL commit 105d2d4832 Merge DRM changes from upstream v5.16..v5.17
Omitted-fix: 751a9d69b197 drm/i915: Fix oops due to missing stack depot
fixed in RHEL commit 99fc716fc4 Merge DRM changes from upstream v5.17..v5.18
Omitted-fix: eb48d4219879 drm/i915: Fix oops due to missing stack depot
fixed in RHEL commit 105d2d4832 Merge DRM changes from upstream v5.16..v5.17
Omitted-fix: 751a9d69b197 drm/i915: Fix oops due to missing stack depot
fixed in RHEL commit 99fc716fc4 Merge DRM changes from upstream v5.17..v5.18
Omitted-fix: b95dc06af3e6 drm/amdgpu: disable runpm if we are the primary adapter
reverted later
Omitted-fix: 5a90c24ad028 Revert "drm/amdgpu: disable runpm if we are the primary adapter"
revert of above omitted fix
Omitted-fix: 724bbe49c5e4 fs/ntfs3: provide block_invalidate_folio to fix memory leak
unsupported fs
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
Approved-by: John W. Linville <linville@redhat.com>
Approved-by: Jiri Benc <jbenc@redhat.com>
Approved-by: Jarod Wilson <jarod@redhat.com>
Approved-by: Prarit Bhargava <prarit@redhat.com>
Approved-by: Lyude Paul <lyude@redhat.com>
Approved-by: Donald Dutile <ddutile@redhat.com>
Approved-by: Rafael Aquini <aquini@redhat.com>
Approved-by: Phil Auld <pauld@redhat.com>
Approved-by: Waiman Long <longman@redhat.com>
Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2134161
Tested: LNST, Tier1
Upstream commit:
commit a5612ca10d1aa05624ebe72633e0c8c792970833
Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date: Tue Aug 23 10:46:57 2022 -0700
net: Fix data-races around sysctl_devconf_inherit_init_net.
While reading sysctl_devconf_inherit_init_net, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its readers.
Fixes: 856c395cfa ("net: introduce a knob to control whether to inherit devconf config")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2120352
commit 6126891c6d4f6f4ef50323d2020635ee255a796e
Author: Vasily Averin <vvs@virtuozzo.com>
Date: Mon Jul 19 13:44:31 2021 +0300
memcg: enable accounting for IP address and routing-related objects
An netadmin inside container can use 'ip a a' and 'ip r a'
to assign a large number of ipv4/ipv6 addresses and routing entries
and force kernel to allocate megabytes of unaccounted memory
for long-lived per-netdevice related kernel objects:
'struct in_ifaddr', 'struct inet6_ifaddr', 'struct fib6_node',
'struct rt6_info', 'struct fib_rules' and ip_fib caches.
These objects can be manually removed, though usually they lives
in memory till destroy of its net namespace.
It makes sense to account for them to restrict the host's memory
consumption from inside the memcg-limited container.
One of such objects is the 'struct fib6_node' mostly allocated in
net/ipv6/route.c::__ip6_ins_rt() inside the lock_bh()/unlock_bh() section:
write_lock_bh(&table->tb6_lock);
err = fib6_add(&table->tb6_root, rt, info, mxc);
write_unlock_bh(&table->tb6_lock);
In this case it is not enough to simply add SLAB_ACCOUNT to corresponding
kmem cache. The proper memory cgroup still cannot be found due to the
incorrect 'in_interrupt()' check used in memcg_kmem_bypass().
Obsoleted in_interrupt() does not describe real execution context properly.
>From include/linux/preempt.h:
The following macros are deprecated and should not be used in new code:
in_interrupt() - We're in NMI,IRQ,SoftIRQ context or have BH disabled
To verify the current execution context new macro should be used instead:
in_task() - We're in task context
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2096377
commit c04438f58d140723e58050fcb9d33d84cb39e9e9
Author: Eric Dumazet <edumazet@google.com>
Date: Sat Dec 4 20:22:12 2021 -0800
ipv4: add net device refcount tracker to struct in_device
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2008927
commit b0e99d03778b2418aec20db99d97d19d25d198b6
Author: Arnd Bergmann <arnd@arndb.de>
Date: Thu Jul 22 16:29:01 2021 +0200
net: socket: remove register_gifconf
Since dynamic registration of the gifconf() helper is only used for
IPv4, and this can not be in a loadable module, this can be simplified
noticeably by turning it into a direct function call as a preparation
for cleaning up the compat handling.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Trivial conflicts in net/can/isotp.c and
tools/testing/selftests/net/mptcp/mptcp_connect.sh
scaled_ppm_to_ppb() was moved from drivers/ptp/ptp_clock.c
to include/linux/ptp_clock_kernel.h in -next so re-apply
the fix there.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When 'nla_parse_nested_deprecated' failed, it's no need to
BUG() here, return -EINVAL is ok.
Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we call af_ops->set_link_af() we hold a RCU read lock
as we retrieve af_ops from the RCU protected list, but this
is unnecessary because we already hold RTNL lock, which is
the writer lock for protecting rtnl_af_ops, so it is safer
than RCU read lock. Similar for af_ops->validate_link_af().
This was not a problem until we begin to take mutex lock
down the path of ->set_link_af() in __ipv6_dev_mc_dec()
recently. We can just drop the RCU read lock there and
assert RTNL lock.
Reported-and-tested-by: syzbot+7d941e89dd48bcf42573@syzkaller.appspotmail.com
Fixes: 63ed8de4be ("mld: add mc_lock for protecting per-interface mld data")
Tested-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Setting iftoken can fail for several different reasons but there
and there was no report to user as to the cause. Add netlink
extended errors to the processing of the request.
This requires adding additional argument through rtnl_af_ops
set_link_af callback.
Reported-by: Hongren Zheng <li@zenithal.me>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Calls to nla_strlcpy are now replaced by calls to nla_strscpy which is the new
name of this function.
Signed-off-by: Francis Laniel <laniel_francis@privacyrequired.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The initialization for 'err' with '-EINVAL' is redundant and
can be removed, as it is updated soon.
Changes since v1:
- Remove redundant empty line
Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn>
Link: https://lore.kernel.org/r/20201108010541.12432-1-dong.menglong@zte.com.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
xdp_umem.c had overlapping changes between the 64-bit math fix
for the calculation of npgs and the removal of the zerocopy
memory type which got rid of the chunk_size_nohdr member.
The mlx5 Kconfig conflict is a case where we just take the
net-next copy of the Kconfig entry dependency as it takes on
the ESWITCH dependency by one level of indirection which is
what the 'net' conflicting change is trying to ensure.
Signed-off-by: David S. Miller <davem@davemloft.net>
When devinet_sysctl_register() failed, the memory allocated
in neigh_parms_alloc() should be freed.
Fixes: 20e61da7ff ("ipv4: fail early when creating netdev named all or default")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The goal is to be able to inherit the initial devconf parameters from the
current netns, ie the netns where this new netns has been created.
This is useful in a containers environment where /proc/sys is read only.
For example, if a pod is created with specifics devconf parameters and has
the capability to create netns, the user expects to get the same parameters
than his 'init_net', which is not the real init_net in this case.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull in Christoph Hellwig's series that changes the sysctl's ->proc_handler
methods to take kernel pointers instead. It gets rid of the set_fs address
space overrides used by BPF. As per discussion, pull in the feature branch
into bpf-next as it relates to BPF sysctl progs.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200427071508.GV23230@ZenIV.linux.org.uk/T/
Instead of having all the sysctl handlers deal with user pointers, which
is rather hairy in terms of the BPF interaction, copy the input to and
from userspace in common code. This also means that the strings are
always NUL-terminated by the common code, making the API a little bit
safer.
As most handler just pass through the data to one of the common handlers
a lot of the changes are mechnical.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
When CONFIG_IP_MULTICAST is not set and multicast ip is added to the device
with autojoin flag or when multicast ip is deleted kernel will crash.
steps to reproduce:
ip addr add 224.0.0.0/32 dev eth0
ip addr del 224.0.0.0/32 dev eth0
or
ip addr add 224.0.0.0/32 dev eth0 autojoin
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000088
pc : _raw_write_lock_irqsave+0x1e0/0x2ac
lr : lock_sock_nested+0x1c/0x60
Call trace:
_raw_write_lock_irqsave+0x1e0/0x2ac
lock_sock_nested+0x1c/0x60
ip_mc_config.isra.28+0x50/0xe0
inet_rtm_deladdr+0x1a8/0x1f0
rtnetlink_rcv_msg+0x120/0x350
netlink_rcv_skb+0x58/0x120
rtnetlink_rcv+0x14/0x20
netlink_unicast+0x1b8/0x270
netlink_sendmsg+0x1a0/0x3b0
____sys_sendmsg+0x248/0x290
___sys_sendmsg+0x80/0xc0
__sys_sendmsg+0x68/0xc0
__arm64_sys_sendmsg+0x20/0x30
el0_svc_common.constprop.2+0x88/0x150
do_el0_svc+0x20/0x80
el0_sync_handler+0x118/0x190
el0_sync+0x140/0x180
Fixes: 93a714d6b5 ("multicast: Extend ip address command to enable multicast group join/leave on")
Signed-off-by: Taras Chornyi <taras.chornyi@plvision.eu>
Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert the various uses of fallthrough comments to fallthrough;
Done via script
Link: https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe@perches.com/
And by hand:
net/ipv6/ip6_fib.c has a fallthrough comment outside of an #ifdef block
that causes gcc to emit a warning if converted in-place.
So move the new fallthrough; inside the containing #ifdef/#endif too.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid the situation where an IPV6 only flag is applied to an IPv4 address:
# ip addr add 192.0.2.1/24 dev dummy0 nodad home mngtmpaddr noprefixroute
# ip -4 addr show dev dummy0
2: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
inet 192.0.2.1/24 scope global noprefixroute dummy0
valid_lft forever preferred_lft forever
Or worse, by sending a malicious netlink command:
# ip -4 addr show dev dummy0
2: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
inet 192.0.2.1/24 scope global nodad optimistic dadfailed home tentative mngtmpaddr noprefixroute stable-privacy dummy0
valid_lft forever preferred_lft forever
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
secondary address promotion causes infinite loop -- it arranges
for ifa->ifa_next to point back to itself.
Problem is that 'prev_prom' and 'last_prim' might point at the same entry,
so 'last_sec' pointer must be obtained after prev_prom->next update.
Fixes: 2638eb8b50 ("net: ipv4: provide __rcu annotation for ifa_list")
Reported-by: Ran Rozenstein <ranro@mellanox.com>
Reported-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
When arp_ignore=3, the NIC won't reply for scope host addresses, but
if enable route_locanet, we need to reply ip address with head 127 and
scope RT_SCOPE_HOST.
Fixes: d0daebc3d6 ("ipv4: Add interface option to enable routing of 127.0.0.0/8")
Signed-off-by: Shijie Luo <luoshijie1@huawei.com>
Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Suppose we have two interfaces eth0 and eth1 in two hosts, follow
the same steps in the two hosts:
# sysctl -w net.ipv4.conf.eth1.route_localnet=1
# sysctl -w net.ipv4.conf.eth1.arp_announce=2
# ip route del 127.0.0.0/8 dev lo table local
and then set ip to eth1 in host1 like:
# ifconfig eth1 127.25.3.4/24
set ip to eth2 in host2 and ping host1:
# ifconfig eth1 127.25.3.14/24
# ping -I eth1 127.25.3.4
Well, host2 cannot connect to host1.
When set a ip address with head 127, the scope of the address defaults
to RT_SCOPE_HOST. In this situation, host2 will use arp_solicit() to
send a arp request for the mac address of host1 with ip
address 127.25.3.14. When arp_announce=2, inet_select_addr() cannot
select a correct saddr with condition ifa->ifa_scope > scope, because
ifa_scope is RT_SCOPE_HOST and scope is RT_SCOPE_LINK. Then,
inet_select_addr() will go to no_in_dev to lookup all interfaces to find
a primary ip and finally get the primary ip of eth0.
Here I add a localnet_scope defaults to RT_SCOPE_HOST, and when
route_localnet is enabled, this value changes to RT_SCOPE_LINK to make
inet_select_addr() find a correct primary ip as saddr of arp request.
Fixes: d0daebc3d6 ("ipv4: Add interface option to enable routing of 127.0.0.0/8")
Signed-off-by: Shijie Luo <luoshijie1@huawei.com>
Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Causes crash when lifetime expires on an adress as garbage is
dereferenced soon after.
This used to look like this:
for (ifap = &ifa->ifa_dev->ifa_list;
*ifap != NULL; ifap = &(*ifap)->ifa_next) {
if (*ifap == ifa) ...
but this was changed to:
struct in_ifaddr *tmp;
ifap = &ifa->ifa_dev->ifa_list;
tmp = rtnl_dereference(*ifap);
while (tmp) {
tmp = rtnl_dereference(tmp->ifa_next); // Bogus
if (rtnl_dereference(*ifap) == ifa) {
...
ifap = &tmp->ifa_next; // Can be NULL
tmp = rtnl_dereference(*ifap); // Dereference
}
}
Remove the bogus assigment/list entry skip.
Fixes: 2638eb8b50 ("net: ipv4: provide __rcu annotation for ifa_list")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some ISDN files that got removed in net-next had some changes
done in mainline, take the removals.
Signed-off-by: David S. Miller <davem@davemloft.net>
syzbot triggered following splat when strict netlink
validation is enabled:
net/ipv4/devinet.c:1766 suspicious rcu_dereference_check() usage!
This occurs because we hold RTNL mutex, but no rcu read lock.
The second call site holds both, so just switch to the _rtnl variant.
Reported-by: syzbot+bad6e32808a3a97b1515@syzkaller.appspotmail.com
Fixes: 2638eb8b50 ("net: ipv4: provide __rcu annotation for ifa_list")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>