Commit Graph

328 Commits

Author SHA1 Message Date
Guillaume Nault 4b871f4434 ipv4: give an IPv4 dev to blackhole_netdev
JIRA: https://issues.redhat.com/browse/RHEL-73391
Upstream Status: linux.git

commit 22600596b6756b166fd052d5facb66287e6f0bad
Author: Xin Long <lucien.xin@gmail.com>
Date:   Wed Oct 9 14:47:13 2024 -0400

    ipv4: give an IPv4 dev to blackhole_netdev

    After commit 8d7017fd62 ("blackhole_netdev: use blackhole_netdev to
    invalidate dst entries"), blackhole_netdev was introduced to invalidate
    dst cache entries on the TX path whenever the cache times out or is
    flushed.

    When two UDP sockets (sk1 and sk2) send messages to the same destination
    simultaneously, they are using the same dst cache. If the dst cache is
    invalidated on one path (sk2) while the other (sk1) is still transmitting,
    sk1 may try to use the invalid dst entry.

             CPU1                   CPU2

          udp_sendmsg(sk1)       udp_sendmsg(sk2)
          udp_send_skb()
          ip_output()
                                                 <--- dst timeout or flushed
                                 dst_dev_put()
          ip_finish_output2()
          ip_neigh_for_gw()

    This results in a scenario where ip_neigh_for_gw() returns -EINVAL because
    blackhole_dev lacks an in_dev, which is needed to initialize the neigh in
    arp_constructor(). This error is then propagated back to userspace,
    breaking the UDP application.

    The patch fixes this issue by assigning an in_dev to blackhole_dev for
    IPv4, similar to what was done for IPv6 in commit e5f80fcf869a ("ipv6:
    give an IPv6 dev to blackhole_netdev"). This ensures that even when the
    dst entry is invalidated with blackhole_dev, it will not fail to create
    the neigh entry.

    As devinet_init() is called ealier than blackhole_netdev_init() in system
    booting, it can not assign the in_dev to blackhole_dev in devinet_init().
    As Paolo suggested, add a separate late_initcall() in devinet.c to ensure
    inet_blackhole_dev_init() is called after blackhole_netdev_init().

    Fixes: 8d7017fd62 ("blackhole_netdev: use blackhole_netdev to invalidate dst entries")
    Signed-off-by: Xin Long <lucien.xin@gmail.com>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Link: https://patch.msgid.link/3000792d45ca44e16c785ebe2b092e610e5b3df1.1728499633.git.lucien.xin@gmail.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Guillaume Nault <gnault@redhat.com>
2025-01-09 18:01:35 +01:00
Petr Oros b5e456d0ed netlink: let core handle error cases in dump operations
JIRA: https://issues.redhat.com/browse/RHEL-57756

Upstream commit(s):
commit 02e24903e5a46b7a7fca44bcfe0cd6fa5b240c34
Author: Eric Dumazet <edumazet@google.com>
Date:   Wed Mar 6 10:24:26 2024 +0000

    netlink: let core handle error cases in dump operations

    After commit b5a899154aa9 ("netlink: handle EMSGSIZE errors
    in the core"), we can remove some code that was not 100 % correct
    anyway.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Reviewed-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Link: https://lore.kernel.org/r/20240306102426.245689-1-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-12-10 10:37:52 +01:00
Antoine Tenart ef09c13fd1 inet: fix inet_fill_ifaddr() flags truncation
JIRA: https://issues.redhat.com/browse/RHEL-62204
Upstream Status: linux.git
Conflicts:\
- Context difference due to missing upstream commit 47f0bd503210 ("net:
  Add new protocol attribute to IP addresses") in c9s.

commit 1af7f88af269c4e06a4dc3bc920ff6cdf7471124
Author: Eric Dumazet <edumazet@google.com>
Date:   Fri May 10 07:29:32 2024 +0000

    inet: fix inet_fill_ifaddr() flags truncation

    I missed that (struct ifaddrmsg)->ifa_flags was only 8bits,
    while (struct in_ifaddr)->ifa_flags is 32bits.

    Use a temporary 32bit variable as I did in set_ifa_lifetime()
    and check_lifetime().

    Fixes: 3ddc2231c810 ("inet: annotate data-races around ifa->ifa_flags")
    Reported-by: Yu Watanabe <watanabe.yu@gmail.com>
    Dianosed-by: Yu Watanabe <watanabe.yu@gmail.com>
    Closes: https://github.com/systemd/systemd/pull/32666#issuecomment-2103977928
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Link: https://lore.kernel.org/r/20240510072932.2678952-1-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2024-11-14 10:16:50 +01:00
Antoine Tenart ba114b046d rtnetlink: make the "split" NLM_DONE handling generic
JIRA: https://issues.redhat.com/browse/RHEL-62204
Upstream Status: linux.git

commit 5b4b62a169e10401cca34a6e7ac39161986f5605
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Mon Jun 3 11:48:26 2024 -0700

    rtnetlink: make the "split" NLM_DONE handling generic

    Jaroslav reports Dell's OMSA Systems Management Data Engine
    expects NLM_DONE in a separate recvmsg(), both for rtnl_dump_ifinfo()
    and inet_dump_ifaddr(). We already added a similar fix previously in
    commit 460b0d33cf10 ("inet: bring NLM_DONE out to a separate recv() again")

    Instead of modifying all the dump handlers, and making them look
    different than modern for_each_netdev_dump()-based dump handlers -
    put the workaround in rtnetlink code. This will also help us move
    the custom rtnl-locking from af_netlink in the future (in net-next).

    Note that this change is not touching rtnl_dump_all(). rtnl_dump_all()
    is different kettle of fish and a potential problem. We now mix families
    in a single recvmsg(), but NLM_DONE is not coalesced.

    Tested:

      ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_addr.yaml \
               --dump getaddr --json '{"ifa-family": 2}'

      ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \
               --dump getroute --json '{"rtm-family": 2}'

      ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_link.yaml \
               --dump getlink

    Fixes: 3e41af90767d ("rtnetlink: use xarray iterator to implement rtnl_dump_ifinfo()")
    Fixes: cdb2f80f1c10 ("inet: use xa_array iterator to implement inet_dump_ifaddr()")
    Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
    Link: https://lore.kernel.org/all/CAK8fFZ7MKoFSEzMBDAOjoUt+vTZRRQgLDNXEOfdCCXSoXXKE0g@mail.gmail.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2024-11-14 10:16:49 +01:00
Antoine Tenart aaac41e579 ipv4: correctly iterate over the target netns in inet_dump_ifaddr()
JIRA: https://issues.redhat.com/browse/RHEL-62204
Upstream Status: linux.git

commit b8c8abefc07b47f0dc9342530b7618237df96724
Author: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Date:   Tue May 28 22:30:30 2024 +0200

    ipv4: correctly iterate over the target netns in inet_dump_ifaddr()

    A recent change to inet_dump_ifaddr had the function incorrectly iterate
    over net rather than tgt_net, resulting in the data coming for the
    incorrect network namespace.

    Fixes: cdb2f80f1c10 ("inet: use xa_array iterator to implement inet_dump_ifaddr()")
    Reported-by: Stéphane Graber <stgraber@stgraber.org>
    Closes: https://github.com/lxc/incus/issues/892
    Bisected-by: Stéphane Graber <stgraber@stgraber.org>
    Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
    Tested-by: Stéphane Graber <stgraber@stgraber.org>
    Acked-by: Christian Brauner <brauner@kernel.org>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Link: https://lore.kernel.org/r/20240528203030.10839-1-aleksandr.mikhalitsyn@canonical.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2024-11-14 10:16:49 +01:00
Antoine Tenart 936c59bc6e ipv4: Fix address dump when IPv4 is disabled on an interface
JIRA: https://issues.redhat.com/browse/RHEL-62204
Upstream Status: linux.git

commit 7b05ab85e28f615e70520d24c075249b4512044e
Author: Ido Schimmel <idosch@nvidia.com>
Date:   Thu May 23 14:02:57 2024 +0300

    ipv4: Fix address dump when IPv4 is disabled on an interface

    Cited commit started returning an error when user space requests to dump
    the interface's IPv4 addresses and IPv4 is disabled on the interface.
    Restore the previous behavior and do not return an error.

    Before cited commit:

     # ip address show dev dummy1
     10: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
         link/ether e2:40:68:98:d0:18 brd ff:ff:ff:ff:ff:ff
         inet6 fe80::e040:68ff:fe98:d018/64 scope link proto kernel_ll
            valid_lft forever preferred_lft forever
     # ip link set dev dummy1 mtu 67
     # ip address show dev dummy1
     10: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 67 qdisc noqueue state UNKNOWN group default qlen 1000
         link/ether e2:40:68:98:d0:18 brd ff:ff:ff:ff:ff:ff

    After cited commit:

     # ip address show dev dummy1
     10: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
         link/ether 32:2d:69:f2:9c:99 brd ff:ff:ff:ff:ff:ff
         inet6 fe80::302d:69ff:fef2:9c99/64 scope link proto kernel_ll
            valid_lft forever preferred_lft forever
     # ip link set dev dummy1 mtu 67
     # ip address show dev dummy1
     RTNETLINK answers: No such device
     Dump terminated

    With this patch:

     # ip address show dev dummy1
     10: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
         link/ether de:17:56:bb:57:c0 brd ff:ff:ff:ff:ff:ff
         inet6 fe80::dc17:56ff:febb:57c0/64 scope link proto kernel_ll
            valid_lft forever preferred_lft forever
     # ip link set dev dummy1 mtu 67
     # ip address show dev dummy1
     10: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 67 qdisc noqueue state UNKNOWN group default qlen 1000
         link/ether de:17:56:bb:57:c0 brd ff:ff:ff:ff:ff:ff

    I fixed the exact same issue for IPv6 in commit c04f7dfe6ec2 ("ipv6: Fix
    address dump when IPv6 is disabled on an interface"), but noted [1] that
    I am not doing the change for IPv4 because I am not aware of a way to
    disable IPv4 on an interface other than unregistering it. I clearly
    missed the above case.

    [1] https://lore.kernel.org/netdev/20240321173042.2151756-1-idosch@nvidia.com/

    Fixes: cdb2f80f1c10 ("inet: use xa_array iterator to implement inet_dump_ifaddr()")
    Reported-by: Carolina Jubran <cjubran@nvidia.com>
    Reported-by: Yamen Safadi <ysafadi@nvidia.com>
    Tested-by: Carolina Jubran <cjubran@nvidia.com>
    Reviewed-by: Petr Machata <petrm@nvidia.com>
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Link: https://lore.kernel.org/r/20240523110257.334315-1-idosch@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2024-11-14 10:16:49 +01:00
Antoine Tenart 88bb83cb48 inet: use xa_array iterator to implement inet_dump_ifaddr()
JIRA: https://issues.redhat.com/browse/RHEL-62204
Upstream Status: linux.git

commit cdb2f80f1c10654efc66c1624f66df2b87eabf06
Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Feb 29 11:40:16 2024 +0000

    inet: use xa_array iterator to implement inet_dump_ifaddr()

    1) inet_dump_ifaddr() can can run under RCU protection
       instead of RTNL.

    2) properly return 0 at the end of a dump, avoiding an
       an extra recvmsg() system call.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2024-11-14 10:16:49 +01:00
Antoine Tenart bbb75e1453 inet: prepare inet_base_seq() to run without RTNL
JIRA: https://issues.redhat.com/browse/RHEL-62204
Upstream Status: linux.git

commit 590e92cdc835fcf435d8611f2477fff0e16877c7
Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Feb 29 11:40:15 2024 +0000

    inet: prepare inet_base_seq() to run without RTNL

    In the following patch, inet_base_seq() will no longer be called
    with RTNL held.

    Add READ_ONCE()/WRITE_ONCE() annotations in dev_base_seq_inc()
    and inet_base_seq().

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2024-11-14 10:16:49 +01:00
Antoine Tenart ca2d487981 inet: annotate data-races around ifa->ifa_flags
JIRA: https://issues.redhat.com/browse/RHEL-62204
Upstream Status: linux.git
Conflicts:\
- Context diff due to missing upstream commit 47f0bd503210 ("net: Add
  new protocol attribute to IP addresses") in c9s.

commit 3ddc2231c8108302a8229d3c5849ee792a63230d
Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Feb 29 11:40:14 2024 +0000

    inet: annotate data-races around ifa->ifa_flags

    ifa->ifa_flags can be read locklessly.

    Add appropriate READ_ONCE()/WRITE_ONCE() annotations.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2024-11-14 10:16:49 +01:00
Antoine Tenart 9441825bbc inet: annotate data-races around ifa->ifa_preferred_lft
JIRA: https://issues.redhat.com/browse/RHEL-62204
Upstream Status: linux.git

commit 9f6fa3c4e722cbb9a007c3b85797bebfcdee84e9
Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Feb 29 11:40:13 2024 +0000

    inet: annotate data-races around ifa->ifa_preferred_lft

    ifa->ifa_preferred_lft can be read locklessly.

    Add appropriate READ_ONCE()/WRITE_ONCE() annotations.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2024-11-14 10:16:49 +01:00
Antoine Tenart 99032797d0 inet: annotate data-races around ifa->ifa_valid_lft
JIRA: https://issues.redhat.com/browse/RHEL-62204
Upstream Status: linux.git

commit a5fcf74d80bec9948701ff0f7529ae96a0c4a41c
Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Feb 29 11:40:12 2024 +0000

    inet: annotate data-races around ifa->ifa_valid_lft

    ifa->ifa_valid_lft can be read locklessly.

    Add appropriate READ_ONCE()/WRITE_ONCE() annotations.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2024-11-14 10:16:49 +01:00
Antoine Tenart c38ed3f9c6 inet: annotate data-races around ifa->ifa_tstamp and ifa->ifa_cstamp
JIRA: https://issues.redhat.com/browse/RHEL-62204
Upstream Status: linux.git

commit 3cd3e72ccb3aade0e8fe037ef07a44b341ab577c
Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Feb 29 11:40:11 2024 +0000

    inet: annotate data-races around ifa->ifa_tstamp and ifa->ifa_cstamp

    ifa->ifa_tstamp can be read locklessly.

    Add appropriate READ_ONCE()/WRITE_ONCE() annotations.

    Do the same for ifa->ifa_cstamp to prepare upcoming changes.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2024-11-14 10:16:49 +01:00
Antoine Tenart 08c6589af9 inet: use xa_array iterator to implement inet_netconf_dump_devconf()
JIRA: https://issues.redhat.com/browse/RHEL-62202
Upstream Status: linux.git

commit 167487070d644a285ed863516c80b3c35ec929d6
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Feb 27 09:24:11 2024 +0000

    inet: use xa_array iterator to implement inet_netconf_dump_devconf()

    1) inet_netconf_dump_devconf() can run under RCU protection
       instead of RTNL.

    2) properly return 0 at the end of a dump, avoiding an
       an extra recvmsg() system call.

    3) Do not use inet_base_seq() anymore, for_each_netdev_dump()
       has nice properties. Restarting a GETDEVCONF dump if a device has
       been added/removed or if net->ipv4.dev_addr_genid has changed is moot.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Link: https://lore.kernel.org/r/20240227092411.2315725-4-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2024-11-14 10:16:48 +01:00
Antoine Tenart 2664dac3a7 inet: do not use RTNL in inet_netconf_get_devconf()
JIRA: https://issues.redhat.com/browse/RHEL-62202
Upstream Status: linux.git

commit bbcf91053bb622c4c26a9bfc998d3b0c59227f10
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Feb 27 09:24:10 2024 +0000

    inet: do not use RTNL in inet_netconf_get_devconf()

    "ip -4 netconf show dev XXXX" no longer acquires RTNL.

    Return -ENODEV instead of -EINVAL if no netdev or idev can be found.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Link: https://lore.kernel.org/r/20240227092411.2315725-3-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2024-11-14 10:16:48 +01:00
Antoine Tenart cff87c7d6f inet: annotate devconf data-races
JIRA: https://issues.redhat.com/browse/RHEL-62202
Upstream Status: linux.git

commit 0598f8f3bb77893a13105d47bb7dfe42f1dc1f4e
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Feb 27 09:24:09 2024 +0000

    inet: annotate devconf data-races

    Add READ_ONCE() in ipv4_devconf_get() and corresponding
    WRITE_ONCE() in ipv4_devconf_set()

    Add IPV4_DEVCONF_RO() and IPV4_DEVCONF_ALL_RO() macros,
    and use them when reading devconf fields.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Link: https://lore.kernel.org/r/20240227092411.2315725-2-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2024-11-14 10:16:47 +01:00
Guillaume Nault 735e61e257 ipv4: properly combine dev_base_seq and ipv4.dev_addr_genid
JIRA: https://issues.redhat.com/browse/RHEL-31492
Upstream Status: linux.git

commit 081a0e3b0d4c061419d3f4679dec9f68725b17e4
Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Feb 15 17:21:06 2024 +0000

    ipv4: properly combine dev_base_seq and ipv4.dev_addr_genid

    net->dev_base_seq and ipv4.dev_addr_genid are monotonically increasing.

    If we XOR their values, we could miss to detect if both values
    were changed with the same amount.

    Fixes: 0465277f6b ("ipv4: provide addr and netconf dump consistency info")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
    Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Guillaume Nault <gnault@redhat.com>
2024-04-03 16:47:27 +02:00
Sabrina Dubroca 445dc1f44f net: add reserved fields to ipv4_devconf
JIRA: https://issues.redhat.com/browse/RHEL-21356
Upstream Status: RHEL-only

ipv4_devconf is protected by kABI and embedded in struct net. Add 16
reserved fields using a custom mechanism.

Signed-off-by: Sabrina Dubroca <sdubroca@redhat.com>
2024-01-12 14:27:38 +01:00
Antoine Tenart c0002f8e89 IPv4: add extack info for IPv4 address add/delete
JIRA: https://issues.redhat.com/browse/RHEL-17413
Upstream Status: linux.git

commit b4672c733713f3bc9029c83efa7a2f1ef42ddf5b
Author: Hangbin Liu <liuhangbin@gmail.com>
Date:   Fri Aug 18 16:25:23 2023 +0800

    IPv4: add extack info for IPv4 address add/delete

    Add extack info for IPv4 address add/delete, which would be useful for
    users to understand the problem without having to read kernel code.

    No extack message for the ifa_local checking in __inet_insert_ifa() as
    it has been checked in find_matching_ifa().

    Suggested-by: Ido Schimmel <idosch@idosch.org>
    Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Reviewed-by: Ido Schimmel <idosch@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2023-12-11 11:15:48 +01:00
Guillaume Nault f9e0bd3ec5 net: ipv4: fix one memleak in __inet_del_ifa()
JIRA: https://issues.redhat.com/browse/RHEL-14295
Upstream Status: linux.git

commit ac28b1ec6135649b5d78b028e47264cb3ebca5ea
Author: Liu Jian <liujian56@huawei.com>
Date:   Thu Sep 7 10:57:09 2023 +0800

    net: ipv4: fix one memleak in __inet_del_ifa()

    I got the below warning when do fuzzing test:
    unregister_netdevice: waiting for bond0 to become free. Usage count = 2

    It can be repoduced via:

    ip link add bond0 type bond
    sysctl -w net.ipv4.conf.bond0.promote_secondaries=1
    ip addr add 4.117.174.103/0 scope 0x40 dev bond0
    ip addr add 192.168.100.111/255.255.255.254 scope 0 dev bond0
    ip addr add 0.0.0.4/0 scope 0x40 secondary dev bond0
    ip addr del 4.117.174.103/0 scope 0x40 dev bond0
    ip link delete bond0 type bond

    In this reproduction test case, an incorrect 'last_prim' is found in
    __inet_del_ifa(), as a result, the secondary address(0.0.0.4/0 scope 0x40)
    is lost. The memory of the secondary address is leaked and the reference of
    in_device and net_device is leaked.

    Fix this problem:
    Look for 'last_prim' starting at location of the deleted IP and inserting
    the promoted IP into the location of 'last_prim'.

    Fixes: 0ff60a4567 ("[IPV4]: Fix secondary IP addresses after promotion")
    Signed-off-by: Liu Jian <liujian56@huawei.com>
    Signed-off-by: Julian Anastasov <ja@ssi.bg>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Guillaume Nault <gnault@redhat.com>
2023-10-19 18:55:19 +02:00
Ivan Vecera 3cc9e8b28b random32: use real rng for non-deterministic randomness
JIRA: https://issues.redhat.com/browse/RHEL-3646

commit d4150779e60fb6c49be25572596b2cdfc5d46a09
Author: Jason A. Donenfeld <Jason@zx2c4.com>
Date:   Wed May 11 16:11:29 2022 +0200

    random32: use real rng for non-deterministic randomness

    random32.c has two random number generators in it: one that is meant to
    be used deterministically, with some predefined seed, and one that does
    the same exact thing as random.c, except does it poorly. The first one
    has some use cases. The second one no longer does and can be replaced
    with calls to random.c's proper random number generator.

    The relatively recent siphash-based bad random32.c code was added in
    response to concerns that the prior random32.c was too deterministic.
    Out of fears that random.c was (at the time) too slow, this code was
    anonymously contributed. Then out of that emerged a kind of shadow
    entropy gathering system, with its own tentacles throughout various net
    code, added willy nilly.

    Stop👏making👏bespoke👏random👏number👏generators👏.

    Fortunately, recent advances in random.c mean that we can stop playing
    with this sketchiness, and just use get_random_u32(), which is now fast
    enough. In micro benchmarks using RDPMC, I'm seeing the same median
    cycle count between the two functions, with the mean being _slightly_
    higher due to batches refilling (which we can optimize further need be).
    However, when doing *real* benchmarks of the net functions that actually
    use these random numbers, the mean cycles actually *decreased* slightly
    (with the median still staying the same), likely because the additional
    prandom code means icache misses and complexity, whereas random.c is
    generally already being used by something else nearby.

    The biggest benefit of this is that there are many users of prandom who
    probably should be using cryptographically secure random numbers. This
    makes all of those accidental cases become secure by just flipping a
    switch. Later on, we can do a tree-wide cleanup to remove the static
    inline wrapper functions that this commit adds.

    There are also some low-ish hanging fruits for making this even faster
    in the future: a get_random_u16() function for use in the networking
    stack will give a 2x performance boost there, using SIMD for ChaCha20
    will let us compute 4 or 8 or 16 blocks of output in parallel, instead
    of just one, giving us large buffers for cheap, and introducing a
    get_random_*_bh() function that assumes irqs are already disabled will
    shave off a few cycles for ordinary calls. These are things we can chip
    away at down the road.

    Acked-by: Jakub Kicinski <kuba@kernel.org>
    Acked-by: Theodore Ts'o <tytso@mit.edu>
    Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2023-09-13 18:39:29 +02:00
Íñigo Huguet 3a91b473a8 net: rename reference+tracking helpers
Bugzilla: https://bugzilla.redhat.com/2175258

Conflicts:
 - Removed chunks of unsupported protocol AX.25
 - Renamed the funtions also in ipvlan. Commit 40b9d1ab63f5 ("ipvlan: hold lower
   dev to avoid possible use-after-free") was backported out of order so it had
   to use the old functions names.

commit d62607c3fe45911b2331fac073355a8c914bbde2
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Tue Jun 7 21:39:55 2022 -0700

    net: rename reference+tracking helpers

    Netdev reference helpers have a dev_ prefix for historic
    reasons. Renaming the old helpers would be too much churn
    but we can rename the tracking ones which are relatively
    recent and should be the default for new code.

    Rename:
     dev_hold_track()    -> netdev_hold()
     dev_put_track()     -> netdev_put()
     dev_replace_track() -> netdev_ref_replace()

    Link: https://lore.kernel.org/r/20220608043955.919359-1-kuba@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
2023-03-23 16:19:21 +01:00
Frantisek Hrbata 27a89b8946 Merge: tcp: BIG TCP implementation
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1560

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2139501
Depends: https://bugzilla.redhat.com/show_bug.cgi?id=2128180
Tested: Using netperf and veth driver. Results meet the assumptions. See https://bugzilla.redhat.com/show_bug.cgi?id=2139501#c1

The series introduces support for BIG TCP.

- Patch 1-2: Preliminary dependencies
- Patch 3-14: Commits from upstream series 7fa2e481ff2f ("Merge branch 'big-tcp'", 2022-05-16)
- Patch 15-19: Follow-ups

Signed-off-by: Ivan Vecera <ivecera@redhat.com>

Approved-by: Antoine Tenart <atenart@redhat.com>
Approved-by: Florian Westphal <fwestpha@redhat.com>

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
2022-11-15 07:30:55 -05:00
Ivan Vecera 997a94df8f net: add extack arg for link ops
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2139501

commit 8679c31e0284aa3aaba038035e443180b5bacb99
Author: Rocco Yue <rocco.yue@mediatek.com>
Date:   Tue Aug 3 20:02:50 2021 +0800

    net: add extack arg for link ops

    Pass extack arg to validate_linkmsg and validate_link_af callbacks.
    If a netlink attribute has a reject_message, use the extended ack
    mechanism to carry the message back to user space.

    Signed-off-by: Rocco Yue <rocco.yue@mediatek.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2022-11-02 18:55:30 +01:00
Frantisek Hrbata e9e9bc8da2 Merge: mm changes through v5.18 for 9.2
Merge conflicts:
-----------------
Conflicts with !1142(merged) "io_uring: update to v5.15"

fs/io-wq.c
        - static bool io_wqe_create_worker(struct io_wqe *wqe, struct io_wqe_acct *acct)
          !1142 already contains backport of 3146cba99aa2 ("io-wq: make worker creation resilient against signals")
          along with other commits which are not present in !1370. Resolved in favor of HEAD(!1142)
        - static int io_wqe_worker(void *data)
          !1370 does not contain 767a65e9f317 ("io-wq: fix potential race of acct->nr_workers")
          Resolved in favor of HEAD(!1142)
        - static void io_init_new_worker(struct io_wqe *wqe, struct io_worker *worker,
          HEAD(!1142) does not contain e32cf5dfbe22 ("kthread: Generalize pf_io_worker so it can point to struct kthread")
          Resolved in favor of !1370
        - static void create_worker_cont(struct callback_head *cb)
          !1370 does not contain 66e70be72288 ("io-wq: fix memory leak in create_io_worker()")
          Resolved in favor of HEAD(!1142)
        - static void io_workqueue_create(struct work_struct *work)
          !1370 does not contain 66e70be72288 ("io-wq: fix memory leak in create_io_worker()")
          Resolved in favor of HEAD(!1142)
        - static bool create_io_worker(struct io_wq *wq, struct io_wqe *wqe, int index)
          !1370 does not contain 66e70be72288 ("io-wq: fix memory leak in create_io_worker()")
          Resolved in favor of HEAD(!1142)
        - static bool io_wq_work_match_item(struct io_wq_work *work, void *data)
          !1370 does not contain 713b9825a4c4 ("io-wq: fix cancellation on create-worker failure")
          Resolved in favor of HEAD(!1142)
        - static void io_wqe_enqueue(struct io_wqe *wqe, struct io_wq_work *work)
          !1370 is missing 713b9825a4c4 ("io-wq: fix cancellation on create-worker failure")
          removed wrongly merged run_cancel label
          Resolved in favor of HEAD(!1142)
        - static bool io_task_work_match(struct callback_head *cb, void *data)
          !1370 is missing 3b33e3f4a6c0 ("io-wq: fix silly logic error in io_task_work_match()")
          Resolved in favor of HEAD(!1142)
        - static void io_wq_exit_workers(struct io_wq *wq)
          !1370 is missing 3b33e3f4a6c0 ("io-wq: fix silly logic error in io_task_work_match()")
          Resolved in favor of HEAD(!1142)
        - int io_wq_max_workers(struct io_wq *wq, int *new_count)
          !1370 is missing 3b33e3f4a6c0 ("io-wq: fix silly logic error in io_task_work_match()")
fs/io_uring.c
        - static int io_register_iowq_max_workers(struct io_ring_ctx *ctx,
          !1370 is missing bunch of commits after 2e480058ddc2 ("io-wq: provide a way to limit max number of workers")
          Resolved in favor of HEAD(!1142)
include/uapi/linux/io_uring.h
        - !1370 is missing dd47c104533d ("io-wq: provide IO_WQ_* constants for IORING_REGISTER_IOWQ_MAX_WORKERS arg items")
          just a comment conflict
          Resolved in favor of HEAD(!1142)
kernel/exit.c
        - void __noreturn do_exit(long code)
        - !1370 contains bunch of commits after f552a27afe67 ("io_uring: remove files pointer in cancellation functions")
          Resolved in favor of !1370

Conflicts with !1357(merged) "NFS refresh for RHEL-9.2"

fs/nfs/callback.c
        - nfs4_callback_svc(void *vrqstp)
          !1370 is missing f49169c97fce ("NFSD: Remove svc_serv_ops::svo_module") where the module_put_and_kthread_exit() was removed
          Resolved in favor of HEAD(!1357)
fs/nfs/file.c
          !1357 is missing 187c82cb0380 ("fs: Convert trivial uses of __set_page_dirty_nobuffers to filemap_dirty_folio")
          Resolved in favor of HEAD(!1370)
fs/nfsd/nfssvc.c
        - nfsd(void *vrqstp)
          !1370 is missing f49169c97fce ("NFSD: Remove svc_serv_ops::svo_module")
          Resolved in favor of HEAD(!1357)
-----------------

MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1370

Bugzilla: https://bugzilla.redhat.com/2120352

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2099722

Patches 1-9 are changes to selftests
Patches 10-31 are reverts of RHEL-only patches to address COR CVE
Patches 32-320 are the machine dependent mm changes ported by Rafael
Patch 321 reverts the backport of 6692c98c7df5. See below.
Patches 322-981 are the machine independent mm changes
Patches 982-1016 are David Hildebrand's upstream changes to address the COR CVE

RHEL commit b23c298982 fork: Stop protecting back_fork_cleanup_cgroup_lock with CONFIG_NUMA
which is a backport of upstream 6692c98c7df5 and is reverted early in this series. 6692c98c7df5
is a fix for upstream 40966e316f86 which was not in RHEL until this series. 6692c98c7df5 is re-added
after 40966e316f86.

Omitted-fix: 310d1344e3c5 ("Revert "powerpc: Remove unused FW_FEATURE_NATIVE references"")
        to be fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2131716

Omitted-fix: 465d0eb0dc31 ("Docs/admin-guide/mm/damon/usage: fix the example code snip")
        to be fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2131716

Omitted-fix: 317314527d17 ("mm/hugetlb: correct demote page offset logic")
        to be fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2131716

Omitted-fix: 37dcc673d065 ("frontswap: don't call ->init if no ops are registered")
        to be fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2131716

Omitted-fix: 30c19366636f ("mm: fix BUG splat with kvmalloc + GFP_ATOMIC")
        to be fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2131716

Omitted: fix: fa84693b3c89 io_uring: ensure IORING_REGISTER_IOWQ_MAX_WORKERS works with SQPOLL
	fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107656

Omitted-fix: 009ad9f0c6ee io_uring: drop ctx->uring_lock before acquiring sqd->lock
	fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107656

Omitted-fix: bc369921d670 io-wq: max_worker fixes
	fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107743

Omitted-fix: e139a1ec92f8 io_uring: apply max_workers limit to all future users
	fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107743

Omitted-fix: 71c9ce27bb57 io-wq: fix max-workers not correctly set on multi-node system
	fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107743

Omitted-fix: 41d3a6bd1d37 io_uring: pin SQPOLL data before unlocking ring lock
	fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107656

Omitted-fix: bad119b9a000 io_uring: honour zeroes as io-wq worker limits
	fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107743

Omitted-fix: 08bdbd39b584 io-wq: ensure that hash wait lock is IRQ disabling
	fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107656

Omitted-fix: 713b9825a4c4 io-wq: fix cancellation on create-worker failure
	fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107656

Omitted-fix: 3b33e3f4a6c0 io-wq: fix silly logic error in io_task_work_match()
	fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107656

Omitted-fix: 71e1cef2d794 io-wq: Remove duplicate code in io_workqueue_create()
	fixed under https://bugzilla.redhat.com/show_bug.cgi?id=210774

Omitted-fix: a226abcd5d42 io-wq: don't retry task_work creation failure on fatal conditions
	fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107743

Omitted-fix: fa84693b3c89 io_uring: ensure IORING_REGISTER_IOWQ_MAX_WORKERS works with SQPOLL
        fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107656

Omitted-fix: dd47c104533d io-wq: provide IO_WQ_* constants for IORING_REGISTER_IOWQ_MAX_WORKERS arg items
        fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107656

Omitted-fix: 4f0712ccec09 hexagon: Fix function name in die()
	unsupported arch

Omitted-fix: 751971af2e36 csky: Fix function name in csky_alignment() and die()
	unsupported arch

Omitted-fix: dcbc65aac283 ptrace: Remove duplicated include in ptrace.c
        unsupported arch

Omitted-fix: eb48d4219879 drm/i915: Fix oops due to missing stack depot
	fixed in RHEL commit 105d2d4832 Merge DRM changes from upstream v5.16..v5.17

Omitted-fix: 751a9d69b197 drm/i915: Fix oops due to missing stack depot
	fixed in RHEL commit 99fc716fc4 Merge DRM changes from upstream v5.17..v5.18

Omitted-fix: eb48d4219879 drm/i915: Fix oops due to missing stack depot
	fixed in RHEL commit 105d2d4832 Merge DRM changes from upstream v5.16..v5.17

Omitted-fix: 751a9d69b197 drm/i915: Fix oops due to missing stack depot
	fixed in RHEL commit 99fc716fc4 Merge DRM changes from upstream v5.17..v5.18

Omitted-fix: b95dc06af3e6 drm/amdgpu: disable runpm if we are the primary adapter
        reverted later

Omitted-fix: 5a90c24ad028 Revert "drm/amdgpu: disable runpm if we are the primary adapter"
        revert of above omitted fix

Omitted-fix: 724bbe49c5e4 fs/ntfs3: provide block_invalidate_folio to fix memory leak
	unsupported fs

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>

Approved-by: John W. Linville <linville@redhat.com>
Approved-by: Jiri Benc <jbenc@redhat.com>
Approved-by: Jarod Wilson <jarod@redhat.com>
Approved-by: Prarit Bhargava <prarit@redhat.com>
Approved-by: Lyude Paul <lyude@redhat.com>
Approved-by: Donald Dutile <ddutile@redhat.com>
Approved-by: Rafael Aquini <aquini@redhat.com>
Approved-by: Phil Auld <pauld@redhat.com>
Approved-by: Waiman Long <longman@redhat.com>

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
2022-10-23 19:49:41 +02:00
Paolo Abeni 49f9c276ff net: Fix data-races around sysctl_devconf_inherit_init_net.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2134161
Tested: LNST, Tier1

Upstream commit:
commit a5612ca10d1aa05624ebe72633e0c8c792970833
Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Tue Aug 23 10:46:57 2022 -0700

    net: Fix data-races around sysctl_devconf_inherit_init_net.

    While reading sysctl_devconf_inherit_init_net, it can be changed
    concurrently.  Thus, we need to add READ_ONCE() to its readers.

    Fixes: 856c395cfa ("net: introduce a knob to control whether to inherit devconf config")
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-13 13:00:04 +02:00
Chris von Recklinghausen 422fcee0d6 memcg: enable accounting for IP address and routing-related objects
Bugzilla: https://bugzilla.redhat.com/2120352

commit 6126891c6d4f6f4ef50323d2020635ee255a796e
Author: Vasily Averin <vvs@virtuozzo.com>
Date:   Mon Jul 19 13:44:31 2021 +0300

    memcg: enable accounting for IP address and routing-related objects

    An netadmin inside container can use 'ip a a' and 'ip r a'
    to assign a large number of ipv4/ipv6 addresses and routing entries
    and force kernel to allocate megabytes of unaccounted memory
    for long-lived per-netdevice related kernel objects:
    'struct in_ifaddr', 'struct inet6_ifaddr', 'struct fib6_node',
    'struct rt6_info', 'struct fib_rules' and ip_fib caches.

    These objects can be manually removed, though usually they lives
    in memory till destroy of its net namespace.

    It makes sense to account for them to restrict the host's memory
    consumption from inside the memcg-limited container.

    One of such objects is the 'struct fib6_node' mostly allocated in
    net/ipv6/route.c::__ip6_ins_rt() inside the lock_bh()/unlock_bh() section:

     write_lock_bh(&table->tb6_lock);
     err = fib6_add(&table->tb6_root, rt, info, mxc);
     write_unlock_bh(&table->tb6_lock);

    In this case it is not enough to simply add SLAB_ACCOUNT to corresponding
    kmem cache. The proper memory cgroup still cannot be found due to the
    incorrect 'in_interrupt()' check used in memcg_kmem_bypass().

    Obsoleted in_interrupt() does not describe real execution context properly.
    >From include/linux/preempt.h:

     The following macros are deprecated and should not be used in new code:
     in_interrupt() - We're in NMI,IRQ,SoftIRQ context or have BH disabled

    To verify the current execution context new macro should be used instead:
     in_task()      - We're in task context

    Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2022-10-12 07:27:22 -04:00
Ivan Vecera 37103509d6 ipv4: add net device refcount tracker to struct in_device
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2096377

commit c04438f58d140723e58050fcb9d33d84cb39e9e9
Author: Eric Dumazet <edumazet@google.com>
Date:   Sat Dec 4 20:22:12 2021 -0800

    ipv4: add net device refcount tracker to struct in_device

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2022-06-13 18:38:19 +02:00
Ivan Vecera 76c5515643 net: socket: remove register_gifconf
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2008927

commit b0e99d03778b2418aec20db99d97d19d25d198b6
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Thu Jul 22 16:29:01 2021 +0200

    net: socket: remove register_gifconf

    Since dynamic registration of the gifconf() helper is only used for
    IPv4, and this can not be in a loadable module, this can be simplified
    noticeably by turning it into a direct function call as a preparation
    for cleaning up the compat handling.

    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2021-10-11 15:43:40 +02:00
Jakub Kicinski adc2e56ebe Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Trivial conflicts in net/can/isotp.c and
tools/testing/selftests/net/mptcp/mptcp_connect.sh

scaled_ppm_to_ppb() was moved from drivers/ptp/ptp_clock.c
to include/linux/ptp_clock_kernel.h in -next so re-apply
the fix there.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-06-18 19:47:02 -07:00
Zheng Yongjun 5ac6b198d7 net: ipv4: Remove unneed BUG() function
When 'nla_parse_nested_deprecated' failed, it's no need to
BUG() here, return -EINVAL is ok.

Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-08 11:36:48 -07:00
Cong Wang a100243d95 rtnetlink: avoid RCU read lock when holding RTNL
When we call af_ops->set_link_af() we hold a RCU read lock
as we retrieve af_ops from the RCU protected list, but this
is unnecessary because we already hold RTNL lock, which is
the writer lock for protecting rtnl_af_ops, so it is safer
than RCU read lock. Similar for af_ops->validate_link_af().

This was not a problem until we begin to take mutex lock
down the path of ->set_link_af() in __ipv6_dev_mc_dec()
recently. We can just drop the RCU read lock there and
assert RTNL lock.

Reported-and-tested-by: syzbot+7d941e89dd48bcf42573@syzkaller.appspotmail.com
Fixes: 63ed8de4be ("mld: add mc_lock for protecting per-interface mld data")
Tested-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-10 14:33:10 -07:00
Stephen Hemminger 3583a4e8d7 ipv6: report errors for iftoken via netlink extack
Setting iftoken can fail for several different reasons but there
and there was no report to user as to the cause. Add netlink
extended errors to the processing of the request.

This requires adding additional argument through rtnl_af_ops
set_link_af callback.

Reported-by: Hongren Zheng <li@zenithal.me>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-08 13:52:36 -07:00
Francis Laniel 872f690341 treewide: rename nla_strlcpy to nla_strscpy.
Calls to nla_strlcpy are now replaced by calls to nla_strscpy which is the new
name of this function.

Signed-off-by: Francis Laniel <laniel_francis@privacyrequired.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16 08:08:54 -08:00
Menglong Dong 30e2379e82 net: ipv4: remove redundant initialization in inet_rtm_deladdr
The initialization for 'err' with '-EINVAL' is redundant and
can be removed, as it is updated soon.

Changes since v1:
- Remove redundant empty line

Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn>
Link: https://lore.kernel.org/r/20201108010541.12432-1-dong.menglong@zte.com.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-10 15:22:05 -08:00
David S. Miller 1806c13dc2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
xdp_umem.c had overlapping changes between the 64-bit math fix
for the calculation of npgs and the removal of the zerocopy
memory type which got rid of the chunk_size_nohdr member.

The mlx5 Kconfig conflict is a case where we just take the
net-next copy of the Kconfig entry dependency as it takes on
the ESWITCH dependency by one level of indirection which is
what the 'net' conflicting change is trying to ensure.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-31 17:48:46 -07:00
Yang Yingliang 1b49cd71b5 devinet: fix memleak in inetdev_init()
When devinet_sysctl_register() failed, the memory allocated
in neigh_parms_alloc() should be freed.

Fixes: 20e61da7ff ("ipv4: fail early when creating netdev named all or default")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-30 17:48:56 -07:00
Nicolas Dichtel 9efd6a3cec netns: enable to inherit devconf from current netns
The goal is to be able to inherit the initial devconf parameters from the
current netns, ie the netns where this new netns has been created.

This is useful in a containers environment where /proc/sys is read only.
For example, if a pod is created with specifics devconf parameters and has
the capability to create netns, the user expects to get the same parameters
than his 'init_net', which is not the real init_net in this case.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-16 13:46:37 -07:00
Daniel Borkmann 0b54142e4b Merge branch 'work.sysctl' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull in Christoph Hellwig's series that changes the sysctl's ->proc_handler
methods to take kernel pointers instead. It gets rid of the set_fs address
space overrides used by BPF. As per discussion, pull in the feature branch
into bpf-next as it relates to BPF sysctl progs.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200427071508.GV23230@ZenIV.linux.org.uk/T/
2020-04-28 21:23:38 +02:00
Christoph Hellwig 32927393dc sysctl: pass kernel pointers to ->proc_handler
Instead of having all the sysctl handlers deal with user pointers, which
is rather hairy in terms of the BPF interaction, copy the input to and
from  userspace in common code.  This also means that the strings are
always NUL-terminated by the common code, making the API a little bit
safer.

As most handler just pass through the data to one of the common handlers
a lot of the changes are mechnical.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-27 02:07:40 -04:00
Taras Chornyi 690cc86321 net: ipv4: devinet: Fix crash when add/del multicast IP with autojoin
When CONFIG_IP_MULTICAST is not set and multicast ip is added to the device
with autojoin flag or when multicast ip is deleted kernel will crash.

steps to reproduce:

ip addr add 224.0.0.0/32 dev eth0
ip addr del 224.0.0.0/32 dev eth0

or

ip addr add 224.0.0.0/32 dev eth0 autojoin

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000088
 pc : _raw_write_lock_irqsave+0x1e0/0x2ac
 lr : lock_sock_nested+0x1c/0x60
 Call trace:
  _raw_write_lock_irqsave+0x1e0/0x2ac
  lock_sock_nested+0x1c/0x60
  ip_mc_config.isra.28+0x50/0xe0
  inet_rtm_deladdr+0x1a8/0x1f0
  rtnetlink_rcv_msg+0x120/0x350
  netlink_rcv_skb+0x58/0x120
  rtnetlink_rcv+0x14/0x20
  netlink_unicast+0x1b8/0x270
  netlink_sendmsg+0x1a0/0x3b0
  ____sys_sendmsg+0x248/0x290
  ___sys_sendmsg+0x80/0xc0
  __sys_sendmsg+0x68/0xc0
  __arm64_sys_sendmsg+0x20/0x30
  el0_svc_common.constprop.2+0x88/0x150
  do_el0_svc+0x20/0x80
 el0_sync_handler+0x118/0x190
  el0_sync+0x140/0x180

Fixes: 93a714d6b5 ("multicast: Extend ip address command to enable multicast group join/leave on")
Signed-off-by: Taras Chornyi <taras.chornyi@plvision.eu>
Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-09 10:27:23 -07:00
Joe Perches a8eceea84a inet: Use fallthrough;
Convert the various uses of fallthrough comments to fallthrough;

Done via script
Link: https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe@perches.com/

And by hand:

net/ipv6/ip6_fib.c has a fallthrough comment outside of an #ifdef block
that causes gcc to emit a warning if converted in-place.

So move the new fallthrough; inside the containing #ifdef/#endif too.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12 15:55:00 -07:00
Eric Dumazet 501a90c945 inet: protect against too small mtu values.
syzbot was once again able to crash a host by setting a very small mtu
on loopback device.

Let's make inetdev_valid_mtu() available in include/net/ip.h,
and use it in ip_setup_cork(), so that we protect both ip_append_page()
and __ip_append_data()

Also add a READ_ONCE() when the device mtu is read.

Pairs this lockless read with one WRITE_ONCE() in __dev_set_mtu(),
even if other code paths might write over this field.

Add a big comment in include/linux/netdevice.h about dev->mtu
needing READ_ONCE()/WRITE_ONCE() annotations.

Hopefully we will add the missing ones in followup patches.

[1]

refcount_t: saturated; leaking memory.
WARNING: CPU: 0 PID: 9464 at lib/refcount.c:22 refcount_warn_saturate+0x138/0x1f0 lib/refcount.c:22
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 9464 Comm: syz-executor850 Not tainted 5.4.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x197/0x210 lib/dump_stack.c:118
 panic+0x2e3/0x75c kernel/panic.c:221
 __warn.cold+0x2f/0x3e kernel/panic.c:582
 report_bug+0x289/0x300 lib/bug.c:195
 fixup_bug arch/x86/kernel/traps.c:174 [inline]
 fixup_bug arch/x86/kernel/traps.c:169 [inline]
 do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:267
 do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:286
 invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027
RIP: 0010:refcount_warn_saturate+0x138/0x1f0 lib/refcount.c:22
Code: 06 31 ff 89 de e8 c8 f5 e6 fd 84 db 0f 85 6f ff ff ff e8 7b f4 e6 fd 48 c7 c7 e0 71 4f 88 c6 05 56 a6 a4 06 01 e8 c7 a8 b7 fd <0f> 0b e9 50 ff ff ff e8 5c f4 e6 fd 0f b6 1d 3d a6 a4 06 31 ff 89
RSP: 0018:ffff88809689f550 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff815e4336 RDI: ffffed1012d13e9c
RBP: ffff88809689f560 R08: ffff88809c50a3c0 R09: fffffbfff15d31b1
R10: fffffbfff15d31b0 R11: ffffffff8ae98d87 R12: 0000000000000001
R13: 0000000000040100 R14: ffff888099041104 R15: ffff888218d96e40
 refcount_add include/linux/refcount.h:193 [inline]
 skb_set_owner_w+0x2b6/0x410 net/core/sock.c:1999
 sock_wmalloc+0xf1/0x120 net/core/sock.c:2096
 ip_append_page+0x7ef/0x1190 net/ipv4/ip_output.c:1383
 udp_sendpage+0x1c7/0x480 net/ipv4/udp.c:1276
 inet_sendpage+0xdb/0x150 net/ipv4/af_inet.c:821
 kernel_sendpage+0x92/0xf0 net/socket.c:3794
 sock_sendpage+0x8b/0xc0 net/socket.c:936
 pipe_to_sendpage+0x2da/0x3c0 fs/splice.c:458
 splice_from_pipe_feed fs/splice.c:512 [inline]
 __splice_from_pipe+0x3ee/0x7c0 fs/splice.c:636
 splice_from_pipe+0x108/0x170 fs/splice.c:671
 generic_splice_sendpage+0x3c/0x50 fs/splice.c:842
 do_splice_from fs/splice.c:861 [inline]
 direct_splice_actor+0x123/0x190 fs/splice.c:1035
 splice_direct_to_actor+0x3b4/0xa30 fs/splice.c:990
 do_splice_direct+0x1da/0x2a0 fs/splice.c:1078
 do_sendfile+0x597/0xd00 fs/read_write.c:1464
 __do_sys_sendfile64 fs/read_write.c:1525 [inline]
 __se_sys_sendfile64 fs/read_write.c:1511 [inline]
 __x64_sys_sendfile64+0x1dd/0x220 fs/read_write.c:1511
 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x441409
Code: e8 ac e8 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fffb64c4f78 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441409
RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000005
RBP: 0000000000073b8a R08: 0000000000000010 R09: 0000000000000010
R10: 0000000000010001 R11: 0000000000000246 R12: 0000000000402180
R13: 0000000000402210 R14: 0000000000000000 R15: 0000000000000000
Kernel Offset: disabled
Rebooting in 86400 seconds..

Fixes: 1470ddf7f8 ("inet: Remove explicit write references to sk/inet in ip_append_data")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-07 11:55:11 -08:00
David S. Miller af144a9834 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Two cases of overlapping changes, nothing fancy.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 19:48:57 -07:00
Matteo Croce 2e60546368 ipv4: don't set IPv6 only flags to IPv4 addresses
Avoid the situation where an IPV6 only flag is applied to an IPv4 address:

    # ip addr add 192.0.2.1/24 dev dummy0 nodad home mngtmpaddr noprefixroute
    # ip -4 addr show dev dummy0
    2: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
        inet 192.0.2.1/24 scope global noprefixroute dummy0
           valid_lft forever preferred_lft forever

Or worse, by sending a malicious netlink command:

    # ip -4 addr show dev dummy0
    2: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
        inet 192.0.2.1/24 scope global nodad optimistic dadfailed home tentative mngtmpaddr noprefixroute stable-privacy dummy0
           valid_lft forever preferred_lft forever

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 11:31:38 -07:00
Florian Westphal 6a9e9cea4c net: ipv4: fix infinite loop on secondary addr promotion
secondary address promotion causes infinite loop -- it arranges
for ifa->ifa_next to point back to itself.

Problem is that 'prev_prom' and 'last_prim' might point at the same entry,
so 'last_sec' pointer must be obtained after prev_prom->next update.

Fixes: 2638eb8b50 ("net: ipv4: provide __rcu annotation for ifa_list")
Reported-by: Ran Rozenstein <ranro@mellanox.com>
Reported-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 09:54:34 -07:00
Shijie Luo 650638a7c6 ipv4: fix confirm_addr_indev() when enable route_localnet
When arp_ignore=3, the NIC won't reply for scope host addresses, but
if enable route_locanet, we need to reply ip address with head 127 and
scope RT_SCOPE_HOST.

Fixes: d0daebc3d6 ("ipv4: Add interface option to enable routing of 127.0.0.0/8")

Signed-off-by: Shijie Luo <luoshijie1@huawei.com>
Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24 09:02:47 -07:00
Shijie Luo d8c444d540 ipv4: fix inet_select_addr() when enable route_localnet
Suppose we have two interfaces eth0 and eth1 in two hosts, follow
the same steps in the two hosts:
 # sysctl -w net.ipv4.conf.eth1.route_localnet=1
 # sysctl -w net.ipv4.conf.eth1.arp_announce=2
 # ip route del 127.0.0.0/8 dev lo table local
and then set ip to eth1 in host1 like:
 # ifconfig eth1 127.25.3.4/24
set ip to eth2 in host2 and ping host1:
 # ifconfig eth1 127.25.3.14/24
 # ping -I eth1 127.25.3.4
Well, host2 cannot connect to host1.

When set a ip address with head 127, the scope of the address defaults
to RT_SCOPE_HOST. In this situation, host2 will use arp_solicit() to
send a arp request for the mac address of host1 with ip
address 127.25.3.14. When arp_announce=2, inet_select_addr() cannot
select a correct saddr with condition ifa->ifa_scope > scope, because
ifa_scope is RT_SCOPE_HOST and scope is RT_SCOPE_LINK. Then,
inet_select_addr() will go to no_in_dev to lookup all interfaces to find
a primary ip and finally get the primary ip of eth0.

Here I add a localnet_scope defaults to RT_SCOPE_HOST, and when
route_localnet is enabled, this value changes to RT_SCOPE_LINK to make
inet_select_addr() find a correct primary ip as saddr of arp request.

Fixes: d0daebc3d6 ("ipv4: Add interface option to enable routing of 127.0.0.0/8")

Signed-off-by: Shijie Luo <luoshijie1@huawei.com>
Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24 09:01:42 -07:00
Florian Westphal 40008e9211 net: ipv4: remove erroneous advancement of list pointer
Causes crash when lifetime expires on an adress as garbage is
dereferenced soon after.

This used to look like this:

 for (ifap = &ifa->ifa_dev->ifa_list;
      *ifap != NULL; ifap = &(*ifap)->ifa_next) {
          if (*ifap == ifa) ...

but this was changed to:

struct in_ifaddr *tmp;

ifap = &ifa->ifa_dev->ifa_list;
tmp = rtnl_dereference(*ifap);
while (tmp) {
   tmp = rtnl_dereference(tmp->ifa_next); // Bogus
   if (rtnl_dereference(*ifap) == ifa) {
     ...
   ifap = &tmp->ifa_next;		// Can be NULL
   tmp = rtnl_dereference(*ifap);	// Dereference
   }
}

Remove the bogus assigment/list entry skip.

Fixes: 2638eb8b50 ("net: ipv4: provide __rcu annotation for ifa_list")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-17 16:27:42 -07:00
David S. Miller a6cdeeb16b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Some ISDN files that got removed in net-next had some changes
done in mainline, take the removals.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-07 11:00:14 -07:00
Florian Westphal d3e6e285ff net: ipv4: fix rcu lockdep splat due to wrong annotation
syzbot triggered following splat when strict netlink
validation is enabled:

net/ipv4/devinet.c:1766 suspicious rcu_dereference_check() usage!

This occurs because we hold RTNL mutex, but no rcu read lock.
The second call site holds both, so just switch to the _rtnl variant.

Reported-by: syzbot+bad6e32808a3a97b1515@syzkaller.appspotmail.com
Fixes: 2638eb8b50 ("net: ipv4: provide __rcu annotation for ifa_list")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-04 14:24:10 -07:00