Commit Graph

297 Commits

Author SHA1 Message Date
Guillaume Nault b7133dffdb ipv4: fix source address selection with route leak
JIRA: https://issues.redhat.com/browse/RHEL-61380
Upstream Status: linux.git

commit 6807352353561187a718e87204458999dbcbba1b
Author: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date:   Wed Jul 10 10:14:27 2024 +0200

    ipv4: fix source address selection with route leak

    By default, an address assigned to the output interface is selected when
    the source address is not specified. This is problematic when a route,
    configured in a vrf, uses an interface from another vrf (aka route leak).
    The original vrf does not own the selected source address.

    Let's add a check against the output interface and call the appropriate
    function to select the source address.

    CC: stable@vger.kernel.org
    Fixes: 8cbb512c92 ("net: Add source address lookup op for VRF")
    Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Link: https://patch.msgid.link/20240710081521.3809742-2-nicolas.dichtel@6wind.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Guillaume Nault <gnault@redhat.com>
2024-10-02 21:02:45 +02:00
Guillaume Nault 1a8c9c1f1e ipv4: fib: annotate races around nh->nh_saddr_genid and nh->nh_saddr
JIRA: https://issues.redhat.com/browse/RHEL-61380
Upstream Status: linux.git

commit 195374d893681da43a39796e53b30ac4f20400c4
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Oct 17 19:23:04 2023 +0000

    ipv4: fib: annotate races around nh->nh_saddr_genid and nh->nh_saddr

    syzbot reported a data-race while accessing nh->nh_saddr_genid [1]

    Add annotations, but leave the code lazy as intended.

    [1]
    BUG: KCSAN: data-race in fib_select_path / fib_select_path

    write to 0xffff8881387166f0 of 4 bytes by task 6778 on cpu 1:
    fib_info_update_nhc_saddr net/ipv4/fib_semantics.c:1334 [inline]
    fib_result_prefsrc net/ipv4/fib_semantics.c:1354 [inline]
    fib_select_path+0x292/0x330 net/ipv4/fib_semantics.c:2269
    ip_route_output_key_hash_rcu+0x659/0x12c0 net/ipv4/route.c:2810
    ip_route_output_key_hash net/ipv4/route.c:2644 [inline]
    __ip_route_output_key include/net/route.h:134 [inline]
    ip_route_output_flow+0xa6/0x150 net/ipv4/route.c:2872
    send4+0x1f5/0x520 drivers/net/wireguard/socket.c:61
    wg_socket_send_skb_to_peer+0x94/0x130 drivers/net/wireguard/socket.c:175
    wg_socket_send_buffer_to_peer+0xd6/0x100 drivers/net/wireguard/socket.c:200
    wg_packet_send_handshake_initiation drivers/net/wireguard/send.c:40 [inline]
    wg_packet_handshake_send_worker+0x10c/0x150 drivers/net/wireguard/send.c:51
    process_one_work kernel/workqueue.c:2630 [inline]
    process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2703
    worker_thread+0x525/0x730 kernel/workqueue.c:2784
    kthread+0x1d7/0x210 kernel/kthread.c:388
    ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147
    ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304

    read to 0xffff8881387166f0 of 4 bytes by task 6759 on cpu 0:
    fib_result_prefsrc net/ipv4/fib_semantics.c:1350 [inline]
    fib_select_path+0x1cb/0x330 net/ipv4/fib_semantics.c:2269
    ip_route_output_key_hash_rcu+0x659/0x12c0 net/ipv4/route.c:2810
    ip_route_output_key_hash net/ipv4/route.c:2644 [inline]
    __ip_route_output_key include/net/route.h:134 [inline]
    ip_route_output_flow+0xa6/0x150 net/ipv4/route.c:2872
    send4+0x1f5/0x520 drivers/net/wireguard/socket.c:61
    wg_socket_send_skb_to_peer+0x94/0x130 drivers/net/wireguard/socket.c:175
    wg_socket_send_buffer_to_peer+0xd6/0x100 drivers/net/wireguard/socket.c:200
    wg_packet_send_handshake_initiation drivers/net/wireguard/send.c:40 [inline]
    wg_packet_handshake_send_worker+0x10c/0x150 drivers/net/wireguard/send.c:51
    process_one_work kernel/workqueue.c:2630 [inline]
    process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2703
    worker_thread+0x525/0x730 kernel/workqueue.c:2784
    kthread+0x1d7/0x210 kernel/kthread.c:388
    ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147
    ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304

    value changed: 0x959d3217 -> 0x959d3218

    Reported by Kernel Concurrency Sanitizer on:
    CPU: 0 PID: 6759 Comm: kworker/u4:15 Not tainted 6.6.0-rc4-syzkaller-00029-gcbf3a2cb156a #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/06/2023
    Workqueue: wg-kex-wg1 wg_packet_handshake_send_worker

    Fixes: 436c3b66ec ("ipv4: Invalidate nexthop cache nh_saddr more correctly.")
    Reported-by: syzbot <syzkaller@googlegroups.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Link: https://lore.kernel.org/r/20231017192304.82626-1-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Guillaume Nault <gnault@redhat.com>
2024-10-02 21:02:33 +02:00
Guillaume Nault 75d61b7226 ipv4: annotate data-races around fi->fib_dead
JIRA: https://issues.redhat.com/browse/RHEL-61380
Upstream Status: linux.git

commit fce92af1c29d90184dfec638b5738831097d66e9
Author: Eric Dumazet <edumazet@google.com>
Date:   Wed Aug 30 09:55:20 2023 +0000

    ipv4: annotate data-races around fi->fib_dead

    syzbot complained about a data-race in fib_table_lookup() [1]

    Add appropriate annotations to document it.

    [1]
    BUG: KCSAN: data-race in fib_release_info / fib_table_lookup

    write to 0xffff888150f31744 of 1 bytes by task 1189 on cpu 0:
    fib_release_info+0x3a0/0x460 net/ipv4/fib_semantics.c:281
    fib_table_delete+0x8d2/0x900 net/ipv4/fib_trie.c:1777
    fib_magic+0x1c1/0x1f0 net/ipv4/fib_frontend.c:1106
    fib_del_ifaddr+0x8cf/0xa60 net/ipv4/fib_frontend.c:1317
    fib_inetaddr_event+0x77/0x200 net/ipv4/fib_frontend.c:1448
    notifier_call_chain kernel/notifier.c:93 [inline]
    blocking_notifier_call_chain+0x90/0x200 kernel/notifier.c:388
    __inet_del_ifa+0x4df/0x800 net/ipv4/devinet.c:432
    inet_del_ifa net/ipv4/devinet.c:469 [inline]
    inetdev_destroy net/ipv4/devinet.c:322 [inline]
    inetdev_event+0x553/0xaf0 net/ipv4/devinet.c:1606
    notifier_call_chain kernel/notifier.c:93 [inline]
    raw_notifier_call_chain+0x6b/0x1c0 kernel/notifier.c:461
    call_netdevice_notifiers_info net/core/dev.c:1962 [inline]
    call_netdevice_notifiers_mtu+0xd2/0x130 net/core/dev.c:2037
    dev_set_mtu_ext+0x30b/0x3e0 net/core/dev.c:8673
    do_setlink+0x5be/0x2430 net/core/rtnetlink.c:2837
    rtnl_setlink+0x255/0x300 net/core/rtnetlink.c:3177
    rtnetlink_rcv_msg+0x807/0x8c0 net/core/rtnetlink.c:6445
    netlink_rcv_skb+0x126/0x220 net/netlink/af_netlink.c:2549
    rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:6463
    netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
    netlink_unicast+0x56f/0x640 net/netlink/af_netlink.c:1365
    netlink_sendmsg+0x665/0x770 net/netlink/af_netlink.c:1914
    sock_sendmsg_nosec net/socket.c:725 [inline]
    sock_sendmsg net/socket.c:748 [inline]
    sock_write_iter+0x1aa/0x230 net/socket.c:1129
    do_iter_write+0x4b4/0x7b0 fs/read_write.c:860
    vfs_writev+0x1a8/0x320 fs/read_write.c:933
    do_writev+0xf8/0x220 fs/read_write.c:976
    __do_sys_writev fs/read_write.c:1049 [inline]
    __se_sys_writev fs/read_write.c:1046 [inline]
    __x64_sys_writev+0x45/0x50 fs/read_write.c:1046
    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
    entry_SYSCALL_64_after_hwframe+0x63/0xcd

    read to 0xffff888150f31744 of 1 bytes by task 21839 on cpu 1:
    fib_table_lookup+0x2bf/0xd50 net/ipv4/fib_trie.c:1585
    fib_lookup include/net/ip_fib.h:383 [inline]
    ip_route_output_key_hash_rcu+0x38c/0x12c0 net/ipv4/route.c:2751
    ip_route_output_key_hash net/ipv4/route.c:2641 [inline]
    __ip_route_output_key include/net/route.h:134 [inline]
    ip_route_output_flow+0xa6/0x150 net/ipv4/route.c:2869
    send4+0x1e7/0x500 drivers/net/wireguard/socket.c:61
    wg_socket_send_skb_to_peer+0x94/0x130 drivers/net/wireguard/socket.c:175
    wg_socket_send_buffer_to_peer+0xd6/0x100 drivers/net/wireguard/socket.c:200
    wg_packet_send_handshake_initiation drivers/net/wireguard/send.c:40 [inline]
    wg_packet_handshake_send_worker+0x10c/0x150 drivers/net/wireguard/send.c:51
    process_one_work+0x434/0x860 kernel/workqueue.c:2600
    worker_thread+0x5f2/0xa10 kernel/workqueue.c:2751
    kthread+0x1d7/0x210 kernel/kthread.c:389
    ret_from_fork+0x2e/0x40 arch/x86/kernel/process.c:145
    ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304

    value changed: 0x00 -> 0x01

    Reported by Kernel Concurrency Sanitizer on:
    CPU: 1 PID: 21839 Comm: kworker/u4:18 Tainted: G W 6.5.0-syzkaller #0

    Fixes: dccd9ecc37 ("ipv4: Do not use dead fib_info entries.")
    Reported-by: syzbot <syzkaller@googlegroups.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Link: https://lore.kernel.org/r/20230830095520.1046984-1-edumazet@google.com
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Guillaume Nault <gnault@redhat.com>
2024-10-02 21:02:23 +02:00
Hangbin Liu 0994388236 ipv4/fib: send notify when delete source address routes
JIRA: https://issues.redhat.com/browse/RHEL-6012
Upstream Status: net.git commit 4b2b606075e5

commit 4b2b606075e50cdae62ab2356b0a1e206947c354
Author: Hangbin Liu <liuhangbin@gmail.com>
Date:   Fri Sep 22 15:55:08 2023 +0800

    ipv4/fib: send notify when delete source address routes

    After deleting an interface address in fib_del_ifaddr(), the function
    scans the fib_info list for stray entries and calls fib_flush() and
    fib_table_flush(). Then the stray entries will be deleted silently and no
    RTM_DELROUTE notification will be sent.

    This lack of notification can make routing daemons, or monitor like
    `ip monitor route` miss the routing changes. e.g.

    + ip link add dummy1 type dummy
    + ip link add dummy2 type dummy
    + ip link set dummy1 up
    + ip link set dummy2 up
    + ip addr add 192.168.5.5/24 dev dummy1
    + ip route add 7.7.7.0/24 dev dummy2 src 192.168.5.5
    + ip -4 route
    7.7.7.0/24 dev dummy2 scope link src 192.168.5.5
    192.168.5.0/24 dev dummy1 proto kernel scope link src 192.168.5.5
    + ip monitor route
    + ip addr del 192.168.5.5/24 dev dummy1
    Deleted 192.168.5.0/24 dev dummy1 proto kernel scope link src 192.168.5.5
    Deleted broadcast 192.168.5.255 dev dummy1 table local proto kernel scope link src 192.168.5.5
    Deleted local 192.168.5.5 dev dummy1 table local proto kernel scope host src 192.168.5.5

    As Ido reminded, fib_table_flush() isn't only called when an address is
    deleted, but also when an interface is deleted or put down. The lack of
    notification in these cases is deliberate. And commit 7c6bb7d2fa
    ("net/ipv6: Add knob to skip DELROUTE message on device down") introduced
    a sysctl to make IPv6 behave like IPv4 in this regard. So we can't send
    the route delete notify blindly in fib_table_flush().

    To fix this issue, let's add a new flag in "struct fib_info" to track the
    deleted prefer source address routes, and only send notify for them.

    After update:
    + ip monitor route
    + ip addr del 192.168.5.5/24 dev dummy1
    Deleted 192.168.5.0/24 dev dummy1 proto kernel scope link src 192.168.5.5
    Deleted broadcast 192.168.5.255 dev dummy1 table local proto kernel scope link src 192.168.5.5
    Deleted local 192.168.5.5 dev dummy1 table local proto kernel scope host src 192.168.5.5
    Deleted 7.7.7.0/24 dev dummy2 scope link src 192.168.5.5

    Suggested-by: Thomas Haller <thaller@redhat.com>
    Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
    Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Link: https://lore.kernel.org/r/20230922075508.848925-1-liuhangbin@gmail.com
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Hangbin Liu <haliu@redhat.com>
2023-12-21 09:23:08 +08:00
Ivan Vecera b167c4e0b9 neighbour: annotate lockless accesses to n->nud_state
JIRA: https://issues.redhat.com/browse/RHEL-16999

commit b071af523579df7341cabf0f16fc661125e9a13f
Author: Eric Dumazet <edumazet@google.com>
Date:   Mon Mar 13 20:17:31 2023 +0000

    neighbour: annotate lockless accesses to n->nud_state

    We have many lockless accesses to n->nud_state.

    Before adding another one in the following patch,
    add annotations to readers and writers.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2023-11-20 19:28:55 +01:00
Guillaume Nault 1e9afbf527 ipv4: prevent potential spectre v1 gadget in fib_metrics_match()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2186795
Upstream Status: linux.git

commit 5e9398a26a92fc402d82ce1f97cc67d832527da0
Author: Eric Dumazet <edumazet@google.com>
Date:   Fri Jan 20 13:31:40 2023 +0000

    ipv4: prevent potential spectre v1 gadget in fib_metrics_match()

    if (!type)
            continue;
        if (type > RTAX_MAX)
            return false;
        ...
        fi_val = fi->fib_metrics->metrics[type - 1];

    @type being used as an array index, we need to prevent
    cpu speculation or risk leaking kernel memory content.

    Fixes: 5f9ae3d9e7 ("ipv4: do metrics match when looking up and deleting a route")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Link: https://lore.kernel.org/r/20230120133140.3624204-1-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Guillaume Nault <gnault@redhat.com>
2023-04-14 16:19:45 +02:00
Guillaume Nault 9d65a7dc2f ipv4: add net_hash_mix() dispersion to fib_info_laddrhash keys
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2186795
Upstream Status: linux.git

commit 79eb15da3cd68f04b06edf73f9bbafa70a06871f
Author: Eric Dumazet <edumazet@google.com>
Date:   Wed Jan 19 02:04:13 2022 -0800

    ipv4: add net_hash_mix() dispersion to fib_info_laddrhash keys

    net/ipv4/fib_semantics.c uses a hash table (fib_info_laddrhash)
    in which fib_sync_down_addr() can locate fib_info
    based on IPv4 local address.

    This hash table is resized based on total number of
    hashed fib_info, but the hash function is only
    using the local address.

    For hosts having many active network namespaces,
    all fib_info for loopback devices (IPv4 address 127.0.0.1)
    are hashed into a single bucket, making netns dismantles
    very slow.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Guillaume Nault <gnault@redhat.com>
2023-04-14 16:19:32 +02:00
Guillaume Nault 9655b576ad ipv4: avoid quadratic behavior in netns dismantle
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2186795
Upstream Status: linux.git

commit d07418afea8f1d9896aaf9dc5ae47ac4f45b220c
Author: Eric Dumazet <edumazet@google.com>
Date:   Wed Jan 19 02:04:12 2022 -0800

    ipv4: avoid quadratic behavior in netns dismantle

    net/ipv4/fib_semantics.c uses an hash table of 256 slots,
    keyed by device ifindexes: fib_info_devhash[DEVINDEX_HASHSIZE]

    Problem is that with network namespaces, devices tend
    to use the same ifindex.

    lo device for instance has a fixed ifindex of one,
    for all network namespaces.

    This means that hosts with thousands of netns spend
    a lot of time looking at some hash buckets with thousands
    of elements, notably at netns dismantle.

    Simply add a per netns perturbation (net_hash_mix())
    to spread elements more uniformely.

    Also change fib_devindex_hashfn() to use more entropy.

    Fixes: aa79e66eee ("net: Make ifindex generation per-net namespace")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Guillaume Nault <gnault@redhat.com>
2023-04-14 16:19:26 +02:00
Íñigo Huguet 3a91b473a8 net: rename reference+tracking helpers
Bugzilla: https://bugzilla.redhat.com/2175258

Conflicts:
 - Removed chunks of unsupported protocol AX.25
 - Renamed the funtions also in ipvlan. Commit 40b9d1ab63f5 ("ipvlan: hold lower
   dev to avoid possible use-after-free") was backported out of order so it had
   to use the old functions names.

commit d62607c3fe45911b2331fac073355a8c914bbde2
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Tue Jun 7 21:39:55 2022 -0700

    net: rename reference+tracking helpers

    Netdev reference helpers have a dev_ prefix for historic
    reasons. Renaming the old helpers would be too much churn
    but we can rename the tracking ones which are relatively
    recent and should be the default for new code.

    Rename:
     dev_hold_track()    -> netdev_hold()
     dev_put_track()     -> netdev_put()
     dev_replace_track() -> netdev_ref_replace()

    Link: https://lore.kernel.org/r/20220608043955.919359-1-kuba@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
2023-03-23 16:19:21 +01:00
Herton R. Krzesinski 13be0a21a9 Merge: ipv4: Secound round of upstream fixes for RHEL 9.2.
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1908

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2162116
Upstream Status: linux.git

Signed-off-by: Guillaume Nault <gnault@redhat.com>

Approved-by: Antoine Tenart <atenart@redhat.com>
Approved-by: Davide Caratti <dcaratti@redhat.com>
Approved-by: Marcelo Ricardo Leitner <mleitner@redhat.com>

Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
2023-02-01 16:08:59 +00:00
Guillaume Nault d99a27877b ipv4: Fix incorrect route flushing when source address is deleted
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2162116
Upstream Status: linux.git

commit f96a3d74554df537b6db5c99c27c80e7afadc8d1
Author: Ido Schimmel <idosch@nvidia.com>
Date:   Sun Dec 4 09:50:44 2022 +0200

    ipv4: Fix incorrect route flushing when source address is deleted

    Cited commit added the table ID to the FIB info structure, but did not
    prevent structures with different table IDs from being consolidated.
    This can lead to routes being flushed from a VRF when an address is
    deleted from a different VRF.

    Fix by taking the table ID into account when looking for a matching FIB
    info. This is already done for FIB info structures backed by a nexthop
    object in fib_find_info_nh().

    Add test cases that fail before the fix:

     # ./fib_tests.sh -t ipv4_del_addr

     IPv4 delete address route tests
         Regular FIB info
         TEST: Route removed from VRF when source address deleted            [ OK ]
         TEST: Route in default VRF not removed                              [ OK ]
         TEST: Route removed in default VRF when source address deleted      [ OK ]
         TEST: Route in VRF is not removed by address delete                 [ OK ]
         Identical FIB info with different table ID
         TEST: Route removed from VRF when source address deleted            [FAIL]
         TEST: Route in default VRF not removed                              [ OK ]
     RTNETLINK answers: File exists
         TEST: Route removed in default VRF when source address deleted      [ OK ]
         TEST: Route in VRF is not removed by address delete                 [FAIL]

     Tests passed:   6
     Tests failed:   2

    And pass after:

     # ./fib_tests.sh -t ipv4_del_addr

     IPv4 delete address route tests
         Regular FIB info
         TEST: Route removed from VRF when source address deleted            [ OK ]
         TEST: Route in default VRF not removed                              [ OK ]
         TEST: Route removed in default VRF when source address deleted      [ OK ]
         TEST: Route in VRF is not removed by address delete                 [ OK ]
         Identical FIB info with different table ID
         TEST: Route removed from VRF when source address deleted            [ OK ]
         TEST: Route in default VRF not removed                              [ OK ]
         TEST: Route removed in default VRF when source address deleted      [ OK ]
         TEST: Route in VRF is not removed by address delete                 [ OK ]

     Tests passed:   8
     Tests failed:   0

    Fixes: 5a56a0b3a4 ("net: Don't delete routes in different VRFs")
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Guillaume Nault <gnault@redhat.com>
2023-01-18 20:48:18 +01:00
Guillaume Nault 70d423451e ipv4: Fix route deletion when nexthop info is not specified
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2162116
Upstream Status: linux.git

commit d5082d386eee7e8ec46fa8581932c81a4961dcef
Author: Ido Schimmel <idosch@nvidia.com>
Date:   Thu Nov 24 23:09:32 2022 +0200

    ipv4: Fix route deletion when nexthop info is not specified

    When the kernel receives a route deletion request from user space it
    tries to delete a route that matches the route attributes specified in
    the request.

    If only prefix information is specified in the request, the kernel
    should delete the first matching FIB alias regardless of its associated
    FIB info. However, an error is currently returned when the FIB info is
    backed by a nexthop object:

     # ip nexthop add id 1 via 192.0.2.2 dev dummy10
     # ip route add 198.51.100.0/24 nhid 1
     # ip route del 198.51.100.0/24
     RTNETLINK answers: No such process

    Fix by matching on such a FIB info when legacy nexthop attributes are
    not specified in the request. An earlier check already covers the case
    where a nexthop ID is specified in the request.

    Add tests that cover these flows. Before the fix:

     # ./fib_nexthops.sh -t ipv4_fcnal
     ...
     TEST: Delete route when not specifying nexthop attributes           [FAIL]

     Tests passed:  11
     Tests failed:   1

    After the fix:

     # ./fib_nexthops.sh -t ipv4_fcnal
     ...
     TEST: Delete route when not specifying nexthop attributes           [ OK ]

     Tests passed:  12
     Tests failed:   0

    No regressions in other tests:

     # ./fib_nexthops.sh
     ...
     Tests passed: 228
     Tests failed:   0

     # ./fib_tests.sh
     ...
     Tests passed: 186
     Tests failed:   0

    Cc: stable@vger.kernel.org
    Reported-by: Jonas Gorski <jonas.gorski@gmail.com>
    Tested-by: Jonas Gorski <jonas.gorski@gmail.com>
    Fixes: 493ced1ac4 ("ipv4: Allow routes to use nexthop objects")
    Fixes: 6bf92d70e690 ("net: ipv4: fix route with nexthop object delete warning")
    Fixes: 61b91eb33a69 ("ipv4: Handle attempt to delete multipath route when fib_info contains an nh reference")
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Link: https://lore.kernel.org/r/20221124210932.2470010-1-idosch@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Guillaume Nault <gnault@redhat.com>
2023-01-18 20:48:11 +01:00
Guillaume Nault 4695d82ed4 ipv4: Fix a data-race around sysctl_fib_multipath_use_neigh.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2160073
Upstream Status: linux.git

commit 87507bcb4f5de16bb419e9509d874f4db6c0ad0f
Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Mon Jul 18 10:26:39 2022 -0700

    ipv4: Fix a data-race around sysctl_fib_multipath_use_neigh.

    While reading sysctl_fib_multipath_use_neigh, it can be changed
    concurrently.  Thus, we need to add READ_ONCE() to its reader.

    Fixes: a6db4494d2 ("net: ipv4: Consider failed nexthops in multipath routes")
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Guillaume Nault <gnault@redhat.com>
2023-01-17 12:25:13 +01:00
Guillaume Nault d4b33fa289 nexthop: Fix data-races around nexthop_compat_mode.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2149949
Upstream Status: linux.git

commit bdf00bf24bef9be1ca641a6390fd5487873e0d2e
Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Mon Jul 11 17:15:33 2022 -0700

    nexthop: Fix data-races around nexthop_compat_mode.

    While reading nexthop_compat_mode, it can be changed concurrently.
    Thus, we need to add READ_ONCE() to its readers.

    Fixes: 4f80116d3d ("net: ipv4: add sysctl for nexthop api compatibility mode")
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Guillaume Nault <gnault@redhat.com>
2022-12-22 11:37:53 +01:00
Frantisek Hrbata a992b2c2a7 Merge: netfilter: nft_fib: Fix for rpath check with VRF devices
Merge conflicts:
-----------------
net/ipv4/netfilter/nft_fib_ipv4.c
        - nft_fib4_eval()
          HEAD(!1475) is missing upstream acc641ab95b6 ("netfilter: rpfilter/fib: Populate flowic_l3mdev field")
          Resolved in favor of !1548

net/ipv6/netfilter/nft_fib_ipv6.c
        - nft_fib6_flowi_init()
          HEAD(!1475) is missing upstream acc641ab95b6 ("netfilter: rpfilter/fib: Populate flowic_l3mdev field")
          Resolved in favor of !1548

MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1548

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2129093

Signed-off-by: Phil Sutter <psutter@redhat.com>

Approved-by: Guillaume Nault <gnault@redhat.com>
Approved-by: Florian Westphal <fwestpha@redhat.com>

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
2022-11-11 09:20:29 +01:00
Frantisek Hrbata 0eb58055ef Merge: CNB: inet: Separate DSCP from ECN bits and use dscp_t for TOS fields
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1583

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2140160
Tested: Using backported fib self-tests (see https://bugzilla.redhat.com/show_bug.cgi?id=2140160#c1)

Commits:
```
a410a0cf9885 ("ipv6: Define dscp_t and stop taking ECN bits into account in fib6-rules")
563f8e97e054 ("ipv4: Stop taking ECN bits into account in fib4-rules")
f55fbb6afb8d ("ipv4: Reject routes specifying ECN bits in rtm_tos")
32ccf1107980 ("ipv4: Use dscp_t in struct fib_alias")
888ade8f90d7 ("ipv4: Use dscp_t in struct fib_rt_info")
568a3f33b427 ("ipv4: Use dscp_t in struct fib_entry_notifier_info")
20bbf32efe1e ("netdevsim: Use dscp_t in struct nsim_fib4_rt")
046eabbf1991 ("mlxsw: Use dscp_t in struct mlxsw_sp_fib4_entry")
dc513a405cad ("ipv4: Reject again rules with high DSCP values")
```

Signed-off-by: Ivan Vecera <ivecera@redhat.com>

Approved-by: Petr Oros <poros@redhat.com>
Approved-by: Jiri Benc <jbenc@redhat.com>

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
2022-11-08 09:08:23 -05:00
Ivan Vecera 7feda53aba ipv4: Use dscp_t in struct fib_rt_info
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2140160

Conflicts:
- removed n/a hunk for unsupported prestera driver

commit 888ade8f90d7dbbdc8552ae9b23d311f9e61ab0e
Author: Guillaume Nault <gnault@redhat.com>
Date:   Fri Apr 8 22:08:37 2022 +0200

    ipv4: Use dscp_t in struct fib_rt_info

    Use the new dscp_t type to replace the tos field of struct fib_rt_info.
    This ensures ECN bits are ignored and makes it compatible with the
    fa_dscp field of struct fib_alias.

    This also allows sparse to flag potential incorrect uses of DSCP and
    ECN bits.

    Signed-off-by: Guillaume Nault <gnault@redhat.com>
    Reviewed-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2022-11-04 18:01:45 +01:00
Ivan Vecera 0954ecd449 ipv4: Use dscp_t in struct fib_alias
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2140160

commit 32ccf1107980e8ed5c62cf6666da7a47a4fc7ecf
Author: Guillaume Nault <gnault@redhat.com>
Date:   Fri Feb 4 14:58:19 2022 +0100

    ipv4: Use dscp_t in struct fib_alias

    Use the new dscp_t type to replace the fa_tos field of fib_alias. This
    ensures ECN bits are ignored and makes the field compatible with the
    fc_dscp field of struct fib_config.

    Converting old *tos variables and fields to dscp_t allows sparse to
    flag incorrect uses of DSCP and ECN bits. This patch is entirely about
    type annotation and shouldn't change any existing behaviour.

    Signed-off-by: Guillaume Nault <gnault@redhat.com>
    Acked-by: David Ahern <dsahern@kernel.org>
    Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2022-11-04 18:01:15 +01:00
Guillaume Nault 1e20066efa ipv4: Handle attempt to delete multipath route when fib_info contains an nh reference
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2134815
Upstream Status: net.git

commit 61b91eb33a69c3be11b259c5ea484505cd79f883
Author: David Ahern <dsahern@kernel.org>
Date:   Thu Oct 6 10:48:49 2022 -0600

    ipv4: Handle attempt to delete multipath route when fib_info contains an nh reference

    Gwangun Jung reported a slab-out-of-bounds access in fib_nh_match:
        fib_nh_match+0xf98/0x1130 linux-6.0-rc7/net/ipv4/fib_semantics.c:961
        fib_table_delete+0x5f3/0xa40 linux-6.0-rc7/net/ipv4/fib_trie.c:1753
        inet_rtm_delroute+0x2b3/0x380 linux-6.0-rc7/net/ipv4/fib_frontend.c:874

    Separate nexthop objects are mutually exclusive with the legacy
    multipath spec. Fix fib_nh_match to return if the config for the
    to be deleted route contains a multipath spec while the fib_info
    is using a nexthop object.

    Fixes: 493ced1ac4 ("ipv4: Allow routes to use nexthop objects")
    Fixes: 6bf92d70e690 ("net: ipv4: fix route with nexthop object delete warning")
    Reported-by: Gwangun Jung <exsociety@gmail.com>
    Signed-off-by: David Ahern <dsahern@kernel.org>
    Reviewed-by: Ido Schimmel <idosch@nvidia.com>
    Tested-by: Ido Schimmel <idosch@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Guillaume Nault <gnault@redhat.com>
2022-11-02 09:25:07 +01:00
Phil Sutter ace54a48e5 net: Add l3mdev index to flow struct and avoid oif reset for port devices
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2129093
Upstream Status: commit 40867d74c374b

commit 40867d74c374b235e14d839f3a77f26684feefe5
Author: David Ahern <dsahern@kernel.org>
Date:   Mon Mar 14 14:45:51 2022 -0600

    net: Add l3mdev index to flow struct and avoid oif reset for port devices

    The fundamental premise of VRF and l3mdev core code is binding a socket
    to a device (l3mdev or netdev with an L3 domain) to indicate L3 scope.
    Legacy code resets flowi_oif to the l3mdev losing any original port
    device binding. Ben (among others) has demonstrated use cases where the
    original port device binding is important and needs to be retained.
    This patch handles that by adding a new entry to the common flow struct
    that can indicate the l3mdev index for later rule and table matching
    avoiding the need to reset flowi_oif.

    In addition to allowing more use cases that require port device binds,
    this patch brings a few datapath simplications:

    1. l3mdev_fib_rule_match is only called when walking fib rules and
       always after l3mdev_update_flow. That allows an optimization to bail
       early for non-VRF type uses cases when flowi_l3mdev is not set. Also,
       only that index needs to be checked for the FIB table id.

    2. l3mdev_update_flow can be called with flowi_oif set to a l3mdev
       (e.g., VRF) device. By resetting flowi_oif only for this case the
       FLOWI_FLAG_SKIP_NH_OIF flag is not longer needed and can be removed,
       removing several checks in the datapath. The flowi_iif path can be
       simplified to only be called if the it is not loopback (loopback can
       not be assigned to an L3 domain) and the l3mdev index is not already
       set.

    3. Avoid another device lookup in the output path when the fib lookup
       returns a reject failure.

    Note: 2 functional tests for local traffic with reject fib rules are
    updated to reflect the new direct failure at FIB lookup time for ping
    rather than the failure on packet path. The current code fails like this:

        HINT: Fails since address on vrf device is out of device scope
        COMMAND: ip netns exec ns-A ping -c1 -w1 -I eth1 172.16.3.1
        ping: Warning: source address might be selected on device other than: eth1
        PING 172.16.3.1 (172.16.3.1) from 172.16.3.1 eth1: 56(84) bytes of data.

        --- 172.16.3.1 ping statistics ---
        1 packets transmitted, 0 received, 100% packet loss, time 0ms

    where the test now directly fails:

        HINT: Fails since address on vrf device is out of device scope
        COMMAND: ip netns exec ns-A ping -c1 -w1 -I eth1 172.16.3.1
        ping: connect: No route to host

    Signed-off-by: David Ahern <dsahern@kernel.org>
    Tested-by: Ben Greear <greearb@candelatech.com>
    Link: https://lore.kernel.org/r/20220314204551.16369-1-dsahern@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Phil Sutter <psutter@redhat.com>
2022-10-28 22:35:32 +02:00
Guillaume Nault a4139a0aa4 net: ipv4: fix route with nexthop object delete warning
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2104124
Upstream Status: linux.git

commit 6bf92d70e690b7ff12b24f4bfff5e5434d019b82
Author: Nikolay Aleksandrov <razor@blackwall.org>
Date:   Fri Apr 1 10:33:42 2022 +0300

    net: ipv4: fix route with nexthop object delete warning

    FRR folks have hit a kernel warning[1] while deleting routes[2] which is
    caused by trying to delete a route pointing to a nexthop id without
    specifying nhid but matching on an interface. That is, a route is found
    but we hit a warning while matching it. The warning is from
    fib_info_nh() in include/net/nexthop.h because we run it on a fib_info
    with nexthop object. The call chain is:
     inet_rtm_delroute -> fib_table_delete -> fib_nh_match (called with a
    nexthop fib_info and also with fc_oif set thus calling fib_info_nh on
    the fib_info and triggering the warning). The fix is to not do any
    matching in that branch if the fi has a nexthop object because those are
    managed separately. I.e. we should match when deleting without nh spec and
    should fail when deleting a nexthop route with old-style nh spec because
    nexthop objects are managed separately, e.g.:
     $ ip r show 1.2.3.4/32
     1.2.3.4 nhid 12 via 192.168.11.2 dev dummy0

     $ ip r del 1.2.3.4/32
     $ ip r del 1.2.3.4/32 nhid 12
     <both should work>

     $ ip r del 1.2.3.4/32 dev dummy0
     <should fail with ESRCH>

    [1]
     [  523.462226] ------------[ cut here ]------------
     [  523.462230] WARNING: CPU: 14 PID: 22893 at include/net/nexthop.h:468 fib_nh_match+0x210/0x460
     [  523.462236] Modules linked in: dummy rpcsec_gss_krb5 xt_socket nf_socket_ipv4 nf_socket_ipv6 ip6table_raw iptable_raw bpf_preload xt_statistic ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs xt_mark nf_tables xt_nat veth nf_conntrack_netlink nfnetlink xt_addrtype br_netfilter overlay dm_crypt nfsv3 nfs fscache netfs vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM xt_MASQUERADE xt_conntrack 8021q garp mrp ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bridge stp llc rfcomm snd_seq_dummy snd_hrtimer rpcrdma rdma_cm iw_cm ib_cm ib_core ip6table_filter xt_comment ip6_tables vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) qrtr bnep binfmt_misc xfs vfat fat squashfs loop nvidia_drm(POE) nvidia_modeset(POE) nvidia_uvm(POE) nvidia(POE) intel_rapl_msr intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi btusb btrtl iwlmvm uvcvideo btbcm snd_hda_intel edac_mce_amd
     [  523.462274]  videobuf2_vmalloc videobuf2_memops btintel snd_intel_dspcfg videobuf2_v4l2 snd_intel_sdw_acpi bluetooth snd_usb_audio snd_hda_codec mac80211 snd_usbmidi_lib joydev snd_hda_core videobuf2_common kvm_amd snd_rawmidi snd_hwdep snd_seq videodev ccp snd_seq_device libarc4 ecdh_generic mc snd_pcm kvm iwlwifi snd_timer drm_kms_helper snd cfg80211 cec soundcore irqbypass rapl wmi_bmof i2c_piix4 rfkill k10temp pcspkr acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc drm zram ip_tables crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel nvme sp5100_tco r8169 nvme_core wmi ipmi_devintf ipmi_msghandler fuse
     [  523.462300] CPU: 14 PID: 22893 Comm: ip Tainted: P           OE     5.16.18-200.fc35.x86_64 #1
     [  523.462302] Hardware name: Micro-Star International Co., Ltd. MS-7C37/MPG X570 GAMING EDGE WIFI (MS-7C37), BIOS 1.C0 10/29/2020
     [  523.462303] RIP: 0010:fib_nh_match+0x210/0x460
     [  523.462304] Code: 7c 24 20 48 8b b5 90 00 00 00 e8 bb ee f4 ff 48 8b 7c 24 20 41 89 c4 e8 ee eb f4 ff 45 85 e4 0f 85 2e fe ff ff e9 4c ff ff ff <0f> 0b e9 17 ff ff ff 3c 0a 0f 85 61 fe ff ff 48 8b b5 98 00 00 00
     [  523.462306] RSP: 0018:ffffaa53d4d87928 EFLAGS: 00010286
     [  523.462307] RAX: 0000000000000000 RBX: ffffaa53d4d87a90 RCX: ffffaa53d4d87bb0
     [  523.462308] RDX: ffff9e3d2ee6be80 RSI: ffffaa53d4d87a90 RDI: ffffffff920ed380
     [  523.462309] RBP: ffff9e3d2ee6be80 R08: 0000000000000064 R09: 0000000000000000
     [  523.462310] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000031
     [  523.462310] R13: 0000000000000020 R14: 0000000000000000 R15: ffff9e3d331054e0
     [  523.462311] FS:  00007f245517c1c0(0000) GS:ffff9e492ed80000(0000) knlGS:0000000000000000
     [  523.462313] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
     [  523.462313] CR2: 000055e5dfdd8268 CR3: 00000003ef488000 CR4: 0000000000350ee0
     [  523.462315] Call Trace:
     [  523.462316]  <TASK>
     [  523.462320]  fib_table_delete+0x1a9/0x310
     [  523.462323]  inet_rtm_delroute+0x93/0x110
     [  523.462325]  rtnetlink_rcv_msg+0x133/0x370
     [  523.462327]  ? _copy_to_iter+0xb5/0x6f0
     [  523.462330]  ? rtnl_calcit.isra.0+0x110/0x110
     [  523.462331]  netlink_rcv_skb+0x50/0xf0
     [  523.462334]  netlink_unicast+0x211/0x330
     [  523.462336]  netlink_sendmsg+0x23f/0x480
     [  523.462338]  sock_sendmsg+0x5e/0x60
     [  523.462340]  ____sys_sendmsg+0x22c/0x270
     [  523.462341]  ? import_iovec+0x17/0x20
     [  523.462343]  ? sendmsg_copy_msghdr+0x59/0x90
     [  523.462344]  ? __mod_lruvec_page_state+0x85/0x110
     [  523.462348]  ___sys_sendmsg+0x81/0xc0
     [  523.462350]  ? netlink_seq_start+0x70/0x70
     [  523.462352]  ? __dentry_kill+0x13a/0x180
     [  523.462354]  ? __fput+0xff/0x250
     [  523.462356]  __sys_sendmsg+0x49/0x80
     [  523.462358]  do_syscall_64+0x3b/0x90
     [  523.462361]  entry_SYSCALL_64_after_hwframe+0x44/0xae
     [  523.462364] RIP: 0033:0x7f24552aa337
     [  523.462365] Code: 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
     [  523.462366] RSP: 002b:00007fff7f05a838 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
     [  523.462368] RAX: ffffffffffffffda RBX: 000000006245bf91 RCX: 00007f24552aa337
     [  523.462368] RDX: 0000000000000000 RSI: 00007fff7f05a8a0 RDI: 0000000000000003
     [  523.462369] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
     [  523.462370] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000001
     [  523.462370] R13: 00007fff7f05ce08 R14: 0000000000000000 R15: 000055e5dfdd1040
     [  523.462373]  </TASK>
     [  523.462374] ---[ end trace ba537bc16f6bf4ed ]---

    [2] https://github.com/FRRouting/frr/issues/6412

    Fixes: 4c7e8084fd ("ipv4: Plumb support for nexthop object in a fib_info")
    Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Guillaume Nault <gnault@redhat.com>
2022-07-05 17:22:57 +02:00
Guillaume Nault 8584ef1c1d ipv4: Check attribute length for RTA_FLOW in multipath route
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2104124
Upstream Status: linux.git

commit 664b9c4b7392ce723b013201843264bf95481ce5
Author: David Ahern <dsahern@kernel.org>
Date:   Thu Dec 30 17:36:32 2021 -0700

    ipv4: Check attribute length for RTA_FLOW in multipath route

    Make sure RTA_FLOW is at least 4B before using.

    Fixes: 4e902c5741 ("[IPv4]: FIB configuration using struct fib_config")
    Signed-off-by: David Ahern <dsahern@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Guillaume Nault <gnault@redhat.com>
2022-07-05 17:22:57 +02:00
Guillaume Nault 49529e8914 ipv4: Check attribute length for RTA_GATEWAY in multipath route
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2104124
Upstream Status: linux.git

commit 7a3429bace0e08d94c39245631ea6bc109dafa49
Author: David Ahern <dsahern@kernel.org>
Date:   Thu Dec 30 17:36:31 2021 -0700

    ipv4: Check attribute length for RTA_GATEWAY in multipath route

    syzbot reported uninit-value:
    ============================================================
      BUG: KMSAN: uninit-value in fib_get_nhs+0xac4/0x1f80
      net/ipv4/fib_semantics.c:708
       fib_get_nhs+0xac4/0x1f80 net/ipv4/fib_semantics.c:708
       fib_create_info+0x2411/0x4870 net/ipv4/fib_semantics.c:1453
       fib_table_insert+0x45c/0x3a10 net/ipv4/fib_trie.c:1224
       inet_rtm_newroute+0x289/0x420 net/ipv4/fib_frontend.c:886

    Add helper to validate RTA_GATEWAY length before using the attribute.

    Fixes: 4e902c5741 ("[IPv4]: FIB configuration using struct fib_config")
    Reported-by: syzbot+d4b9a2851cc3ce998741@syzkaller.appspotmail.com
    Signed-off-by: David Ahern <dsahern@kernel.org>
    Cc: Thomas Graf <tgraf@suug.ch>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Guillaume Nault <gnault@redhat.com>
2022-07-05 17:22:57 +02:00
Ivan Vecera e152f0dd49 inet: add net device refcount tracker to struct fib_nh_common
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2096377

commit e44b14ebae1025cff3bef2d78a2e2f6869cefca0
Author: Eric Dumazet <edumazet@google.com>
Date:   Mon Dec 6 17:30:32 2021 -0800

    inet: add net device refcount tracker to struct fib_nh_common

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2022-06-13 18:38:36 +02:00
Patrick Talbert f311aab772 Merge: net: backport core fixes from upstream
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/832

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2081920

A bunch of fixes for net core path.

Signed-off-by: Hangbin Liu <haliu@redhat.com>

Approved-by: Antoine Tenart <atenart@redhat.com>
Approved-by: Jarod Wilson <jarod@redhat.com>

Signed-off-by: Patrick Talbert <ptalbert@redhat.com>
2022-05-18 10:58:56 +02:00
Hangbin Liu 601c199bd0 lwtunnel: Validate RTA_ENCAP_TYPE attribute length
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2081920
Upstream Status: net.git commit 8bda81a4d400

commit 8bda81a4d400cf8a72e554012f0d8c45e07a3904
Author: David Ahern <dsahern@kernel.org>
Date:   Thu Dec 30 17:36:35 2021 -0700

    lwtunnel: Validate RTA_ENCAP_TYPE attribute length

    lwtunnel_valid_encap_type_attr is used to validate encap attributes
    within a multipath route. Add length validation checking to the type.

    lwtunnel_valid_encap_type_attr is called converting attributes to
    fib{6,}_config struct which means it is used before fib_get_nhs,
    ip6_route_multipath_add, and ip6_route_multipath_del - other
    locations that use rtnh_ok and then nla_get_u16 on RTA_ENCAP_TYPE
    attribute.

    Fixes: 9ed59592e3 ("lwtunnel: fix autoload of lwt modules")

    Signed-off-by: David Ahern <dsahern@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Hangbin Liu <haliu@redhat.com>
2022-05-05 12:26:41 +08:00
Guillaume Nault b0f81b7517 ipv4: fix data races in fib_alias_hw_flags_set
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2081383
Upstream Status: linux.git

commit 9fcf986cc4bc6a3a39f23fbcbbc3a9e52d3c24fd
Author: Eric Dumazet <edumazet@google.com>
Date:   Wed Feb 16 09:32:16 2022 -0800

    ipv4: fix data races in fib_alias_hw_flags_set

    fib_alias_hw_flags_set() can be used by concurrent threads,
    and is only RCU protected.

    We need to annotate accesses to following fields of struct fib_alias:

        offload, trap, offload_failed

    Because of READ_ONCE()WRITE_ONCE() limitations, make these
    field u8.

    BUG: KCSAN: data-race in fib_alias_hw_flags_set / fib_alias_hw_flags_set

    read to 0xffff888134224a6a of 1 bytes by task 2013 on cpu 1:
     fib_alias_hw_flags_set+0x28a/0x470 net/ipv4/fib_trie.c:1050
     nsim_fib4_rt_hw_flags_set drivers/net/netdevsim/fib.c:350 [inline]
     nsim_fib4_rt_add drivers/net/netdevsim/fib.c:367 [inline]
     nsim_fib4_rt_insert drivers/net/netdevsim/fib.c:429 [inline]
     nsim_fib4_event drivers/net/netdevsim/fib.c:461 [inline]
     nsim_fib_event drivers/net/netdevsim/fib.c:881 [inline]
     nsim_fib_event_work+0x1852/0x2cf0 drivers/net/netdevsim/fib.c:1477
     process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
     process_scheduled_works kernel/workqueue.c:2370 [inline]
     worker_thread+0x7df/0xa70 kernel/workqueue.c:2456
     kthread+0x1bf/0x1e0 kernel/kthread.c:377
     ret_from_fork+0x1f/0x30

    write to 0xffff888134224a6a of 1 bytes by task 4872 on cpu 0:
     fib_alias_hw_flags_set+0x2d5/0x470 net/ipv4/fib_trie.c:1054
     nsim_fib4_rt_hw_flags_set drivers/net/netdevsim/fib.c:350 [inline]
     nsim_fib4_rt_add drivers/net/netdevsim/fib.c:367 [inline]
     nsim_fib4_rt_insert drivers/net/netdevsim/fib.c:429 [inline]
     nsim_fib4_event drivers/net/netdevsim/fib.c:461 [inline]
     nsim_fib_event drivers/net/netdevsim/fib.c:881 [inline]
     nsim_fib_event_work+0x1852/0x2cf0 drivers/net/netdevsim/fib.c:1477
     process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
     process_scheduled_works kernel/workqueue.c:2370 [inline]
     worker_thread+0x7df/0xa70 kernel/workqueue.c:2456
     kthread+0x1bf/0x1e0 kernel/kthread.c:377
     ret_from_fork+0x1f/0x30

    value changed: 0x00 -> 0x02

    Reported by Kernel Concurrency Sanitizer on:
    CPU: 0 PID: 4872 Comm: kworker/0:0 Not tainted 5.17.0-rc3-syzkaller-00188-g1d41d2e82623-dirty #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Workqueue: events nsim_fib_event_work

    Fixes: 90b93f1b31 ("ipv4: Add "offload" and "trap" indications to routes")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reported-by: syzbot <syzkaller@googlegroups.com>
    Reviewed-by: Ido Schimmel <idosch@nvidia.com>
    Link: https://lore.kernel.org/r/20220216173217.3792411-1-eric.dumazet@gmail.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Guillaume Nault <gnault@redhat.com>
2022-05-03 17:05:24 +02:00
Guillaume Nault 965233bbc8 ipv4: update fib_info_cnt under spinlock protection
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2081383
Upstream Status: linux.git

commit 0a6e6b3c7db6c34e3d149f09cd714972f8753e3f
Author: Eric Dumazet <edumazet@google.com>
Date:   Sun Jan 16 01:02:20 2022 -0800

    ipv4: update fib_info_cnt under spinlock protection

    In the past, free_fib_info() was supposed to be called
    under RTNL protection.

    This eventually was no longer the case.

    Instead of enforcing RTNL it seems we simply can
    move fib_info_cnt changes to occur when fib_info_lock
    is held.

    v2: David Laight suggested to update fib_info_cnt
    only when an entry is added/deleted to/from the hash table,
    as fib_info_cnt is used to make sure hash table size
    is optimal.

    BUG: KCSAN: data-race in fib_create_info / free_fib_info

    write to 0xffffffff86e243a0 of 4 bytes by task 26429 on cpu 0:
     fib_create_info+0xe78/0x3440 net/ipv4/fib_semantics.c:1428
     fib_table_insert+0x148/0x10c0 net/ipv4/fib_trie.c:1224
     fib_magic+0x195/0x1e0 net/ipv4/fib_frontend.c:1087
     fib_add_ifaddr+0xd0/0x2e0 net/ipv4/fib_frontend.c:1109
     fib_netdev_event+0x178/0x510 net/ipv4/fib_frontend.c:1466
     notifier_call_chain kernel/notifier.c:83 [inline]
     raw_notifier_call_chain+0x53/0xb0 kernel/notifier.c:391
     __dev_notify_flags+0x1d3/0x3b0
     dev_change_flags+0xa2/0xc0 net/core/dev.c:8872
     do_setlink+0x810/0x2410 net/core/rtnetlink.c:2719
     rtnl_group_changelink net/core/rtnetlink.c:3242 [inline]
     __rtnl_newlink net/core/rtnetlink.c:3396 [inline]
     rtnl_newlink+0xb10/0x13b0 net/core/rtnetlink.c:3506
     rtnetlink_rcv_msg+0x745/0x7e0 net/core/rtnetlink.c:5571
     netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2496
     rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5589
     netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
     netlink_unicast+0x5fc/0x6c0 net/netlink/af_netlink.c:1345
     netlink_sendmsg+0x726/0x840 net/netlink/af_netlink.c:1921
     sock_sendmsg_nosec net/socket.c:704 [inline]
     sock_sendmsg net/socket.c:724 [inline]
     ____sys_sendmsg+0x39a/0x510 net/socket.c:2409
     ___sys_sendmsg net/socket.c:2463 [inline]
     __sys_sendmsg+0x195/0x230 net/socket.c:2492
     __do_sys_sendmsg net/socket.c:2501 [inline]
     __se_sys_sendmsg net/socket.c:2499 [inline]
     __x64_sys_sendmsg+0x42/0x50 net/socket.c:2499
     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
     do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
     entry_SYSCALL_64_after_hwframe+0x44/0xae

    read to 0xffffffff86e243a0 of 4 bytes by task 31505 on cpu 1:
     free_fib_info+0x35/0x80 net/ipv4/fib_semantics.c:252
     fib_info_put include/net/ip_fib.h:575 [inline]
     nsim_fib4_rt_destroy drivers/net/netdevsim/fib.c:294 [inline]
     nsim_fib4_rt_replace drivers/net/netdevsim/fib.c:403 [inline]
     nsim_fib4_rt_insert drivers/net/netdevsim/fib.c:431 [inline]
     nsim_fib4_event drivers/net/netdevsim/fib.c:461 [inline]
     nsim_fib_event drivers/net/netdevsim/fib.c:881 [inline]
     nsim_fib_event_work+0x15ca/0x2cf0 drivers/net/netdevsim/fib.c:1477
     process_one_work+0x3fc/0x980 kernel/workqueue.c:2298
     process_scheduled_works kernel/workqueue.c:2361 [inline]
     worker_thread+0x7df/0xa70 kernel/workqueue.c:2447
     kthread+0x2c7/0x2e0 kernel/kthread.c:327
     ret_from_fork+0x1f/0x30

    value changed: 0x00000d2d -> 0x00000d2e

    Reported by Kernel Concurrency Sanitizer on:
    CPU: 1 PID: 31505 Comm: kworker/1:21 Not tainted 5.16.0-rc6-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Workqueue: events nsim_fib_event_work

    Fixes: 48bb9eb47b ("netdevsim: fib: Add dummy implementation for FIB offload")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reported-by: syzbot <syzkaller@googlegroups.com>
    Cc: David Laight <David.Laight@ACULAB.COM>
    Cc: Ido Schimmel <idosch@mellanox.com>
    Cc: Jiri Pirko <jiri@mellanox.com>
    Reviewed-by: Ido Schimmel <idosch@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Guillaume Nault <gnault@redhat.com>
2022-05-03 17:05:12 +02:00
Guillaume Nault 44329f0bf4 ipv4: convert fib_num_tclassid_users to atomic_t
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2047202
Upstream Status: linux.git

commit 213f5f8f31f10aa1e83187ae20fb7fa4e626b724
Author: Eric Dumazet <edumazet@google.com>
Date:   Wed Dec 1 18:26:35 2021 -0800

    ipv4: convert fib_num_tclassid_users to atomic_t

    Before commit faa041a40b ("ipv4: Create cleanup helper for fib_nh")
    changes to net->ipv4.fib_num_tclassid_users were protected by RTNL.

    After the change, this is no longer the case, as free_fib_info_rcu()
    runs after rcu grace period, without rtnl being held.

    Fixes: faa041a40b ("ipv4: Create cleanup helper for fib_nh")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: David Ahern <dsahern@kernel.org>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Guillaume Nault <gnault@redhat.com>
2022-01-27 12:51:25 +01:00
Petr Oros ea6b084bc4 net: Remove redundant if statements
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2037315

Upstream commit(s):
commit 1160dfa178eb848327e9dec39960a735f4dc1685
Author: Yajun Deng <yajun.deng@linux.dev>
Date:   Thu Aug 5 19:55:27 2021 +0800

    net: Remove redundant if statements

    The 'if (dev)' statement already move into dev_{put , hold}, so remove
    redundant if statements.

    Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2022-01-10 16:20:08 +01:00
Guillaume Nault 878dade5f0 net: ipv4: Fix rtnexthop len when RTA_FLOW is present
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2024572
Upstream Status: linux.git&

commit 597aa16c782496bf74c5dc3b45ff472ade6cee64
Author: Xiao Liang <shaw.leon@gmail.com>
Date:   Thu Sep 23 23:03:19 2021 +0800

    net: ipv4: Fix rtnexthop len when RTA_FLOW is present

    Multipath RTA_FLOW is embedded in nexthop. Dump it in fib_add_nexthop()
    to get the length of rtnexthop correct.

    Fixes: b0f6019363 ("ipv4: Refactor nexthop attributes in fib_dump_info")
    Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Guillaume Nault <gnault@redhat.com>
2021-11-18 14:50:53 +01:00
Gustavo A. R. Silva 79121184f8 ipv4: Fix fall-through warnings for Clang
In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple
warnings by explicitly adding multiple break statements instead of just
letting the code fall through to the next case.

Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2021-05-17 19:29:10 -05:00
Amit Cohen 36c5100e85 IPv4: Add "offload failed" indication to routes
After installing a route to the kernel, user space receives an
acknowledgment, which means the route was installed in the kernel, but not
necessarily in hardware.

The asynchronous nature of route installation in hardware can lead to a
routing daemon advertising a route before it was actually installed in
hardware. This can result in packet loss or mis-routed packets until the
route is installed in hardware.

To avoid such cases, previous patch set added the ability to emit
RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags
are changed, this behavior is controlled by sysctl.

With the above mentioned behavior, it is possible to know from user-space
if the route was offloaded, but if the offload fails there is no indication
to user-space. Following a failure, a routing daemon will wait indefinitely
for a notification that will never come.

This patch adds an "offload_failed" indication to IPv4 routes, so that
users will have better visibility into the offload process.

'struct fib_alias', and 'struct fib_rt_info' are extended with new field
that indicates if route offload failed. Note that the new field is added
using unused bit and therefore there is no need to increase structs size.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-08 16:47:03 -08:00
Amit Cohen 1e7bdec6bb net: ipv4: Publish fib_nlmsg_size()
Publish fib_nlmsg_size() to allow it to be used later on from
fib_alias_hw_flags_set().

Remove the inline keyword since it shouldn't be used inside C files.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02 17:45:58 -08:00
Amit Cohen 085547891d net: ipv4: Pass fib_rt_info as const to fib_dump_info()
fib_dump_info() does not change 'fri', so pass it as 'const'.
It will later allow us to invoke fib_dump_info() from
fib_alias_hw_flags_set().

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02 17:45:58 -08:00
Francis Laniel 872f690341 treewide: rename nla_strlcpy to nla_strscpy.
Calls to nla_strlcpy are now replaced by calls to nla_strscpy which is the new
name of this function.

Signed-off-by: Francis Laniel <laniel_francis@privacyrequired.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16 08:08:54 -08:00
Ido Schimmel ca787e0b93 ipv4: Set nexthop flags in a more consistent way
Be more consistent about the way in which the nexthop flags are set and
set them in one go.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20201110102553.1924232-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-11 17:45:55 -08:00
Ido Schimmel 968a83f8cf rtnetlink: Add RTNH_F_TRAP flag
The flag indicates to user space that the nexthop is not programmed to
forward packets in hardware, but rather to trap them to the CPU. This is
needed, for example, when the MAC of the nexthop neighbour is not
resolved and packets should reach the CPU to trigger neighbour
resolution.

The flag will be used in subsequent patches by netdevsim to test nexthop
objects programming to device drivers and in the future by mlxsw as
well.

Changes since RFC:
* Reword commit message

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-06 11:28:49 -08:00
guodeqing 5eea3a63ff net: Fix the arp error in some cases
ie.,
$ ifconfig eth0 6.6.6.6 netmask 255.255.255.0

$ ip rule add from 6.6.6.6 table 6666

$ ip route add 9.9.9.9 via 6.6.6.6

$ ping -I 6.6.6.6 9.9.9.9
PING 9.9.9.9 (9.9.9.9) from 6.6.6.6 : 56(84) bytes of data.

3 packets transmitted, 0 received, 100% packet loss, time 2079ms

$ arp
Address     HWtype  HWaddress           Flags Mask            Iface
6.6.6.6             (incomplete)                              eth0

The arp request address is error, this is because fib_table_lookup in
fib_check_nh lookup the destnation 9.9.9.9 nexthop, the scope of
the fib result is RT_SCOPE_LINK,the correct scope is RT_SCOPE_HOST.
Here I add a check of whether this is RT_TABLE_MAIN to solve this problem.

Fixes: 3bfd847203 ("net: Use passed in table for nexthop lookups")
Signed-off-by: guodeqing <geffrey.guo@huawei.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-18 20:21:51 -07:00
Roopa Prabhu 4f80116d3d net: ipv4: add sysctl for nexthop api compatibility mode
Current route nexthop API maintains user space compatibility
with old route API by default. Dumps and netlink notifications
support both new and old API format. In systems which have
moved to the new API, this compatibility mode cancels some
of the performance benefits provided by the new nexthop API.

This patch adds new sysctl nexthop_compat_mode which is on
by default but provides the ability to turn off compatibility
mode allowing systems to run entirely with the new routing
API. Old route API behaviour and support is not modified by this
sysctl.

Uses a single sysctl to cover both ipv4 and ipv6 following
other sysctls. Covers dumps and delete notifications as
suggested by David Ahern.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-28 12:50:37 -07:00
David Ahern 7c74b0bec9 ipv4: Update fib_select_default to handle nexthop objects
A user reported [0] hitting the WARN_ON in fib_info_nh:

    [ 8633.839816] ------------[ cut here ]------------
    [ 8633.839819] WARNING: CPU: 0 PID: 1719 at include/net/nexthop.h:251 fib_select_path+0x303/0x381
    ...
    [ 8633.839846] RIP: 0010:fib_select_path+0x303/0x381
    ...
    [ 8633.839848] RSP: 0018:ffffb04d407f7d00 EFLAGS: 00010286
    [ 8633.839850] RAX: 0000000000000000 RBX: ffff9460b9897ee8 RCX: 00000000000000fe
    [ 8633.839851] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 0000000000000000
    [ 8633.839852] RBP: ffff946076049850 R08: 0000000059263a83 R09: ffff9460840e4000
    [ 8633.839853] R10: 0000000000000014 R11: 0000000000000000 R12: ffffb04d407f7dc0
    [ 8633.839854] R13: ffffffffa4ce3240 R14: 0000000000000000 R15: ffff9460b7681f60
    [ 8633.839857] FS:  00007fcac2e02700(0000) GS:ffff9460bdc00000(0000) knlGS:0000000000000000
    [ 8633.839858] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 8633.839859] CR2: 00007f27beb77e28 CR3: 0000000077734000 CR4: 00000000000006f0
    [ 8633.839867] Call Trace:
    [ 8633.839871]  ip_route_output_key_hash_rcu+0x421/0x890
    [ 8633.839873]  ip_route_output_key_hash+0x5e/0x80
    [ 8633.839876]  ip_route_output_flow+0x1a/0x50
    [ 8633.839878]  __ip4_datagram_connect+0x154/0x310
    [ 8633.839880]  ip4_datagram_connect+0x28/0x40
    [ 8633.839882]  __sys_connect+0xd6/0x100
    ...

The WARN_ON is triggered in fib_select_default which is invoked when
there are multiple default routes. Update the function to use
fib_info_nhc and convert the nexthop checks to use fib_nh_common.

Add test case that covers the affected code path.

[0] https://github.com/FRRouting/frr/issues/6089

Fixes: 493ced1ac4 ("ipv4: Allow routes to use nexthop objects")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 19:57:39 -07:00
Alexander Aring faee676944 net: add net available in build_state
The build_state callback of lwtunnel doesn't contain the net namespace
structure yet. This patch will add it so we can check on specific
address configuration at creation time of rpl source routes.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:30:57 -07:00
Joe Perches a8eceea84a inet: Use fallthrough;
Convert the various uses of fallthrough comments to fallthrough;

Done via script
Link: https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe@perches.com/

And by hand:

net/ipv6/ip6_fib.c has a fallthrough comment outside of an #ifdef block
that causes gcc to emit a warning if converted in-place.

So move the new fallthrough; inside the containing #ifdef/#endif too.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12 15:55:00 -07:00
Ido Schimmel 90b93f1b31 ipv4: Add "offload" and "trap" indications to routes
When performing L3 offload, routes and nexthops are usually programmed
into two different tables in the underlying device. Therefore, the fact
that a nexthop resides in hardware does not necessarily mean that all
the associated routes also reside in hardware and vice-versa.

While the kernel can signal to user space the presence of a nexthop in
hardware (via 'RTNH_F_OFFLOAD'), it does not have a corresponding flag
for routes. In addition, the fact that a route resides in hardware does
not necessarily mean that the traffic is offloaded. For example,
unreachable routes (i.e., 'RTN_UNREACHABLE') are programmed to trap
packets to the CPU so that the kernel will be able to generate the
appropriate ICMP error packet.

This patch adds an "offload" and "trap" indications to IPv4 routes, so
that users will have better visibility into the offload process.

'struct fib_alias' is extended with two new fields that indicate if the
route resides in hardware or not and if it is offloading traffic from
the kernel or trapping packets to it. Note that the new fields are added
in the 6 bytes hole and therefore the struct still fits in a single
cache line [1].

Capable drivers are expected to invoke fib_alias_hw_flags_set() with the
route's key in order to set the flags.

The indications are dumped to user space via a new flags (i.e.,
'RTM_F_OFFLOAD' and 'RTM_F_TRAP') in the 'rtm_flags' field in the
ancillary header.

v2:
* Make use of 'struct fib_rt_info' in fib_alias_hw_flags_set()

[1]
struct fib_alias {
        struct hlist_node  fa_list;                      /*     0    16 */
        struct fib_info *          fa_info;              /*    16     8 */
        u8                         fa_tos;               /*    24     1 */
        u8                         fa_type;              /*    25     1 */
        u8                         fa_state;             /*    26     1 */
        u8                         fa_slen;              /*    27     1 */
        u32                        tb_id;                /*    28     4 */
        s16                        fa_default;           /*    32     2 */
        u8                         offload:1;            /*    34: 0  1 */
        u8                         trap:1;               /*    34: 1  1 */
        u8                         unused:6;             /*    34: 2  1 */

        /* XXX 5 bytes hole, try to pack */

        struct callback_head rcu __attribute__((__aligned__(8))); /*    40    16 */

        /* size: 56, cachelines: 1, members: 12 */
        /* sum members: 50, holes: 1, sum holes: 5 */
        /* sum bitfield members: 8 bits (1 bytes) */
        /* forced alignments: 1, forced holes: 1, sum forced holes: 5 */
        /* last cacheline: 56 bytes */
} __attribute__((__aligned__(8)));

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14 18:53:35 -08:00
Ido Schimmel 1e301fd04e ipv4: Encapsulate function arguments in a struct
fib_dump_info() is used to prepare RTM_{NEW,DEL}ROUTE netlink messages
using the passed arguments. Currently, the function takes 11 arguments,
6 of which are attributes of the route being dumped (e.g., prefix, TOS).

The next patch will need the function to also dump to user space an
indication if the route is present in hardware or not. Instead of
passing yet another argument, change the function to take a struct
containing the different route attributes.

v2:
* Name last argument of fib_dump_info()
* Move 'struct fib_rt_info' to include/net/ip_fib.h so that it could
  later be passed to fib_alias_hw_flags_set()

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14 18:53:35 -08:00
David Ahern e0a312629f ipv4: Fix table id reference in fib_sync_down_addr
Hendrik reported routes in the main table using source address are not
removed when the address is removed. The problem is that fib_sync_down_addr
does not account for devices in the default VRF which are associated
with the main table. Fix by updating the table id reference.

Fixes: 5a56a0b3a4 ("net: Don't delete routes in different VRFs")
Reported-by: Hendrik Donner <hd@os-cillation.de>
Signed-off-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07 16:14:36 -08:00
Donald Sharp 7bdf4de126 net: Properly update v4 routes with v6 nexthop
When creating a v4 route that uses a v6 nexthop from a nexthop group.
Allow the kernel to properly send the nexthop as v6 via the RTA_VIA
attribute.

Broken behavior:

$ ip nexthop add via fe80::9 dev eth0
$ ip nexthop show
id 1 via fe80::9 dev eth0 scope link
$ ip route add 4.5.6.7/32 nhid 1
$ ip route show
default via 10.0.2.2 dev eth0
4.5.6.7 nhid 1 via 254.128.0.0 dev eth0
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15
$

Fixed behavior:

$ ip nexthop add via fe80::9 dev eth0
$ ip nexthop show
id 1 via fe80::9 dev eth0 scope link
$ ip route add 4.5.6.7/32 nhid 1
$ ip route show
default via 10.0.2.2 dev eth0
4.5.6.7 nhid 1 via inet6 fe80::9 dev eth0
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15
$

v2, v3: Addresses code review comments from David Ahern

Fixes: dcb1ecb50e (“ipv4: Prepare for fib6_nh from a nexthop object”)
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-05 12:35:58 +02:00
David S. Miller 13091aa305 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Honestly all the conflicts were simple overlapping changes,
nothing really interesting to report.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-17 20:20:36 -07:00
David Ahern 6c48ea5fe6 ipv4: Optimization for fib_info lookup with nexthops
Be optimistic about re-using a fib_info when nexthop id is given and
the route does not use metrics. Avoids a memory allocation which in
most cases is expected to be freed anyways.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-10 10:44:57 -07:00
David Ahern 493ced1ac4 ipv4: Allow routes to use nexthop objects
Add support for RTA_NH_ID attribute to allow a user to specify a
nexthop id to use with a route. fc_nh_id is added to fib_config to
hold the value passed in the RTA_NH_ID attribute. If a nexthop id
is given, the gateway, device, encap and multipath attributes can
not be set.

Update fib_nh_match to check ids on a route delete.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-10 10:44:56 -07:00