Commit Graph

522 Commits

Author SHA1 Message Date
Hangbin Liu 5598424870 ipv6: fix source address selection with route leak
JIRA: https://issues.redhat.com/browse/RHEL-73281
Upstream Status: net.git commit 252442f2ae31

Conflicts: context conflicts due to missing fa17a6d8a5bd
("ipv6: lockless IPV6_ADDR_PREFERENCES implementation").

commit 252442f2ae317d109ef0b4b39ce0608c09563042
Author: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date:   Wed Jul 10 10:14:28 2024 +0200

    ipv6: fix source address selection with route leak

    By default, an address assigned to the output interface is selected when
    the source address is not specified. This is problematic when a route,
    configured in a vrf, uses an interface from another vrf (aka route leak).
    The original vrf does not own the selected source address.

    Let's add a check against the output interface and call the appropriate
    function to select the source address.

    CC: stable@vger.kernel.org
    Fixes: 0d240e7811 ("net: vrf: Implement get_saddr for IPv6")
    Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
    Link: https://patch.msgid.link/20240710081521.3809742-3-nicolas.dichtel@6wind.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Hangbin Liu <haliu@redhat.com>
2025-01-13 10:18:18 +08:00
Antoine Tenart 0b703569b7 ipv6: annotate data-races around devconf->disable_policy
JIRA: https://issues.redhat.com/browse/RHEL-62203
Upstream Status: linux.git

commit 624d5aec487cf8c2955d9c5880685714f7fe8e6f
Author: Eric Dumazet <edumazet@google.com>
Date:   Wed Feb 28 13:54:35 2024 +0000

    ipv6: annotate data-races around devconf->disable_policy

    idev->cnf.disable_policy and net->ipv6.devconf_all->disable_policy
    can be read locklessly. Add appropriate annotations on reads
    and writes.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2024-11-14 10:16:48 +01:00
Antoine Tenart ff1884900e ipv6: annotate data-races around devconf->proxy_ndp
JIRA: https://issues.redhat.com/browse/RHEL-62203
Upstream Status: linux.git

commit a8fbd4d90720b6c930661ed593d54aba77cec3c2
Author: Eric Dumazet <edumazet@google.com>
Date:   Wed Feb 28 13:54:34 2024 +0000

    ipv6: annotate data-races around devconf->proxy_ndp

    devconf->proxy_ndp can be read and written locklessly,
    add appropriate annotations.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2024-11-14 10:16:48 +01:00
Antoine Tenart a9543e015a ipv6: annotate data-races around cnf.forwarding
JIRA: https://issues.redhat.com/browse/RHEL-62203
Upstream Status: linux.git
Conflicts:\
- One missing chunk as upstream commit f9a2fb73318e ("net/ipv6:
  Introduce accept_unsolicited_na knob to implement router-side changes
  for RFC9131") in not in c9s.

commit 32f754176e889cdfe989ef08ece19859427755df
Author: Eric Dumazet <edumazet@google.com>
Date:   Wed Feb 28 13:54:30 2024 +0000

    ipv6: annotate data-races around cnf.forwarding

    idev->cnf.forwarding and net->ipv6.devconf_all->forwarding
    might be read locklessly, add appropriate READ_ONCE()
    and WRITE_ONCE() annotations.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2024-11-14 10:16:48 +01:00
CKI Backport Bot 30b315f5ee ipv6: prevent possible UAF in ip6_xmit()
JIRA: https://issues.redhat.com/browse/RHEL-60232

commit 2d5ff7e339d04622d8282661df36151906d0e1c7
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Aug 20 16:08:59 2024 +0000

    ipv6: prevent possible UAF in ip6_xmit()

    If skb_expand_head() returns NULL, skb has been freed
    and the associated dst/idev could also have been freed.

    We must use rcu_read_lock() to prevent a possible UAF.

    Fixes: 0c9f227bee11 ("ipv6: use skb_expand_head in ip6_xmit")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: Vasily Averin <vasily.averin@linux.dev>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Link: https://patch.msgid.link/20240820160859.3786976-4-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>
2024-09-26 07:12:35 +00:00
CKI Backport Bot ddb848fdae ipv6: fix possible UAF in ip6_finish_output2()
JIRA: https://issues.redhat.com/browse/RHEL-60232

commit da273b377ae0d9bd255281ed3c2adb228321687b
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Aug 20 16:08:58 2024 +0000

    ipv6: fix possible UAF in ip6_finish_output2()

    If skb_expand_head() returns NULL, skb has been freed
    and associated dst/idev could also have been freed.

    We need to hold rcu_read_lock() to make sure the dst and
    associated idev are alive.

    Fixes: 5796015fa9 ("ipv6: allocate enough headroom in ip6_finish_output2()")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: Vasily Averin <vasily.averin@linux.dev>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Link: https://patch.msgid.link/20240820160859.3786976-3-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>
2024-09-26 07:12:34 +00:00
CKI Backport Bot e6285bfd47 ipv6: prevent UAF in ip6_send_skb()
JIRA: https://issues.redhat.com/browse/RHEL-60232

commit faa389b2fbaaec7fd27a390b4896139f9da662e3
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Aug 20 16:08:57 2024 +0000

    ipv6: prevent UAF in ip6_send_skb()

    syzbot reported an UAF in ip6_send_skb() [1]

    After ip6_local_out() has returned, we no longer can safely
    dereference rt, unless we hold rcu_read_lock().

    A similar issue has been fixed in commit
    a688caa34b ("ipv6: take rcu lock in rawv6_send_hdrinc()")

    Another potential issue in ip6_finish_output2() is handled in a
    separate patch.

    [1]
     BUG: KASAN: slab-use-after-free in ip6_send_skb+0x18d/0x230 net/ipv6/ip6_output.c:1964
    Read of size 8 at addr ffff88806dde4858 by task syz.1.380/6530

    CPU: 1 UID: 0 PID: 6530 Comm: syz.1.380 Not tainted 6.11.0-rc3-syzkaller-00306-gdf6cbc62cc9b #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
    Call Trace:
     <TASK>
      __dump_stack lib/dump_stack.c:93 [inline]
      dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119
      print_address_description mm/kasan/report.c:377 [inline]
      print_report+0x169/0x550 mm/kasan/report.c:488
      kasan_report+0x143/0x180 mm/kasan/report.c:601
      ip6_send_skb+0x18d/0x230 net/ipv6/ip6_output.c:1964
      rawv6_push_pending_frames+0x75c/0x9e0 net/ipv6/raw.c:588
      rawv6_sendmsg+0x19c7/0x23c0 net/ipv6/raw.c:926
      sock_sendmsg_nosec net/socket.c:730 [inline]
      __sock_sendmsg+0x1a6/0x270 net/socket.c:745
      sock_write_iter+0x2dd/0x400 net/socket.c:1160
     do_iter_readv_writev+0x60a/0x890
      vfs_writev+0x37c/0xbb0 fs/read_write.c:971
      do_writev+0x1b1/0x350 fs/read_write.c:1018
      do_syscall_x64 arch/x86/entry/common.c:52 [inline]
      do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
     entry_SYSCALL_64_after_hwframe+0x77/0x7f
    RIP: 0033:0x7f936bf79e79
    Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
    RSP: 002b:00007f936cd7f038 EFLAGS: 00000246 ORIG_RAX: 0000000000000014
    RAX: ffffffffffffffda RBX: 00007f936c115f80 RCX: 00007f936bf79e79
    RDX: 0000000000000001 RSI: 0000000020000040 RDI: 0000000000000004
    RBP: 00007f936bfe7916 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
    R13: 0000000000000000 R14: 00007f936c115f80 R15: 00007fff2860a7a8
     </TASK>

    Allocated by task 6530:
      kasan_save_stack mm/kasan/common.c:47 [inline]
      kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
      unpoison_slab_object mm/kasan/common.c:312 [inline]
      __kasan_slab_alloc+0x66/0x80 mm/kasan/common.c:338
      kasan_slab_alloc include/linux/kasan.h:201 [inline]
      slab_post_alloc_hook mm/slub.c:3988 [inline]
      slab_alloc_node mm/slub.c:4037 [inline]
      kmem_cache_alloc_noprof+0x135/0x2a0 mm/slub.c:4044
      dst_alloc+0x12b/0x190 net/core/dst.c:89
      ip6_blackhole_route+0x59/0x340 net/ipv6/route.c:2670
      make_blackhole net/xfrm/xfrm_policy.c:3120 [inline]
      xfrm_lookup_route+0xd1/0x1c0 net/xfrm/xfrm_policy.c:3313
      ip6_dst_lookup_flow+0x13e/0x180 net/ipv6/ip6_output.c:1257
      rawv6_sendmsg+0x1283/0x23c0 net/ipv6/raw.c:898
      sock_sendmsg_nosec net/socket.c:730 [inline]
      __sock_sendmsg+0x1a6/0x270 net/socket.c:745
      ____sys_sendmsg+0x525/0x7d0 net/socket.c:2597
      ___sys_sendmsg net/socket.c:2651 [inline]
      __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2680
      do_syscall_x64 arch/x86/entry/common.c:52 [inline]
      do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
     entry_SYSCALL_64_after_hwframe+0x77/0x7f

    Freed by task 45:
      kasan_save_stack mm/kasan/common.c:47 [inline]
      kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
      kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:579
      poison_slab_object+0xe0/0x150 mm/kasan/common.c:240
      __kasan_slab_free+0x37/0x60 mm/kasan/common.c:256
      kasan_slab_free include/linux/kasan.h:184 [inline]
      slab_free_hook mm/slub.c:2252 [inline]
      slab_free mm/slub.c:4473 [inline]
      kmem_cache_free+0x145/0x350 mm/slub.c:4548
      dst_destroy+0x2ac/0x460 net/core/dst.c:124
      rcu_do_batch kernel/rcu/tree.c:2569 [inline]
      rcu_core+0xafd/0x1830 kernel/rcu/tree.c:2843
      handle_softirqs+0x2c4/0x970 kernel/softirq.c:554
      __do_softirq kernel/softirq.c:588 [inline]
      invoke_softirq kernel/softirq.c:428 [inline]
      __irq_exit_rcu+0xf4/0x1c0 kernel/softirq.c:637
      irq_exit_rcu+0x9/0x30 kernel/softirq.c:649
      instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline]
      sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043
      asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702

    Last potentially related work creation:
      kasan_save_stack+0x3f/0x60 mm/kasan/common.c:47
      __kasan_record_aux_stack+0xac/0xc0 mm/kasan/generic.c:541
      __call_rcu_common kernel/rcu/tree.c:3106 [inline]
      call_rcu+0x167/0xa70 kernel/rcu/tree.c:3210
      refdst_drop include/net/dst.h:263 [inline]
      skb_dst_drop include/net/dst.h:275 [inline]
      nf_ct_frag6_queue net/ipv6/netfilter/nf_conntrack_reasm.c:306 [inline]
      nf_ct_frag6_gather+0xb9a/0x2080 net/ipv6/netfilter/nf_conntrack_reasm.c:485
      ipv6_defrag+0x2c8/0x3c0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:67
      nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline]
      nf_hook_slow+0xc3/0x220 net/netfilter/core.c:626
      nf_hook include/linux/netfilter.h:269 [inline]
      __ip6_local_out+0x6fa/0x800 net/ipv6/output_core.c:143
      ip6_local_out+0x26/0x70 net/ipv6/output_core.c:153
      ip6_send_skb+0x112/0x230 net/ipv6/ip6_output.c:1959
      rawv6_push_pending_frames+0x75c/0x9e0 net/ipv6/raw.c:588
      rawv6_sendmsg+0x19c7/0x23c0 net/ipv6/raw.c:926
      sock_sendmsg_nosec net/socket.c:730 [inline]
      __sock_sendmsg+0x1a6/0x270 net/socket.c:745
      sock_write_iter+0x2dd/0x400 net/socket.c:1160
     do_iter_readv_writev+0x60a/0x890

    Fixes: 0625491493 ("ipv6: ip6_push_pending_frames() should increment IPSTATS_MIB_OUTDISCARDS")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reported-by: syzbot <syzkaller@googlegroups.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Link: https://patch.msgid.link/20240820160859.3786976-2-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>
2024-09-26 07:12:33 +00:00
Hangbin Liu 469de90207 ipv6: prevent NULL dereference in ip6_output()
JIRA: https://issues.redhat.com/browse/RHEL-45826
Upstream Status: net.git commit 4db783d68b9b

Conflicts: no READ_ONCE due to missing commit d289ab65b89c
("ipv6: annotate data-races around cnf.disable_ipv6").

commit 4db783d68b9b39a411a96096c10828ff5dfada7a
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue May 7 16:18:42 2024 +0000

    ipv6: prevent NULL dereference in ip6_output()

    According to syzbot, there is a chance that ip6_dst_idev()
    returns NULL in ip6_output(). Most places in IPv6 stack
    deal with a NULL idev just fine, but not here.

    syzbot reported:

    general protection fault, probably for non-canonical address 0xdffffc00000000bc: 0000 [#1] PREEMPT SMP KASAN PTI
    KASAN: null-ptr-deref in range [0x00000000000005e0-0x00000000000005e7]
    CPU: 0 PID: 9775 Comm: syz-executor.4 Not tainted 6.9.0-rc5-syzkaller-00157-g6a30653b604a #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
     RIP: 0010:ip6_output+0x231/0x3f0 net/ipv6/ip6_output.c:237
    Code: 3c 1e 00 49 89 df 74 08 4c 89 ef e8 19 58 db f7 48 8b 44 24 20 49 89 45 00 49 89 c5 48 8d 9d e0 05 00 00 48 89 d8 48 c1 e8 03 <42> 0f b6 04 38 84 c0 4c 8b 74 24 28 0f 85 61 01 00 00 8b 1b 31 ff
    RSP: 0018:ffffc9000927f0d8 EFLAGS: 00010202
    RAX: 00000000000000bc RBX: 00000000000005e0 RCX: 0000000000040000
    RDX: ffffc900131f9000 RSI: 0000000000004f47 RDI: 0000000000004f48
    RBP: 0000000000000000 R08: ffffffff8a1f0b9a R09: 1ffffffff1f51fad
    R10: dffffc0000000000 R11: fffffbfff1f51fae R12: ffff8880293ec8c0
    R13: ffff88805d7fc000 R14: 1ffff1100527d91a R15: dffffc0000000000
    FS:  00007f135c6856c0(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000020000080 CR3: 0000000064096000 CR4: 00000000003506f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     <TASK>
      NF_HOOK include/linux/netfilter.h:314 [inline]
      ip6_xmit+0xefe/0x17f0 net/ipv6/ip6_output.c:358
      sctp_v6_xmit+0x9f2/0x13f0 net/sctp/ipv6.c:248
      sctp_packet_transmit+0x26ad/0x2ca0 net/sctp/output.c:653
      sctp_packet_singleton+0x22c/0x320 net/sctp/outqueue.c:783
      sctp_outq_flush_ctrl net/sctp/outqueue.c:914 [inline]
      sctp_outq_flush+0x6d5/0x3e20 net/sctp/outqueue.c:1212
      sctp_side_effects net/sctp/sm_sideeffect.c:1198 [inline]
      sctp_do_sm+0x59cc/0x60c0 net/sctp/sm_sideeffect.c:1169
      sctp_primitive_ASSOCIATE+0x95/0xc0 net/sctp/primitive.c:73
      __sctp_connect+0x9cd/0xe30 net/sctp/socket.c:1234
      sctp_connect net/sctp/socket.c:4819 [inline]
      sctp_inet_connect+0x149/0x1f0 net/sctp/socket.c:4834
      __sys_connect_file net/socket.c:2048 [inline]
      __sys_connect+0x2df/0x310 net/socket.c:2065
      __do_sys_connect net/socket.c:2075 [inline]
      __se_sys_connect net/socket.c:2072 [inline]
      __x64_sys_connect+0x7a/0x90 net/socket.c:2072
      do_syscall_x64 arch/x86/entry/common.c:52 [inline]
      do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
     entry_SYSCALL_64_after_hwframe+0x77/0x7f

    Fixes: 778d80be52 ("ipv6: Add disable_ipv6 sysctl to disable IPv6 operaion on specific interface.")
    Reported-by: syzbot <syzkaller@googlegroups.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
    Link: https://lore.kernel.org/r/20240507161842.773961-1-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Hangbin Liu <haliu@redhat.com>
2024-07-02 13:51:10 +08:00
Lucas Zampieri df4c7fc3d2 Merge: CNB95: convert tunnel metadata flags
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4465

JIRA: https://issues.redhat.com/browse/RHEL-40130  
Tested: Using routing and tunneling self-tests  
Depends: !4435 

Commits:
```
537dd2d9fb9f ("net: Add helper function to parse netlink msg of ip_tunnel_encap")
b86fca800a6a ("net: Add helper function to parse netlink msg of ip_tunnel_parm")
63c15822b8dd ("lib/bitmap: add bitmap_{read,write}()")
117aef12a7b1 ("ip_tunnel: use a separate struct to store tunnel params in the kernel")
020e8f60aa8b ("ip_gre: Make GRE and GRETAP devices always NETIF_F_LLTX")
b11ebf2ca2c1 ("ip6_gre: Make IP6GRE and IP6GRETAP devices always NETIF_F_LLTX")
45490ce2ff83 ("nfp: flower: add support for tunnel offload without key ID")
bf3fcbf7e7a0 ("ipv4: rename and move ip_route_output_tunnel()")
78f3655adcb5 ("ipv4: remove "proto" argument from udp_tunnel_dst_lookup()")
72fc68c6356b ("ipv4: add new arguments to udp_tunnel_dst_lookup()")
3ae983a603a4 ("ipv4: use tunnel flow flags for tunnel route lookups")
60a77d11cd5d ("geneve: add dsfield helper function")
daa2ba7ed1d1 ("geneve: use generic function for tunnel IPv4 route lookup")
6f19b2c136d9 ("vxlan: use generic function for tunnel IPv4 route lookup")
fc47e86dbfb7 ("ipv6: rename and move ip6_dst_lookup_tunnel()")
7e937dcf96d0 ("ipv6: remove "proto" argument from udp_tunnel6_dst_lookup()")
946fcfdbc5b9 ("ipv6: add new arguments to udp_tunnel6_dst_lookup()")
69d72587c17b ("geneve: use generic function for tunnel IPv6 route lookup")
f25e621f5d4c ("ipv6: mark address parameters of udp_tunnel6_xmit_skb() as const")
2aceb896ee18 ("vxlan: use generic function for tunnel IPv6 route lookup")
3e7e5baaaba7 ("bitmap: don't assume compiler evaluates small mem*() builtins calls")
c1023f5634b9 ("s390/cio: rename bitmap_size() -> idset_bitmap_size()")
10a04ff09bcc ("tools: move alignment-related macros to new <linux/align.h>")
a37fbe666c01 ("bitmap: introduce generic optimized bitmap_size()")
5832c4a77d69 ("ip_tunnel: convert __be16 tunnel flags to bitmaps")
5a66cda52d7d ("ip_tunnel: harden copying IP tunnel params to userspace")
```

Signed-off-by: Ivan Vecera <ivecera@redhat.com>

Approved-by: Petr Oros <poros@redhat.com>
Approved-by: José Ignacio Tornos Martínez <jtornosm@redhat.com>
Approved-by: Davide Caratti <dcaratti@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Lucas Zampieri <lzampier@redhat.com>
2024-07-01 12:48:47 +00:00
Antoine Tenart 2c58b3a9d7 ipv6: Fix potential uninit-value access in __ip6_make_skb()
JIRA: https://issues.redhat.com/browse/RHEL-39786
Upstream Status: linux.git
Conflicts:\
- Removed code differs due to missing upstream commit cafbe182a467
("inet: move inet->hdrincl to inet->inet_flags") in c9s.

commit 4e13d3a9c25b7080f8a619f961e943fe08c2672c
Author: Shigeru Yoshida <syoshida@redhat.com>
Date:   Mon May 6 23:11:29 2024 +0900

    ipv6: Fix potential uninit-value access in __ip6_make_skb()

    As it was done in commit fc1092f51567 ("ipv4: Fix uninit-value access in
    __ip_make_skb()") for IPv4, check FLOWI_FLAG_KNOWN_NH on fl6->flowi6_flags
    instead of testing HDRINCL on the socket to avoid a race condition which
    causes uninit-value access.

    Fixes: ea30388baebc ("ipv6: Fix an uninit variable access bug in __ip6_make_skb()")
    Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2024-06-14 15:21:39 +02:00
Ivan Vecera d34e7320e7 ipv6: rename and move ip6_dst_lookup_tunnel()
JIRA: https://issues.redhat.com/browse/RHEL-40130

commit fc47e86dbfb75a864c0c9dd8e78affb6506296bb
Author: Beniamino Galvani <b.galvani@gmail.com>
Date:   Fri Oct 20 13:55:25 2023 +0200

    ipv6: rename and move ip6_dst_lookup_tunnel()

    At the moment ip6_dst_lookup_tunnel() is used only by bareudp.
    Ideally, other UDP tunnel implementations should use it, but to do so
    the function needs to accept new parameters that are specific for UDP
    tunnels, such as the ports.

    Prepare for these changes by renaming the function to
    udp_tunnel6_dst_lookup() and move it to file
    net/ipv6/ip6_udp_tunnel.c.

    This is similar to what already done for IPv4 in commit bf3fcbf7e7a0
    ("ipv4: rename and move ip_route_output_tunnel()").

    Suggested-by: Guillaume Nault <gnault@redhat.com>
    Signed-off-by: Beniamino Galvani <b.galvani@gmail.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-06-11 11:22:48 +02:00
Hangbin Liu f8264e5586 ipv6: avoid atomic fragment on GSO packets
JIRA: https://issues.redhat.com/browse/RHEL-21152
Upstream Status: net.git commit 03d6c848bfb4

commit 03d6c848bfb406e9ef6d9846d759e97beaeea113
Author: Yan Zhai <yan@cloudflare.com>
Date:   Tue Oct 24 07:26:40 2023 -0700

    ipv6: avoid atomic fragment on GSO packets

    When the ipv6 stack output a GSO packet, if its gso_size is larger than
    dst MTU, then all segments would be fragmented. However, it is possible
    for a GSO packet to have a trailing segment with smaller actual size
    than both gso_size as well as the MTU, which leads to an "atomic
    fragment". Atomic fragments are considered harmful in RFC-8021. An
    Existing report from APNIC also shows that atomic fragments are more
    likely to be dropped even it is equivalent to a no-op [1].

    Add an extra check in the GSO slow output path. For each segment from
    the original over-sized packet, if it fits with the path MTU, then avoid
    generating an atomic fragment.

    Link: https://www.potaroo.net/presentations/2022-03-01-ipv6-frag.pdf [1]
    Fixes: b210de4f8c ("net: ipv6: Validate GSO SKB before finish IPv6 processing")
    Reported-by: David Wragg <dwragg@cloudflare.com>
    Signed-off-by: Yan Zhai <yan@cloudflare.com>
    Link: https://lore.kernel.org/r/90912e3503a242dca0bc36958b11ed03a2696e5e.1698156966.git.yan@cloudflare.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Hangbin Liu <haliu@redhat.com>
2024-01-10 14:19:31 +08:00
Antoine Tenart 791e96333e net: fix IPSTATS_MIB_OUTPKGS increment in OutForwDatagrams.
JIRA: https://issues.redhat.com/browse/RHEL-17413
Upstream Status: linux.git
Conflicts:\
- Context diff due to missing upstream commit 09eed1192cec ("neighbour:
  switch to standard rcu, instead of rcu_bh") in c9s.
- Context diff due to missing upstream commit cd3c74807736 ("ipv6:
  optimise dst refcounting on skb init") in c9s.

commit b4a11b2033b7d3dfdd46592f7036a775b18cecd1
Author: Heng Guo <heng.guo@windriver.com>
Date:   Thu Oct 19 09:20:53 2023 +0800

    net: fix IPSTATS_MIB_OUTPKGS increment in OutForwDatagrams.

    Reproduce environment:
    network with 3 VM linuxs is connected as below:
    VM1<---->VM2(latest kernel 6.5.0-rc7)<---->VM3
    VM1: eth0 ip: 192.168.122.207 MTU 1500
    VM2: eth0 ip: 192.168.122.208, eth1 ip: 192.168.123.224 MTU 1500
    VM3: eth0 ip: 192.168.123.240 MTU 1500

    Reproduce:
    VM1 send 1400 bytes UDP data to VM3 using tools scapy with flags=0.
    scapy command:
    send(IP(dst="192.168.123.240",flags=0)/UDP()/str('0'*1400),count=1,
    inter=1.000000)

    Result:
    Before IP data is sent.
    ----------------------------------------------------------------------
    root@qemux86-64:~# cat /proc/net/snmp
    Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors
      ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests
      OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails
      FragOKs FragFails FragCreates
    Ip: 1 64 11 0 3 4 0 0 4 7 0 0 0 0 0 0 0 0 0
    ......
    ----------------------------------------------------------------------
    After IP data is sent.
    ----------------------------------------------------------------------
    root@qemux86-64:~# cat /proc/net/snmp
    Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors
      ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests
      OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails
      FragOKs FragFails FragCreates
    Ip: 1 64 12 0 3 5 0 0 4 8 0 0 0 0 0 0 0 0 0
    ......
    ----------------------------------------------------------------------
    "ForwDatagrams" increase from 4 to 5 and "OutRequests" also increase
    from 7 to 8.

    Issue description and patch:
    IPSTATS_MIB_OUTPKTS("OutRequests") is counted with IPSTATS_MIB_OUTOCTETS
    ("OutOctets") in ip_finish_output2().
    According to RFC 4293, it is "OutOctets" counted with "OutTransmits" but
    not "OutRequests". "OutRequests" does not include any datagrams counted
    in "ForwDatagrams".
    ipSystemStatsOutOctets OBJECT-TYPE
        DESCRIPTION
               "The total number of octets in IP datagrams delivered to the
                lower layers for transmission.  Octets from datagrams
                counted in ipIfStatsOutTransmits MUST be counted here.
    ipSystemStatsOutRequests OBJECT-TYPE
        DESCRIPTION
               "The total number of IP datagrams that local IP user-
                protocols (including ICMP) supplied to IP in requests for
                transmission.  Note that this counter does not include any
                datagrams counted in ipSystemStatsOutForwDatagrams.
    So do patch to define IPSTATS_MIB_OUTPKTS to "OutTransmits" and add
    IPSTATS_MIB_OUTREQUESTS for "OutRequests".
    Add IPSTATS_MIB_OUTREQUESTS counter in __ip_local_out() for ipv4 and add
    IPSTATS_MIB_OUT counter in ip6_finish_output2() for ipv6.

    Test result with patch:
    Before IP data is sent.
    ----------------------------------------------------------------------
    root@qemux86-64:~# cat /proc/net/snmp
    Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors
      ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests
      OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails
      FragOKs FragFails FragCreates OutTransmits
    Ip: 1 64 9 0 5 1 0 0 3 3 0 0 0 0 0 0 0 0 0 4
    ......
    root@qemux86-64:~# cat /proc/net/netstat
    ......
    IpExt: InNoRoutes InTruncatedPkts InMcastPkts OutMcastPkts InBcastPkts
      OutBcastPkts InOctets OutOctets InMcastOctets OutMcastOctets
      InBcastOctets OutBcastOctets InCsumErrors InNoECTPkts InECT1Pkts
      InECT0Pkts InCEPkts ReasmOverlaps
    IpExt: 0 0 0 0 0 0 2976 1896 0 0 0 0 0 9 0 0 0 0
    ----------------------------------------------------------------------
    After IP data is sent.
    ----------------------------------------------------------------------
    root@qemux86-64:~# cat /proc/net/snmp
    Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors
      ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests
      OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails
      FragOKs FragFails FragCreates OutTransmits
    Ip: 1 64 10 0 5 2 0 0 3 3 0 0 0 0 0 0 0 0 0 5
    ......
    root@qemux86-64:~# cat /proc/net/netstat
    ......
    IpExt: InNoRoutes InTruncatedPkts InMcastPkts OutMcastPkts InBcastPkts
      OutBcastPkts InOctets OutOctets InMcastOctets OutMcastOctets
      InBcastOctets OutBcastOctets InCsumErrors InNoECTPkts InECT1Pkts
      InECT0Pkts InCEPkts ReasmOverlaps
    IpExt: 0 0 0 0 0 0 4404 3324 0 0 0 0 0 10 0 0 0 0
    ----------------------------------------------------------------------
    "ForwDatagrams" increase from 1 to 2 and "OutRequests" is keeping 3.
    "OutTransmits" increase from 4 to 5 and "OutOctets" increase 1428.

    Signed-off-by: Heng Guo <heng.guo@windriver.com>
    Reviewed-by: Kun Song <Kun.Song@windriver.com>
    Reviewed-by: Filip Pudak <filip.pudak@windriver.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2023-12-11 11:15:48 +01:00
Antoine Tenart b7ff6e2ea3 net: ipv4, ipv6: fix IPSTATS_MIB_OUTOCTETS increment duplicated
JIRA: https://issues.redhat.com/browse/RHEL-17413
Upstream Status: linux.git

commit e4da8c78973c1e307c0431e0b99a969ffb8aa3f1
Author: Heng Guo <heng.guo@windriver.com>
Date:   Fri Aug 25 15:55:05 2023 +0800

    net: ipv4, ipv6: fix IPSTATS_MIB_OUTOCTETS increment duplicated

    commit edf391ff17 ("snmp: add missing counters for RFC 4293") had
    already added OutOctets for RFC 4293. In commit 2d8dbb04c6 ("snmp: fix
    OutOctets counter to include forwarded datagrams"), OutOctets was
    counted again, but not removed from ip_output().

    According to RFC 4293 "3.2.3. IP Statistics Tables",
    ipipIfStatsOutTransmits is not equal to ipIfStatsOutForwDatagrams. So
    "IPSTATS_MIB_OUTOCTETS must be incremented when incrementing" is not
    accurate. And IPSTATS_MIB_OUTOCTETS should be counted after fragment.

    This patch reverts commit 2d8dbb04c6 ("snmp: fix OutOctets counter to
    include forwarded datagrams") and move IPSTATS_MIB_OUTOCTETS to
    ip_finish_output2 for ipv4.

    Reviewed-by: Filip Pudak <filip.pudak@windriver.com>
    Signed-off-by: Heng Guo <heng.guo@windriver.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2023-12-11 11:15:48 +01:00
Ivan Vecera b167c4e0b9 neighbour: annotate lockless accesses to n->nud_state
JIRA: https://issues.redhat.com/browse/RHEL-16999

commit b071af523579df7341cabf0f16fc661125e9a13f
Author: Eric Dumazet <edumazet@google.com>
Date:   Mon Mar 13 20:17:31 2023 +0000

    neighbour: annotate lockless accesses to n->nud_state

    We have many lockless accesses to n->nud_state.

    Before adding another one in the following patch,
    add annotations to readers and writers.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2023-11-20 19:28:55 +01:00
Scott Weaver 14c9e1feb0 Merge: tunnels: First round of upstream fixes for RHEL 9.4.
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3250

JIRA: https://issues.redhat.com/browse/RHEL-14360
Upstream Status: linux.git

Signed-off-by: Guillaume Nault <gnault@redhat.com>

Approved-by: Hangbin Liu <haliu@redhat.com>
Approved-by: Florian Westphal <fwestpha@redhat.com>

Signed-off-by: Scott Weaver <scweaver@redhat.com>
2023-10-25 11:39:20 -04:00
Guillaume Nault 30e4857ce5 lwt: Check LWTUNNEL_XMIT_CONTINUE strictly
JIRA: https://issues.redhat.com/browse/RHEL-14360
Upstream Status: linux.git

commit a171fbec88a2c730b108c7147ac5e7b2f5a02b47
Author: Yan Zhai <yan@cloudflare.com>
Date:   Thu Aug 17 19:58:14 2023 -0700

    lwt: Check LWTUNNEL_XMIT_CONTINUE strictly

    LWTUNNEL_XMIT_CONTINUE is implicitly assumed in ip(6)_finish_output2,
    such that any positive return value from a xmit hook could cause
    unexpected continue behavior, despite that related skb may have been
    freed. This could be error-prone for future xmit hook ops. One of the
    possible errors is to return statuses of dst_output directly.

    To make the code safer, redefine LWTUNNEL_XMIT_CONTINUE value to
    distinguish from dst_output statuses and check the continue
    condition explicitly.

    Fixes: 3a0af8fd61 ("bpf: BPF for lightweight tunnel infrastructure")
    Suggested-by: Dan Carpenter <dan.carpenter@linaro.org>
    Signed-off-by: Yan Zhai <yan@cloudflare.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Link: https://lore.kernel.org/bpf/96b939b85eda00e8df4f7c080f770970a4c5f698.1692326837.git.yan@cloudflare.com

Signed-off-by: Guillaume Nault <gnault@redhat.com>
2023-10-20 13:25:18 +02:00
Ivan Vecera 497f645693 net: move gso declarations and functions to their own files
JIRA: https://issues.redhat.com/browse/RHEL-12679

commit d457a0e329b0bfd3a1450e0b1a18cd2b47a25a08
Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Jun 8 19:17:37 2023 +0000

    net: move gso declarations and functions to their own files

    Move declarations into include/net/gso.h and code into net/core/gso.c

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: Stanislav Fomichev <sdf@google.com>
    Reviewed-by: Simon Horman <simon.horman@corigine.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Link: https://lore.kernel.org/r/20230608191738.3947077-1-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2023-10-11 13:35:27 +02:00
Jeff Moyer 5fbf8901c6 net: shrink struct ubuf_info
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2068237

commit e7d2b510165fff6bedc9cca88c071ad846850c74
Author: Pavel Begunkov <asml.silence@gmail.com>
Date:   Fri Sep 23 17:39:04 2022 +0100

    net: shrink struct ubuf_info
    
    We can benefit from a smaller struct ubuf_info, so leave only mandatory
    fields and let users to decide how they want to extend it. Convert
    MSG_ZEROCOPY to struct ubuf_info_msgzc and remove duplicated fields.
    This reduces the size from 48 bytes to just 16.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-05-05 15:25:02 -04:00
Jeff Moyer 59af9301fc ipv6/udp: support externally provided ubufs
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2068237

commit 1fd3ae8c906c0f521238d436566323af3f0282e8
Author: Pavel Begunkov <asml.silence@gmail.com>
Date:   Tue Jul 12 21:52:34 2022 +0100

    ipv6/udp: support externally provided ubufs
    
    Teach ipv6/udp how to use external ubuf_info provided in msghdr and
    also prepare it for managed frags by sprinkling
    skb_zcopy_downgrade_managed() when it could mix managed and not managed
    frags.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-04-29 08:09:02 -04:00
Jeff Moyer f39d223559 ipv6: avoid partial copy for zc
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2068237

commit 773ba4fe9104a64a54d1c00f0fb6ffb95def2b03
Author: Pavel Begunkov <asml.silence@gmail.com>
Date:   Tue Jul 12 21:52:26 2022 +0100

    ipv6: avoid partial copy for zc
    
    Even when zerocopy transmission is requested and possible,
    __ip_append_data() will still copy a small chunk of data just because it
    allocated some extra linear space (e.g. 128 bytes). It wastes CPU cycles
    on copy and iter manipulations and also misalignes potentially aligned
    data. Avoid such copies. And as a bonus we can allocate smaller skb.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-04-29 08:00:02 -04:00
Jeff Moyer c2c5210b7b ipv6: refactor ip6_finish_output2()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2068237

commit 58f71be58b8713e41f8568938a0199190f723d1d
Author: Pavel Begunkov <asml.silence@gmail.com>
Date:   Thu Apr 28 11:58:48 2022 +0100

    ipv6: refactor ip6_finish_output2()
    
    Throw neigh checks in ip6_finish_output2() under a single slow path if,
    so we don't have the overhead in the hot path.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-04-29 07:58:02 -04:00
Jeff Moyer cf36341bc5 ipv6: help __ip6_finish_output() inlining
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2068237

commit 4b143ed7dde59d8a4f94c39aa7c4e92842c3ecc1
Author: Pavel Begunkov <asml.silence@gmail.com>
Date:   Thu Apr 28 11:58:47 2022 +0100

    ipv6: help __ip6_finish_output() inlining
    
    There are two callers of __ip6_finish_output(), both are in
    ip6_finish_output(). We can combine the call sites into one and handle
    return code after, that will inline __ip6_finish_output().
    
    Note, error handling under NET_XMIT_CN will only return 0 if
    __ip6_finish_output() succeded, and in this case it return 0.
    Considering that NET_XMIT_SUCCESS is 0, it'll be returning exactly the
    same result for it as before.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-04-29 07:57:02 -04:00
Hangbin Liu 9501bbbd4f ipv6: Fix an uninit variable access bug in __ip6_make_skb()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2186064
Upstream Status: net.git commit ea30388baebc

commit ea30388baebcce37fd594d425a65037ca35e59e8
Author: Ziyang Xuan <william.xuanziyang@huawei.com>
Date:   Mon Apr 3 15:34:17 2023 +0800

    ipv6: Fix an uninit variable access bug in __ip6_make_skb()

    Syzbot reported a bug as following:

    =====================================================
    BUG: KMSAN: uninit-value in arch_atomic64_inc arch/x86/include/asm/atomic64_64.h:88 [inline]
    BUG: KMSAN: uninit-value in arch_atomic_long_inc include/linux/atomic/atomic-long.h:161 [inline]
    BUG: KMSAN: uninit-value in atomic_long_inc include/linux/atomic/atomic-instrumented.h:1429 [inline]
    BUG: KMSAN: uninit-value in __ip6_make_skb+0x2f37/0x30f0 net/ipv6/ip6_output.c:1956
     arch_atomic64_inc arch/x86/include/asm/atomic64_64.h:88 [inline]
     arch_atomic_long_inc include/linux/atomic/atomic-long.h:161 [inline]
     atomic_long_inc include/linux/atomic/atomic-instrumented.h:1429 [inline]
     __ip6_make_skb+0x2f37/0x30f0 net/ipv6/ip6_output.c:1956
     ip6_finish_skb include/net/ipv6.h:1122 [inline]
     ip6_push_pending_frames+0x10e/0x550 net/ipv6/ip6_output.c:1987
     rawv6_push_pending_frames+0xb12/0xb90 net/ipv6/raw.c:579
     rawv6_sendmsg+0x297e/0x2e60 net/ipv6/raw.c:922
     inet_sendmsg+0x101/0x180 net/ipv4/af_inet.c:827
     sock_sendmsg_nosec net/socket.c:714 [inline]
     sock_sendmsg net/socket.c:734 [inline]
     ____sys_sendmsg+0xa8e/0xe70 net/socket.c:2476
     ___sys_sendmsg+0x2a1/0x3f0 net/socket.c:2530
     __sys_sendmsg net/socket.c:2559 [inline]
     __do_sys_sendmsg net/socket.c:2568 [inline]
     __se_sys_sendmsg net/socket.c:2566 [inline]
     __x64_sys_sendmsg+0x367/0x540 net/socket.c:2566
     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
     do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
     entry_SYSCALL_64_after_hwframe+0x63/0xcd

    Uninit was created at:
     slab_post_alloc_hook mm/slab.h:766 [inline]
     slab_alloc_node mm/slub.c:3452 [inline]
     __kmem_cache_alloc_node+0x71f/0xce0 mm/slub.c:3491
     __do_kmalloc_node mm/slab_common.c:967 [inline]
     __kmalloc_node_track_caller+0x114/0x3b0 mm/slab_common.c:988
     kmalloc_reserve net/core/skbuff.c:492 [inline]
     __alloc_skb+0x3af/0x8f0 net/core/skbuff.c:565
     alloc_skb include/linux/skbuff.h:1270 [inline]
     __ip6_append_data+0x51c1/0x6bb0 net/ipv6/ip6_output.c:1684
     ip6_append_data+0x411/0x580 net/ipv6/ip6_output.c:1854
     rawv6_sendmsg+0x2882/0x2e60 net/ipv6/raw.c:915
     inet_sendmsg+0x101/0x180 net/ipv4/af_inet.c:827
     sock_sendmsg_nosec net/socket.c:714 [inline]
     sock_sendmsg net/socket.c:734 [inline]
     ____sys_sendmsg+0xa8e/0xe70 net/socket.c:2476
     ___sys_sendmsg+0x2a1/0x3f0 net/socket.c:2530
     __sys_sendmsg net/socket.c:2559 [inline]
     __do_sys_sendmsg net/socket.c:2568 [inline]
     __se_sys_sendmsg net/socket.c:2566 [inline]
     __x64_sys_sendmsg+0x367/0x540 net/socket.c:2566
     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
     do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
     entry_SYSCALL_64_after_hwframe+0x63/0xcd

    It is because icmp6hdr does not in skb linear region under the scenario
    of SOCK_RAW socket. Access icmp6_hdr(skb)->icmp6_type directly will
    trigger the uninit variable access bug.

    Use a local variable icmp6_type to carry the correct value in different
    scenarios.

    Fixes: 14878f75ab ("[IPV6]: Add ICMPMsgStats MIB (RFC 4293) [rev 2]")
    Reported-by: syzbot+8257f4dcef79de670baf@syzkaller.appspotmail.com
    Link: https://syzkaller.appspot.com/bug?id=3d605ec1d0a7f2a269a1a6936ac7f2b85975ee9c
    Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Hangbin Liu <haliu@redhat.com>
2023-04-27 10:04:56 +08:00
Hangbin Liu 08dd280139 xfrm: fix MTU regression
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2161893
Upstream Status: net.git commit 6596a0229541

commit 6596a0229541270fb8d38d989f91b78838e5e9da
Author: Jiri Bohac <jbohac@suse.cz>
Date:   Wed Jan 19 10:22:53 2022 +0100

    xfrm: fix MTU regression

    Commit 749439bfac ("ipv6: fix udpv6
    sendmsg crash caused by too small MTU") breaks PMTU for xfrm.

    A Packet Too Big ICMPv6 message received in response to an ESP
    packet will prevent all further communication through the tunnel
    if the reported MTU minus the ESP overhead is smaller than 1280.

    E.g. in a case of a tunnel-mode ESP with sha256/aes the overhead
    is 92 bytes. Receiving a PTB with MTU of 1371 or less will result
    in all further packets in the tunnel dropped. A ping through the
    tunnel fails with "ping: sendmsg: Invalid argument".

    Apparently the MTU on the xfrm route is smaller than 1280 and
    fails the check inside ip6_setup_cork() added by 749439bf.

    We found this by debugging USGv6/ipv6ready failures. Failing
    tests are: "Phase-2 Interoperability Test Scenario IPsec" /
    5.3.11 and 5.4.11 (Tunnel Mode: Fragmentation).

    Commit b515d26372 ("xfrm:
    xfrm_state_mtu should return at least 1280 for ipv6") attempted
    to fix this but caused another regression in TCP MSS calculations
    and had to be reverted.

    The patch below fixes the situation by dropping the MTU
    check and instead checking for the underflows described in the
    749439bf commit message.

    Signed-off-by: Jiri Bohac <jbohac@suse.cz>
    Fixes: 749439bfac ("ipv6: fix udpv6 sendmsg crash caused by too small MTU")
    Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

Signed-off-by: Hangbin Liu <haliu@redhat.com>
2023-01-31 09:37:13 +08:00
Hangbin Liu 893e69ec88 ipv6: fix reachability confirmation with proxy_ndp
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2161893
Upstream Status: net.git commit 9f535c870e49

commit 9f535c870e493841ac7be390610ff2edec755762
Author: Gergely Risko <gergely.risko@gmail.com>
Date:   Thu Jan 19 14:40:41 2023 +0100

    ipv6: fix reachability confirmation with proxy_ndp

    When proxying IPv6 NDP requests, the adverts to the initial multicast
    solicits are correct and working.  On the other hand, when later a
    reachability confirmation is requested (on unicast), no reply is sent.

    This causes the neighbor entry expiring on the sending node, which is
    mostly a non-issue, as a new multicast request is sent.  There are
    routers, where the multicast requests are intentionally delayed, and in
    these environments the current implementation causes periodic packet
    loss for the proxied endpoints.

    The root cause is the erroneous decrease of the hop limit, as this
    is checked in ndisc.c and no answer is generated when it's 254 instead
    of the correct 255.

    Cc: stable@vger.kernel.org
    Fixes: 46c7655f0b ("ipv6: decrease hop limit counter in ip6_forward()")
    Signed-off-by: Gergely Risko <gergely.risko@gmail.com>
    Tested-by: Gergely Risko <gergely.risko@gmail.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Hangbin Liu <haliu@redhat.com>
2023-01-31 09:34:45 +08:00
Hangbin Liu 0244cfd2ed ipv6: avoid use-after-free in ip6_fragment()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2161893
Upstream Status: net.git commit 803e84867de5

commit 803e84867de59a1e5d126666d25eb4860cfd2ebe
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Dec 6 10:13:51 2022 +0000

    ipv6: avoid use-after-free in ip6_fragment()

    Blamed commit claimed rcu_read_lock() was held by ip6_fragment() callers.

    It seems to not be always true, at least for UDP stack.

    syzbot reported:

    BUG: KASAN: use-after-free in ip6_dst_idev include/net/ip6_fib.h:245 [inline]
    BUG: KASAN: use-after-free in ip6_fragment+0x2724/0x2770 net/ipv6/ip6_output.c:951
    Read of size 8 at addr ffff88801d403e80 by task syz-executor.3/7618

    CPU: 1 PID: 7618 Comm: syz-executor.3 Not tainted 6.1.0-rc6-syzkaller-00012-g4312098baf37 #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
    Call Trace:
     <TASK>
     __dump_stack lib/dump_stack.c:88 [inline]
     dump_stack_lvl+0xd1/0x138 lib/dump_stack.c:106
     print_address_description mm/kasan/report.c:284 [inline]
     print_report+0x15e/0x45d mm/kasan/report.c:395
     kasan_report+0xbf/0x1f0 mm/kasan/report.c:495
     ip6_dst_idev include/net/ip6_fib.h:245 [inline]
     ip6_fragment+0x2724/0x2770 net/ipv6/ip6_output.c:951
     __ip6_finish_output net/ipv6/ip6_output.c:193 [inline]
     ip6_finish_output+0x9a3/0x1170 net/ipv6/ip6_output.c:206
     NF_HOOK_COND include/linux/netfilter.h:291 [inline]
     ip6_output+0x1f1/0x540 net/ipv6/ip6_output.c:227
     dst_output include/net/dst.h:445 [inline]
     ip6_local_out+0xb3/0x1a0 net/ipv6/output_core.c:161
     ip6_send_skb+0xbb/0x340 net/ipv6/ip6_output.c:1966
     udp_v6_send_skb+0x82a/0x18a0 net/ipv6/udp.c:1286
     udp_v6_push_pending_frames+0x140/0x200 net/ipv6/udp.c:1313
     udpv6_sendmsg+0x18da/0x2c80 net/ipv6/udp.c:1606
     inet6_sendmsg+0x9d/0xe0 net/ipv6/af_inet6.c:665
     sock_sendmsg_nosec net/socket.c:714 [inline]
     sock_sendmsg+0xd3/0x120 net/socket.c:734
     sock_write_iter+0x295/0x3d0 net/socket.c:1108
     call_write_iter include/linux/fs.h:2191 [inline]
     new_sync_write fs/read_write.c:491 [inline]
     vfs_write+0x9ed/0xdd0 fs/read_write.c:584
     ksys_write+0x1ec/0x250 fs/read_write.c:637
     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
     do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
     entry_SYSCALL_64_after_hwframe+0x63/0xcd
    RIP: 0033:0x7fde3588c0d9
    Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
    RSP: 002b:00007fde365b6168 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
    RAX: ffffffffffffffda RBX: 00007fde359ac050 RCX: 00007fde3588c0d9
    RDX: 000000000000ffdc RSI: 00000000200000c0 RDI: 000000000000000a
    RBP: 00007fde358e7ae9 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
    R13: 00007fde35acfb1f R14: 00007fde365b6300 R15: 0000000000022000
     </TASK>

    Allocated by task 7618:
     kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
     kasan_set_track+0x25/0x30 mm/kasan/common.c:52
     __kasan_slab_alloc+0x82/0x90 mm/kasan/common.c:325
     kasan_slab_alloc include/linux/kasan.h:201 [inline]
     slab_post_alloc_hook mm/slab.h:737 [inline]
     slab_alloc_node mm/slub.c:3398 [inline]
     slab_alloc mm/slub.c:3406 [inline]
     __kmem_cache_alloc_lru mm/slub.c:3413 [inline]
     kmem_cache_alloc+0x2b4/0x3d0 mm/slub.c:3422
     dst_alloc+0x14a/0x1f0 net/core/dst.c:92
     ip6_dst_alloc+0x32/0xa0 net/ipv6/route.c:344
     ip6_rt_pcpu_alloc net/ipv6/route.c:1369 [inline]
     rt6_make_pcpu_route net/ipv6/route.c:1417 [inline]
     ip6_pol_route+0x901/0x1190 net/ipv6/route.c:2254
     pol_lookup_func include/net/ip6_fib.h:582 [inline]
     fib6_rule_lookup+0x52e/0x6f0 net/ipv6/fib6_rules.c:121
     ip6_route_output_flags_noref+0x2e6/0x380 net/ipv6/route.c:2625
     ip6_route_output_flags+0x76/0x320 net/ipv6/route.c:2638
     ip6_route_output include/net/ip6_route.h:98 [inline]
     ip6_dst_lookup_tail+0x5ab/0x1620 net/ipv6/ip6_output.c:1092
     ip6_dst_lookup_flow+0x90/0x1d0 net/ipv6/ip6_output.c:1222
     ip6_sk_dst_lookup_flow+0x553/0x980 net/ipv6/ip6_output.c:1260
     udpv6_sendmsg+0x151d/0x2c80 net/ipv6/udp.c:1554
     inet6_sendmsg+0x9d/0xe0 net/ipv6/af_inet6.c:665
     sock_sendmsg_nosec net/socket.c:714 [inline]
     sock_sendmsg+0xd3/0x120 net/socket.c:734
     __sys_sendto+0x23a/0x340 net/socket.c:2117
     __do_sys_sendto net/socket.c:2129 [inline]
     __se_sys_sendto net/socket.c:2125 [inline]
     __x64_sys_sendto+0xe1/0x1b0 net/socket.c:2125
     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
     do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
     entry_SYSCALL_64_after_hwframe+0x63/0xcd

    Freed by task 7599:
     kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
     kasan_set_track+0x25/0x30 mm/kasan/common.c:52
     kasan_save_free_info+0x2e/0x40 mm/kasan/generic.c:511
     ____kasan_slab_free mm/kasan/common.c:236 [inline]
     ____kasan_slab_free+0x160/0x1c0 mm/kasan/common.c:200
     kasan_slab_free include/linux/kasan.h:177 [inline]
     slab_free_hook mm/slub.c:1724 [inline]
     slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1750
     slab_free mm/slub.c:3661 [inline]
     kmem_cache_free+0xee/0x5c0 mm/slub.c:3683
     dst_destroy+0x2ea/0x400 net/core/dst.c:127
     rcu_do_batch kernel/rcu/tree.c:2250 [inline]
     rcu_core+0x81f/0x1980 kernel/rcu/tree.c:2510
     __do_softirq+0x1fb/0xadc kernel/softirq.c:571

    Last potentially related work creation:
     kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
     __kasan_record_aux_stack+0xbc/0xd0 mm/kasan/generic.c:481
     call_rcu+0x9d/0x820 kernel/rcu/tree.c:2798
     dst_release net/core/dst.c:177 [inline]
     dst_release+0x7d/0xe0 net/core/dst.c:167
     refdst_drop include/net/dst.h:256 [inline]
     skb_dst_drop include/net/dst.h:268 [inline]
     skb_release_head_state+0x250/0x2a0 net/core/skbuff.c:838
     skb_release_all net/core/skbuff.c:852 [inline]
     __kfree_skb net/core/skbuff.c:868 [inline]
     kfree_skb_reason+0x151/0x4b0 net/core/skbuff.c:891
     kfree_skb_list_reason+0x4b/0x70 net/core/skbuff.c:901
     kfree_skb_list include/linux/skbuff.h:1227 [inline]
     ip6_fragment+0x2026/0x2770 net/ipv6/ip6_output.c:949
     __ip6_finish_output net/ipv6/ip6_output.c:193 [inline]
     ip6_finish_output+0x9a3/0x1170 net/ipv6/ip6_output.c:206
     NF_HOOK_COND include/linux/netfilter.h:291 [inline]
     ip6_output+0x1f1/0x540 net/ipv6/ip6_output.c:227
     dst_output include/net/dst.h:445 [inline]
     ip6_local_out+0xb3/0x1a0 net/ipv6/output_core.c:161
     ip6_send_skb+0xbb/0x340 net/ipv6/ip6_output.c:1966
     udp_v6_send_skb+0x82a/0x18a0 net/ipv6/udp.c:1286
     udp_v6_push_pending_frames+0x140/0x200 net/ipv6/udp.c:1313
     udpv6_sendmsg+0x18da/0x2c80 net/ipv6/udp.c:1606
     inet6_sendmsg+0x9d/0xe0 net/ipv6/af_inet6.c:665
     sock_sendmsg_nosec net/socket.c:714 [inline]
     sock_sendmsg+0xd3/0x120 net/socket.c:734
     sock_write_iter+0x295/0x3d0 net/socket.c:1108
     call_write_iter include/linux/fs.h:2191 [inline]
     new_sync_write fs/read_write.c:491 [inline]
     vfs_write+0x9ed/0xdd0 fs/read_write.c:584
     ksys_write+0x1ec/0x250 fs/read_write.c:637
     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
     do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
     entry_SYSCALL_64_after_hwframe+0x63/0xcd

    Second to last potentially related work creation:
     kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
     __kasan_record_aux_stack+0xbc/0xd0 mm/kasan/generic.c:481
     call_rcu+0x9d/0x820 kernel/rcu/tree.c:2798
     dst_release net/core/dst.c:177 [inline]
     dst_release+0x7d/0xe0 net/core/dst.c:167
     refdst_drop include/net/dst.h:256 [inline]
     skb_dst_drop include/net/dst.h:268 [inline]
     __dev_queue_xmit+0x1b9d/0x3ba0 net/core/dev.c:4211
     dev_queue_xmit include/linux/netdevice.h:3008 [inline]
     neigh_resolve_output net/core/neighbour.c:1552 [inline]
     neigh_resolve_output+0x51b/0x840 net/core/neighbour.c:1532
     neigh_output include/net/neighbour.h:546 [inline]
     ip6_finish_output2+0x56c/0x1530 net/ipv6/ip6_output.c:134
     __ip6_finish_output net/ipv6/ip6_output.c:195 [inline]
     ip6_finish_output+0x694/0x1170 net/ipv6/ip6_output.c:206
     NF_HOOK_COND include/linux/netfilter.h:291 [inline]
     ip6_output+0x1f1/0x540 net/ipv6/ip6_output.c:227
     dst_output include/net/dst.h:445 [inline]
     NF_HOOK include/linux/netfilter.h:302 [inline]
     NF_HOOK include/linux/netfilter.h:296 [inline]
     mld_sendpack+0xa09/0xe70 net/ipv6/mcast.c:1820
     mld_send_cr net/ipv6/mcast.c:2121 [inline]
     mld_ifc_work+0x720/0xdc0 net/ipv6/mcast.c:2653
     process_one_work+0x9bf/0x1710 kernel/workqueue.c:2289
     worker_thread+0x669/0x1090 kernel/workqueue.c:2436
     kthread+0x2e8/0x3a0 kernel/kthread.c:376
     ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306

    The buggy address belongs to the object at ffff88801d403dc0
     which belongs to the cache ip6_dst_cache of size 240
    The buggy address is located 192 bytes inside of
     240-byte region [ffff88801d403dc0, ffff88801d403eb0)

    The buggy address belongs to the physical page:
    page:ffffea00007500c0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1d403
    memcg:ffff888022f49c81
    flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff)
    raw: 00fff00000000200 ffffea0001ef6580 dead000000000002 ffff88814addf640
    raw: 0000000000000000 00000000800c000c 00000001ffffffff ffff888022f49c81
    page dumped because: kasan: bad access detected
    page_owner tracks the page as allocated
    page last allocated via order 0, migratetype Unmovable, gfp_mask 0x112a20(GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_HARDWALL), pid 3719, tgid 3719 (kworker/0:6), ts 136223432244, free_ts 136222971441
     prep_new_page mm/page_alloc.c:2539 [inline]
     get_page_from_freelist+0x10b5/0x2d50 mm/page_alloc.c:4288
     __alloc_pages+0x1cb/0x5b0 mm/page_alloc.c:5555
     alloc_pages+0x1aa/0x270 mm/mempolicy.c:2285
     alloc_slab_page mm/slub.c:1794 [inline]
     allocate_slab+0x213/0x300 mm/slub.c:1939
     new_slab mm/slub.c:1992 [inline]
     ___slab_alloc+0xa91/0x1400 mm/slub.c:3180
     __slab_alloc.constprop.0+0x56/0xa0 mm/slub.c:3279
     slab_alloc_node mm/slub.c:3364 [inline]
     slab_alloc mm/slub.c:3406 [inline]
     __kmem_cache_alloc_lru mm/slub.c:3413 [inline]
     kmem_cache_alloc+0x31a/0x3d0 mm/slub.c:3422
     dst_alloc+0x14a/0x1f0 net/core/dst.c:92
     ip6_dst_alloc+0x32/0xa0 net/ipv6/route.c:344
     icmp6_dst_alloc+0x71/0x680 net/ipv6/route.c:3261
     mld_sendpack+0x5de/0xe70 net/ipv6/mcast.c:1809
     mld_send_cr net/ipv6/mcast.c:2121 [inline]
     mld_ifc_work+0x720/0xdc0 net/ipv6/mcast.c:2653
     process_one_work+0x9bf/0x1710 kernel/workqueue.c:2289
     worker_thread+0x669/0x1090 kernel/workqueue.c:2436
     kthread+0x2e8/0x3a0 kernel/kthread.c:376
     ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
    page last free stack trace:
     reset_page_owner include/linux/page_owner.h:24 [inline]
     free_pages_prepare mm/page_alloc.c:1459 [inline]
     free_pcp_prepare+0x65c/0xd90 mm/page_alloc.c:1509
     free_unref_page_prepare mm/page_alloc.c:3387 [inline]
     free_unref_page+0x1d/0x4d0 mm/page_alloc.c:3483
     __unfreeze_partials+0x17c/0x1a0 mm/slub.c:2586
     qlink_free mm/kasan/quarantine.c:168 [inline]
     qlist_free_all+0x6a/0x170 mm/kasan/quarantine.c:187
     kasan_quarantine_reduce+0x184/0x210 mm/kasan/quarantine.c:294
     __kasan_slab_alloc+0x66/0x90 mm/kasan/common.c:302
     kasan_slab_alloc include/linux/kasan.h:201 [inline]
     slab_post_alloc_hook mm/slab.h:737 [inline]
     slab_alloc_node mm/slub.c:3398 [inline]
     kmem_cache_alloc_node+0x304/0x410 mm/slub.c:3443
     __alloc_skb+0x214/0x300 net/core/skbuff.c:497
     alloc_skb include/linux/skbuff.h:1267 [inline]
     netlink_alloc_large_skb net/netlink/af_netlink.c:1191 [inline]
     netlink_sendmsg+0x9a6/0xe10 net/netlink/af_netlink.c:1896
     sock_sendmsg_nosec net/socket.c:714 [inline]
     sock_sendmsg+0xd3/0x120 net/socket.c:734
     __sys_sendto+0x23a/0x340 net/socket.c:2117
     __do_sys_sendto net/socket.c:2129 [inline]
     __se_sys_sendto net/socket.c:2125 [inline]
     __x64_sys_sendto+0xe1/0x1b0 net/socket.c:2125
     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
     do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
     entry_SYSCALL_64_after_hwframe+0x63/0xcd

    Fixes: 1758fd4688 ("ipv6: remove unnecessary dst_hold() in ip6_fragment()")
    Reported-by: syzbot+8c0ac31aa9681abb9e2d@syzkaller.appspotmail.com
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: Wei Wang <weiwan@google.com>
    Cc: Martin KaFai Lau <kafai@fb.com>
    Link: https://lore.kernel.org/r/20221206101351.2037285-1-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Hangbin Liu <haliu@redhat.com>
2023-01-30 15:00:30 +08:00
Frantisek Hrbata 1269719102 Merge: BPF and XDP rebase to v5.18
Merge conflicts:
-----------------
arch/x86/net/bpf_jit_comp.c
        - bpf_arch_text_poke()
          HEAD(!1464) contains b73b002f7f ("x86/ibt,bpf: Add ENDBR instructions to prologue and trampoline")
          Resolved in favour of !1464, but keep the return statement from !1477

MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1477

Bugzilla: https://bugzilla.redhat.com/2120966

Rebase BPF and XDP to the upstream kernel version 5.18

Patch applied, then reverted:
```
544356 selftests/bpf: switch to new libbpf XDP APIs
0bfb95 selftests, bpf: Do not yet switch to new libbpf XDP APIs
```
Taken in the perf rebase:
```
23fcfc perf: use generic bpf_program__set_type() to set BPF prog type
```
Unsuported arches:
```
5c1011 libbpf: Fix riscv register names
cf0b5b libbpf: Fix accessing syscall arguments on riscv
```
Depends on changes of other subsystems:
```
7fc8c3 s390/bpf: encode register within extable entry
aebfd1 x86/ibt,ftrace: Search for __fentry__ location
589127 x86/ibt,bpf: Add ENDBR instructions to prologue and trampoline
```
Broken selftest:
```
edae34 selftests net: add UDP GRO fraglist + bpf self-tests
cf6783 selftests net: fix bpf build error
7b92aa selftests net: fix kselftest net fatal error
```
Out of scope:
```
baebdf net: dev: Makes sure netif_rx() can be invoked in any context.
5c8166 kbuild: replace $(if A,A,B) with $(or A,B)
1a97ce perf maps: Use a pointer for kmaps
967747 uaccess: remove CONFIG_SET_FS
42b01a s390: always use the packed stack layout
bf0882 flow_dissector: Add support for HSR
d09a30 s390/extable: move EX_TABLE define to asm-extable.h
3d6671 s390/extable: convert to relative table with data
4efd41 s390: raise minimum supported machine generation to z10
f65e58 flow_dissector: Add support for HSRv0
1a6d7a netdevsim: Introduce support for L3 offload xstats
9b1894 selftests: netdevsim: hw_stats_l3: Add a new test
84005b perf ftrace latency: Add -n/--use-nsec option
36c4a7 kasan, arm64: don't tag executable vmalloc allocations
8df013 docs: netdev: move the netdev-FAQ to the process pages
4d4d00 perf tools: Update copy of libbpf's hashmap.c
0df6ad perf evlist: Rename cpus to user_requested_cpus
1b8089 flow_dissector: fix false-positive __read_overflow2_field() warning
0ae065 perf build: Fix check for btf__load_from_kernel_by_id() in libbpf
8994e9 perf test bpf: Skip test if clang is not present
735346 perf build: Fix btf__load_from_kernel_by_id() feature check
f037ac s390/stack: merge empty stack frame slots
335220 docs: netdev: update maintainer-netdev.rst reference
a0b098 s390/nospec: remove unneeded header includes
34513a netdevsim: Fix hwstats debugfs file permissions
```

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>

Approved-by: John W. Linville <linville@redhat.com>
Approved-by: Wander Lairson Costa <wander@redhat.com>
Approved-by: Torez Smith <torez@redhat.com>
Approved-by: Jan Stancek <jstancek@redhat.com>
Approved-by: Prarit Bhargava <prarit@redhat.com>
Approved-by: Felix Maurer <fmaurer@redhat.com>
Approved-by: Viktor Malik <vmalik@redhat.com>

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
2022-11-21 05:30:47 -05:00
Frantisek Hrbata 27a89b8946 Merge: tcp: BIG TCP implementation
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1560

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2139501
Depends: https://bugzilla.redhat.com/show_bug.cgi?id=2128180
Tested: Using netperf and veth driver. Results meet the assumptions. See https://bugzilla.redhat.com/show_bug.cgi?id=2139501#c1

The series introduces support for BIG TCP.

- Patch 1-2: Preliminary dependencies
- Patch 3-14: Commits from upstream series 7fa2e481ff2f ("Merge branch 'big-tcp'", 2022-05-16)
- Patch 15-19: Follow-ups

Signed-off-by: Ivan Vecera <ivecera@redhat.com>

Approved-by: Antoine Tenart <atenart@redhat.com>
Approved-by: Florian Westphal <fwestpha@redhat.com>

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
2022-11-15 07:30:55 -05:00
Ivan Vecera f8e686beec ipv6: Add hop-by-hop header to jumbograms in ip6_output
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2139501

Conflicts:
- context conflict due to missing e41294408c56 ("icmp: ICMPV6: Examine
  invoking packet for Segment Route Headers.")

commit 80e425b613421911f89664663a7060216abcaed2
Author: Coco Li <lixiaoyan@google.com>
Date:   Fri May 13 11:34:04 2022 -0700

    ipv6: Add hop-by-hop header to jumbograms in ip6_output

    Instead of simply forcing a 0 payload_len in IPv6 header,
    implement RFC 2675 and insert a custom extension header.

    Note that only TCP stack is currently potentially generating
    jumbograms, and that this extension header is purely local,
    it wont be sent on a physical link.

    This is needed so that packet capture (tcpdump and friends)
    can properly dissect these large packets.

    Signed-off-by: Coco Li <lixiaoyan@google.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Acked-by: Alexander Duyck <alexanderduyck@fb.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2022-11-02 18:56:17 +01:00
Phil Sutter ace54a48e5 net: Add l3mdev index to flow struct and avoid oif reset for port devices
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2129093
Upstream Status: commit 40867d74c374b

commit 40867d74c374b235e14d839f3a77f26684feefe5
Author: David Ahern <dsahern@kernel.org>
Date:   Mon Mar 14 14:45:51 2022 -0600

    net: Add l3mdev index to flow struct and avoid oif reset for port devices

    The fundamental premise of VRF and l3mdev core code is binding a socket
    to a device (l3mdev or netdev with an L3 domain) to indicate L3 scope.
    Legacy code resets flowi_oif to the l3mdev losing any original port
    device binding. Ben (among others) has demonstrated use cases where the
    original port device binding is important and needs to be retained.
    This patch handles that by adding a new entry to the common flow struct
    that can indicate the l3mdev index for later rule and table matching
    avoiding the need to reset flowi_oif.

    In addition to allowing more use cases that require port device binds,
    this patch brings a few datapath simplications:

    1. l3mdev_fib_rule_match is only called when walking fib rules and
       always after l3mdev_update_flow. That allows an optimization to bail
       early for non-VRF type uses cases when flowi_l3mdev is not set. Also,
       only that index needs to be checked for the FIB table id.

    2. l3mdev_update_flow can be called with flowi_oif set to a l3mdev
       (e.g., VRF) device. By resetting flowi_oif only for this case the
       FLOWI_FLAG_SKIP_NH_OIF flag is not longer needed and can be removed,
       removing several checks in the datapath. The flowi_iif path can be
       simplified to only be called if the it is not loopback (loopback can
       not be assigned to an L3 domain) and the l3mdev index is not already
       set.

    3. Avoid another device lookup in the output path when the fib lookup
       returns a reject failure.

    Note: 2 functional tests for local traffic with reject fib rules are
    updated to reflect the new direct failure at FIB lookup time for ping
    rather than the failure on packet path. The current code fails like this:

        HINT: Fails since address on vrf device is out of device scope
        COMMAND: ip netns exec ns-A ping -c1 -w1 -I eth1 172.16.3.1
        ping: Warning: source address might be selected on device other than: eth1
        PING 172.16.3.1 (172.16.3.1) from 172.16.3.1 eth1: 56(84) bytes of data.

        --- 172.16.3.1 ping statistics ---
        1 packets transmitted, 0 received, 100% packet loss, time 0ms

    where the test now directly fails:

        HINT: Fails since address on vrf device is out of device scope
        COMMAND: ip netns exec ns-A ping -c1 -w1 -I eth1 172.16.3.1
        ping: connect: No route to host

    Signed-off-by: David Ahern <dsahern@kernel.org>
    Tested-by: Ben Greear <greearb@candelatech.com>
    Link: https://lore.kernel.org/r/20220314204551.16369-1-dsahern@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Phil Sutter <psutter@redhat.com>
2022-10-28 22:35:32 +02:00
Frantisek Hrbata 0c3a22328a Merge: IPv6: 9.2 P1 backport from upstream
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1488

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2135319

Signed-off-by: Hangbin Liu <haliu@redhat.com>

Approved-by: Davide Caratti <dcaratti@redhat.com>
Approved-by: Sabrina Dubroca <sdubroca@redhat.com>

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
2022-10-27 08:26:02 -04:00
Jiri Benc 2e725d3634 net: Add skb_clear_tstamp() to keep the mono delivery_time
Bugzilla: https://bugzilla.redhat.com/2120966

commit de799101519aad23c6096041ba2744d7b5517e6a
Author: Martin KaFai Lau <kafai@fb.com>
Date:   Wed Mar 2 11:55:31 2022 -0800

    net: Add skb_clear_tstamp() to keep the mono delivery_time

    Right now, skb->tstamp is reset to 0 whenever the skb is forwarded.

    If skb->tstamp has the mono delivery_time, clearing it can hurt
    the performance when it finally transmits out to fq@phy-dev.

    The earlier patch added a skb->mono_delivery_time bit to
    flag the skb->tstamp carrying the mono delivery_time.

    This patch adds skb_clear_tstamp() helper which keeps
    the mono delivery_time and clears everything else.

    The delivery_time clearing will be postponed until the stack knows the
    skb will be delivered locally.  It will be done in a latter patch.

    Signed-off-by: Martin KaFai Lau <kafai@fb.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Jiri Benc <jbenc@redhat.com>
2022-10-25 14:57:59 +02:00
Jiri Benc 6619cf0a37 net: Add skb->mono_delivery_time to distinguish mono delivery_time from (rcv) timestamp
Bugzilla: https://bugzilla.redhat.com/2120966

Conflicts:
- [minor] different context in tcp_fragment() due to missing
  a52fe46ef160 ("tcp: factorize ip_summed setting")

commit a1ac9c8acec1605c6b43af418f79facafdced680
Author: Martin KaFai Lau <kafai@fb.com>
Date:   Wed Mar 2 11:55:25 2022 -0800

    net: Add skb->mono_delivery_time to distinguish mono delivery_time from (rcv) timestamp

    skb->tstamp was first used as the (rcv) timestamp.
    The major usage is to report it to the user (e.g. SO_TIMESTAMP).

    Later, skb->tstamp is also set as the (future) delivery_time (e.g. EDT in TCP)
    during egress and used by the qdisc (e.g. sch_fq) to make decision on when
    the skb can be passed to the dev.

    Currently, there is no way to tell skb->tstamp having the (rcv) timestamp
    or the delivery_time, so it is always reset to 0 whenever forwarded
    between egress and ingress.

    While it makes sense to always clear the (rcv) timestamp in skb->tstamp
    to avoid confusing sch_fq that expects the delivery_time, it is a
    performance issue [0] to clear the delivery_time if the skb finally
    egress to a fq@phy-dev.  For example, when forwarding from egress to
    ingress and then finally back to egress:

                tcp-sender => veth@netns => veth@hostns => fq@eth0@hostns
                                         ^              ^
                                         reset          rest

    This patch adds one bit skb->mono_delivery_time to flag the skb->tstamp
    is storing the mono delivery_time (EDT) instead of the (rcv) timestamp.

    The current use case is to keep the TCP mono delivery_time (EDT) and
    to be used with sch_fq.  A latter patch will also allow tc-bpf@ingress
    to read and change the mono delivery_time.

    In the future, another bit (e.g. skb->user_delivery_time) can be added
    for the SCM_TXTIME where the clock base is tracked by sk->sk_clockid.

    [ This patch is a prep work.  The following patches will
      get the other parts of the stack ready first.  Then another patch
      after that will finally set the skb->mono_delivery_time. ]

    skb_set_delivery_time() function is added.  It is used by the tcp_output.c
    and during ip[6] fragmentation to assign the delivery_time to
    the skb->tstamp and also set the skb->mono_delivery_time.

    A note on the change in ip_send_unicast_reply() in ip_output.c.
    It is only used by TCP to send reset/ack out of a ctl_sk.
    Like the new skb_set_delivery_time(), this patch sets
    the skb->mono_delivery_time to 0 for now as a place
    holder.  It will be enabled in a latter patch.
    A similar case in tcp_ipv6 can be done with
    skb_set_delivery_time() in tcp_v6_send_response().

    [0] (slide 22): https://linuxplumbersconf.org/event/11/contributions/953/attachments/867/1658/LPC_2021_BPF_Datapath_Extensions.pdf

    Signed-off-by: Martin KaFai Lau <kafai@fb.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Jiri Benc <jbenc@redhat.com>
2022-10-25 14:57:59 +02:00
Hangbin Liu 090d473615 ipv6: do not use RT_TOS for IPv6 flowlabel
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2135319
Upstream Status: net.git commit ab7e2e0dfa5d

commit ab7e2e0dfa5d37540ab1dc5376e9a2cb9188925d
Author: Matthias May <matthias.may@westermo.com>
Date:   Fri Aug 5 21:19:06 2022 +0200

    ipv6: do not use RT_TOS for IPv6 flowlabel

    According to Guillaume Nault RT_TOS should never be used for IPv6.

    Quote:
    RT_TOS() is an old macro used to interprete IPv4 TOS as described in
    the obsolete RFC 1349. It's conceptually wrong to use it even in IPv4
    code, although, given the current state of the code, most of the
    existing calls have no consequence.

    But using RT_TOS() in IPv6 code is always a bug: IPv6 never had a "TOS"
    field to be interpreted the RFC 1349 way. There's no historical
    compatibility to worry about.

    Fixes: 571912c69f ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.")
    Acked-by: Guillaume Nault <gnault@redhat.com>
    Signed-off-by: Matthias May <matthias.may@westermo.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Hangbin Liu <haliu@redhat.com>
2022-10-18 11:41:13 +08:00
Antoine Tenart 7c476f9b4c net: ip: add skb drop reasons to ip forwarding
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2059161
Upstream Status: linux.git

commit 2edc1a383fda8d2f580216292dfd9daeae691e47
Author: Menglong Dong <imagedong@tencent.com>
Date:   Wed Apr 13 16:15:55 2022 +0800

    net: ip: add skb drop reasons to ip forwarding

    Replace kfree_skb() which is used in ip6_forward() and ip_forward()
    with kfree_skb_reason().

    The new drop reason 'SKB_DROP_REASON_PKT_TOO_BIG' is introduced for
    the case that the length of the packet exceeds MTU and can't
    fragment.

    Signed-off-by: Menglong Dong <imagedong@tencent.com>
    Reviewed-by: Jiang Biao <benbjiang@tencent.com>
    Reviewed-by: Hao Peng <flyingpeng@tencent.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2022-10-13 14:53:24 +02:00
Antoine Tenart 620d4ff739 net: ip: add skb drop reasons for ip egress path
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2059161
Upstream Status: linux.git

commit 5e187189ec324f78035d33a4bc123a9c4ca6f3e3
Author: Menglong Dong <imagedong@tencent.com>
Date:   Sat Feb 26 12:18:29 2022 +0800

    net: ip: add skb drop reasons for ip egress path

    Replace kfree_skb() which is used in the packet egress path of IP layer
    with kfree_skb_reason(). Functions that are involved include:

    __ip_queue_xmit()
    ip_finish_output()
    ip_mc_finish_output()
    ip6_output()
    ip6_finish_output()
    ip6_finish_output2()

    Following new drop reasons are introduced:

    SKB_DROP_REASON_IP_OUTNOROUTES
    SKB_DROP_REASON_BPF_CGROUP_EGRESS
    SKB_DROP_REASON_IPV6DISABLED
    SKB_DROP_REASON_NEIGH_CREATEFAIL

    Reviewed-by: Mengen Sun <mengensun@tencent.com>
    Reviewed-by: Hao Peng <flyingpeng@tencent.com>
    Signed-off-by: Menglong Dong <imagedong@tencent.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2022-10-13 14:53:23 +02:00
Hangbin Liu e333d6a1da net-timestamp: convert sk->sk_tskey to atomic_t
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2081920
Upstream Status: net.git commit a1cdec57e03a

commit a1cdec57e03a1352e92fbbe7974039dda4efcec0
Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Feb 17 09:05:02 2022 -0800

    net-timestamp: convert sk->sk_tskey to atomic_t

    UDP sendmsg() can be lockless, this is causing all kinds
    of data races.

    This patch converts sk->sk_tskey to remove one of these races.

    BUG: KCSAN: data-race in __ip_append_data / __ip_append_data

    read to 0xffff8881035d4b6c of 4 bytes by task 8877 on cpu 1:
     __ip_append_data+0x1c1/0x1de0 net/ipv4/ip_output.c:994
     ip_make_skb+0x13f/0x2d0 net/ipv4/ip_output.c:1636
     udp_sendmsg+0x12bd/0x14c0 net/ipv4/udp.c:1249
     inet_sendmsg+0x5f/0x80 net/ipv4/af_inet.c:819
     sock_sendmsg_nosec net/socket.c:705 [inline]
     sock_sendmsg net/socket.c:725 [inline]
     ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
     ___sys_sendmsg net/socket.c:2467 [inline]
     __sys_sendmmsg+0x267/0x4c0 net/socket.c:2553
     __do_sys_sendmmsg net/socket.c:2582 [inline]
     __se_sys_sendmmsg net/socket.c:2579 [inline]
     __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579
     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
     do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
     entry_SYSCALL_64_after_hwframe+0x44/0xae

    write to 0xffff8881035d4b6c of 4 bytes by task 8880 on cpu 0:
     __ip_append_data+0x1d8/0x1de0 net/ipv4/ip_output.c:994
     ip_make_skb+0x13f/0x2d0 net/ipv4/ip_output.c:1636
     udp_sendmsg+0x12bd/0x14c0 net/ipv4/udp.c:1249
     inet_sendmsg+0x5f/0x80 net/ipv4/af_inet.c:819
     sock_sendmsg_nosec net/socket.c:705 [inline]
     sock_sendmsg net/socket.c:725 [inline]
     ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
     ___sys_sendmsg net/socket.c:2467 [inline]
     __sys_sendmmsg+0x267/0x4c0 net/socket.c:2553
     __do_sys_sendmmsg net/socket.c:2582 [inline]
     __se_sys_sendmmsg net/socket.c:2579 [inline]
     __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579
     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
     do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
     entry_SYSCALL_64_after_hwframe+0x44/0xae

    value changed: 0x0000054d -> 0x0000054e

    Reported by Kernel Concurrency Sanitizer on:
    CPU: 0 PID: 8880 Comm: syz-executor.5 Not tainted 5.17.0-rc2-syzkaller-00167-gdcb85f85fa6f-dirty #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

    Fixes: 09c2d251b7 ("net-timestamp: add key to disambiguate concurrent datagrams")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: Willem de Bruijn <willemb@google.com>
    Reported-by: syzbot <syzkaller@googlegroups.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Hangbin Liu <haliu@redhat.com>
2022-05-05 12:26:57 +08:00
Hangbin Liu 78e4beac3f ipv6: fix panic when forwarding a pkt with no in6 dev
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2077372
Upstream Status: net.git commit e3fa461d8b0e

commit e3fa461d8b0e185b7da8a101fe94dfe6dd500ac0
Author: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date:   Fri Apr 8 16:03:42 2022 +0200

    ipv6: fix panic when forwarding a pkt with no in6 dev

    kongweibin reported a kernel panic in ip6_forward() when input interface
    has no in6 dev associated.

    The following tc commands were used to reproduce this panic:
    tc qdisc del dev vxlan100 root
    tc qdisc add dev vxlan100 root netem corrupt 5%

    CC: stable@vger.kernel.org
    Fixes: ccd27f05ae ("ipv6: fix 'disable_policy' for fwd packets")
    Reported-by: kongweibin <kongweibin2@huawei.com>
    Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Hangbin Liu <haliu@redhat.com>
2022-04-25 15:13:12 +08:00
Herton R. Krzesinski e635ed00b4 Merge: ipv6: 9.0 P2 backports from upstream
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/368

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2041401

v2: add followup update for commit 8837cbbf8542 ("net: ipv6: add fib6_nh_release_dsts stub").
Also add a optimize for the resilient nexthop group.

Signed-off-by: Hangbin Liu <haliu@redhat.com>

Approved-by: Guillaume Nault <gnault@redhat.com>
Approved-by: Antoine Tenart <atenart@redhat.com>

Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
2022-02-15 01:29:58 +00:00
Herton R. Krzesinski 3e26d2a862 Merge: net: backports before kABI freeze
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/407

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2041382
Tested: ENRT
Depends: https://bugzilla.redhat.com/show_bug.cgi?id=2028420
Depends: https://bugzilla.redhat.com/show_bug.cgi?id=2037783

Includes patches that would break kABI without backporting the full
series they are taken from, which we will do later (post-freeze).

The following fixes were omitted as the backport of commit
f35f821935d8 ("tcp: defer skb freeing after socket lock is released")
is a partial one not introducing the issues.

Omitted-fix: ffef737fd037 ("net/tls: Fix skb memory leak when running kTLS traffic")
Omitted-fix: db094aa8140e ("net/tls: Fix another skb memory leak when running kTLS traffic")
Omitted-fix: 79074a72d335 ("net: Flush deferred skb free on socket destroy")
Omitted-fix: ebdc1a030962 ("tcp: add a missing sk_defer_free_flush() in tcp_splice_read()")

Signed-off-by: Antoine Tenart <atenart@redhat.com>

Approved-by: Jarod Wilson <jarod@redhat.com>
Approved-by: Sabrina Dubroca <sdubroca@redhat.com>
Approved-by: Jiri Benc <jbenc@redhat.com>

Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
2022-02-07 15:11:27 +00:00
Hangbin Liu b8154eaff8 ipv6: fix typos in __ip6_finish_output()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2041401
Upstream Status: net.git commit 19d36c5f2948

commit 19d36c5f294879949c9d6f57cb61d39cc4c48553
Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Nov 18 17:37:58 2021 -0800

    ipv6: fix typos in __ip6_finish_output()

    We deal with IPv6 packets, so we need to use IP6CB(skb)->flags and
    IP6SKB_REROUTED, instead of IPCB(skb)->flags and IPSKB_REROUTED

    Found by code inspection, please double check that fixing this bug
    does not surface other bugs.

    Fixes: 09ee9dba96 ("ipv6: Reinject IPv6 packets if IPsec policy matches after SNAT")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: Tobias Brunner <tobias@strongswan.org>
    Cc: Steffen Klassert <steffen.klassert@secunet.com>
    Cc: David Ahern <dsahern@kernel.org>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Tested-by: Tobias Brunner <tobias@strongswan.org>
    Acked-by: Tobias Brunner <tobias@strongswan.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Hangbin Liu <haliu@redhat.com>
2022-02-07 11:46:18 +08:00
Antoine Tenart 2027864c47 net: remove sk_route_nocaps
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2041382
Upstream Status: linux.git
Tested: ENRT

commit aba546565b613e74b84b8261999ea82b5561d3f1
Author: Eric Dumazet <edumazet@google.com>
Date:   Mon Nov 15 11:02:35 2021 -0800

    net: remove sk_route_nocaps

    Instead of using a full netdev_features_t, we can use a single bit,
    as sk_route_nocaps is only used to remove NETIF_F_GSO_MASK from
    sk->sk_route_cap.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2022-01-21 16:26:18 +01:00
Herton R. Krzesinski b8f20958b7 Merge: net: core stable backport for rhel 9.0
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/212

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2028276
Tested: LNST, Tier1

This includes a few critical bugfixes for the core network stack.

Notably it includes 7f678def99d2 ("skb_expand_head() adjust skb->truesize incorrectly") and a whole series of pre-requisites. The bug addressed there is nasty and present even prior to skb_expand_head() introduction.

commit 719c57197010 ("net: make napi_disable() symmetric with enable") instead has been explicitly excluded, as it's not really a fix, is known to introduce problems and it's still quite new

Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Approved-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Approved-by: Jarod Wilson <jarod@redhat.com>
Approved-by: Antoine Tenart <atenart@redhat.com>
Approved-by: Guillaume Nault <gnault@redhat.com>
Approved-by: Jiri Benc <jbenc@redhat.com>

Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
2022-01-14 16:53:21 +00:00
Paolo Abeni c3a17fdd3d ipv6: use skb_expand_head in ip6_xmit
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2028276
Tested: LNST, Tier1

Upstream commit:
commit 0c9f227bee11910a49e1d159abe102d06e3745d5
Author: Vasily Averin <vvs@virtuozzo.com>
Date:   Mon Aug 2 11:52:29 2021 +0300

    ipv6: use skb_expand_head in ip6_xmit

    Unlike skb_realloc_headroom, new helper skb_expand_head
    does not allocate a new skb if possible.

    Additionally this patch replaces commonly used dereferencing with variables.

    Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2021-12-09 10:44:30 +01:00
Paolo Abeni 578a8caa0b ipv6: use skb_expand_head in ip6_finish_output2
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2028276
Tested: LNST, Tier1

Upstream commit:
commit e415ed3a4b8b246ee5e9d109ff5153efcf96b9f2
Author: Vasily Averin <vvs@virtuozzo.com>
Date:   Mon Aug 2 11:52:22 2021 +0300

    ipv6: use skb_expand_head in ip6_finish_output2

    Unlike skb_realloc_headroom, new helper skb_expand_head does not allocate
    a new skb if possible.

    Additionally this patch replaces commonly used dereferencing with variables.

    Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2021-12-09 10:44:30 +01:00
Hangbin Liu 890571c2b7 ipv6: When forwarding count rx stats on the orig netdev
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2025457
Upstream Status: net.git commit 0857d6f8c759

commit 0857d6f8c759d95f89d0436f86cdfd189ef99f20
Author: Stephen Suryaputra <ssuryaextr@gmail.com>
Date:   Thu Oct 14 09:08:45 2021 -0400

    ipv6: When forwarding count rx stats on the orig netdev

    Commit bdb7cc643f ("ipv6: Count interface receive statistics on the
    ingress netdev") does not work when ip6_forward() executes on the skbs
    with vrf-enslaved netdev. Use IP6CB(skb)->iif to get to the right one.

    Add a selftest script to verify.

    Fixes: bdb7cc643f ("ipv6: Count interface receive statistics on the ingress netdev")
    Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Link: https://lore.kernel.org/r/20211014130845.410602-1-ssuryaextr@gmail.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Hangbin Liu <haliu@redhat.com>
2021-11-22 17:25:50 +08:00
Kangmin Park 46c7655f0b ipv6: decrease hop limit counter in ip6_forward()
Decrease hop limit counter when deliver skb to ndp proxy.

Signed-off-by: Kangmin Park <l4stpr0gr4m@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-23 16:40:41 +01:00
Vasily Averin 2d85a1b31d ipv6: ip6_finish_output2: set sk into newly allocated nskb
skb_set_owner_w() should set sk not to old skb but to new nskb.

Fixes: 5796015fa9 ("ipv6: allocate enough headroom in ip6_finish_output2()")
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Link: https://lore.kernel.org/r/70c0744f-89ae-1869-7e3e-4fa292158f4b@virtuozzo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-07-20 11:52:36 +02:00
Vasily Averin 5796015fa9 ipv6: allocate enough headroom in ip6_finish_output2()
When TEE target mirrors traffic to another interface, sk_buff may
not have enough headroom to be processed correctly.
ip_finish_output2() detect this situation for ipv4 and allocates
new skb with enogh headroom. However ipv6 lacks this logic in
ip_finish_output2 and it leads to skb_under_panic:

 skbuff: skb_under_panic: text:ffffffffc0866ad4 len:96 put:24
 head:ffff97be85e31800 data:ffff97be85e317f8 tail:0x58 end:0xc0 dev:gre0
 ------------[ cut here ]------------
 kernel BUG at net/core/skbuff.c:110!
 invalid opcode: 0000 [#1] SMP PTI
 CPU: 2 PID: 393 Comm: kworker/2:2 Tainted: G           OE     5.13.0 #13
 Hardware name: Virtuozzo KVM, BIOS 1.11.0-2.vz7.4 04/01/2014
 Workqueue: ipv6_addrconf addrconf_dad_work
 RIP: 0010:skb_panic+0x48/0x4a
 Call Trace:
  skb_push.cold.111+0x10/0x10
  ipgre_header+0x24/0xf0 [ip_gre]
  neigh_connected_output+0xae/0xf0
  ip6_finish_output2+0x1a8/0x5a0
  ip6_output+0x5c/0x110
  nf_dup_ipv6+0x158/0x1000 [nf_dup_ipv6]
  tee_tg6+0x2e/0x40 [xt_TEE]
  ip6t_do_table+0x294/0x470 [ip6_tables]
  nf_hook_slow+0x44/0xc0
  nf_hook.constprop.34+0x72/0xe0
  ndisc_send_skb+0x20d/0x2e0
  ndisc_send_ns+0xd1/0x210
  addrconf_dad_work+0x3c8/0x540
  process_one_work+0x1d1/0x370
  worker_thread+0x30/0x390
  kthread+0x116/0x130
  ret_from_fork+0x22/0x30

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-12 11:25:12 -07:00