JIRA: https://issues.redhat.com/browse/RHEL-73281
Upstream Status: net.git commit 252442f2ae31
Conflicts: context conflicts due to missing fa17a6d8a5bd
("ipv6: lockless IPV6_ADDR_PREFERENCES implementation").
commit 252442f2ae317d109ef0b4b39ce0608c09563042
Author: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Wed Jul 10 10:14:28 2024 +0200
ipv6: fix source address selection with route leak
By default, an address assigned to the output interface is selected when
the source address is not specified. This is problematic when a route,
configured in a vrf, uses an interface from another vrf (aka route leak).
The original vrf does not own the selected source address.
Let's add a check against the output interface and call the appropriate
function to select the source address.
CC: stable@vger.kernel.org
Fixes: 0d240e7811 ("net: vrf: Implement get_saddr for IPv6")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://patch.msgid.link/20240710081521.3809742-3-nicolas.dichtel@6wind.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Hangbin Liu <haliu@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-62203
Upstream Status: linux.git
commit 624d5aec487cf8c2955d9c5880685714f7fe8e6f
Author: Eric Dumazet <edumazet@google.com>
Date: Wed Feb 28 13:54:35 2024 +0000
ipv6: annotate data-races around devconf->disable_policy
idev->cnf.disable_policy and net->ipv6.devconf_all->disable_policy
can be read locklessly. Add appropriate annotations on reads
and writes.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-62203
Upstream Status: linux.git
commit a8fbd4d90720b6c930661ed593d54aba77cec3c2
Author: Eric Dumazet <edumazet@google.com>
Date: Wed Feb 28 13:54:34 2024 +0000
ipv6: annotate data-races around devconf->proxy_ndp
devconf->proxy_ndp can be read and written locklessly,
add appropriate annotations.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-62203
Upstream Status: linux.git
Conflicts:\
- One missing chunk as upstream commit f9a2fb73318e ("net/ipv6:
Introduce accept_unsolicited_na knob to implement router-side changes
for RFC9131") in not in c9s.
commit 32f754176e889cdfe989ef08ece19859427755df
Author: Eric Dumazet <edumazet@google.com>
Date: Wed Feb 28 13:54:30 2024 +0000
ipv6: annotate data-races around cnf.forwarding
idev->cnf.forwarding and net->ipv6.devconf_all->forwarding
might be read locklessly, add appropriate READ_ONCE()
and WRITE_ONCE() annotations.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-60232
commit 2d5ff7e339d04622d8282661df36151906d0e1c7
Author: Eric Dumazet <edumazet@google.com>
Date: Tue Aug 20 16:08:59 2024 +0000
ipv6: prevent possible UAF in ip6_xmit()
If skb_expand_head() returns NULL, skb has been freed
and the associated dst/idev could also have been freed.
We must use rcu_read_lock() to prevent a possible UAF.
Fixes: 0c9f227bee11 ("ipv6: use skb_expand_head in ip6_xmit")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Vasily Averin <vasily.averin@linux.dev>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20240820160859.3786976-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-60232
commit da273b377ae0d9bd255281ed3c2adb228321687b
Author: Eric Dumazet <edumazet@google.com>
Date: Tue Aug 20 16:08:58 2024 +0000
ipv6: fix possible UAF in ip6_finish_output2()
If skb_expand_head() returns NULL, skb has been freed
and associated dst/idev could also have been freed.
We need to hold rcu_read_lock() to make sure the dst and
associated idev are alive.
Fixes: 5796015fa9 ("ipv6: allocate enough headroom in ip6_finish_output2()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Vasily Averin <vasily.averin@linux.dev>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20240820160859.3786976-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4465
JIRA: https://issues.redhat.com/browse/RHEL-40130
Tested: Using routing and tunneling self-tests
Depends: !4435
Commits:
```
537dd2d9fb9f ("net: Add helper function to parse netlink msg of ip_tunnel_encap")
b86fca800a6a ("net: Add helper function to parse netlink msg of ip_tunnel_parm")
63c15822b8dd ("lib/bitmap: add bitmap_{read,write}()")
117aef12a7b1 ("ip_tunnel: use a separate struct to store tunnel params in the kernel")
020e8f60aa8b ("ip_gre: Make GRE and GRETAP devices always NETIF_F_LLTX")
b11ebf2ca2c1 ("ip6_gre: Make IP6GRE and IP6GRETAP devices always NETIF_F_LLTX")
45490ce2ff83 ("nfp: flower: add support for tunnel offload without key ID")
bf3fcbf7e7a0 ("ipv4: rename and move ip_route_output_tunnel()")
78f3655adcb5 ("ipv4: remove "proto" argument from udp_tunnel_dst_lookup()")
72fc68c6356b ("ipv4: add new arguments to udp_tunnel_dst_lookup()")
3ae983a603a4 ("ipv4: use tunnel flow flags for tunnel route lookups")
60a77d11cd5d ("geneve: add dsfield helper function")
daa2ba7ed1d1 ("geneve: use generic function for tunnel IPv4 route lookup")
6f19b2c136d9 ("vxlan: use generic function for tunnel IPv4 route lookup")
fc47e86dbfb7 ("ipv6: rename and move ip6_dst_lookup_tunnel()")
7e937dcf96d0 ("ipv6: remove "proto" argument from udp_tunnel6_dst_lookup()")
946fcfdbc5b9 ("ipv6: add new arguments to udp_tunnel6_dst_lookup()")
69d72587c17b ("geneve: use generic function for tunnel IPv6 route lookup")
f25e621f5d4c ("ipv6: mark address parameters of udp_tunnel6_xmit_skb() as const")
2aceb896ee18 ("vxlan: use generic function for tunnel IPv6 route lookup")
3e7e5baaaba7 ("bitmap: don't assume compiler evaluates small mem*() builtins calls")
c1023f5634b9 ("s390/cio: rename bitmap_size() -> idset_bitmap_size()")
10a04ff09bcc ("tools: move alignment-related macros to new <linux/align.h>")
a37fbe666c01 ("bitmap: introduce generic optimized bitmap_size()")
5832c4a77d69 ("ip_tunnel: convert __be16 tunnel flags to bitmaps")
5a66cda52d7d ("ip_tunnel: harden copying IP tunnel params to userspace")
```
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Approved-by: Petr Oros <poros@redhat.com>
Approved-by: José Ignacio Tornos Martínez <jtornosm@redhat.com>
Approved-by: Davide Caratti <dcaratti@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>
Merged-by: Lucas Zampieri <lzampier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-39786
Upstream Status: linux.git
Conflicts:\
- Removed code differs due to missing upstream commit cafbe182a467
("inet: move inet->hdrincl to inet->inet_flags") in c9s.
commit 4e13d3a9c25b7080f8a619f961e943fe08c2672c
Author: Shigeru Yoshida <syoshida@redhat.com>
Date: Mon May 6 23:11:29 2024 +0900
ipv6: Fix potential uninit-value access in __ip6_make_skb()
As it was done in commit fc1092f51567 ("ipv4: Fix uninit-value access in
__ip_make_skb()") for IPv4, check FLOWI_FLAG_KNOWN_NH on fl6->flowi6_flags
instead of testing HDRINCL on the socket to avoid a race condition which
causes uninit-value access.
Fixes: ea30388baebc ("ipv6: Fix an uninit variable access bug in __ip6_make_skb()")
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-40130
commit fc47e86dbfb75a864c0c9dd8e78affb6506296bb
Author: Beniamino Galvani <b.galvani@gmail.com>
Date: Fri Oct 20 13:55:25 2023 +0200
ipv6: rename and move ip6_dst_lookup_tunnel()
At the moment ip6_dst_lookup_tunnel() is used only by bareudp.
Ideally, other UDP tunnel implementations should use it, but to do so
the function needs to accept new parameters that are specific for UDP
tunnels, such as the ports.
Prepare for these changes by renaming the function to
udp_tunnel6_dst_lookup() and move it to file
net/ipv6/ip6_udp_tunnel.c.
This is similar to what already done for IPv4 in commit bf3fcbf7e7a0
("ipv4: rename and move ip_route_output_tunnel()").
Suggested-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Beniamino Galvani <b.galvani@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-21152
Upstream Status: net.git commit 03d6c848bfb4
commit 03d6c848bfb406e9ef6d9846d759e97beaeea113
Author: Yan Zhai <yan@cloudflare.com>
Date: Tue Oct 24 07:26:40 2023 -0700
ipv6: avoid atomic fragment on GSO packets
When the ipv6 stack output a GSO packet, if its gso_size is larger than
dst MTU, then all segments would be fragmented. However, it is possible
for a GSO packet to have a trailing segment with smaller actual size
than both gso_size as well as the MTU, which leads to an "atomic
fragment". Atomic fragments are considered harmful in RFC-8021. An
Existing report from APNIC also shows that atomic fragments are more
likely to be dropped even it is equivalent to a no-op [1].
Add an extra check in the GSO slow output path. For each segment from
the original over-sized packet, if it fits with the path MTU, then avoid
generating an atomic fragment.
Link: https://www.potaroo.net/presentations/2022-03-01-ipv6-frag.pdf [1]
Fixes: b210de4f8c ("net: ipv6: Validate GSO SKB before finish IPv6 processing")
Reported-by: David Wragg <dwragg@cloudflare.com>
Signed-off-by: Yan Zhai <yan@cloudflare.com>
Link: https://lore.kernel.org/r/90912e3503a242dca0bc36958b11ed03a2696e5e.1698156966.git.yan@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Hangbin Liu <haliu@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-17413
Upstream Status: linux.git
Conflicts:\
- Context diff due to missing upstream commit 09eed1192cec ("neighbour:
switch to standard rcu, instead of rcu_bh") in c9s.
- Context diff due to missing upstream commit cd3c74807736 ("ipv6:
optimise dst refcounting on skb init") in c9s.
commit b4a11b2033b7d3dfdd46592f7036a775b18cecd1
Author: Heng Guo <heng.guo@windriver.com>
Date: Thu Oct 19 09:20:53 2023 +0800
net: fix IPSTATS_MIB_OUTPKGS increment in OutForwDatagrams.
Reproduce environment:
network with 3 VM linuxs is connected as below:
VM1<---->VM2(latest kernel 6.5.0-rc7)<---->VM3
VM1: eth0 ip: 192.168.122.207 MTU 1500
VM2: eth0 ip: 192.168.122.208, eth1 ip: 192.168.123.224 MTU 1500
VM3: eth0 ip: 192.168.123.240 MTU 1500
Reproduce:
VM1 send 1400 bytes UDP data to VM3 using tools scapy with flags=0.
scapy command:
send(IP(dst="192.168.123.240",flags=0)/UDP()/str('0'*1400),count=1,
inter=1.000000)
Result:
Before IP data is sent.
----------------------------------------------------------------------
root@qemux86-64:~# cat /proc/net/snmp
Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors
ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests
OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails
FragOKs FragFails FragCreates
Ip: 1 64 11 0 3 4 0 0 4 7 0 0 0 0 0 0 0 0 0
......
----------------------------------------------------------------------
After IP data is sent.
----------------------------------------------------------------------
root@qemux86-64:~# cat /proc/net/snmp
Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors
ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests
OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails
FragOKs FragFails FragCreates
Ip: 1 64 12 0 3 5 0 0 4 8 0 0 0 0 0 0 0 0 0
......
----------------------------------------------------------------------
"ForwDatagrams" increase from 4 to 5 and "OutRequests" also increase
from 7 to 8.
Issue description and patch:
IPSTATS_MIB_OUTPKTS("OutRequests") is counted with IPSTATS_MIB_OUTOCTETS
("OutOctets") in ip_finish_output2().
According to RFC 4293, it is "OutOctets" counted with "OutTransmits" but
not "OutRequests". "OutRequests" does not include any datagrams counted
in "ForwDatagrams".
ipSystemStatsOutOctets OBJECT-TYPE
DESCRIPTION
"The total number of octets in IP datagrams delivered to the
lower layers for transmission. Octets from datagrams
counted in ipIfStatsOutTransmits MUST be counted here.
ipSystemStatsOutRequests OBJECT-TYPE
DESCRIPTION
"The total number of IP datagrams that local IP user-
protocols (including ICMP) supplied to IP in requests for
transmission. Note that this counter does not include any
datagrams counted in ipSystemStatsOutForwDatagrams.
So do patch to define IPSTATS_MIB_OUTPKTS to "OutTransmits" and add
IPSTATS_MIB_OUTREQUESTS for "OutRequests".
Add IPSTATS_MIB_OUTREQUESTS counter in __ip_local_out() for ipv4 and add
IPSTATS_MIB_OUT counter in ip6_finish_output2() for ipv6.
Test result with patch:
Before IP data is sent.
----------------------------------------------------------------------
root@qemux86-64:~# cat /proc/net/snmp
Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors
ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests
OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails
FragOKs FragFails FragCreates OutTransmits
Ip: 1 64 9 0 5 1 0 0 3 3 0 0 0 0 0 0 0 0 0 4
......
root@qemux86-64:~# cat /proc/net/netstat
......
IpExt: InNoRoutes InTruncatedPkts InMcastPkts OutMcastPkts InBcastPkts
OutBcastPkts InOctets OutOctets InMcastOctets OutMcastOctets
InBcastOctets OutBcastOctets InCsumErrors InNoECTPkts InECT1Pkts
InECT0Pkts InCEPkts ReasmOverlaps
IpExt: 0 0 0 0 0 0 2976 1896 0 0 0 0 0 9 0 0 0 0
----------------------------------------------------------------------
After IP data is sent.
----------------------------------------------------------------------
root@qemux86-64:~# cat /proc/net/snmp
Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors
ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests
OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails
FragOKs FragFails FragCreates OutTransmits
Ip: 1 64 10 0 5 2 0 0 3 3 0 0 0 0 0 0 0 0 0 5
......
root@qemux86-64:~# cat /proc/net/netstat
......
IpExt: InNoRoutes InTruncatedPkts InMcastPkts OutMcastPkts InBcastPkts
OutBcastPkts InOctets OutOctets InMcastOctets OutMcastOctets
InBcastOctets OutBcastOctets InCsumErrors InNoECTPkts InECT1Pkts
InECT0Pkts InCEPkts ReasmOverlaps
IpExt: 0 0 0 0 0 0 4404 3324 0 0 0 0 0 10 0 0 0 0
----------------------------------------------------------------------
"ForwDatagrams" increase from 1 to 2 and "OutRequests" is keeping 3.
"OutTransmits" increase from 4 to 5 and "OutOctets" increase 1428.
Signed-off-by: Heng Guo <heng.guo@windriver.com>
Reviewed-by: Kun Song <Kun.Song@windriver.com>
Reviewed-by: Filip Pudak <filip.pudak@windriver.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-17413
Upstream Status: linux.git
commit e4da8c78973c1e307c0431e0b99a969ffb8aa3f1
Author: Heng Guo <heng.guo@windriver.com>
Date: Fri Aug 25 15:55:05 2023 +0800
net: ipv4, ipv6: fix IPSTATS_MIB_OUTOCTETS increment duplicated
commit edf391ff17 ("snmp: add missing counters for RFC 4293") had
already added OutOctets for RFC 4293. In commit 2d8dbb04c6 ("snmp: fix
OutOctets counter to include forwarded datagrams"), OutOctets was
counted again, but not removed from ip_output().
According to RFC 4293 "3.2.3. IP Statistics Tables",
ipipIfStatsOutTransmits is not equal to ipIfStatsOutForwDatagrams. So
"IPSTATS_MIB_OUTOCTETS must be incremented when incrementing" is not
accurate. And IPSTATS_MIB_OUTOCTETS should be counted after fragment.
This patch reverts commit 2d8dbb04c6 ("snmp: fix OutOctets counter to
include forwarded datagrams") and move IPSTATS_MIB_OUTOCTETS to
ip_finish_output2 for ipv4.
Reviewed-by: Filip Pudak <filip.pudak@windriver.com>
Signed-off-by: Heng Guo <heng.guo@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-16999
commit b071af523579df7341cabf0f16fc661125e9a13f
Author: Eric Dumazet <edumazet@google.com>
Date: Mon Mar 13 20:17:31 2023 +0000
neighbour: annotate lockless accesses to n->nud_state
We have many lockless accesses to n->nud_state.
Before adding another one in the following patch,
add annotations to readers and writers.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-14360
Upstream Status: linux.git
commit a171fbec88a2c730b108c7147ac5e7b2f5a02b47
Author: Yan Zhai <yan@cloudflare.com>
Date: Thu Aug 17 19:58:14 2023 -0700
lwt: Check LWTUNNEL_XMIT_CONTINUE strictly
LWTUNNEL_XMIT_CONTINUE is implicitly assumed in ip(6)_finish_output2,
such that any positive return value from a xmit hook could cause
unexpected continue behavior, despite that related skb may have been
freed. This could be error-prone for future xmit hook ops. One of the
possible errors is to return statuses of dst_output directly.
To make the code safer, redefine LWTUNNEL_XMIT_CONTINUE value to
distinguish from dst_output statuses and check the continue
condition explicitly.
Fixes: 3a0af8fd61 ("bpf: BPF for lightweight tunnel infrastructure")
Suggested-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Yan Zhai <yan@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/96b939b85eda00e8df4f7c080f770970a4c5f698.1692326837.git.yan@cloudflare.com
Signed-off-by: Guillaume Nault <gnault@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12679
commit d457a0e329b0bfd3a1450e0b1a18cd2b47a25a08
Author: Eric Dumazet <edumazet@google.com>
Date: Thu Jun 8 19:17:37 2023 +0000
net: move gso declarations and functions to their own files
Move declarations into include/net/gso.h and code into net/core/gso.c
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230608191738.3947077-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2068237
commit e7d2b510165fff6bedc9cca88c071ad846850c74
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Fri Sep 23 17:39:04 2022 +0100
net: shrink struct ubuf_info
We can benefit from a smaller struct ubuf_info, so leave only mandatory
fields and let users to decide how they want to extend it. Convert
MSG_ZEROCOPY to struct ubuf_info_msgzc and remove duplicated fields.
This reduces the size from 48 bytes to just 16.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2068237
commit 1fd3ae8c906c0f521238d436566323af3f0282e8
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Tue Jul 12 21:52:34 2022 +0100
ipv6/udp: support externally provided ubufs
Teach ipv6/udp how to use external ubuf_info provided in msghdr and
also prepare it for managed frags by sprinkling
skb_zcopy_downgrade_managed() when it could mix managed and not managed
frags.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2068237
commit 773ba4fe9104a64a54d1c00f0fb6ffb95def2b03
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Tue Jul 12 21:52:26 2022 +0100
ipv6: avoid partial copy for zc
Even when zerocopy transmission is requested and possible,
__ip_append_data() will still copy a small chunk of data just because it
allocated some extra linear space (e.g. 128 bytes). It wastes CPU cycles
on copy and iter manipulations and also misalignes potentially aligned
data. Avoid such copies. And as a bonus we can allocate smaller skb.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2068237
commit 58f71be58b8713e41f8568938a0199190f723d1d
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Thu Apr 28 11:58:48 2022 +0100
ipv6: refactor ip6_finish_output2()
Throw neigh checks in ip6_finish_output2() under a single slow path if,
so we don't have the overhead in the hot path.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2068237
commit 4b143ed7dde59d8a4f94c39aa7c4e92842c3ecc1
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Thu Apr 28 11:58:47 2022 +0100
ipv6: help __ip6_finish_output() inlining
There are two callers of __ip6_finish_output(), both are in
ip6_finish_output(). We can combine the call sites into one and handle
return code after, that will inline __ip6_finish_output().
Note, error handling under NET_XMIT_CN will only return 0 if
__ip6_finish_output() succeded, and in this case it return 0.
Considering that NET_XMIT_SUCCESS is 0, it'll be returning exactly the
same result for it as before.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2161893
Upstream Status: net.git commit 6596a0229541
commit 6596a0229541270fb8d38d989f91b78838e5e9da
Author: Jiri Bohac <jbohac@suse.cz>
Date: Wed Jan 19 10:22:53 2022 +0100
xfrm: fix MTU regression
Commit 749439bfac ("ipv6: fix udpv6
sendmsg crash caused by too small MTU") breaks PMTU for xfrm.
A Packet Too Big ICMPv6 message received in response to an ESP
packet will prevent all further communication through the tunnel
if the reported MTU minus the ESP overhead is smaller than 1280.
E.g. in a case of a tunnel-mode ESP with sha256/aes the overhead
is 92 bytes. Receiving a PTB with MTU of 1371 or less will result
in all further packets in the tunnel dropped. A ping through the
tunnel fails with "ping: sendmsg: Invalid argument".
Apparently the MTU on the xfrm route is smaller than 1280 and
fails the check inside ip6_setup_cork() added by 749439bf.
We found this by debugging USGv6/ipv6ready failures. Failing
tests are: "Phase-2 Interoperability Test Scenario IPsec" /
5.3.11 and 5.4.11 (Tunnel Mode: Fragmentation).
Commit b515d26372 ("xfrm:
xfrm_state_mtu should return at least 1280 for ipv6") attempted
to fix this but caused another regression in TCP MSS calculations
and had to be reverted.
The patch below fixes the situation by dropping the MTU
check and instead checking for the underflows described in the
749439bf commit message.
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Fixes: 749439bfac ("ipv6: fix udpv6 sendmsg crash caused by too small MTU")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Hangbin Liu <haliu@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2161893
Upstream Status: net.git commit 9f535c870e49
commit 9f535c870e493841ac7be390610ff2edec755762
Author: Gergely Risko <gergely.risko@gmail.com>
Date: Thu Jan 19 14:40:41 2023 +0100
ipv6: fix reachability confirmation with proxy_ndp
When proxying IPv6 NDP requests, the adverts to the initial multicast
solicits are correct and working. On the other hand, when later a
reachability confirmation is requested (on unicast), no reply is sent.
This causes the neighbor entry expiring on the sending node, which is
mostly a non-issue, as a new multicast request is sent. There are
routers, where the multicast requests are intentionally delayed, and in
these environments the current implementation causes periodic packet
loss for the proxied endpoints.
The root cause is the erroneous decrease of the hop limit, as this
is checked in ndisc.c and no answer is generated when it's 254 instead
of the correct 255.
Cc: stable@vger.kernel.org
Fixes: 46c7655f0b ("ipv6: decrease hop limit counter in ip6_forward()")
Signed-off-by: Gergely Risko <gergely.risko@gmail.com>
Tested-by: Gergely Risko <gergely.risko@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Hangbin Liu <haliu@redhat.com>
Merge conflicts:
-----------------
arch/x86/net/bpf_jit_comp.c
- bpf_arch_text_poke()
HEAD(!1464) contains b73b002f7f ("x86/ibt,bpf: Add ENDBR instructions to prologue and trampoline")
Resolved in favour of !1464, but keep the return statement from !1477
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1477
Bugzilla: https://bugzilla.redhat.com/2120966
Rebase BPF and XDP to the upstream kernel version 5.18
Patch applied, then reverted:
```
544356 selftests/bpf: switch to new libbpf XDP APIs
0bfb95 selftests, bpf: Do not yet switch to new libbpf XDP APIs
```
Taken in the perf rebase:
```
23fcfc perf: use generic bpf_program__set_type() to set BPF prog type
```
Unsuported arches:
```
5c1011 libbpf: Fix riscv register names
cf0b5b libbpf: Fix accessing syscall arguments on riscv
```
Depends on changes of other subsystems:
```
7fc8c3 s390/bpf: encode register within extable entry
aebfd1 x86/ibt,ftrace: Search for __fentry__ location
589127 x86/ibt,bpf: Add ENDBR instructions to prologue and trampoline
```
Broken selftest:
```
edae34 selftests net: add UDP GRO fraglist + bpf self-tests
cf6783 selftests net: fix bpf build error
7b92aa selftests net: fix kselftest net fatal error
```
Out of scope:
```
baebdf net: dev: Makes sure netif_rx() can be invoked in any context.
5c8166 kbuild: replace $(if A,A,B) with $(or A,B)
1a97ce perf maps: Use a pointer for kmaps
967747 uaccess: remove CONFIG_SET_FS
42b01a s390: always use the packed stack layout
bf0882 flow_dissector: Add support for HSR
d09a30 s390/extable: move EX_TABLE define to asm-extable.h
3d6671 s390/extable: convert to relative table with data
4efd41 s390: raise minimum supported machine generation to z10
f65e58 flow_dissector: Add support for HSRv0
1a6d7a netdevsim: Introduce support for L3 offload xstats
9b1894 selftests: netdevsim: hw_stats_l3: Add a new test
84005b perf ftrace latency: Add -n/--use-nsec option
36c4a7 kasan, arm64: don't tag executable vmalloc allocations
8df013 docs: netdev: move the netdev-FAQ to the process pages
4d4d00 perf tools: Update copy of libbpf's hashmap.c
0df6ad perf evlist: Rename cpus to user_requested_cpus
1b8089 flow_dissector: fix false-positive __read_overflow2_field() warning
0ae065 perf build: Fix check for btf__load_from_kernel_by_id() in libbpf
8994e9 perf test bpf: Skip test if clang is not present
735346 perf build: Fix btf__load_from_kernel_by_id() feature check
f037ac s390/stack: merge empty stack frame slots
335220 docs: netdev: update maintainer-netdev.rst reference
a0b098 s390/nospec: remove unneeded header includes
34513a netdevsim: Fix hwstats debugfs file permissions
```
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Approved-by: John W. Linville <linville@redhat.com>
Approved-by: Wander Lairson Costa <wander@redhat.com>
Approved-by: Torez Smith <torez@redhat.com>
Approved-by: Jan Stancek <jstancek@redhat.com>
Approved-by: Prarit Bhargava <prarit@redhat.com>
Approved-by: Felix Maurer <fmaurer@redhat.com>
Approved-by: Viktor Malik <vmalik@redhat.com>
Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2139501
Conflicts:
- context conflict due to missing e41294408c56 ("icmp: ICMPV6: Examine
invoking packet for Segment Route Headers.")
commit 80e425b613421911f89664663a7060216abcaed2
Author: Coco Li <lixiaoyan@google.com>
Date: Fri May 13 11:34:04 2022 -0700
ipv6: Add hop-by-hop header to jumbograms in ip6_output
Instead of simply forcing a 0 payload_len in IPv6 header,
implement RFC 2675 and insert a custom extension header.
Note that only TCP stack is currently potentially generating
jumbograms, and that this extension header is purely local,
it wont be sent on a physical link.
This is needed so that packet capture (tcpdump and friends)
can properly dissect these large packets.
Signed-off-by: Coco Li <lixiaoyan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2129093
Upstream Status: commit 40867d74c374b
commit 40867d74c374b235e14d839f3a77f26684feefe5
Author: David Ahern <dsahern@kernel.org>
Date: Mon Mar 14 14:45:51 2022 -0600
net: Add l3mdev index to flow struct and avoid oif reset for port devices
The fundamental premise of VRF and l3mdev core code is binding a socket
to a device (l3mdev or netdev with an L3 domain) to indicate L3 scope.
Legacy code resets flowi_oif to the l3mdev losing any original port
device binding. Ben (among others) has demonstrated use cases where the
original port device binding is important and needs to be retained.
This patch handles that by adding a new entry to the common flow struct
that can indicate the l3mdev index for later rule and table matching
avoiding the need to reset flowi_oif.
In addition to allowing more use cases that require port device binds,
this patch brings a few datapath simplications:
1. l3mdev_fib_rule_match is only called when walking fib rules and
always after l3mdev_update_flow. That allows an optimization to bail
early for non-VRF type uses cases when flowi_l3mdev is not set. Also,
only that index needs to be checked for the FIB table id.
2. l3mdev_update_flow can be called with flowi_oif set to a l3mdev
(e.g., VRF) device. By resetting flowi_oif only for this case the
FLOWI_FLAG_SKIP_NH_OIF flag is not longer needed and can be removed,
removing several checks in the datapath. The flowi_iif path can be
simplified to only be called if the it is not loopback (loopback can
not be assigned to an L3 domain) and the l3mdev index is not already
set.
3. Avoid another device lookup in the output path when the fib lookup
returns a reject failure.
Note: 2 functional tests for local traffic with reject fib rules are
updated to reflect the new direct failure at FIB lookup time for ping
rather than the failure on packet path. The current code fails like this:
HINT: Fails since address on vrf device is out of device scope
COMMAND: ip netns exec ns-A ping -c1 -w1 -I eth1 172.16.3.1
ping: Warning: source address might be selected on device other than: eth1
PING 172.16.3.1 (172.16.3.1) from 172.16.3.1 eth1: 56(84) bytes of data.
--- 172.16.3.1 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms
where the test now directly fails:
HINT: Fails since address on vrf device is out of device scope
COMMAND: ip netns exec ns-A ping -c1 -w1 -I eth1 172.16.3.1
ping: connect: No route to host
Signed-off-by: David Ahern <dsahern@kernel.org>
Tested-by: Ben Greear <greearb@candelatech.com>
Link: https://lore.kernel.org/r/20220314204551.16369-1-dsahern@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Phil Sutter <psutter@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2120966
commit de799101519aad23c6096041ba2744d7b5517e6a
Author: Martin KaFai Lau <kafai@fb.com>
Date: Wed Mar 2 11:55:31 2022 -0800
net: Add skb_clear_tstamp() to keep the mono delivery_time
Right now, skb->tstamp is reset to 0 whenever the skb is forwarded.
If skb->tstamp has the mono delivery_time, clearing it can hurt
the performance when it finally transmits out to fq@phy-dev.
The earlier patch added a skb->mono_delivery_time bit to
flag the skb->tstamp carrying the mono delivery_time.
This patch adds skb_clear_tstamp() helper which keeps
the mono delivery_time and clears everything else.
The delivery_time clearing will be postponed until the stack knows the
skb will be delivered locally. It will be done in a latter patch.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2120966
Conflicts:
- [minor] different context in tcp_fragment() due to missing
a52fe46ef160 ("tcp: factorize ip_summed setting")
commit a1ac9c8acec1605c6b43af418f79facafdced680
Author: Martin KaFai Lau <kafai@fb.com>
Date: Wed Mar 2 11:55:25 2022 -0800
net: Add skb->mono_delivery_time to distinguish mono delivery_time from (rcv) timestamp
skb->tstamp was first used as the (rcv) timestamp.
The major usage is to report it to the user (e.g. SO_TIMESTAMP).
Later, skb->tstamp is also set as the (future) delivery_time (e.g. EDT in TCP)
during egress and used by the qdisc (e.g. sch_fq) to make decision on when
the skb can be passed to the dev.
Currently, there is no way to tell skb->tstamp having the (rcv) timestamp
or the delivery_time, so it is always reset to 0 whenever forwarded
between egress and ingress.
While it makes sense to always clear the (rcv) timestamp in skb->tstamp
to avoid confusing sch_fq that expects the delivery_time, it is a
performance issue [0] to clear the delivery_time if the skb finally
egress to a fq@phy-dev. For example, when forwarding from egress to
ingress and then finally back to egress:
tcp-sender => veth@netns => veth@hostns => fq@eth0@hostns
^ ^
reset rest
This patch adds one bit skb->mono_delivery_time to flag the skb->tstamp
is storing the mono delivery_time (EDT) instead of the (rcv) timestamp.
The current use case is to keep the TCP mono delivery_time (EDT) and
to be used with sch_fq. A latter patch will also allow tc-bpf@ingress
to read and change the mono delivery_time.
In the future, another bit (e.g. skb->user_delivery_time) can be added
for the SCM_TXTIME where the clock base is tracked by sk->sk_clockid.
[ This patch is a prep work. The following patches will
get the other parts of the stack ready first. Then another patch
after that will finally set the skb->mono_delivery_time. ]
skb_set_delivery_time() function is added. It is used by the tcp_output.c
and during ip[6] fragmentation to assign the delivery_time to
the skb->tstamp and also set the skb->mono_delivery_time.
A note on the change in ip_send_unicast_reply() in ip_output.c.
It is only used by TCP to send reset/ack out of a ctl_sk.
Like the new skb_set_delivery_time(), this patch sets
the skb->mono_delivery_time to 0 for now as a place
holder. It will be enabled in a latter patch.
A similar case in tcp_ipv6 can be done with
skb_set_delivery_time() in tcp_v6_send_response().
[0] (slide 22): https://linuxplumbersconf.org/event/11/contributions/953/attachments/867/1658/LPC_2021_BPF_Datapath_Extensions.pdf
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2135319
Upstream Status: net.git commit ab7e2e0dfa5d
commit ab7e2e0dfa5d37540ab1dc5376e9a2cb9188925d
Author: Matthias May <matthias.may@westermo.com>
Date: Fri Aug 5 21:19:06 2022 +0200
ipv6: do not use RT_TOS for IPv6 flowlabel
According to Guillaume Nault RT_TOS should never be used for IPv6.
Quote:
RT_TOS() is an old macro used to interprete IPv4 TOS as described in
the obsolete RFC 1349. It's conceptually wrong to use it even in IPv4
code, although, given the current state of the code, most of the
existing calls have no consequence.
But using RT_TOS() in IPv6 code is always a bug: IPv6 never had a "TOS"
field to be interpreted the RFC 1349 way. There's no historical
compatibility to worry about.
Fixes: 571912c69f ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.")
Acked-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Matthias May <matthias.may@westermo.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Hangbin Liu <haliu@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2059161
Upstream Status: linux.git
commit 2edc1a383fda8d2f580216292dfd9daeae691e47
Author: Menglong Dong <imagedong@tencent.com>
Date: Wed Apr 13 16:15:55 2022 +0800
net: ip: add skb drop reasons to ip forwarding
Replace kfree_skb() which is used in ip6_forward() and ip_forward()
with kfree_skb_reason().
The new drop reason 'SKB_DROP_REASON_PKT_TOO_BIG' is introduced for
the case that the length of the packet exceeds MTU and can't
fragment.
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2059161
Upstream Status: linux.git
commit 5e187189ec324f78035d33a4bc123a9c4ca6f3e3
Author: Menglong Dong <imagedong@tencent.com>
Date: Sat Feb 26 12:18:29 2022 +0800
net: ip: add skb drop reasons for ip egress path
Replace kfree_skb() which is used in the packet egress path of IP layer
with kfree_skb_reason(). Functions that are involved include:
__ip_queue_xmit()
ip_finish_output()
ip_mc_finish_output()
ip6_output()
ip6_finish_output()
ip6_finish_output2()
Following new drop reasons are introduced:
SKB_DROP_REASON_IP_OUTNOROUTES
SKB_DROP_REASON_BPF_CGROUP_EGRESS
SKB_DROP_REASON_IPV6DISABLED
SKB_DROP_REASON_NEIGH_CREATEFAIL
Reviewed-by: Mengen Sun <mengensun@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2077372
Upstream Status: net.git commit e3fa461d8b0e
commit e3fa461d8b0e185b7da8a101fe94dfe6dd500ac0
Author: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Fri Apr 8 16:03:42 2022 +0200
ipv6: fix panic when forwarding a pkt with no in6 dev
kongweibin reported a kernel panic in ip6_forward() when input interface
has no in6 dev associated.
The following tc commands were used to reproduce this panic:
tc qdisc del dev vxlan100 root
tc qdisc add dev vxlan100 root netem corrupt 5%
CC: stable@vger.kernel.org
Fixes: ccd27f05ae ("ipv6: fix 'disable_policy' for fwd packets")
Reported-by: kongweibin <kongweibin2@huawei.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Hangbin Liu <haliu@redhat.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/407
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2041382
Tested: ENRT
Depends: https://bugzilla.redhat.com/show_bug.cgi?id=2028420
Depends: https://bugzilla.redhat.com/show_bug.cgi?id=2037783
Includes patches that would break kABI without backporting the full
series they are taken from, which we will do later (post-freeze).
The following fixes were omitted as the backport of commit
f35f821935d8 ("tcp: defer skb freeing after socket lock is released")
is a partial one not introducing the issues.
Omitted-fix: ffef737fd037 ("net/tls: Fix skb memory leak when running kTLS traffic")
Omitted-fix: db094aa8140e ("net/tls: Fix another skb memory leak when running kTLS traffic")
Omitted-fix: 79074a72d335 ("net: Flush deferred skb free on socket destroy")
Omitted-fix: ebdc1a030962 ("tcp: add a missing sk_defer_free_flush() in tcp_splice_read()")
Signed-off-by: Antoine Tenart <atenart@redhat.com>
Approved-by: Jarod Wilson <jarod@redhat.com>
Approved-by: Sabrina Dubroca <sdubroca@redhat.com>
Approved-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2041401
Upstream Status: net.git commit 19d36c5f2948
commit 19d36c5f294879949c9d6f57cb61d39cc4c48553
Author: Eric Dumazet <edumazet@google.com>
Date: Thu Nov 18 17:37:58 2021 -0800
ipv6: fix typos in __ip6_finish_output()
We deal with IPv6 packets, so we need to use IP6CB(skb)->flags and
IP6SKB_REROUTED, instead of IPCB(skb)->flags and IPSKB_REROUTED
Found by code inspection, please double check that fixing this bug
does not surface other bugs.
Fixes: 09ee9dba96 ("ipv6: Reinject IPv6 packets if IPsec policy matches after SNAT")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tobias Brunner <tobias@strongswan.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: David Ahern <dsahern@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Tested-by: Tobias Brunner <tobias@strongswan.org>
Acked-by: Tobias Brunner <tobias@strongswan.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Hangbin Liu <haliu@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2041382
Upstream Status: linux.git
Tested: ENRT
commit aba546565b613e74b84b8261999ea82b5561d3f1
Author: Eric Dumazet <edumazet@google.com>
Date: Mon Nov 15 11:02:35 2021 -0800
net: remove sk_route_nocaps
Instead of using a full netdev_features_t, we can use a single bit,
as sk_route_nocaps is only used to remove NETIF_F_GSO_MASK from
sk->sk_route_cap.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/212
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2028276
Tested: LNST, Tier1
This includes a few critical bugfixes for the core network stack.
Notably it includes 7f678def99d2 ("skb_expand_head() adjust skb->truesize incorrectly") and a whole series of pre-requisites. The bug addressed there is nasty and present even prior to skb_expand_head() introduction.
commit 719c57197010 ("net: make napi_disable() symmetric with enable") instead has been explicitly excluded, as it's not really a fix, is known to introduce problems and it's still quite new
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Approved-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Approved-by: Jarod Wilson <jarod@redhat.com>
Approved-by: Antoine Tenart <atenart@redhat.com>
Approved-by: Guillaume Nault <gnault@redhat.com>
Approved-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2028276
Tested: LNST, Tier1
Upstream commit:
commit 0c9f227bee11910a49e1d159abe102d06e3745d5
Author: Vasily Averin <vvs@virtuozzo.com>
Date: Mon Aug 2 11:52:29 2021 +0300
ipv6: use skb_expand_head in ip6_xmit
Unlike skb_realloc_headroom, new helper skb_expand_head
does not allocate a new skb if possible.
Additionally this patch replaces commonly used dereferencing with variables.
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2028276
Tested: LNST, Tier1
Upstream commit:
commit e415ed3a4b8b246ee5e9d109ff5153efcf96b9f2
Author: Vasily Averin <vvs@virtuozzo.com>
Date: Mon Aug 2 11:52:22 2021 +0300
ipv6: use skb_expand_head in ip6_finish_output2
Unlike skb_realloc_headroom, new helper skb_expand_head does not allocate
a new skb if possible.
Additionally this patch replaces commonly used dereferencing with variables.
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2025457
Upstream Status: net.git commit 0857d6f8c759
commit 0857d6f8c759d95f89d0436f86cdfd189ef99f20
Author: Stephen Suryaputra <ssuryaextr@gmail.com>
Date: Thu Oct 14 09:08:45 2021 -0400
ipv6: When forwarding count rx stats on the orig netdev
Commit bdb7cc643f ("ipv6: Count interface receive statistics on the
ingress netdev") does not work when ip6_forward() executes on the skbs
with vrf-enslaved netdev. Use IP6CB(skb)->iif to get to the right one.
Add a selftest script to verify.
Fixes: bdb7cc643f ("ipv6: Count interface receive statistics on the ingress netdev")
Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20211014130845.410602-1-ssuryaextr@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Hangbin Liu <haliu@redhat.com>
Decrease hop limit counter when deliver skb to ndp proxy.
Signed-off-by: Kangmin Park <l4stpr0gr4m@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When TEE target mirrors traffic to another interface, sk_buff may
not have enough headroom to be processed correctly.
ip_finish_output2() detect this situation for ipv4 and allocates
new skb with enogh headroom. However ipv6 lacks this logic in
ip_finish_output2 and it leads to skb_under_panic:
skbuff: skb_under_panic: text:ffffffffc0866ad4 len:96 put:24
head:ffff97be85e31800 data:ffff97be85e317f8 tail:0x58 end:0xc0 dev:gre0
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:110!
invalid opcode: 0000 [#1] SMP PTI
CPU: 2 PID: 393 Comm: kworker/2:2 Tainted: G OE 5.13.0 #13
Hardware name: Virtuozzo KVM, BIOS 1.11.0-2.vz7.4 04/01/2014
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:skb_panic+0x48/0x4a
Call Trace:
skb_push.cold.111+0x10/0x10
ipgre_header+0x24/0xf0 [ip_gre]
neigh_connected_output+0xae/0xf0
ip6_finish_output2+0x1a8/0x5a0
ip6_output+0x5c/0x110
nf_dup_ipv6+0x158/0x1000 [nf_dup_ipv6]
tee_tg6+0x2e/0x40 [xt_TEE]
ip6t_do_table+0x294/0x470 [ip6_tables]
nf_hook_slow+0x44/0xc0
nf_hook.constprop.34+0x72/0xe0
ndisc_send_skb+0x20d/0x2e0
ndisc_send_ns+0xd1/0x210
addrconf_dad_work+0x3c8/0x540
process_one_work+0x1d1/0x370
worker_thread+0x30/0x390
kthread+0x116/0x130
ret_from_fork+0x22/0x30
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>