JIRA: https://issues.redhat.com/browse/RHEL-57756
Upstream commit(s):
commit 9cf621bd5fcbeadc2804951d13d487e22e95b363
Author: Eric Dumazet <edumazet@google.com>
Date: Fri May 3 19:20:59 2024 +0000
rtnetlink: allow rtnl_fill_link_netnsid() to run under RCU protection
We want to be able to run rtnl_fill_ifinfo() under RCU protection
instead of RTNL in the future.
All rtnl_link_ops->get_link_net() methods already using dev_net()
are ready. I added READ_ONCE() annotations on others.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Petr Oros <poros@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-68063
Upstream Status: net-next.git
commit eb4f99c56ad30cb0f8c8e93a78b1200f5987e41e
Author: Menglong Dong <menglong8.dong@gmail.com>
Date: Tue Oct 15 16:28:30 2024 +0800
net: vxlan: replace VXLAN_INVALID_HDR with VNI_NOT_FOUND
Replace the drop reason "SKB_DROP_REASON_VXLAN_INVALID_HDR" with
"SKB_DROP_REASON_VXLAN_VNI_NOT_FOUND" in encap_bypass_if_local(), as the
latter is more accurate.
Fixes: 790961d88b0e ("net: vxlan: use kfree_skb_reason() in encap_bypass_if_local()")
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-68063
Upstream Status: net-next.git
commit 790961d88b0e63d993e112b747746dfd94a7c823
Author: Menglong Dong <menglong8.dong@gmail.com>
Date: Wed Oct 9 10:28:30 2024 +0800
net: vxlan: use kfree_skb_reason() in encap_bypass_if_local()
Replace kfree_skb() with kfree_skb_reason() in encap_bypass_if_local, and
no new skb drop reason is added in this commit.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-68063
Upstream Status: net-next.git
commit c106479b612d34739c9337a18ce5332ca613f993
Author: Menglong Dong <menglong8.dong@gmail.com>
Date: Wed Oct 9 10:28:29 2024 +0800
net: vxlan: use kfree_skb_reason() in vxlan_encap_bypass()
Replace kfree_skb with kfree_skb_reason in vxlan_encap_bypass, and no new
skb drop reason is added in this commit.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-68063
Upstream Status: net-next.git
commit e7c700aaa67a59c28da07072fbaae207b5f27519
Author: Menglong Dong <menglong8.dong@gmail.com>
Date: Wed Oct 9 10:28:27 2024 +0800
net: vxlan: add drop reasons support to vxlan_xmit_one()
Replace kfree_skb/dev_kfree_skb with kfree_skb_reason in vxlan_xmit_one.
No drop reasons are introduced in this commit.
The only concern of mine is replacing dev_kfree_skb with
kfree_skb_reason. The dev_kfree_skb is equal to consume_skb, and I'm not
sure if we can change it to kfree_skb here. In my option, the skb is
"dropped" here, isn't it?
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-68063
Upstream Status: net-next.git
commit b71a576e452b800efeac49ecca116d954601d911
Author: Menglong Dong <menglong8.dong@gmail.com>
Date: Wed Oct 9 10:28:26 2024 +0800
net: vxlan: use kfree_skb_reason() in vxlan_xmit()
Replace kfree_skb() with kfree_skb_reason() in vxlan_xmit(). Following
new skb drop reasons are introduced for vxlan:
/* no remote found for xmit */
SKB_DROP_REASON_VXLAN_NO_REMOTE
/* packet without necessary metadata reached a device which is
* in "external" mode
*/
SKB_DROP_REASON_TUNNEL_TXINFO
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-68063
Upstream Status: net-next.git
commit d209706f562ee4fa81bdf24cf6b679c3222aa06c
Author: Menglong Dong <menglong8.dong@gmail.com>
Date: Wed Oct 9 10:28:25 2024 +0800
net: vxlan: make vxlan_set_mac() return drop reasons
Change the return type of vxlan_set_mac() from bool to enum
skb_drop_reason. In this commit, the drop reason
"SKB_DROP_REASON_LOCAL_MAC" is introduced for the case that the source
mac of the packet is a local mac.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-68063
Upstream Status: net-next.git
Conflicts:\
- One chunk missing due to upstream commit f58f45c1e5b9 ("vxlan: drop
packets from invalid src-address") not in c9s.
commit 289fd4e75219a96f77c5d679166035cd5118d139
Author: Menglong Dong <menglong8.dong@gmail.com>
Date: Wed Oct 9 10:28:24 2024 +0800
net: vxlan: make vxlan_snoop() return drop reasons
Change the return type of vxlan_snoop() from bool to enum
skb_drop_reason. In this commit, two drop reasons are introduced:
SKB_DROP_REASON_MAC_INVALID_SOURCE
SKB_DROP_REASON_VXLAN_ENTRY_EXISTS
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-68063
Upstream Status: net-next.git
commit 7b3e018f3eefc6144638800d9f92b3a5e120c537
Author: Menglong Dong <menglong8.dong@gmail.com>
Date: Wed Oct 9 10:28:23 2024 +0800
net: vxlan: make vxlan_remcsum() return drop reasons
Make vxlan_remcsum() support skb drop reasons by changing the return
value type of it from bool to enum skb_drop_reason.
The only drop reason in vxlan_remcsum() comes from pskb_may_pull_reason(),
so we just return it.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-68063
Upstream Status: net-next.git
commit 4c06d9daf8e6215447ca8a2ddd59fa09862c9bae
Author: Menglong Dong <menglong8.dong@gmail.com>
Date: Wed Oct 9 10:28:22 2024 +0800
net: vxlan: add skb drop reasons to vxlan_rcv()
Introduce skb drop reasons to the function vxlan_rcv(). Following new
drop reasons are added:
SKB_DROP_REASON_VXLAN_INVALID_HDR
SKB_DROP_REASON_VXLAN_VNI_NOT_FOUND
SKB_DROP_REASON_IP_TUNNEL_ECN
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-68063
Upstream Status: net-next.git
Conflicts:\
- Context difference (in changed lines) because of c9s-only
modifications introduced by c9s commit b3455ab162 ("geneve: Fix
incorrect inner network header offset when innerprotoinherit is set"(.
commit 9990ddf47d4168088e2246c3d418bf526e40830d
Author: Menglong Dong <menglong8.dong@gmail.com>
Date: Wed Oct 9 10:28:21 2024 +0800
net: tunnel: make skb_vlan_inet_prepare() return drop reasons
Make skb_vlan_inet_prepare return the skb drop reasons, which is just
what pskb_may_pull_reason() returns. Meanwhile, adjust all the call of
it.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-59091
commit 00d066a4d4edbe559ba6c35153da71d4b2b8a383
Author: Alexander Lobakin <aleksander.lobakin@intel.com>
Date: Thu Aug 29 14:33:37 2024 +0200
netdev_features: convert NETIF_F_LLTX to dev->lltx
NETIF_F_LLTX can't be changed via Ethtool and is not a feature,
rather an attribute, very similar to IFF_NO_QUEUE (and hot).
Free one netdev_features_t bit and make it a "hot" private flag.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Conflicts:
drivers/net/macsec.c
drivers/net/veth.c
net/ipv6/ip6_tunnel.c
- Context.
drivers/net/amt.c
drivers/net/netkit.c
- Non-existent in RHEL 9.
drivers/net/ethernet/chelsio/cxgb/cxgb2.c
drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
- Drivers disabled in RHEL 9. Skipped.
net/dsa/user.c
- This is slave.c in RHEL 9, but CONFIG_NET_DSA is disabled,
so skipped the hunk.
net/core/net-sysfs.c
- Code not present because of missing commit 74293ea1c4db
("net: sysfs: Do not create sysfs for non BQL device")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-59091
commit beb5a9bea8239cdf4adf6b62672e30db3e9fa5ce
Author: Alexander Lobakin <aleksander.lobakin@intel.com>
Date: Thu Aug 29 14:33:36 2024 +0200
netdevice: convert private flags > BIT(31) to bitfields
Make dev->priv_flags `u32` back and define bits higher than 31 as
bitfield booleans as per Jakub's suggestion. This simplifies code
which accesses these bits with no optimization loss (testb both
before/after), allows to not extend &netdev_priv_flags each time,
but also scales better as bits > 63 in the future would only add
a new u64 to the structure with no complications, comparing to
that extending ::priv_flags would require converting it to a bitmap.
Note that I picked `unsigned long :1` to not lose any potential
optimizations comparing to `bool :1` etc.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Conflicts:
drivers/net/ethernet/microchip/lan966x/lan966x_main.c
- Driver not present in RHEL 9.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-35248
Upstream Status: linux.git
commit 31392048f55f98cb01ca709d32d06d926ab9760a
Author: Guillaume Nault <gnault@redhat.com>
Date: Wed Jun 19 15:34:57 2024 +0200
vxlan: Pull inner IP header in vxlan_xmit_one().
Ensure the inner IP header is part of the skb's linear data before
setting old_iph. Otherwise, on a non-linear skb, old_iph could point
outside of the packet data.
Unlike classical VXLAN, which always encapsulates Ethernet packets,
VXLAN-GPE can transport IP packets directly. In that case, we need to
look at skb->protocol to figure out if an Ethernet header is present.
Fixes: d342894c5d ("vxlan: virtual extensible lan")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/2aa75f6fa62ac9dbe4f16ad5ba75dd04a51d4b99.1718804000.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Guillaume Nault <gnault@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-35248
Upstream Status: net.git
Conflicts:
* Missing upstream commit 4095e0e1328a ("drivers: vxlan: vnifilter:
per vni stats"):
Drop the vxlan_vnifilter_count() as Centos Stream 9 doesn't have
this feature.
* (context) Missing upstream commit 6dee402daba4 ("vxlan: Fix racy
device stats updates."):
Centos Stream 9 still updates statistics non-atomically (that is,
not using DEV_STATS_INC()). Keep the new DEV_STATS_INC() calls of
the original patch though, since this macro is defined in Centos
Stream 9 and that's the proper way for updating these counters.
commit f7789419137b18e3847d0cc41afd788c3c00663d
Author: Guillaume Nault <gnault@redhat.com>
Date: Tue Apr 30 18:50:13 2024 +0200
vxlan: Pull inner IP header in vxlan_rcv().
Ensure the inner IP header is part of skb's linear data before reading
its ECN bits. Otherwise we might read garbage.
One symptom is the system erroneously logging errors like
"vxlan: non-ECT from xxx.xxx.xxx.xxx with TOS=xxxx".
Similar bugs have been fixed in geneve, ip_tunnel and ip6_tunnel (see
commit 1ca1ba465e55 ("geneve: make sure to pull inner header in
geneve_rx()") for example). So let's reuse the same code structure for
consistency. Maybe we'll can add a common helper in the future.
Fixes: d342894c5d ("vxlan: virtual extensible lan")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/1239c8db54efec341dd6455c77e0380f58923a3c.1714495737.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Guillaume Nault <gnault@redhat.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4465
JIRA: https://issues.redhat.com/browse/RHEL-40130
Tested: Using routing and tunneling self-tests
Depends: !4435
Commits:
```
537dd2d9fb9f ("net: Add helper function to parse netlink msg of ip_tunnel_encap")
b86fca800a6a ("net: Add helper function to parse netlink msg of ip_tunnel_parm")
63c15822b8dd ("lib/bitmap: add bitmap_{read,write}()")
117aef12a7b1 ("ip_tunnel: use a separate struct to store tunnel params in the kernel")
020e8f60aa8b ("ip_gre: Make GRE and GRETAP devices always NETIF_F_LLTX")
b11ebf2ca2c1 ("ip6_gre: Make IP6GRE and IP6GRETAP devices always NETIF_F_LLTX")
45490ce2ff83 ("nfp: flower: add support for tunnel offload without key ID")
bf3fcbf7e7a0 ("ipv4: rename and move ip_route_output_tunnel()")
78f3655adcb5 ("ipv4: remove "proto" argument from udp_tunnel_dst_lookup()")
72fc68c6356b ("ipv4: add new arguments to udp_tunnel_dst_lookup()")
3ae983a603a4 ("ipv4: use tunnel flow flags for tunnel route lookups")
60a77d11cd5d ("geneve: add dsfield helper function")
daa2ba7ed1d1 ("geneve: use generic function for tunnel IPv4 route lookup")
6f19b2c136d9 ("vxlan: use generic function for tunnel IPv4 route lookup")
fc47e86dbfb7 ("ipv6: rename and move ip6_dst_lookup_tunnel()")
7e937dcf96d0 ("ipv6: remove "proto" argument from udp_tunnel6_dst_lookup()")
946fcfdbc5b9 ("ipv6: add new arguments to udp_tunnel6_dst_lookup()")
69d72587c17b ("geneve: use generic function for tunnel IPv6 route lookup")
f25e621f5d4c ("ipv6: mark address parameters of udp_tunnel6_xmit_skb() as const")
2aceb896ee18 ("vxlan: use generic function for tunnel IPv6 route lookup")
3e7e5baaaba7 ("bitmap: don't assume compiler evaluates small mem*() builtins calls")
c1023f5634b9 ("s390/cio: rename bitmap_size() -> idset_bitmap_size()")
10a04ff09bcc ("tools: move alignment-related macros to new <linux/align.h>")
a37fbe666c01 ("bitmap: introduce generic optimized bitmap_size()")
5832c4a77d69 ("ip_tunnel: convert __be16 tunnel flags to bitmaps")
5a66cda52d7d ("ip_tunnel: harden copying IP tunnel params to userspace")
```
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Approved-by: Petr Oros <poros@redhat.com>
Approved-by: José Ignacio Tornos Martínez <jtornosm@redhat.com>
Approved-by: Davide Caratti <dcaratti@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>
Merged-by: Lucas Zampieri <lzampier@redhat.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4387
JIRA: https://issues.redhat.com/browse/RHEL-39583
Tested: Just built... no way to test
Commit(s):
```
1eb2cded45b3 ("net: annotate writes on dev->mtu from ndo_change_mtu()")
```
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Approved-by: Tony Camuso <tcamuso@redhat.com>
Approved-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Approved-by: Antoine Tenart <atenart@redhat.com>
Approved-by: José Ignacio Tornos Martínez <jtornosm@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>
Merged-by: Lucas Zampieri <lzampier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-40130
Conflicts:
- hunk for non-existing net/ipv4/fou_bpf.c skipped
- conflict in ip_gre.c resolved in the same way as upstream merge
commit cf1ca1f66d30 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net") did
- simple context conflict ip_tunnel.c due to missing commit
c4794d22251b9 ("ipv4: tunnels: use DEV_STATS_INC()")
- simple context conflict in ip6_gre.c and ip6_tunnel.c due to missing
commit 2fad1ba354d4a ("ipv6: tunnels: use DEV_STATS_INC()")
- simple conflict in nft_tunnel.c due to missing ffb3d9a30cc67 ("netfilter:
nf_tables: use correct integer types")
commit 5832c4a77d6931cebf9ba737129ae8f14b66ee1d
Author: Alexander Lobakin <aleksander.lobakin@intel.com>
Date: Wed Mar 27 16:23:53 2024 +0100
ip_tunnel: convert __be16 tunnel flags to bitmaps
Historically, tunnel flags like TUNNEL_CSUM or TUNNEL_ERSPAN_OPT
have been defined as __be16. Now all of those 16 bits are occupied
and there's no more free space for new flags.
It can't be simply switched to a bigger container with no
adjustments to the values, since it's an explicit Endian storage,
and on LE systems (__be16)0x0001 equals to
(__be64)0x0001000000000000.
We could probably define new 64-bit flags depending on the
Endianness, i.e. (__be64)0x0001 on BE and (__be64)0x00010000... on
LE, but that would introduce an Endianness dependency and spawn a
ton of Sparse warnings. To mitigate them, all of those places which
were adjusted with this change would be touched anyway, so why not
define stuff properly if there's no choice.
Define IP_TUNNEL_*_BIT counterparts as a bit number instead of the
value already coded and a fistful of <16 <-> bitmap> converters and
helpers. The two flags which have a different bit position are
SIT_ISATAP_BIT and VTI_ISVTI_BIT, as they were defined not as
__cpu_to_be16(), but as (__force __be16), i.e. had different
positions on LE and BE. Now they both have strongly defined places.
Change all __be16 fields which were used to store those flags, to
IP_TUNNEL_DECLARE_FLAGS() -> DECLARE_BITMAP(__IP_TUNNEL_FLAG_NUM) ->
unsigned long[1] for now, and replace all TUNNEL_* occurrences to
their bitmap counterparts. Use the converters in the places which talk
to the userspace, hardware (NFP) or other hosts (GRE header). The rest
must explicitly use the new flags only. This must be done at once,
otherwise there will be too many conversions throughout the code in
the intermediate commits.
Finally, disable the old __be16 flags for use in the kernel code
(except for the two 'irregular' flags mentioned above), to prevent
any accidental (mis)use of them. For the userspace, nothing is
changed, only additions were made.
Most noticeable bloat-o-meter difference (.text):
vmlinux: 307/-1 (306)
gre.ko: 62/0 (62)
ip_gre.ko: 941/-217 (724) [*]
ip_tunnel.ko: 390/-900 (-510) [**]
ip_vti.ko: 138/0 (138)
ip6_gre.ko: 534/-18 (516) [*]
ip6_tunnel.ko: 118/-10 (108)
[*] gre_flags_to_tnl_flags() grew, but still is inlined
[**] ip_tunnel_find() got uninlined, hence such decrease
The average code size increase in non-extreme case is 100-200 bytes
per module, mostly due to sizeof(long) > sizeof(__be16), as
%__IP_TUNNEL_FLAG_NUM is less than %BITS_PER_LONG and the compilers
are able to expand the majority of bitmap_*() calls here into direct
operations on scalars.
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-40130
commit 2aceb896ee18ae35b21b14c978d8c2ef8c7b439d
Author: Beniamino Galvani <b.galvani@gmail.com>
Date: Fri Oct 20 13:55:29 2023 +0200
vxlan: use generic function for tunnel IPv6 route lookup
The route lookup can be done now via generic function
udp_tunnel6_dst_lookup() to replace the custom implementation in
vxlan6_get_route().
This is similar to what already done for IPv4 in commit 6f19b2c136d9
("vxlan: use generic function for tunnel IPv4 route lookup").
Suggested-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Beniamino Galvani <b.galvani@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-40130
commit 6f19b2c136d98a84d79030b53e23d405edfdc783
Author: Beniamino Galvani <b.galvani@gmail.com>
Date: Mon Oct 16 09:15:26 2023 +0200
vxlan: use generic function for tunnel IPv4 route lookup
The route lookup can be done now via generic function
udp_tunnel_dst_lookup() to replace the custom implementations in
vxlan_get_route().
Note that this patch only touches IPv4, while IPv6 still uses
vxlan6_get_route(). After IPv6 route lookup gets converted as well,
vxlan_xmit_one() can be simplified by removing local variables that
will be passed via "struct ip_tunnel_key", such as remote_ip,
local_ip, flow_flags, label.
Suggested-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Beniamino Galvani <b.galvani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-39583
Conflicts:
- hunks for non-existing files and non-applicable hunks for unsupported
drivers, batman-adv and DSA were skipped
commit 1eb2cded45b35816085c1f962933c187d970f9dc
Author: Eric Dumazet <edumazet@google.com>
Date: Mon May 6 10:28:12 2024 +0000
net: annotate writes on dev->mtu from ndo_change_mtu()
Simon reported that ndo_change_mtu() methods were never
updated to use WRITE_ONCE(dev->mtu, new_mtu) as hinted
in commit 501a90c945 ("inet: protect against too small
mtu values.")
We read dev->mtu without holding RTNL in many places,
with READ_ONCE() annotations.
It is time to take care of ndo_change_mtu() methods
to use corresponding WRITE_ONCE()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Simon Horman <horms@kernel.org>
Closes: https://lore.kernel.org/netdev/20240505144608.GB67882@kernel.org/
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20240506102812.3025432-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4133
JIRA: https://issues.redhat.com/browse/RHEL-6066
Upstream Status: all mainline in net-next.git
Tested: boot-tested only
Conflicts: None
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Approved-by: Xin Long <lxin@redhat.com>
Approved-by: Florian Westphal <fwestpha@redhat.com>
Approved-by: Ivan Vecera <ivecera@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>
Conflicts:
- drivers/net/vxlan/vxlan_core.c: resolved conflict between
3f27054994 ('net: add netdev_lockdep_set_classes() to virtual
drivers') and 3158288aac ('vxlan: Fix memory leaks in error path')
which came in with MR!4249.
Merged-by: Scott Weaver <scweaver@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-29681
Upstream Status: linux.git
commit 110d3047a3ec033de00322b1a8068b1215efa97a
Author: Eric Dumazet <edumazet@google.com>
Date: Tue Feb 6 14:43:05 2024 +0000
vxlan: use exit_batch_rtnl() method
exit_batch_rtnl() is called while RTNL is held,
and devices to be unregistered can be queued in the dev_kill_list.
This saves one rtnl_lock()/rtnl_unlock() pair per netns
and one unregister_netdevice_many() call.
v4: (Paolo feedback : https://netdev-3.bots.linux.dev/vmksft-net/results/453141/17-udpgro-fwd-sh/stdout )
- Changed vxlan_destroy_tunnels() to use vxlan_dellink()
instead of unregister_netdevice_queue to propely remove
devices from vn->vxlan_list.
- vxlan_destroy_tunnels() can simply iterate one list (vn->vxlan_list)
to find all devices in the most efficient way.
- Moved sanity checks in a separate vxlan_exit_net() method.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Antoine Tenart <atenart@kernel.org>
Link: https://lore.kernel.org/r/20240206144313.2050392-10-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-36219
commit 4cde72fead4cebb5b6b2fe9425904c2064739184
Author: Ido Schimmel <idosch@nvidia.com>
Date: Sun Dec 17 10:32:41 2023 +0200
vxlan: mdb: Add MDB bulk deletion support
Implement MDB bulk deletion support in the VXLAN driver, allowing MDB
entries to be deleted in bulk according to provided parameters.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-36219
commit 32d9673e96dc636cbfca2381b2c93b7a15dc3369
Author: Ido Schimmel <idosch@nvidia.com>
Date: Wed Oct 25 15:30:17 2023 +0300
vxlan: mdb: Add MDB get support
Implement support for MDB get operation by looking up a matching MDB
entry, allocating the skb according to the entry's size and then filling
in the response.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-36610
commit b22ea4ef4c3438817fcb604255b55b0058ed8c64
Author: Guillaume Nault <gnault@redhat.com>
Date: Fri Apr 26 17:27:19 2024 +0200
vxlan: Add missing VNI filter counter update in arp_reduce().
VXLAN stores per-VNI statistics using vxlan_vnifilter_count().
These statistics were not updated when arp_reduce() failed its
pskb_may_pull() call.
Use vxlan_vnifilter_count() to update the VNI counter when that
happens.
Fixes: 4095e0e1328a ("drivers: vxlan: vnifilter: per vni stats")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-36610
commit 6dee402daba4eb8677a9438ebdcd8fe90ddd4326
Author: Guillaume Nault <gnault@redhat.com>
Date: Fri Apr 26 17:27:17 2024 +0200
vxlan: Fix racy device stats updates.
VXLAN devices update their stats locklessly. Therefore these counters
should either be stored in per-cpu data structures or the updates
should be done using atomic increments.
Since the net_device_core_stats infrastructure is already used in
vxlan_rcv(), use it for the other rx_dropped and tx_dropped counter
updates. Update the other counters atomically using DEV_STATS_INC().
Fixes: d342894c5d ("vxlan: virtual extensible lan")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-36610
commit 6d90b64256f39511d0083111e167321dc6a6add2
Author: Benjamin Poirier <bpoirier@nvidia.com>
Date: Fri Oct 27 14:44:10 2023 -0400
vxlan: Cleanup IFLA_VXLAN_PORT_RANGE entry in vxlan_get_size()
This patch is basically a followup to commit 4e4b1798cc90 ("vxlan: Add
missing entries to vxlan_get_size()"). All of the attributes in
vxlan_get_size() appear in the same order that they are filled in
vxlan_fill_info() except for IFLA_VXLAN_PORT_RANGE. For consistency, move
that entry to match its order and add a comment, like for all other
entries.
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Link: https://lore.kernel.org/r/20231027184410.236671-1-bpoirier@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-36610
commit 4e4b1798cc90e376b8b61d0098b4093898a32227
Author: Benjamin Poirier <bpoirier@nvidia.com>
Date: Mon Sep 18 11:40:15 2023 -0400
vxlan: Add missing entries to vxlan_get_size()
There are some attributes added by vxlan_fill_info() which are not
accounted for in vxlan_get_size(). Add them.
I didn't find a way to trigger an actual problem from this miscalculation
since there is usually extra space in netlink size calculations like
if_nlmsg_size(); but maybe I just didn't search long enough.
Fixes: 3511494ce2 ("vxlan: Group Policy extension")
Fixes: e1e5314de0 ("vxlan: implement GPE")
Fixes: 0ace2ca89c ("vxlan: Use checksum partial with remote checksum offload")
Fixes: f9c4bb0b245c ("vxlan: vni filtering support on collect metadata device")
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-36610
commit 3c0930b491f8995de974f459648e4aad4ca996ff
Author: Li Zetao <lizetao1@huawei.com>
Date: Thu Aug 10 16:56:42 2023 +0800
vxlan: Use helper functions to update stats
Use the helper functions dev_sw_netstats_rx_add() and
dev_sw_netstats_tx_add() to update stats, which helps to
provide code readability.
Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-36610
commit d977e1c8e3a143bceb63a0042890f4a0268a9990
Author: Ido Schimmel <idosch@nvidia.com>
Date: Mon Jul 17 11:12:27 2023 +0300
vxlan: Add support for nexthop ID metadata
VXLAN FDB entries can point to FDB nexthop objects. Each such object
includes the IP address(es) of remote VTEP(s) via which the target host
is accessible. Example:
# ip nexthop add id 1 via 192.0.2.1 fdb
# ip nexthop add id 2 via 192.0.2.17 fdb
# ip nexthop add id 1000 group 1/2 fdb
# bridge fdb add 00:11:22:33:44:55 dev vx0 self static nhid 1000 src_vni 10020
This is useful for EVPN multihoming where a single host can be connected
to multiple VTEPs. The source VTEP will calculate the flow hash of the
skb and forward it towards the IP address of one of the VTEPs member in
the nexthop group.
There are cases where an external entity (e.g., the bridge driver) can
provide not only the tunnel ID (i.e., VNI) of the skb, but also the ID
of the nexthop object via which the skb should be forwarded.
Therefore, in order to support such cases, when the VXLAN device is in
external / collect metadata mode and the tunnel info attached to the skb
is of bridge type, extract the nexthop ID from the tunnel info. If the
ID is valid (i.e., non-zero), forward the skb via the nexthop object
associated with the ID, as if the skb hit an FDB entry associated with
this ID.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-36610
commit 69474a8a5837be63f13c6f60a7d622b98ed5c539
Author: Vladimir Nikishkin <vladimir@nikishkin.pw>
Date: Fri May 12 11:40:33 2023 +0800
net: vxlan: Add nolocalbypass option to vxlan.
If a packet needs to be encapsulated towards a local destination IP, the
packet will undergo a "local bypass" and be injected into the Rx path as
if it was received by the target VXLAN device without undergoing
encapsulation. If such a device does not exist, the packet will be
dropped.
There are scenarios where we do not want to perform such a bypass, but
instead want the packet to be encapsulated and locally received by a
user space program for post-processing.
To that end, add a new VXLAN device attribute that controls whether a
"local bypass" is performed or not. Default to performing a bypass to
maintain existing behavior.
Signed-off-by: Vladimir Nikishkin <vladimir@nikishkin.pw>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-36610
commit 08f876a7d79ed235f90af0373d1e548a71c1f4f6
Author: Ido Schimmel <idosch@nvidia.com>
Date: Wed Mar 15 15:11:54 2023 +0200
vxlan: Enable MDB support
Now that the VXLAN MDB control and data paths are in place we can expose
the VXLAN MDB functionality to user space.
Set the VXLAN MDB net device operations to the appropriate functions,
thereby allowing the rtnetlink code to reach the VXLAN driver.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-36610
commit 0f83e69f44bf8dc8ab48ff0196b3475c1f0f6c07
Author: Ido Schimmel <idosch@nvidia.com>
Date: Wed Mar 15 15:11:53 2023 +0200
vxlan: Add MDB data path support
Integrate MDB support into the Tx path of the VXLAN driver, allowing it
to selectively forward IP multicast traffic according to the matched MDB
entry.
If MDB entries are configured (i.e., 'VXLAN_F_MDB' is set) and the
packet is an IP multicast packet, perform up to three different lookups
according to the following priority:
1. For an (S, G) entry, using {Source VNI, Source IP, Destination IP}.
2. For a (*, G) entry, using {Source VNI, Destination IP}.
3. For the catchall MDB entry (0.0.0.0 or ::), using the source VNI.
The catchall MDB entry is similar to the catchall FDB entry
(00:00:00:00:00:00) that is currently used to transmit BUM (broadcast,
unknown unicast and multicast) traffic. However, unlike the catchall FDB
entry, this entry is only used to transmit unregistered IP multicast
traffic that is not link-local. Therefore, when configured, the catchall
FDB entry will only transmit BULL (broadcast, unknown unicast,
link-local multicast) traffic.
The catchall MDB entry is useful in deployments where inter-subnet
multicast forwarding is used and not all the VTEPs in a tenant domain
are members in all the broadcast domains. In such deployments it is
advantageous to transmit BULL (broadcast, unknown unicast and link-local
multicast) and unregistered IP multicast traffic on different tunnels.
If the same tunnel was used, a VTEP only interested in IP multicast
traffic would also pull all the BULL traffic and drop it as it is not a
member in the originating broadcast domain [1].
If the packet did not match an MDB entry (or if the packet is not an IP
multicast packet), return it to the Tx path, allowing it to be forwarded
according to the FDB.
If the packet did match an MDB entry, forward it to the associated
remote VTEPs. However, if the entry is a (*, G) entry and the associated
remote is in INCLUDE mode, then skip over it as the source IP is not in
its source list (otherwise the packet would have matched on an (S, G)
entry). Similarly, if the associated remote is marked as BLOCKED (can
only be set on (S, G) entries), then skip over it as well as the remote
is in EXCLUDE mode and the source IP is in its source list.
[1] https://datatracker.ietf.org/doc/html/draft-ietf-bess-evpn-irb-mcast#section-2.6
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-36610
commit a3a48de5eade770e911d35291217bdd69ce04ef1
Author: Ido Schimmel <idosch@nvidia.com>
Date: Wed Mar 15 15:11:51 2023 +0200
vxlan: mdb: Add MDB control path support
Implement MDB control path support, enabling the creation, deletion,
replacement and dumping of MDB entries in a similar fashion to the
bridge driver. Unlike the bridge driver, each entry stores a list of
remote VTEPs to which matched packets need to be replicated to and not a
list of bridge ports.
The motivating use case is the installation of MDB entries by a user
space control plane in response to received EVPN routes. As such, only
allow permanent MDB entries to be installed and do not implement
snooping functionality, avoiding a lot of unnecessary complexity.
Since entries can only be modified by user space under RTNL, use RTNL as
the write lock. Use RCU to ensure that MDB entries and remotes are not
freed while being accessed from the data path during transmission.
In terms of uAPI, reuse the existing MDB netlink interface, but add a
few new attributes to request and response messages:
* IP address of the destination VXLAN tunnel endpoint where the
multicast receivers reside.
* UDP destination port number to use to connect to the remote VXLAN
tunnel endpoint.
* VXLAN VNI Network Identifier to use to connect to the remote VXLAN
tunnel endpoint. Required when Ingress Replication (IR) is used and
the remote VTEP is not a member of originating broadcast domain
(VLAN/VNI) [1].
* Source VNI Network Identifier the MDB entry belongs to. Used only when
the VXLAN device is in external mode.
* Interface index of the outgoing interface to reach the remote VXLAN
tunnel endpoint. This is required when the underlay destination IP is
multicast (P2MP), as the multicast routing tables are not consulted.
All the new attributes are added under the 'MDBA_SET_ENTRY_ATTRS' nest
which is strictly validated by the bridge driver, thereby automatically
rejecting the new attributes.
[1] https://datatracker.ietf.org/doc/html/draft-ietf-bess-evpn-irb-mcast#section-3.2.2
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-36610
commit 6ab271aaad25351ea8587d67c6837678b875eb2c
Author: Ido Schimmel <idosch@nvidia.com>
Date: Wed Mar 15 15:11:50 2023 +0200
vxlan: Expose vxlan_xmit_one()
Given a packet and a remote destination, the function will take care of
encapsulating the packet and transmitting it to the destination.
Expose it so that it could be used in subsequent patches by the MDB code
to transmit a packet to the remote destination(s) stored in the MDB
entry.
It will allow us to keep the MDB code self-contained, not exposing its
data structures to the rest of the VXLAN driver.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-36610
commit f307c8bf37a346ed3e8b6090b64b4ca8d61e1bcd
Author: Ido Schimmel <idosch@nvidia.com>
Date: Wed Mar 15 15:11:49 2023 +0200
vxlan: Move address helpers to private headers
Move the helpers out of the core C file to the private header so that
they could be used by the upcoming MDB code.
While at it, constify the second argument of vxlan_nla_get_addr().
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-36610
Conflicts:
- modified due to already applied commit b0b672c4d095 ("vxlan: fix
GRO with VXLAN-GPE")
- hunk for fou is applied into fou_core.c instead of fou.c due to
existing backport of 08d323234d10 ("net: fou: rename the source for
linking")
commit 35ffb66547295c72650978f9c28e670e014d0957
Author: Richard Gobert <richardbgobert@gmail.com>
Date: Tue Aug 23 09:10:49 2022 +0200
net: gro: skb_gro_header helper function
Introduce a simple helper function to replace a common pattern.
When accessing the GRO header, we fetch the pointer from frag0,
then test its validity and fetch it from the skb when necessary.
This leads to the pattern
skb_gro_header_fast -> skb_gro_header_hard -> skb_gro_header_slow
recurring many times throughout GRO code.
This patch replaces these patterns with a single inlined function
call, improving code readability.
Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220823071034.GA56142@debian
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-36610
commit c2e10f53455c898050738d6a5f8c237f27aec225
Author: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com>
Date: Fri May 20 02:36:14 2022 +0200
net: vxlan: Fix kernel coding style
The continuation line does not align with the opening bracket
and this patch fix it.
Signed-off-by: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com>
Link: https://lore.kernel.org/r/20220520003614.6073-1-eng.alaamohamedsoliman.am@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-36610
commit e92695e506d663bc4868ffc5bc187488a4f4d5c8
Author: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com>
Date: Thu May 5 17:09:58 2022 +0200
net: vxlan: Add extack support to vxlan_fdb_delete
This patch adds extack msg support to vxlan_fdb_delete and vxlan_fdb_parse.
extack is used to propagate meaningful error msgs to the user of vxlan
fdb netlink api
Signed-off-by: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-36610
commit 8daf4e75fc09d6b0ca8fea0988959c99643aa8a8
Author: Dan Carpenter <dan.carpenter@oracle.com>
Date: Mon Mar 7 15:57:36 2022 +0300
vxlan_core: delete unnecessary condition
The previous check handled the "if (!nh)" condition so we know "nh"
is non-NULL here. Delete the check and pull the code in one tab.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Roopa Prabhu <roopa@nvidia.com>
Link: https://lore.kernel.org/r/20220307125735.GC16710@kili
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-36610
Conflicts:
- adjusted due to already applied commits 625788b58445 ("net: add
per-cpu storage and net->core_stats") and 3d391f6518fd ("tun: vxlan:
Use netif_rx().")
commit 4095e0e1328a3cd9e3b30174d6cb0edb3824256d
Author: Nikolay Aleksandrov <nikolay@nvidia.com>
Date: Tue Mar 1 05:04:38 2022 +0000
drivers: vxlan: vnifilter: per vni stats
Add per-vni statistics for vni filter mode. Counting Rx/Tx
bytes/packets/drops/errors at the appropriate places.
This patch changes vxlan_vs_find_vni to also return the
vxlan_vni_node in cases where the vni belongs to a vni
filtering vxlan device
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Roopa Prabhu <roopa@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-36610
commit f9c4bb0b245cee35ef66f75bf409c9573d934cf9
Author: Roopa Prabhu <roopa@nvidia.com>
Date: Tue Mar 1 05:04:36 2022 +0000
vxlan: vni filtering support on collect metadata device
This patch adds vnifiltering support to collect metadata device.
Motivation:
You can only use a single vxlan collect metadata device for a given
vxlan udp port in the system today. The vxlan collect metadata device
terminates all received vxlan packets. As shown in the below diagram,
there are use-cases where you need to support multiple such vxlan devices in
independent bridge domains. Each vxlan device must terminate the vni's
it is configured for.
Example usecase: In a service provider network a service provider
typically supports multiple bridge domains with overlapping vlans.
One bridge domain per customer. Vlans in each bridge domain are
mapped to globally unique vxlan ranges assigned to each customer.
vnifiltering support in collect metadata devices terminates only configured
vnis. This is similar to vlan filtering in bridge driver. The vni filtering
capability is provided by a new flag on collect metadata device.
In the below pic:
- customer1 is mapped to br1 bridge domain
- customer2 is mapped to br2 bridge domain
- customer1 vlan 10-11 is mapped to vni 1001-1002
- customer2 vlan 10-11 is mapped to vni 2001-2002
- br1 and br2 are vlan filtering bridges
- vxlan1 and vxlan2 are collect metadata devices with
vnifiltering enabled
┌──────────────────────────────────────────────────────────────────┐
│ switch │
│ │
│ ┌───────────┐ ┌───────────┐ │
│ │ │ │ │ │
│ │ br1 │ │ br2 │ │
│ └┬─────────┬┘ └──┬───────┬┘ │
│ vlans│ │ vlans │ │ │
│ 10,11│ │ 10,11│ │ │
│ │ vlanvnimap: │ vlanvnimap: │
│ │ 10-1001,11-1002 │ 10-2001,11-2002 │
│ │ │ │ │ │
│ ┌──────┴┐ ┌──┴─────────┐ ┌───┴────┐ │ │
│ │ swp1 │ │vxlan1 │ │ swp2 │ ┌┴─────────────┐ │
│ │ │ │ vnifilter:│ │ │ │vxlan2 │ │
│ └───┬───┘ │ 1001,1002│ └───┬────┘ │ vnifilter: │ │
│ │ └────────────┘ │ │ 2001,2002 │ │
│ │ │ └──────────────┘ │
│ │ │ │
└───────┼──────────────────────────────────┼───────────────────────┘
│ │
│ │
┌─────┴───────┐ │
│ customer1 │ ┌─────┴──────┐
│ host/VM │ │customer2 │
└─────────────┘ │ host/VM │
└────────────┘
With this implementation, vxlan dst metadata device can
be associated with range of vnis.
struct vxlan_vni_node is introduced to represent
a configured vni. We start with vni and its
associated remote_ip in this structure. This
structure can be extended to bring in other
per vni attributes if there are usecases for it.
A vni inherits an attribute from the base vxlan device
if there is no per vni attributes defined.
struct vxlan_dev gets a new rhashtable for
vnis called vxlan_vni_group. vxlan_vnifilter.c
implements the necessary netlink api, notifications
and helper functions to process and manage lifecycle
of vxlan_vni_node.
This patch also adds new helper functions in vxlan_multicast.c
to handle per vni remote_ip multicast groups which are part
of vxlan_vni_group.
Fix build problems:
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Roopa Prabhu <roopa@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>