Commit Graph

71 Commits

Author SHA1 Message Date
CKI Backport Bot ffa18c8ddf vxlan: check vxlan_vnigroup_init() return value
JIRA: https://issues.redhat.com/browse/RHEL-81516
CVE: CVE-2025-21790

commit 5805402dcc56241987bca674a1b4da79a249bab7
Author: Eric Dumazet <edumazet@google.com>
Date:   Mon Feb 10 10:52:42 2025 +0000

    vxlan: check vxlan_vnigroup_init() return value

    vxlan_init() must check vxlan_vnigroup_init() success
    otherwise a crash happens later, spotted by syzbot.

    Oops: general protection fault, probably for non-canonical address 0xdffffc000000002c: 0000 [#1] PREEMPT SMP KASAN NOPTI
    KASAN: null-ptr-deref in range [0x0000000000000160-0x0000000000000167]
    CPU: 0 UID: 0 PID: 7313 Comm: syz-executor147 Not tainted 6.14.0-rc1-syzkaller-00276-g69b54314c975 #0
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
     RIP: 0010:vxlan_vnigroup_uninit+0x89/0x500 drivers/net/vxlan/vxlan_vnifilter.c:912
    Code: 00 48 8b 44 24 08 4c 8b b0 98 41 00 00 49 8d 86 60 01 00 00 48 89 c2 48 89 44 24 10 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 4d 04 00 00 49 8b 86 60 01 00 00 48 ba 00 00 00
    RSP: 0018:ffffc9000cc1eea8 EFLAGS: 00010202
    RAX: dffffc0000000000 RBX: 0000000000000001 RCX: ffffffff8672effb
    RDX: 000000000000002c RSI: ffffffff8672ecb9 RDI: ffff8880461b4f18
    RBP: ffff8880461b4ef4 R08: 0000000000000001 R09: 0000000000000000
    R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000020000
    R13: ffff8880461b0d80 R14: 0000000000000000 R15: dffffc0000000000
    FS:  00007fecfa95d6c0(0000) GS:ffff88806a600000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fecfa95cfb8 CR3: 000000004472c000 CR4: 0000000000352ef0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     <TASK>
      vxlan_uninit+0x1ab/0x200 drivers/net/vxlan/vxlan_core.c:2942
      unregister_netdevice_many_notify+0x12d6/0x1f30 net/core/dev.c:11824
      unregister_netdevice_many net/core/dev.c:11866 [inline]
      unregister_netdevice_queue+0x307/0x3f0 net/core/dev.c:11736
      register_netdevice+0x1829/0x1eb0 net/core/dev.c:10901
      __vxlan_dev_create+0x7c6/0xa30 drivers/net/vxlan/vxlan_core.c:3981
      vxlan_newlink+0xd1/0x130 drivers/net/vxlan/vxlan_core.c:4407
      rtnl_newlink_create net/core/rtnetlink.c:3795 [inline]
      __rtnl_newlink net/core/rtnetlink.c:3906 [inline]

    Fixes: f9c4bb0b245c ("vxlan: vni filtering support on collect metadata device")
    Reported-by: syzbot+6a9624592218c2c5e7aa@syzkaller.appspotmail.com
    Closes: https://lore.kernel.org/netdev/67a9d9b4.050a0220.110943.002d.GAE@google.com/T/#u
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: Roopa Prabhu <roopa@nvidia.com>
    Reviewed-by: Ido Schimmel <idosch@nvidia.com>
    Link: https://patch.msgid.link/20250210105242.883482-1-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>
2025-02-27 22:52:42 +00:00
Rado Vrbovsky 65ee7b65eb Merge: net: visibility patches for 9.6
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5833

JIRA: https://issues.redhat.com/browse/RHEL-68063

Signed-off-by: Antoine Tenart <atenart@redhat.com>

Approved-by: Guillaume Nault <gnault@redhat.com>
Approved-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2025-01-06 08:26:06 +00:00
Petr Oros 5f99ca47f6 rtnetlink: allow rtnl_fill_link_netnsid() to run under RCU protection
JIRA: https://issues.redhat.com/browse/RHEL-57756

Upstream commit(s):
commit 9cf621bd5fcbeadc2804951d13d487e22e95b363
Author: Eric Dumazet <edumazet@google.com>
Date:   Fri May 3 19:20:59 2024 +0000

    rtnetlink: allow rtnl_fill_link_netnsid() to run under RCU protection

    We want to be able to run rtnl_fill_ifinfo() under RCU protection
    instead of RTNL in the future.

    All rtnl_link_ops->get_link_net() methods already using dev_net()
    are ready. I added READ_ONCE() annotations on others.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-12-10 10:37:54 +01:00
Antoine Tenart 2bca16cb90 net: vxlan: replace VXLAN_INVALID_HDR with VNI_NOT_FOUND
JIRA: https://issues.redhat.com/browse/RHEL-68063
Upstream Status: net-next.git

commit eb4f99c56ad30cb0f8c8e93a78b1200f5987e41e
Author: Menglong Dong <menglong8.dong@gmail.com>
Date:   Tue Oct 15 16:28:30 2024 +0800

    net: vxlan: replace VXLAN_INVALID_HDR with VNI_NOT_FOUND

    Replace the drop reason "SKB_DROP_REASON_VXLAN_INVALID_HDR" with
    "SKB_DROP_REASON_VXLAN_VNI_NOT_FOUND" in encap_bypass_if_local(), as the
    latter is more accurate.

    Fixes: 790961d88b0e ("net: vxlan: use kfree_skb_reason() in encap_bypass_if_local()")
    Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
    Reviewed-by: Ido Schimmel <idosch@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2024-11-19 15:34:04 +01:00
Antoine Tenart 3217b16116 net: vxlan: use kfree_skb_reason() in encap_bypass_if_local()
JIRA: https://issues.redhat.com/browse/RHEL-68063
Upstream Status: net-next.git

commit 790961d88b0e63d993e112b747746dfd94a7c823
Author: Menglong Dong <menglong8.dong@gmail.com>
Date:   Wed Oct 9 10:28:30 2024 +0800

    net: vxlan: use kfree_skb_reason() in encap_bypass_if_local()

    Replace kfree_skb() with kfree_skb_reason() in encap_bypass_if_local, and
    no new skb drop reason is added in this commit.

    Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2024-11-19 15:33:24 +01:00
Antoine Tenart f94a63199b net: vxlan: use kfree_skb_reason() in vxlan_encap_bypass()
JIRA: https://issues.redhat.com/browse/RHEL-68063
Upstream Status: net-next.git

commit c106479b612d34739c9337a18ce5332ca613f993
Author: Menglong Dong <menglong8.dong@gmail.com>
Date:   Wed Oct 9 10:28:29 2024 +0800

    net: vxlan: use kfree_skb_reason() in vxlan_encap_bypass()

    Replace kfree_skb with kfree_skb_reason in vxlan_encap_bypass, and no new
    skb drop reason is added in this commit.

    Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2024-11-19 15:32:16 +01:00
Antoine Tenart ea152001e0 net: vxlan: add drop reasons support to vxlan_xmit_one()
JIRA: https://issues.redhat.com/browse/RHEL-68063
Upstream Status: net-next.git

commit e7c700aaa67a59c28da07072fbaae207b5f27519
Author: Menglong Dong <menglong8.dong@gmail.com>
Date:   Wed Oct 9 10:28:27 2024 +0800

    net: vxlan: add drop reasons support to vxlan_xmit_one()

    Replace kfree_skb/dev_kfree_skb with kfree_skb_reason in vxlan_xmit_one.
    No drop reasons are introduced in this commit.

    The only concern of mine is replacing dev_kfree_skb with
    kfree_skb_reason. The dev_kfree_skb is equal to consume_skb, and I'm not
    sure if we can change it to kfree_skb here. In my option, the skb is
    "dropped" here, isn't it?

    Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2024-11-19 15:28:21 +01:00
Antoine Tenart 06bc7fdfb3 net: vxlan: use kfree_skb_reason() in vxlan_xmit()
JIRA: https://issues.redhat.com/browse/RHEL-68063
Upstream Status: net-next.git

commit b71a576e452b800efeac49ecca116d954601d911
Author: Menglong Dong <menglong8.dong@gmail.com>
Date:   Wed Oct 9 10:28:26 2024 +0800

    net: vxlan: use kfree_skb_reason() in vxlan_xmit()

    Replace kfree_skb() with kfree_skb_reason() in vxlan_xmit(). Following
    new skb drop reasons are introduced for vxlan:

    /* no remote found for xmit */
    SKB_DROP_REASON_VXLAN_NO_REMOTE
    /* packet without necessary metadata reached a device which is
     * in "external" mode
     */
    SKB_DROP_REASON_TUNNEL_TXINFO

    Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2024-11-19 15:25:00 +01:00
Antoine Tenart c9f96a26fc net: vxlan: make vxlan_set_mac() return drop reasons
JIRA: https://issues.redhat.com/browse/RHEL-68063
Upstream Status: net-next.git

commit d209706f562ee4fa81bdf24cf6b679c3222aa06c
Author: Menglong Dong <menglong8.dong@gmail.com>
Date:   Wed Oct 9 10:28:25 2024 +0800

    net: vxlan: make vxlan_set_mac() return drop reasons

    Change the return type of vxlan_set_mac() from bool to enum
    skb_drop_reason. In this commit, the drop reason
    "SKB_DROP_REASON_LOCAL_MAC" is introduced for the case that the source
    mac of the packet is a local mac.

    Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2024-11-19 15:23:15 +01:00
Antoine Tenart 1f10cc7533 net: vxlan: make vxlan_snoop() return drop reasons
JIRA: https://issues.redhat.com/browse/RHEL-68063
Upstream Status: net-next.git
Conflicts:\
- One chunk missing due to upstream commit f58f45c1e5b9 ("vxlan: drop
  packets from invalid src-address") not in c9s.

commit 289fd4e75219a96f77c5d679166035cd5118d139
Author: Menglong Dong <menglong8.dong@gmail.com>
Date:   Wed Oct 9 10:28:24 2024 +0800

    net: vxlan: make vxlan_snoop() return drop reasons

    Change the return type of vxlan_snoop() from bool to enum
    skb_drop_reason. In this commit, two drop reasons are introduced:

      SKB_DROP_REASON_MAC_INVALID_SOURCE
      SKB_DROP_REASON_VXLAN_ENTRY_EXISTS

    Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2024-11-19 15:11:58 +01:00
Antoine Tenart 932e782b03 net: vxlan: make vxlan_remcsum() return drop reasons
JIRA: https://issues.redhat.com/browse/RHEL-68063
Upstream Status: net-next.git

commit 7b3e018f3eefc6144638800d9f92b3a5e120c537
Author: Menglong Dong <menglong8.dong@gmail.com>
Date:   Wed Oct 9 10:28:23 2024 +0800

    net: vxlan: make vxlan_remcsum() return drop reasons

    Make vxlan_remcsum() support skb drop reasons by changing the return
    value type of it from bool to enum skb_drop_reason.

    The only drop reason in vxlan_remcsum() comes from pskb_may_pull_reason(),
    so we just return it.

    Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2024-11-19 15:04:49 +01:00
Antoine Tenart b0c107198c net: vxlan: add skb drop reasons to vxlan_rcv()
JIRA: https://issues.redhat.com/browse/RHEL-68063
Upstream Status: net-next.git

commit 4c06d9daf8e6215447ca8a2ddd59fa09862c9bae
Author: Menglong Dong <menglong8.dong@gmail.com>
Date:   Wed Oct 9 10:28:22 2024 +0800

    net: vxlan: add skb drop reasons to vxlan_rcv()

    Introduce skb drop reasons to the function vxlan_rcv(). Following new
    drop reasons are added:

      SKB_DROP_REASON_VXLAN_INVALID_HDR
      SKB_DROP_REASON_VXLAN_VNI_NOT_FOUND
      SKB_DROP_REASON_IP_TUNNEL_ECN

    Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2024-11-19 14:45:44 +01:00
Antoine Tenart 973d441bf5 net: tunnel: make skb_vlan_inet_prepare() return drop reasons
JIRA: https://issues.redhat.com/browse/RHEL-68063
Upstream Status: net-next.git
Conflicts:\
- Context difference (in changed lines) because of c9s-only
  modifications introduced by c9s commit b3455ab162 ("geneve: Fix
  incorrect inner network header offset when innerprotoinherit is set"(.

commit 9990ddf47d4168088e2246c3d418bf526e40830d
Author: Menglong Dong <menglong8.dong@gmail.com>
Date:   Wed Oct 9 10:28:21 2024 +0800

    net: tunnel: make skb_vlan_inet_prepare() return drop reasons

    Make skb_vlan_inet_prepare return the skb drop reasons, which is just
    what pskb_may_pull_reason() returns. Meanwhile, adjust all the call of
    it.

    Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2024-11-19 14:42:14 +01:00
Michal Schmidt 555cb3d84d netdev_features: convert NETIF_F_LLTX to dev->lltx
JIRA: https://issues.redhat.com/browse/RHEL-59091

commit 00d066a4d4edbe559ba6c35153da71d4b2b8a383
Author: Alexander Lobakin <aleksander.lobakin@intel.com>
Date:   Thu Aug 29 14:33:37 2024 +0200

    netdev_features: convert NETIF_F_LLTX to dev->lltx

    NETIF_F_LLTX can't be changed via Ethtool and is not a feature,
    rather an attribute, very similar to IFF_NO_QUEUE (and hot).
    Free one netdev_features_t bit and make it a "hot" private flag.

    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Conflicts:
	drivers/net/macsec.c
	drivers/net/veth.c
	net/ipv6/ip6_tunnel.c
	- Context.

	drivers/net/amt.c
	drivers/net/netkit.c
	- Non-existent in RHEL 9.

	drivers/net/ethernet/chelsio/cxgb/cxgb2.c
	drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
	- Drivers disabled in RHEL 9. Skipped.

	net/dsa/user.c
	- This is slave.c in RHEL 9, but CONFIG_NET_DSA is disabled,
	  so skipped the hunk.

	net/core/net-sysfs.c
	- Code not present because of missing commit 74293ea1c4db
	  ("net: sysfs: Do not create sysfs for non BQL device")

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
2024-10-03 17:59:44 +02:00
Michal Schmidt 12a989692f netdevice: convert private flags > BIT(31) to bitfields
JIRA: https://issues.redhat.com/browse/RHEL-59091

commit beb5a9bea8239cdf4adf6b62672e30db3e9fa5ce
Author: Alexander Lobakin <aleksander.lobakin@intel.com>
Date:   Thu Aug 29 14:33:36 2024 +0200

    netdevice: convert private flags > BIT(31) to bitfields

    Make dev->priv_flags `u32` back and define bits higher than 31 as
    bitfield booleans as per Jakub's suggestion. This simplifies code
    which accesses these bits with no optimization loss (testb both
    before/after), allows to not extend &netdev_priv_flags each time,
    but also scales better as bits > 63 in the future would only add
    a new u64 to the structure with no complications, comparing to
    that extending ::priv_flags would require converting it to a bitmap.
    Note that I picked `unsigned long :1` to not lose any potential
    optimizations comparing to `bool :1` etc.

    Suggested-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Conflicts:
	drivers/net/ethernet/microchip/lan966x/lan966x_main.c
	- Driver not present in RHEL 9.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
2024-10-03 17:59:39 +02:00
Lucas Zampieri f0fafe0f6d Merge: vxlan: Ensure headers are in skb's linear data part before accessing them.
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4163

JIRA: https://issues.redhat.com/browse/RHEL-35248
Upstream Status: net.git

Signed-off-by: Guillaume Nault <gnault@redhat.com>

Approved-by: Sabrina Dubroca <sdubroca@redhat.com>
Approved-by: Ivan Vecera <ivecera@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Lucas Zampieri <lzampier@redhat.com>
2024-07-15 13:28:25 +00:00
Lucas Zampieri f943c5e738 Merge: net: ease rtnl contention in cleanup paths
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4360

JIRA: https://issues.redhat.com/browse/RHEL-29681  
  
Signed-off-by: Antoine Tenart <atenart@redhat.com>

Approved-by: Florian Westphal <fwestpha@redhat.com>
Approved-by: Hangbin Liu <haliu@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Lucas Zampieri <lzampier@redhat.com>
2024-07-04 12:25:29 +00:00
Guillaume Nault 529660f666 vxlan: Pull inner IP header in vxlan_xmit_one().
JIRA: https://issues.redhat.com/browse/RHEL-35248
Upstream Status: linux.git

commit 31392048f55f98cb01ca709d32d06d926ab9760a
Author: Guillaume Nault <gnault@redhat.com>
Date:   Wed Jun 19 15:34:57 2024 +0200

    vxlan: Pull inner IP header in vxlan_xmit_one().

    Ensure the inner IP header is part of the skb's linear data before
    setting old_iph. Otherwise, on a non-linear skb, old_iph could point
    outside of the packet data.

    Unlike classical VXLAN, which always encapsulates Ethernet packets,
    VXLAN-GPE can transport IP packets directly. In that case, we need to
    look at skb->protocol to figure out if an Ethernet header is present.

    Fixes: d342894c5d ("vxlan: virtual extensible lan")
    Signed-off-by: Guillaume Nault <gnault@redhat.com>
    Link: https://patch.msgid.link/2aa75f6fa62ac9dbe4f16ad5ba75dd04a51d4b99.1718804000.git.gnault@redhat.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Guillaume Nault <gnault@redhat.com>
2024-07-02 13:02:49 +02:00
Guillaume Nault 21ade3f50b vxlan: Pull inner IP header in vxlan_rcv().
JIRA: https://issues.redhat.com/browse/RHEL-35248
Upstream Status: net.git
Conflicts:
  * Missing upstream commit 4095e0e1328a ("drivers: vxlan: vnifilter:
    per vni stats"):
    Drop the vxlan_vnifilter_count() as Centos Stream 9 doesn't have
    this feature.

  * (context) Missing upstream commit 6dee402daba4 ("vxlan: Fix racy
    device stats updates."):
    Centos Stream 9 still updates statistics non-atomically (that is,
    not using DEV_STATS_INC()). Keep the new DEV_STATS_INC() calls of
    the original patch though, since this macro is defined in Centos
    Stream 9 and that's the proper way for updating these counters.

commit f7789419137b18e3847d0cc41afd788c3c00663d
Author: Guillaume Nault <gnault@redhat.com>
Date:   Tue Apr 30 18:50:13 2024 +0200

    vxlan: Pull inner IP header in vxlan_rcv().

    Ensure the inner IP header is part of skb's linear data before reading
    its ECN bits. Otherwise we might read garbage.
    One symptom is the system erroneously logging errors like
    "vxlan: non-ECT from xxx.xxx.xxx.xxx with TOS=xxxx".

    Similar bugs have been fixed in geneve, ip_tunnel and ip6_tunnel (see
    commit 1ca1ba465e55 ("geneve: make sure to pull inner header in
    geneve_rx()") for example). So let's reuse the same code structure for
    consistency. Maybe we'll can add a common helper in the future.

    Fixes: d342894c5d ("vxlan: virtual extensible lan")
    Signed-off-by: Guillaume Nault <gnault@redhat.com>
    Reviewed-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
    Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
    Link: https://lore.kernel.org/r/1239c8db54efec341dd6455c77e0380f58923a3c.1714495737.git.gnault@redhat.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Guillaume Nault <gnault@redhat.com>
2024-07-02 11:23:14 +02:00
Lucas Zampieri df4c7fc3d2 Merge: CNB95: convert tunnel metadata flags
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4465

JIRA: https://issues.redhat.com/browse/RHEL-40130  
Tested: Using routing and tunneling self-tests  
Depends: !4435 

Commits:
```
537dd2d9fb9f ("net: Add helper function to parse netlink msg of ip_tunnel_encap")
b86fca800a6a ("net: Add helper function to parse netlink msg of ip_tunnel_parm")
63c15822b8dd ("lib/bitmap: add bitmap_{read,write}()")
117aef12a7b1 ("ip_tunnel: use a separate struct to store tunnel params in the kernel")
020e8f60aa8b ("ip_gre: Make GRE and GRETAP devices always NETIF_F_LLTX")
b11ebf2ca2c1 ("ip6_gre: Make IP6GRE and IP6GRETAP devices always NETIF_F_LLTX")
45490ce2ff83 ("nfp: flower: add support for tunnel offload without key ID")
bf3fcbf7e7a0 ("ipv4: rename and move ip_route_output_tunnel()")
78f3655adcb5 ("ipv4: remove "proto" argument from udp_tunnel_dst_lookup()")
72fc68c6356b ("ipv4: add new arguments to udp_tunnel_dst_lookup()")
3ae983a603a4 ("ipv4: use tunnel flow flags for tunnel route lookups")
60a77d11cd5d ("geneve: add dsfield helper function")
daa2ba7ed1d1 ("geneve: use generic function for tunnel IPv4 route lookup")
6f19b2c136d9 ("vxlan: use generic function for tunnel IPv4 route lookup")
fc47e86dbfb7 ("ipv6: rename and move ip6_dst_lookup_tunnel()")
7e937dcf96d0 ("ipv6: remove "proto" argument from udp_tunnel6_dst_lookup()")
946fcfdbc5b9 ("ipv6: add new arguments to udp_tunnel6_dst_lookup()")
69d72587c17b ("geneve: use generic function for tunnel IPv6 route lookup")
f25e621f5d4c ("ipv6: mark address parameters of udp_tunnel6_xmit_skb() as const")
2aceb896ee18 ("vxlan: use generic function for tunnel IPv6 route lookup")
3e7e5baaaba7 ("bitmap: don't assume compiler evaluates small mem*() builtins calls")
c1023f5634b9 ("s390/cio: rename bitmap_size() -> idset_bitmap_size()")
10a04ff09bcc ("tools: move alignment-related macros to new <linux/align.h>")
a37fbe666c01 ("bitmap: introduce generic optimized bitmap_size()")
5832c4a77d69 ("ip_tunnel: convert __be16 tunnel flags to bitmaps")
5a66cda52d7d ("ip_tunnel: harden copying IP tunnel params to userspace")
```

Signed-off-by: Ivan Vecera <ivecera@redhat.com>

Approved-by: Petr Oros <poros@redhat.com>
Approved-by: José Ignacio Tornos Martínez <jtornosm@redhat.com>
Approved-by: Davide Caratti <dcaratti@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Lucas Zampieri <lzampier@redhat.com>
2024-07-01 12:48:47 +00:00
Lucas Zampieri a14ac2400e Merge: CNB95: net: annotate writes on dev->mtu from ndo_change_mtu()
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4387

JIRA: https://issues.redhat.com/browse/RHEL-39583  
Tested: Just built... no way to test  

Commit(s):
```
1eb2cded45b3 ("net: annotate writes on dev->mtu from ndo_change_mtu()")
```

Signed-off-by: Ivan Vecera <ivecera@redhat.com>

Approved-by: Tony Camuso <tcamuso@redhat.com>
Approved-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Approved-by: Antoine Tenart <atenart@redhat.com>
Approved-by: José Ignacio Tornos Martínez <jtornosm@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Lucas Zampieri <lzampier@redhat.com>
2024-06-19 18:24:31 +00:00
Ivan Vecera a4a12f7632 ip_tunnel: convert __be16 tunnel flags to bitmaps
JIRA: https://issues.redhat.com/browse/RHEL-40130

Conflicts:
- hunk for non-existing net/ipv4/fou_bpf.c skipped
- conflict in ip_gre.c resolved in the same way as upstream merge
  commit cf1ca1f66d30 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net") did
- simple context conflict ip_tunnel.c due to missing commit
  c4794d22251b9 ("ipv4: tunnels: use DEV_STATS_INC()")
- simple context conflict in ip6_gre.c and ip6_tunnel.c due to missing
  commit 2fad1ba354d4a ("ipv6: tunnels: use DEV_STATS_INC()")
- simple conflict in nft_tunnel.c due to missing ffb3d9a30cc67 ("netfilter:
  nf_tables: use correct integer types")

commit 5832c4a77d6931cebf9ba737129ae8f14b66ee1d
Author: Alexander Lobakin <aleksander.lobakin@intel.com>
Date:   Wed Mar 27 16:23:53 2024 +0100

    ip_tunnel: convert __be16 tunnel flags to bitmaps

    Historically, tunnel flags like TUNNEL_CSUM or TUNNEL_ERSPAN_OPT
    have been defined as __be16. Now all of those 16 bits are occupied
    and there's no more free space for new flags.
    It can't be simply switched to a bigger container with no
    adjustments to the values, since it's an explicit Endian storage,
    and on LE systems (__be16)0x0001 equals to
    (__be64)0x0001000000000000.
    We could probably define new 64-bit flags depending on the
    Endianness, i.e. (__be64)0x0001 on BE and (__be64)0x00010000... on
    LE, but that would introduce an Endianness dependency and spawn a
    ton of Sparse warnings. To mitigate them, all of those places which
    were adjusted with this change would be touched anyway, so why not
    define stuff properly if there's no choice.

    Define IP_TUNNEL_*_BIT counterparts as a bit number instead of the
    value already coded and a fistful of <16 <-> bitmap> converters and
    helpers. The two flags which have a different bit position are
    SIT_ISATAP_BIT and VTI_ISVTI_BIT, as they were defined not as
    __cpu_to_be16(), but as (__force __be16), i.e. had different
    positions on LE and BE. Now they both have strongly defined places.
    Change all __be16 fields which were used to store those flags, to
    IP_TUNNEL_DECLARE_FLAGS() -> DECLARE_BITMAP(__IP_TUNNEL_FLAG_NUM) ->
    unsigned long[1] for now, and replace all TUNNEL_* occurrences to
    their bitmap counterparts. Use the converters in the places which talk
    to the userspace, hardware (NFP) or other hosts (GRE header). The rest
    must explicitly use the new flags only. This must be done at once,
    otherwise there will be too many conversions throughout the code in
    the intermediate commits.
    Finally, disable the old __be16 flags for use in the kernel code
    (except for the two 'irregular' flags mentioned above), to prevent
    any accidental (mis)use of them. For the userspace, nothing is
    changed, only additions were made.

    Most noticeable bloat-o-meter difference (.text):

    vmlinux:        307/-1 (306)
    gre.ko:         62/0 (62)
    ip_gre.ko:      941/-217 (724)  [*]
    ip_tunnel.ko:   390/-900 (-510) [**]
    ip_vti.ko:      138/0 (138)
    ip6_gre.ko:     534/-18 (516)   [*]
    ip6_tunnel.ko:  118/-10 (108)

    [*] gre_flags_to_tnl_flags() grew, but still is inlined
    [**] ip_tunnel_find() got uninlined, hence such decrease

    The average code size increase in non-extreme case is 100-200 bytes
    per module, mostly due to sizeof(long) > sizeof(__be16), as
    %__IP_TUNNEL_FLAG_NUM is less than %BITS_PER_LONG and the compilers
    are able to expand the majority of bitmap_*() calls here into direct
    operations on scalars.

    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-06-12 14:49:18 +02:00
Ivan Vecera e46fe31763 vxlan: use generic function for tunnel IPv6 route lookup
JIRA: https://issues.redhat.com/browse/RHEL-40130

commit 2aceb896ee18ae35b21b14c978d8c2ef8c7b439d
Author: Beniamino Galvani <b.galvani@gmail.com>
Date:   Fri Oct 20 13:55:29 2023 +0200

    vxlan: use generic function for tunnel IPv6 route lookup

    The route lookup can be done now via generic function
    udp_tunnel6_dst_lookup() to replace the custom implementation in
    vxlan6_get_route().

    This is similar to what already done for IPv4 in commit 6f19b2c136d9
    ("vxlan: use generic function for tunnel IPv4 route lookup").

    Suggested-by: Guillaume Nault <gnault@redhat.com>
    Signed-off-by: Beniamino Galvani <b.galvani@gmail.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-06-11 11:22:58 +02:00
Ivan Vecera 8ae3d4002b vxlan: use generic function for tunnel IPv4 route lookup
JIRA: https://issues.redhat.com/browse/RHEL-40130

commit 6f19b2c136d98a84d79030b53e23d405edfdc783
Author: Beniamino Galvani <b.galvani@gmail.com>
Date:   Mon Oct 16 09:15:26 2023 +0200

    vxlan: use generic function for tunnel IPv4 route lookup

    The route lookup can be done now via generic function
    udp_tunnel_dst_lookup() to replace the custom implementations in
    vxlan_get_route().

    Note that this patch only touches IPv4, while IPv6 still uses
    vxlan6_get_route(). After IPv6 route lookup gets converted as well,
    vxlan_xmit_one() can be simplified by removing local variables that
    will be passed via "struct ip_tunnel_key", such as remote_ip,
    local_ip, flow_flags, label.

    Suggested-by: Guillaume Nault <gnault@redhat.com>
    Signed-off-by: Beniamino Galvani <b.galvani@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-06-11 11:22:47 +02:00
Lucas Zampieri 1cc33b9d3b Merge: CNB95: bridge: update bridge core to upstream v6.8
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4261

JIRA: https://issues.redhat.com/browse/RHEL-36219  
Depends: !4249  
Tested: using existing bridge self-tests  

Commits:
```
29cfb2aaa442 ("bridge: Add backup nexthop ID support")
b408453053fb ("selftests: net: Add bridge backup port and backup nexthop ID test")
cbf51acbc5d5 ("net: bridge: Set BR_FDB_ADDED_BY_USER early in fdb_add_entry")
bdb4dfda3b41 ("net: bridge: Track and limit dynamically learned FDB entries")
ddd1ad68826d ("net: bridge: Add netlink knobs for number / max learned FDB entries")
19297c3ab23c ("net: bridge: Set strict_start_type for br_policy")
6f84090333bb ("selftests: forwarding: bridge_fdb_learning_limit: Add a new selftest")
ee6f05dcd672 ("br_netfilter: use single forward hook for ip and arp")
b9109b5b77f0 ("bridge: mcast: Dump MDB entries even when snooping is disabled")
1b6d993509c1 ("bridge: mcast: Account for missing attributes")
62ef9cba98a2 ("bridge: mcast: Factor out a helper for PG entry size calculation")
6d0259dd6c53 ("bridge: mcast: Rename MDB entry get function")
ff97d2a956a1 ("vxlan: mdb: Adjust function arguments")
14c32a46d992 ("vxlan: mdb: Factor out a helper for remote entry size calculation")
68b380a395a7 ("bridge: mcast: Add MDB get support")
32d9673e96dc ("vxlan: mdb: Add MDB get support")
ddd17a54e692 ("rtnetlink: Add MDB get support")
e8bba9e83c88 ("selftests: bridge_mdb: Use MDB get instead of dump")
0514dd05939a ("selftests: vxlan_mdb: Use MDB get instead of dump")
6808918343a8 ("net: bridge: fill in MODULE_DESCRIPTION()")
e8a4195d843f ("docs: bridge: update doc format to rst")
8ebe06611666 ("net: bridge: add document for IFLA_BR enum")
8c4bafdb01cc ("net: bridge: add document for IFLA_BRPORT enum")
bcc1f84e4d34 ("docs: bridge: Add kAPI/uAPI fields")
567d2608209f ("docs: bridge: add STP doc")
041a6ac4bf79 ("docs: bridge: add VLAN doc")
75ceac88efb8 ("docs: bridge: add multicast doc")
3c37f17d6ca9 ("docs: bridge: add switchdev doc")
1b1a4c7e82ae ("docs: bridge: add netfilter doc")
d2afc2cd7f1f ("docs: bridge: add other features")
25ae948b4478 ("selftests/net: add lib.sh")
4624a78c18c6 ("selftests/net: convert test_bridge_backup_port.sh to run it in unique namespace")
312abe3d93a3 ("selftests/net: convert test_bridge_neigh_suppress.sh to run it in unique namespace")
e37a11fca418 ("bridge: add MDB state mask uAPI attribute")
a6acb535afb2 ("bridge: mdb: Add MDB bulk deletion support")
4cde72fead4c ("vxlan: mdb: Add MDB bulk deletion support")
bd2dcb94c81e ("selftests: bridge_mdb: Add MDB bulk deletion test")
c3e87a7fcd0b ("selftests: vxlan_mdb: Add MDB bulk deletion test")
c2b2ee36250d ("bridge: cfm: fix enum typo in br_cc_ccm_tx_parse")
2114e83381d3 ("selftests: forwarding: Avoid failures to source net/lib.sh")
49078c1b80b6 ("selftests: forwarding: Remove executable bits from lib.sh")
fc836129f708 ("selftests/net/lib: update busywait timeout value")
f5c3eb4b7251 ("bridge: mcast: fix disabled snooping after long uptime")
b40f873a7c80 ("selftests: net: Add missing matchall classifier")
96cd5ac4c0e6 ("selftests: forwarding: List helper scripts in TEST_FILES Makefile variable")
38ee0cb2a2e2 ("selftests: net: Fix bridge backup port test flakiness")
93590849a05e ("selftests: forwarding: Fix layer 2 miss test flakiness")
7399e2ce4d42 ("selftests: forwarding: Fix bridge MDB test flakiness")
dd6b34589441 ("selftests: forwarding: Suppress grep warnings")
f97f1fcc9690 ("selftests: forwarding: Fix bridge locked port test flakiness")
dc489f86257c ("net: bridge: switchdev: Skip MDB replays of deferred events on offload")
f7a70d650b0b ("net: bridge: switchdev: Ensure deferred event delivery on unoffload")
9adcac650618 ("netlink: specs: Add missing bridge linkinfo attrs")
83e93942796d ("selftests/net/lib: no need to record ns name if it already exist")
```

Signed-off-by: Ivan Vecera <ivecera@redhat.com>

Approved-by: Hangbin Liu <haliu@redhat.com>
Approved-by: Kamal Heib <kheib@redhat.com>
Approved-by: José Ignacio Tornos Martínez <jtornosm@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Lucas Zampieri <lzampier@redhat.com>
2024-06-10 13:42:40 +00:00
Ivan Vecera 24ef7349da net: annotate writes on dev->mtu from ndo_change_mtu()
JIRA: https://issues.redhat.com/browse/RHEL-39583

Conflicts:
- hunks for non-existing files and non-applicable hunks for unsupported
  drivers, batman-adv and DSA were skipped

commit 1eb2cded45b35816085c1f962933c187d970f9dc
Author: Eric Dumazet <edumazet@google.com>
Date:   Mon May 6 10:28:12 2024 +0000

    net: annotate writes on dev->mtu from ndo_change_mtu()

    Simon reported that ndo_change_mtu() methods were never
    updated to use WRITE_ONCE(dev->mtu, new_mtu) as hinted
    in commit 501a90c945 ("inet: protect against too small
    mtu values.")

    We read dev->mtu without holding RTNL in many places,
    with READ_ONCE() annotations.

    It is time to take care of ndo_change_mtu() methods
    to use corresponding WRITE_ONCE()

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reported-by: Simon Horman <horms@kernel.org>
    Closes: https://lore.kernel.org/netdev/20240505144608.GB67882@kernel.org/
    Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
    Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Acked-by: Shannon Nelson <shannon.nelson@amd.com>
    Link: https://lore.kernel.org/r/20240506102812.3025432-1-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-05-31 21:00:33 +02:00
Scott Weaver e8e3d43224 Merge: net/sched: fix false lockdep warning on qdisc root lock
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4133

JIRA: https://issues.redhat.com/browse/RHEL-6066
Upstream Status: all mainline in net-next.git
Tested: boot-tested only
Conflicts: None

Signed-off-by: Davide Caratti <dcaratti@redhat.com>

Approved-by: Xin Long <lxin@redhat.com>
Approved-by: Florian Westphal <fwestpha@redhat.com>
Approved-by: Ivan Vecera <ivecera@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Conflicts:
  - drivers/net/vxlan/vxlan_core.c: resolved conflict between
    3f27054994 ('net: add netdev_lockdep_set_classes() to virtual
    drivers') and 3158288aac ('vxlan: Fix memory leaks in error path')
    which came in with MR!4249.

Merged-by: Scott Weaver <scweaver@redhat.com>
2024-05-30 09:56:25 -04:00
Antoine Tenart b143dae11a vxlan: use exit_batch_rtnl() method
JIRA: https://issues.redhat.com/browse/RHEL-29681
Upstream Status: linux.git

commit 110d3047a3ec033de00322b1a8068b1215efa97a
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Feb 6 14:43:05 2024 +0000

    vxlan: use exit_batch_rtnl() method

    exit_batch_rtnl() is called while RTNL is held,
    and devices to be unregistered can be queued in the dev_kill_list.

    This saves one rtnl_lock()/rtnl_unlock() pair per netns
    and one unregister_netdevice_many() call.

    v4: (Paolo feedback : https://netdev-3.bots.linux.dev/vmksft-net/results/453141/17-udpgro-fwd-sh/stdout )
      - Changed vxlan_destroy_tunnels() to use vxlan_dellink()
        instead of unregister_netdevice_queue to propely remove
        devices from vn->vxlan_list.
      - vxlan_destroy_tunnels() can simply iterate one list (vn->vxlan_list)
        to find all devices in the most efficient way.
      - Moved sanity checks in a separate vxlan_exit_net() method.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Antoine Tenart <atenart@kernel.org>
    Link: https://lore.kernel.org/r/20240206144313.2050392-10-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2024-05-28 15:24:06 +02:00
Ivan Vecera 6bbcd271c4 vxlan: mdb: Add MDB bulk deletion support
JIRA: https://issues.redhat.com/browse/RHEL-36219

commit 4cde72fead4cebb5b6b2fe9425904c2064739184
Author: Ido Schimmel <idosch@nvidia.com>
Date:   Sun Dec 17 10:32:41 2023 +0200

    vxlan: mdb: Add MDB bulk deletion support

    Implement MDB bulk deletion support in the VXLAN driver, allowing MDB
    entries to be deleted in bulk according to provided parameters.

    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Petr Machata <petrm@nvidia.com>
    Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-05-17 13:49:23 +02:00
Ivan Vecera 73a27755db vxlan: mdb: Add MDB get support
JIRA: https://issues.redhat.com/browse/RHEL-36219

commit 32d9673e96dc636cbfca2381b2c93b7a15dc3369
Author: Ido Schimmel <idosch@nvidia.com>
Date:   Wed Oct 25 15:30:17 2023 +0300

    vxlan: mdb: Add MDB get support

    Implement support for MDB get operation by looking up a matching MDB
    entry, allocating the skb according to the entry's size and then filling
    in the response.

    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-05-17 13:48:16 +02:00
Ivan Vecera fd7bad51b0 vxlan: Add missing VNI filter counter update in arp_reduce().
JIRA: https://issues.redhat.com/browse/RHEL-36610

commit b22ea4ef4c3438817fcb604255b55b0058ed8c64
Author: Guillaume Nault <gnault@redhat.com>
Date:   Fri Apr 26 17:27:19 2024 +0200

    vxlan: Add missing VNI filter counter update in arp_reduce().

    VXLAN stores per-VNI statistics using vxlan_vnifilter_count().
    These statistics were not updated when arp_reduce() failed its
    pskb_may_pull() call.

    Use vxlan_vnifilter_count() to update the VNI counter when that
    happens.

    Fixes: 4095e0e1328a ("drivers: vxlan: vnifilter: per vni stats")
    Signed-off-by: Guillaume Nault <gnault@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-05-17 11:06:44 +02:00
Ivan Vecera 7d468b23b2 vxlan: Fix racy device stats updates.
JIRA: https://issues.redhat.com/browse/RHEL-36610

commit 6dee402daba4eb8677a9438ebdcd8fe90ddd4326
Author: Guillaume Nault <gnault@redhat.com>
Date:   Fri Apr 26 17:27:17 2024 +0200

    vxlan: Fix racy device stats updates.

    VXLAN devices update their stats locklessly. Therefore these counters
    should either be stored in per-cpu data structures or the updates
    should be done using atomic increments.

    Since the net_device_core_stats infrastructure is already used in
    vxlan_rcv(), use it for the other rx_dropped and tx_dropped counter
    updates. Update the other counters atomically using DEV_STATS_INC().

    Fixes: d342894c5d ("vxlan: virtual extensible lan")
    Signed-off-by: Guillaume Nault <gnault@redhat.com>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-05-17 11:06:44 +02:00
Ivan Vecera caaaeb0676 vxlan: Cleanup IFLA_VXLAN_PORT_RANGE entry in vxlan_get_size()
JIRA: https://issues.redhat.com/browse/RHEL-36610

commit 6d90b64256f39511d0083111e167321dc6a6add2
Author: Benjamin Poirier <bpoirier@nvidia.com>
Date:   Fri Oct 27 14:44:10 2023 -0400

    vxlan: Cleanup IFLA_VXLAN_PORT_RANGE entry in vxlan_get_size()

    This patch is basically a followup to commit 4e4b1798cc90 ("vxlan: Add
    missing entries to vxlan_get_size()"). All of the attributes in
    vxlan_get_size() appear in the same order that they are filled in
    vxlan_fill_info() except for IFLA_VXLAN_PORT_RANGE. For consistency, move
    that entry to match its order and add a comment, like for all other
    entries.

    Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
    Link: https://lore.kernel.org/r/20231027184410.236671-1-bpoirier@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-05-17 11:06:44 +02:00
Ivan Vecera 4167552b9a vxlan: Add missing entries to vxlan_get_size()
JIRA: https://issues.redhat.com/browse/RHEL-36610

commit 4e4b1798cc90e376b8b61d0098b4093898a32227
Author: Benjamin Poirier <bpoirier@nvidia.com>
Date:   Mon Sep 18 11:40:15 2023 -0400

    vxlan: Add missing entries to vxlan_get_size()

    There are some attributes added by vxlan_fill_info() which are not
    accounted for in vxlan_get_size(). Add them.

    I didn't find a way to trigger an actual problem from this miscalculation
    since there is usually extra space in netlink size calculations like
    if_nlmsg_size(); but maybe I just didn't search long enough.

    Fixes: 3511494ce2 ("vxlan: Group Policy extension")
    Fixes: e1e5314de0 ("vxlan: implement GPE")
    Fixes: 0ace2ca89c ("vxlan: Use checksum partial with remote checksum offload")
    Fixes: f9c4bb0b245c ("vxlan: vni filtering support on collect metadata device")
    Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
    Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
    Reviewed-by: Ido Schimmel <idosch@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-05-17 11:06:44 +02:00
Ivan Vecera 16b04b2c75 vxlan: Use helper functions to update stats
JIRA: https://issues.redhat.com/browse/RHEL-36610

commit 3c0930b491f8995de974f459648e4aad4ca996ff
Author: Li Zetao <lizetao1@huawei.com>
Date:   Thu Aug 10 16:56:42 2023 +0800

    vxlan: Use helper functions to update stats

    Use the helper functions dev_sw_netstats_rx_add() and
    dev_sw_netstats_tx_add() to update stats, which helps to
    provide code readability.

    Signed-off-by: Li Zetao <lizetao1@huawei.com>
    Reviewed-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-05-17 11:06:44 +02:00
Ivan Vecera 7de70876ff vxlan: Add support for nexthop ID metadata
JIRA: https://issues.redhat.com/browse/RHEL-36610

commit d977e1c8e3a143bceb63a0042890f4a0268a9990
Author: Ido Schimmel <idosch@nvidia.com>
Date:   Mon Jul 17 11:12:27 2023 +0300

    vxlan: Add support for nexthop ID metadata

    VXLAN FDB entries can point to FDB nexthop objects. Each such object
    includes the IP address(es) of remote VTEP(s) via which the target host
    is accessible. Example:

     # ip nexthop add id 1 via 192.0.2.1 fdb
     # ip nexthop add id 2 via 192.0.2.17 fdb
     # ip nexthop add id 1000 group 1/2 fdb
     # bridge fdb add 00:11:22:33:44:55 dev vx0 self static nhid 1000 src_vni 10020

    This is useful for EVPN multihoming where a single host can be connected
    to multiple VTEPs. The source VTEP will calculate the flow hash of the
    skb and forward it towards the IP address of one of the VTEPs member in
    the nexthop group.

    There are cases where an external entity (e.g., the bridge driver) can
    provide not only the tunnel ID (i.e., VNI) of the skb, but also the ID
    of the nexthop object via which the skb should be forwarded.

    Therefore, in order to support such cases, when the VXLAN device is in
    external / collect metadata mode and the tunnel info attached to the skb
    is of bridge type, extract the nexthop ID from the tunnel info. If the
    ID is valid (i.e., non-zero), forward the skb via the nexthop object
    associated with the ID, as if the skb hit an FDB entry associated with
    this ID.

    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-05-17 11:06:44 +02:00
Ivan Vecera 4fd8bcec73 net: vxlan: Add nolocalbypass option to vxlan.
JIRA: https://issues.redhat.com/browse/RHEL-36610

commit 69474a8a5837be63f13c6f60a7d622b98ed5c539
Author: Vladimir Nikishkin <vladimir@nikishkin.pw>
Date:   Fri May 12 11:40:33 2023 +0800

    net: vxlan: Add nolocalbypass option to vxlan.

    If a packet needs to be encapsulated towards a local destination IP, the
    packet will undergo a "local bypass" and be injected into the Rx path as
    if it was received by the target VXLAN device without undergoing
    encapsulation. If such a device does not exist, the packet will be
    dropped.

    There are scenarios where we do not want to perform such a bypass, but
    instead want the packet to be encapsulated and locally received by a
    user space program for post-processing.

    To that end, add a new VXLAN device attribute that controls whether a
    "local bypass" is performed or not. Default to performing a bypass to
    maintain existing behavior.

    Signed-off-by: Vladimir Nikishkin <vladimir@nikishkin.pw>
    Reviewed-by: Ido Schimmel <idosch@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-05-17 11:06:43 +02:00
Ivan Vecera 909fba68f5 vxlan: Enable MDB support
JIRA: https://issues.redhat.com/browse/RHEL-36610

commit 08f876a7d79ed235f90af0373d1e548a71c1f4f6
Author: Ido Schimmel <idosch@nvidia.com>
Date:   Wed Mar 15 15:11:54 2023 +0200

    vxlan: Enable MDB support

    Now that the VXLAN MDB control and data paths are in place we can expose
    the VXLAN MDB functionality to user space.

    Set the VXLAN MDB net device operations to the appropriate functions,
    thereby allowing the rtnetlink code to reach the VXLAN driver.

    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-05-17 11:06:43 +02:00
Ivan Vecera 83e1cb7adf vxlan: Add MDB data path support
JIRA: https://issues.redhat.com/browse/RHEL-36610

commit 0f83e69f44bf8dc8ab48ff0196b3475c1f0f6c07
Author: Ido Schimmel <idosch@nvidia.com>
Date:   Wed Mar 15 15:11:53 2023 +0200

    vxlan: Add MDB data path support

    Integrate MDB support into the Tx path of the VXLAN driver, allowing it
    to selectively forward IP multicast traffic according to the matched MDB
    entry.

    If MDB entries are configured (i.e., 'VXLAN_F_MDB' is set) and the
    packet is an IP multicast packet, perform up to three different lookups
    according to the following priority:

    1. For an (S, G) entry, using {Source VNI, Source IP, Destination IP}.
    2. For a (*, G) entry, using {Source VNI, Destination IP}.
    3. For the catchall MDB entry (0.0.0.0 or ::), using the source VNI.

    The catchall MDB entry is similar to the catchall FDB entry
    (00:00:00:00:00:00) that is currently used to transmit BUM (broadcast,
    unknown unicast and multicast) traffic. However, unlike the catchall FDB
    entry, this entry is only used to transmit unregistered IP multicast
    traffic that is not link-local. Therefore, when configured, the catchall
    FDB entry will only transmit BULL (broadcast, unknown unicast,
    link-local multicast) traffic.

    The catchall MDB entry is useful in deployments where inter-subnet
    multicast forwarding is used and not all the VTEPs in a tenant domain
    are members in all the broadcast domains. In such deployments it is
    advantageous to transmit BULL (broadcast, unknown unicast and link-local
    multicast) and unregistered IP multicast traffic on different tunnels.
    If the same tunnel was used, a VTEP only interested in IP multicast
    traffic would also pull all the BULL traffic and drop it as it is not a
    member in the originating broadcast domain [1].

    If the packet did not match an MDB entry (or if the packet is not an IP
    multicast packet), return it to the Tx path, allowing it to be forwarded
    according to the FDB.

    If the packet did match an MDB entry, forward it to the associated
    remote VTEPs. However, if the entry is a (*, G) entry and the associated
    remote is in INCLUDE mode, then skip over it as the source IP is not in
    its source list (otherwise the packet would have matched on an (S, G)
    entry). Similarly, if the associated remote is marked as BLOCKED (can
    only be set on (S, G) entries), then skip over it as well as the remote
    is in EXCLUDE mode and the source IP is in its source list.

    [1] https://datatracker.ietf.org/doc/html/draft-ietf-bess-evpn-irb-mcast#section-2.6

    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-05-17 11:06:43 +02:00
Ivan Vecera 9289ff99a9 vxlan: mdb: Add MDB control path support
JIRA: https://issues.redhat.com/browse/RHEL-36610

commit a3a48de5eade770e911d35291217bdd69ce04ef1
Author: Ido Schimmel <idosch@nvidia.com>
Date:   Wed Mar 15 15:11:51 2023 +0200

    vxlan: mdb: Add MDB control path support

    Implement MDB control path support, enabling the creation, deletion,
    replacement and dumping of MDB entries in a similar fashion to the
    bridge driver. Unlike the bridge driver, each entry stores a list of
    remote VTEPs to which matched packets need to be replicated to and not a
    list of bridge ports.

    The motivating use case is the installation of MDB entries by a user
    space control plane in response to received EVPN routes. As such, only
    allow permanent MDB entries to be installed and do not implement
    snooping functionality, avoiding a lot of unnecessary complexity.

    Since entries can only be modified by user space under RTNL, use RTNL as
    the write lock. Use RCU to ensure that MDB entries and remotes are not
    freed while being accessed from the data path during transmission.

    In terms of uAPI, reuse the existing MDB netlink interface, but add a
    few new attributes to request and response messages:

    * IP address of the destination VXLAN tunnel endpoint where the
      multicast receivers reside.

    * UDP destination port number to use to connect to the remote VXLAN
      tunnel endpoint.

    * VXLAN VNI Network Identifier to use to connect to the remote VXLAN
      tunnel endpoint. Required when Ingress Replication (IR) is used and
      the remote VTEP is not a member of originating broadcast domain
      (VLAN/VNI) [1].

    * Source VNI Network Identifier the MDB entry belongs to. Used only when
      the VXLAN device is in external mode.

    * Interface index of the outgoing interface to reach the remote VXLAN
      tunnel endpoint. This is required when the underlay destination IP is
      multicast (P2MP), as the multicast routing tables are not consulted.

    All the new attributes are added under the 'MDBA_SET_ENTRY_ATTRS' nest
    which is strictly validated by the bridge driver, thereby automatically
    rejecting the new attributes.

    [1] https://datatracker.ietf.org/doc/html/draft-ietf-bess-evpn-irb-mcast#section-3.2.2

    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-05-17 11:06:43 +02:00
Ivan Vecera eae34c8677 vxlan: Expose vxlan_xmit_one()
JIRA: https://issues.redhat.com/browse/RHEL-36610

commit 6ab271aaad25351ea8587d67c6837678b875eb2c
Author: Ido Schimmel <idosch@nvidia.com>
Date:   Wed Mar 15 15:11:50 2023 +0200

    vxlan: Expose vxlan_xmit_one()

    Given a packet and a remote destination, the function will take care of
    encapsulating the packet and transmitting it to the destination.

    Expose it so that it could be used in subsequent patches by the MDB code
    to transmit a packet to the remote destination(s) stored in the MDB
    entry.

    It will allow us to keep the MDB code self-contained, not exposing its
    data structures to the rest of the VXLAN driver.

    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-05-17 11:06:43 +02:00
Ivan Vecera ea2d7ae35e vxlan: Move address helpers to private headers
JIRA: https://issues.redhat.com/browse/RHEL-36610

commit f307c8bf37a346ed3e8b6090b64b4ca8d61e1bcd
Author: Ido Schimmel <idosch@nvidia.com>
Date:   Wed Mar 15 15:11:49 2023 +0200

    vxlan: Move address helpers to private headers

    Move the helpers out of the core C file to the private header so that
    they could be used by the upcoming MDB code.

    While at it, constify the second argument of vxlan_nla_get_addr().

    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-05-17 11:06:43 +02:00
Ivan Vecera 3158288aac vxlan: Fix memory leaks in error path
JIRA: https://issues.redhat.com/browse/RHEL-36610

commit 06bf62944144a92d83dd14fd1378d2a288259561
Author: Ido Schimmel <idosch@nvidia.com>
Date:   Mon Jan 2 08:55:56 2023 +0200

    vxlan: Fix memory leaks in error path

    The memory allocated by vxlan_vnigroup_init() is not freed in the error
    path, leading to memory leaks [1]. Fix by calling
    vxlan_vnigroup_uninit() in the error path.

    The leaks can be reproduced by annotating gro_cells_init() with
    ALLOW_ERROR_INJECTION() and then running:

     # echo "100" > /sys/kernel/debug/fail_function/probability
     # echo "1" > /sys/kernel/debug/fail_function/times
     # echo "gro_cells_init" > /sys/kernel/debug/fail_function/inject
     # printf %#x -12 > /sys/kernel/debug/fail_function/gro_cells_init/retval
     # ip link add name vxlan0 type vxlan dstport 4789 external vnifilter
     RTNETLINK answers: Cannot allocate memory

    [1]
    unreferenced object 0xffff88810db84a00 (size 512):
      comm "ip", pid 330, jiffies 4295010045 (age 66.016s)
      hex dump (first 32 bytes):
        f8 d5 76 0e 81 88 ff ff 01 00 00 00 00 00 00 02  ..v.............
        03 00 04 00 48 00 00 00 00 00 00 01 04 00 01 00  ....H...........
      backtrace:
        [<ffffffff81a3097a>] kmalloc_trace+0x2a/0x60
        [<ffffffff82f049fc>] vxlan_vnigroup_init+0x4c/0x160
        [<ffffffff82ecd69e>] vxlan_init+0x1ae/0x280
        [<ffffffff836858ca>] register_netdevice+0x57a/0x16d0
        [<ffffffff82ef67b7>] __vxlan_dev_create+0x7c7/0xa50
        [<ffffffff82ef6ce6>] vxlan_newlink+0xd6/0x130
        [<ffffffff836d02ab>] __rtnl_newlink+0x112b/0x18a0
        [<ffffffff836d0a8c>] rtnl_newlink+0x6c/0xa0
        [<ffffffff836c0ddf>] rtnetlink_rcv_msg+0x43f/0xd40
        [<ffffffff83908ce0>] netlink_rcv_skb+0x170/0x440
        [<ffffffff839066af>] netlink_unicast+0x53f/0x810
        [<ffffffff839072d8>] netlink_sendmsg+0x958/0xe70
        [<ffffffff835c319f>] ____sys_sendmsg+0x78f/0xa90
        [<ffffffff835cd6da>] ___sys_sendmsg+0x13a/0x1e0
        [<ffffffff835cd94c>] __sys_sendmsg+0x11c/0x1f0
        [<ffffffff8424da78>] do_syscall_64+0x38/0x80
    unreferenced object 0xffff88810e76d5f8 (size 192):
      comm "ip", pid 330, jiffies 4295010045 (age 66.016s)
      hex dump (first 32 bytes):
        04 00 00 00 00 00 00 00 db e1 4f e7 00 00 00 00  ..........O.....
        08 d6 76 0e 81 88 ff ff 08 d6 76 0e 81 88 ff ff  ..v.......v.....
      backtrace:
        [<ffffffff81a3162e>] __kmalloc_node+0x4e/0x90
        [<ffffffff81a0e166>] kvmalloc_node+0xa6/0x1f0
        [<ffffffff8276e1a3>] bucket_table_alloc.isra.0+0x83/0x460
        [<ffffffff8276f18b>] rhashtable_init+0x43b/0x7c0
        [<ffffffff82f04a1c>] vxlan_vnigroup_init+0x6c/0x160
        [<ffffffff82ecd69e>] vxlan_init+0x1ae/0x280
        [<ffffffff836858ca>] register_netdevice+0x57a/0x16d0
        [<ffffffff82ef67b7>] __vxlan_dev_create+0x7c7/0xa50
        [<ffffffff82ef6ce6>] vxlan_newlink+0xd6/0x130
        [<ffffffff836d02ab>] __rtnl_newlink+0x112b/0x18a0
        [<ffffffff836d0a8c>] rtnl_newlink+0x6c/0xa0
        [<ffffffff836c0ddf>] rtnetlink_rcv_msg+0x43f/0xd40
        [<ffffffff83908ce0>] netlink_rcv_skb+0x170/0x440
        [<ffffffff839066af>] netlink_unicast+0x53f/0x810
        [<ffffffff839072d8>] netlink_sendmsg+0x958/0xe70
        [<ffffffff835c319f>] ____sys_sendmsg+0x78f/0xa90

    Fixes: f9c4bb0b245c ("vxlan: vni filtering support on collect metadata device")
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-05-17 11:06:42 +02:00
Ivan Vecera 61dfc52e63 net: gro: skb_gro_header helper function
JIRA: https://issues.redhat.com/browse/RHEL-36610

Conflicts:
- modified due to already applied commit b0b672c4d095 ("vxlan: fix
  GRO with VXLAN-GPE")
- hunk for fou is applied into fou_core.c instead of fou.c due to
  existing backport of 08d323234d10 ("net: fou: rename the source for
  linking")

commit 35ffb66547295c72650978f9c28e670e014d0957
Author: Richard Gobert <richardbgobert@gmail.com>
Date:   Tue Aug 23 09:10:49 2022 +0200

    net: gro: skb_gro_header helper function

    Introduce a simple helper function to replace a common pattern.
    When accessing the GRO header, we fetch the pointer from frag0,
    then test its validity and fetch it from the skb when necessary.

    This leads to the pattern
    skb_gro_header_fast -> skb_gro_header_hard -> skb_gro_header_slow
    recurring many times throughout GRO code.

    This patch replaces these patterns with a single inlined function
    call, improving code readability.

    Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Link: https://lore.kernel.org/r/20220823071034.GA56142@debian
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-05-17 11:06:42 +02:00
Ivan Vecera 3d0fe3b695 net: vxlan: Fix kernel coding style
JIRA: https://issues.redhat.com/browse/RHEL-36610

commit c2e10f53455c898050738d6a5f8c237f27aec225
Author: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com>
Date:   Fri May 20 02:36:14 2022 +0200

    net: vxlan: Fix kernel coding style

    The continuation line does not align with the opening bracket
    and this patch fix it.

    Signed-off-by: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com>
    Link: https://lore.kernel.org/r/20220520003614.6073-1-eng.alaamohamedsoliman.am@gmail.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-05-17 11:06:42 +02:00
Ivan Vecera 4c6030b2ad net: vxlan: Add extack support to vxlan_fdb_delete
JIRA: https://issues.redhat.com/browse/RHEL-36610

commit e92695e506d663bc4868ffc5bc187488a4f4d5c8
Author: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com>
Date:   Thu May 5 17:09:58 2022 +0200

    net: vxlan: Add extack support to vxlan_fdb_delete

    This patch adds extack msg support to vxlan_fdb_delete and vxlan_fdb_parse.
    extack is used to propagate meaningful error msgs to the user of vxlan
    fdb netlink api

    Signed-off-by: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-05-17 11:06:42 +02:00
Ivan Vecera 50157ec86f drivers: vxlan: fix returnvar.cocci warning
JIRA: https://issues.redhat.com/browse/RHEL-36610

commit e58bc864630f0eb5e7bff8ac3c2d5816591189de
Author: Guo Zhengkui <guozhengkui@vivo.com>
Date:   Tue Mar 8 21:43:09 2022 +0800

    drivers: vxlan: fix returnvar.cocci warning

    Fix the following coccicheck warning:

    drivers/net/vxlan/vxlan_core.c:2995:5-8:
    Unneeded variable: "ret". Return "0" on line 3004.

    Fixes: f9c4bb0b245c ("vxlan: vni filtering support on collect metadata device")
    Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com>
    Acked-by: Roopa Prabhu <roopa@nvidia.com>
    Link: https://lore.kernel.org/r/20220308134321.29862-1-guozhengkui@vivo.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-05-17 11:06:42 +02:00
Ivan Vecera a680539ac0 vxlan_core: delete unnecessary condition
JIRA: https://issues.redhat.com/browse/RHEL-36610

commit 8daf4e75fc09d6b0ca8fea0988959c99643aa8a8
Author: Dan Carpenter <dan.carpenter@oracle.com>
Date:   Mon Mar 7 15:57:36 2022 +0300

    vxlan_core: delete unnecessary condition

    The previous check handled the "if (!nh)" condition so we know "nh"
    is non-NULL here.  Delete the check and pull the code in one tab.

    Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
    Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
    Reviewed-by: Roopa Prabhu <roopa@nvidia.com>
    Link: https://lore.kernel.org/r/20220307125735.GC16710@kili
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-05-17 11:06:42 +02:00
Ivan Vecera 16c6655923 drivers: vxlan: vnifilter: per vni stats
JIRA: https://issues.redhat.com/browse/RHEL-36610

Conflicts:
- adjusted due to already applied commits 625788b58445 ("net: add
  per-cpu storage and net->core_stats") and 3d391f6518fd ("tun: vxlan:
  Use netif_rx().")

commit 4095e0e1328a3cd9e3b30174d6cb0edb3824256d
Author: Nikolay Aleksandrov <nikolay@nvidia.com>
Date:   Tue Mar 1 05:04:38 2022 +0000

    drivers: vxlan: vnifilter: per vni stats

    Add per-vni statistics for vni filter mode. Counting Rx/Tx
    bytes/packets/drops/errors at the appropriate places.

    This patch changes vxlan_vs_find_vni to also return the
    vxlan_vni_node in cases where the vni belongs to a vni
    filtering vxlan device

    Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
    Signed-off-by: Roopa Prabhu <roopa@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-05-17 11:06:42 +02:00
Ivan Vecera 8d598dbaec vxlan: vni filtering support on collect metadata device
JIRA: https://issues.redhat.com/browse/RHEL-36610

commit f9c4bb0b245cee35ef66f75bf409c9573d934cf9
Author: Roopa Prabhu <roopa@nvidia.com>
Date:   Tue Mar 1 05:04:36 2022 +0000

    vxlan: vni filtering support on collect metadata device

    This patch adds vnifiltering support to collect metadata device.

    Motivation:
    You can only use a single vxlan collect metadata device for a given
    vxlan udp port in the system today. The vxlan collect metadata device
    terminates all received vxlan packets. As shown in the below diagram,
    there are use-cases where you need to support multiple such vxlan devices in
    independent bridge domains. Each vxlan device must terminate the vni's
    it is configured for.
    Example usecase: In a service provider network a service provider
    typically supports multiple bridge domains with overlapping vlans.
    One bridge domain per customer. Vlans in each bridge domain are
    mapped to globally unique vxlan ranges assigned to each customer.

    vnifiltering support in collect metadata devices terminates only configured
    vnis. This is similar to vlan filtering in bridge driver. The vni filtering
    capability is provided by a new flag on collect metadata device.

    In the below pic:
            - customer1 is mapped to br1 bridge domain
            - customer2 is mapped to br2 bridge domain
            - customer1 vlan 10-11 is mapped to vni 1001-1002
            - customer2 vlan 10-11 is mapped to vni 2001-2002
            - br1 and br2 are vlan filtering bridges
            - vxlan1 and vxlan2 are collect metadata devices with
              vnifiltering enabled

    ┌──────────────────────────────────────────────────────────────────┐
    │  switch                                                          │
    │                                                                  │
    │         ┌───────────┐                 ┌───────────┐              │
    │         │           │                 │           │              │
    │         │   br1     │                 │   br2     │              │
    │         └┬─────────┬┘                 └──┬───────┬┘              │
    │     vlans│         │               vlans │       │               │
    │     10,11│         │                10,11│       │               │
    │          │     vlanvnimap:               │    vlanvnimap:        │
    │          │       10-1001,11-1002         │      10-2001,11-2002  │
    │          │         │                     │       │               │
    │   ┌──────┴┐     ┌──┴─────────┐       ┌───┴────┐  │               │
    │   │ swp1  │     │vxlan1      │       │ swp2   │ ┌┴─────────────┐ │
    │   │       │     │  vnifilter:│       │        │ │vxlan2        │ │
    │   └───┬───┘     │   1001,1002│       └───┬────┘ │ vnifilter:   │ │
    │       │         └────────────┘           │      │  2001,2002   │ │
    │       │                                  │      └──────────────┘ │
    │       │                                  │                       │
    └───────┼──────────────────────────────────┼───────────────────────┘
            │                                  │
            │                                  │
      ┌─────┴───────┐                          │
      │  customer1  │                    ┌─────┴──────┐
      │ host/VM     │                    │customer2   │
      └─────────────┘                    │ host/VM    │
                                         └────────────┘

    With this implementation, vxlan dst metadata device can
    be associated with range of vnis.
    struct vxlan_vni_node is introduced to represent
    a configured vni. We start with vni and its
    associated remote_ip in this structure. This
    structure can be extended to bring in other
    per vni attributes if there are usecases for it.
    A vni inherits an attribute from the base vxlan device
    if there is no per vni attributes defined.

    struct vxlan_dev gets a new rhashtable for
    vnis called vxlan_vni_group. vxlan_vnifilter.c
    implements the necessary netlink api, notifications
    and helper functions to process and manage lifecycle
    of vxlan_vni_node.

    This patch also adds new helper functions in vxlan_multicast.c
    to handle per vni remote_ip multicast groups which are part
    of vxlan_vni_group.

    Fix build problems:
    Reported-by: kernel test robot <lkp@intel.com>
    Signed-off-by: Roopa Prabhu <roopa@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-05-17 11:06:41 +02:00