Commit Graph

244 Commits

Author SHA1 Message Date
Davide Caratti 2714cd666b flow_dissector: set encapsulation control flags for non-IP
JIRA: https://issues.redhat.com/browse/RHEL-3647
Upstream Status: net-next.git commit 706bf4f44c6d2ae2fdeefeb816b2c35a173ecfa4

commit 706bf4f44c6d2ae2fdeefeb816b2c35a173ecfa4
Author: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Date:   Sat Jul 13 02:19:09 2024 +0000

    flow_dissector: set encapsulation control flags for non-IP

    Make sure to set encapsulated control flags also for non-IP
    packets, such that it's possible to allow matching on e.g.
    TUNNEL_OAM on a geneve packet carrying a non-IP packet.

    Suggested-by: Davide Caratti <dcaratti@redhat.com>
    Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
    Tested-by: Davide Caratti <dcaratti@redhat.com>
    Reviewed-by: Davide Caratti <dcaratti@redhat.com>
    Link: https://patch.msgid.link/20240713021911.1631517-13-ast@fiberby.net
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2024-10-21 16:23:54 +02:00
Davide Caratti dc3f4b244f flow_dissector: cleanup FLOW_DISSECTOR_KEY_ENC_FLAGS
JIRA: https://issues.redhat.com/browse/RHEL-3647
Upstream Status: net-next.git commit db5271d50ec155abf287a27fa84e2e33a81dbd55

commit db5271d50ec155abf287a27fa84e2e33a81dbd55
Author: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Date:   Sat Jul 13 02:19:08 2024 +0000

    flow_dissector: cleanup FLOW_DISSECTOR_KEY_ENC_FLAGS

    Now that TCA_FLOWER_KEY_ENC_FLAGS is unused, as it's
    former data is stored behind TCA_FLOWER_KEY_ENC_CONTROL,
    then remove the last bits of FLOW_DISSECTOR_KEY_ENC_FLAGS.

    FLOW_DISSECTOR_KEY_ENC_FLAGS is unreleased, and have been
    in net-next since 2024-06-04.

    Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
    Tested-by: Davide Caratti <dcaratti@redhat.com>
    Reviewed-by: Davide Caratti <dcaratti@redhat.com>
    Link: https://patch.msgid.link/20240713021911.1631517-12-ast@fiberby.net
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2024-10-21 16:23:54 +02:00
Davide Caratti 4d1830ce6b flow_dissector: set encapsulated control flags from tun_flags
JIRA: https://issues.redhat.com/browse/RHEL-3647
Upstream Status: net-next.git commit 03afeb613bfe6b0c28e8b843959f716a3d2c42df

commit 03afeb613bfe6b0c28e8b843959f716a3d2c42df
Author: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Date:   Sat Jul 13 02:19:04 2024 +0000

    flow_dissector: set encapsulated control flags from tun_flags

    Set the new FLOW_DIS_F_TUNNEL_* encapsulated control flags, based
    on if their counter-part is set in tun_flags.

    These flags are not userspace visible yet, as the code to dump
    encapsulated control flags will first be added, and later activated
    in the following patches.

    Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
    Tested-by: Davide Caratti <dcaratti@redhat.com>
    Reviewed-by: Davide Caratti <dcaratti@redhat.com>
    Link: https://patch.msgid.link/20240713021911.1631517-8-ast@fiberby.net
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2024-10-21 16:23:53 +02:00
Davide Caratti a8a743ad5b flow_dissector: prepare for encapsulated control flags
JIRA: https://issues.redhat.com/browse/RHEL-3647
Upstream Status: net-next.git commit 4d0aed380f9ddf24dfb1d06a05096b778442c403

commit 4d0aed380f9ddf24dfb1d06a05096b778442c403
Author: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Date:   Sat Jul 13 02:19:03 2024 +0000

    flow_dissector: prepare for encapsulated control flags

    Rename skb_flow_dissect_set_enc_addr_type() to
    skb_flow_dissect_set_enc_control(), and make it set both
    addr_type and flags in FLOW_DISSECTOR_KEY_ENC_CONTROL.

    Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
    Tested-by: Davide Caratti <dcaratti@redhat.com>
    Reviewed-by: Davide Caratti <dcaratti@redhat.com>
    Link: https://patch.msgid.link/20240713021911.1631517-7-ast@fiberby.net
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2024-10-21 16:23:53 +02:00
Davide Caratti 5bb0ce0378 flow_dissector: add support for tunnel control flags
JIRA: https://issues.redhat.com/browse/RHEL-3647
Upstream Status: net-next.git commit 668b6a2ef832a878494cc1b12a881c8ec0494b25

commit 668b6a2ef832a878494cc1b12a881c8ec0494b25
Author: Davide Caratti <dcaratti@redhat.com>
Date:   Thu May 30 19:08:34 2024 +0200

    flow_dissector: add support for tunnel control flags

    Dissect [no]csum, [no]dontfrag, [no]oam, [no]crit flags from skb metadata.
    This is a prerequisite for matching these control flags using TC flower.

    Suggested-by: Ilya Maximets <i.maximets@ovn.org>
    Signed-off-by: Davide Caratti <dcaratti@redhat.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2024-10-21 16:23:52 +02:00
Ivan Vecera b2b69e4f5f net: ipv4: Add a sysctl to set multipath hash seed
JIRA: https://issues.redhat.com/browse/RHEL-59087

commit 4ee2a8cace3fb9a34aea6a56426f89d26dd514f3
Author: Petr Machata <petrm@nvidia.com>
Date:   Fri Jun 7 17:13:54 2024 +0200

    net: ipv4: Add a sysctl to set multipath hash seed

    When calculating hashes for the purpose of multipath forwarding, both IPv4
    and IPv6 code currently fall back on flow_hash_from_keys(). That uses a
    randomly-generated seed. That's a fine choice by default, but unfortunately
    some deployments may need a tighter control over the seed used.

    In this patch, make the seed configurable by adding a new sysctl key,
    net.ipv4.fib_multipath_hash_seed to control the seed. This seed is used
    specifically for multipath forwarding and not for the other concerns that
    flow_hash_from_keys() is used for, such as queue selection. Expose the knob
    as sysctl because other such settings, such as headers to hash, are also
    handled that way. Like those, the multipath hash seed is a per-netns
    variable.

    Despite being placed in the net.ipv4 namespace, the multipath seed sysctl
    is used for both IPv4 and IPv6, similarly to e.g. a number of TCP
    variables.

    The seed used by flow_hash_from_keys() is a 128-bit quantity. However it
    seems that usually the seed is a much more modest value. 32 bits seem
    typical (Cisco, Cumulus), some systems go even lower. For that reason, and
    to decouple the user interface from implementation details, go with a
    32-bit quantity, which is then quadruplicated to form the siphash key.

    Signed-off-by: Petr Machata <petrm@nvidia.com>
    Reviewed-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Link: https://lore.kernel.org/r/20240607151357.421181-3-petrm@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-09-24 17:04:46 +02:00
Hangbin Liu 5dc2b04050 net: add and use __skb_get_hash_symmetric_net
JIRA: https://issues.redhat.com/browse/RHEL-54921
Upstream Status: net.git commit d1dab4f71d37

commit d1dab4f71d372e00e2d34a9c32bf261623e3a95c
Author: Florian Westphal <fw@strlen.de>
Date:   Sun Jun 9 00:10:40 2024 +0200

    net: add and use __skb_get_hash_symmetric_net

    Similar to previous patch: apply same logic for
    __skb_get_hash_symmetric and let callers pass the netns to the dissector
    core.

    Existing function is turned into a wrapper to avoid adjusting all
    callers, nft_hash.c uses new function.

    Reviewed-by: Willem de Bruijn <willemb@google.com>
    Signed-off-by: Florian Westphal <fw@strlen.de>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Link: https://lore.kernel.org/r/20240608221057.16070-3-fw@strlen.de
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Hangbin Liu <haliu@redhat.com>
2024-08-29 08:59:51 +08:00
Hangbin Liu 68bfdd12cd net: add and use skb_get_hash_net
JIRA: https://issues.redhat.com/browse/RHEL-54921
Upstream Status: net.git commit b975d3ee5962

commit b975d3ee5962237c1e2f5d5aeeaaf0dc2173486c
Author: Florian Westphal <fw@strlen.de>
Date:   Sun Jun 9 00:10:39 2024 +0200

    net: add and use skb_get_hash_net

    Years ago flow dissector gained ability to delegate flow dissection
    to a bpf program, scoped per netns.

    Unfortunately, skb_get_hash() only gets an sk_buff argument instead
    of both net+skb.  This means the flow dissector needs to obtain the
    netns pointer from somewhere else.

    The netns is derived from skb->dev, and if that is not available, from
    skb->sk.  If neither is set, we hit a (benign) WARN_ON_ONCE().

    Trying both dev and sk covers most cases, but not all, as recently
    reported by Christoph Paasch.

    In case of nf-generated tcp reset, both sk and dev are NULL:

    WARNING: .. net/core/flow_dissector.c:1104
     skb_flow_dissect_flow_keys include/linux/skbuff.h:1536 [inline]
     skb_get_hash include/linux/skbuff.h:1578 [inline]
     nft_trace_init+0x7d/0x120 net/netfilter/nf_tables_trace.c:320
     nft_do_chain+0xb26/0xb90 net/netfilter/nf_tables_core.c:268
     nft_do_chain_ipv4+0x7a/0xa0 net/netfilter/nft_chain_filter.c:23
     nf_hook_slow+0x57/0x160 net/netfilter/core.c:626
     __ip_local_out+0x21d/0x260 net/ipv4/ip_output.c:118
     ip_local_out+0x26/0x1e0 net/ipv4/ip_output.c:127
     nf_send_reset+0x58c/0x700 net/ipv4/netfilter/nf_reject_ipv4.c:308
     nft_reject_ipv4_eval+0x53/0x90 net/ipv4/netfilter/nft_reject_ipv4.c:30
     [..]

    syzkaller did something like this:
    table inet filter {
      chain input {
        type filter hook input priority filter; policy accept;
        meta nftrace set 1
        tcp dport 42 reject with tcp reset
       }
       chain output {
        type filter hook output priority filter; policy accept;
        # empty chain is enough
       }
    }

    ... then sends a tcp packet to port 42.

    Initial attempt to simply set skb->dev from nf_reject_ipv4 doesn't cover
    all cases: skbs generated via ipv4 igmp_send_report trigger similar splat.

    Moreover, Pablo Neira found that nft_hash.c uses __skb_get_hash_symmetric()
    which would trigger same warn splat for such skbs.

    Lets allow callers to pass the current netns explicitly.
    The nf_trace infrastructure is adjusted to use the new helper.

    __skb_get_hash_symmetric is handled in the next patch.

    Reported-by: Christoph Paasch <cpaasch@apple.com>
    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/494
    Reviewed-by: Willem de Bruijn <willemb@google.com>
    Signed-off-by: Florian Westphal <fw@strlen.de>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Link: https://lore.kernel.org/r/20240608221057.16070-2-fw@strlen.de
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Hangbin Liu <haliu@redhat.com>
2024-08-29 08:59:41 +08:00
CKI Backport Bot ad584b3d7b net: flow_dissector: use DEBUG_NET_WARN_ON_ONCE
JIRA: https://issues.redhat.com/browse/RHEL-54921
CVE: CVE-2024-42321

commit 120f1c857a73e52132e473dee89b340440cb692b
Author: Pablo Neira Ayuso <pablo@netfilter.org>
Date:   Mon Jul 15 16:14:42 2024 +0200

    net: flow_dissector: use DEBUG_NET_WARN_ON_ONCE

    The following splat is easy to reproduce upstream as well as in -stable
    kernels. Florian Westphal provided the following commit:

      d1dab4f71d37 ("net: add and use __skb_get_hash_symmetric_net")

    but this complementary fix has been also suggested by Willem de Bruijn
    and it can be easily backported to -stable kernel which consists in
    using DEBUG_NET_WARN_ON_ONCE instead to silence the following splat
    given __skb_get_hash() is used by the nftables tracing infrastructure to
    to identify packets in traces.

    [69133.561393] ------------[ cut here ]------------
    [69133.561404] WARNING: CPU: 0 PID: 43576 at net/core/flow_dissector.c:1104 __skb_flow_dissect+0x134f/
    [...]
    [69133.561944] CPU: 0 PID: 43576 Comm: socat Not tainted 6.10.0-rc7+ #379
    [69133.561959] RIP: 0010:__skb_flow_dissect+0x134f/0x2ad0
    [69133.561970] Code: 83 f9 04 0f 84 b3 00 00 00 45 85 c9 0f 84 aa 00 00 00 41 83 f9 02 0f 84 81 fc ff
    ff 44 0f b7 b4 24 80 00 00 00 e9 8b f9 ff ff <0f> 0b e9 20 f3 ff ff 41 f6 c6 20 0f 84 e4 ef ff ff 48 8d 7b 12 e8
    [69133.561979] RSP: 0018:ffffc90000006fc0 EFLAGS: 00010246
    [69133.561988] RAX: 0000000000000000 RBX: ffffffff82f33e20 RCX: ffffffff81ab7e19
    [69133.561994] RDX: dffffc0000000000 RSI: ffffc90000007388 RDI: ffff888103a1b418
    [69133.562001] RBP: ffffc90000007310 R08: 0000000000000000 R09: 0000000000000000
    [69133.562007] R10: ffffc90000007388 R11: ffffffff810cface R12: ffff888103a1b400
    [69133.562013] R13: 0000000000000000 R14: ffffffff82f33e2a R15: ffffffff82f33e28
    [69133.562020] FS:  00007f40f7131740(0000) GS:ffff888390800000(0000) knlGS:0000000000000000
    [69133.562027] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [69133.562033] CR2: 00007f40f7346ee0 CR3: 000000015d200001 CR4: 00000000001706f0
    [69133.562040] Call Trace:
    [69133.562044]  <IRQ>
    [69133.562049]  ? __warn+0x9f/0x1a0
    [ 1211.841384]  ? __skb_flow_dissect+0x107e/0x2860
    [...]
    [ 1211.841496]  ? bpf_flow_dissect+0x160/0x160
    [ 1211.841753]  __skb_get_hash+0x97/0x280
    [ 1211.841765]  ? __skb_get_hash_symmetric+0x230/0x230
    [ 1211.841776]  ? mod_find+0xbf/0xe0
    [ 1211.841786]  ? get_stack_info_noinstr+0x12/0xe0
    [ 1211.841798]  ? bpf_ksym_find+0x56/0xe0
    [ 1211.841807]  ? __rcu_read_unlock+0x2a/0x70
    [ 1211.841819]  nft_trace_init+0x1b9/0x1c0 [nf_tables]
    [ 1211.841895]  ? nft_trace_notify+0x830/0x830 [nf_tables]
    [ 1211.841964]  ? get_stack_info+0x2b/0x80
    [ 1211.841975]  ? nft_do_chain_arp+0x80/0x80 [nf_tables]
    [ 1211.842044]  nft_do_chain+0x79c/0x850 [nf_tables]

    Fixes: 9b52e3f267 ("flow_dissector: handle no-skb use case")
    Suggested-by: Willem de Bruijn <willemb@google.com>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Reviewed-by: Willem de Bruijn <willemb@google.com>
    Link: https://patch.msgid.link/20240715141442.43775-1-pablo@netfilter.org
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>
2024-08-19 14:13:01 +00:00
Ivan Vecera a4a12f7632 ip_tunnel: convert __be16 tunnel flags to bitmaps
JIRA: https://issues.redhat.com/browse/RHEL-40130

Conflicts:
- hunk for non-existing net/ipv4/fou_bpf.c skipped
- conflict in ip_gre.c resolved in the same way as upstream merge
  commit cf1ca1f66d30 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net") did
- simple context conflict ip_tunnel.c due to missing commit
  c4794d22251b9 ("ipv4: tunnels: use DEV_STATS_INC()")
- simple context conflict in ip6_gre.c and ip6_tunnel.c due to missing
  commit 2fad1ba354d4a ("ipv6: tunnels: use DEV_STATS_INC()")
- simple conflict in nft_tunnel.c due to missing ffb3d9a30cc67 ("netfilter:
  nf_tables: use correct integer types")

commit 5832c4a77d6931cebf9ba737129ae8f14b66ee1d
Author: Alexander Lobakin <aleksander.lobakin@intel.com>
Date:   Wed Mar 27 16:23:53 2024 +0100

    ip_tunnel: convert __be16 tunnel flags to bitmaps

    Historically, tunnel flags like TUNNEL_CSUM or TUNNEL_ERSPAN_OPT
    have been defined as __be16. Now all of those 16 bits are occupied
    and there's no more free space for new flags.
    It can't be simply switched to a bigger container with no
    adjustments to the values, since it's an explicit Endian storage,
    and on LE systems (__be16)0x0001 equals to
    (__be64)0x0001000000000000.
    We could probably define new 64-bit flags depending on the
    Endianness, i.e. (__be64)0x0001 on BE and (__be64)0x00010000... on
    LE, but that would introduce an Endianness dependency and spawn a
    ton of Sparse warnings. To mitigate them, all of those places which
    were adjusted with this change would be touched anyway, so why not
    define stuff properly if there's no choice.

    Define IP_TUNNEL_*_BIT counterparts as a bit number instead of the
    value already coded and a fistful of <16 <-> bitmap> converters and
    helpers. The two flags which have a different bit position are
    SIT_ISATAP_BIT and VTI_ISVTI_BIT, as they were defined not as
    __cpu_to_be16(), but as (__force __be16), i.e. had different
    positions on LE and BE. Now they both have strongly defined places.
    Change all __be16 fields which were used to store those flags, to
    IP_TUNNEL_DECLARE_FLAGS() -> DECLARE_BITMAP(__IP_TUNNEL_FLAG_NUM) ->
    unsigned long[1] for now, and replace all TUNNEL_* occurrences to
    their bitmap counterparts. Use the converters in the places which talk
    to the userspace, hardware (NFP) or other hosts (GRE header). The rest
    must explicitly use the new flags only. This must be done at once,
    otherwise there will be too many conversions throughout the code in
    the intermediate commits.
    Finally, disable the old __be16 flags for use in the kernel code
    (except for the two 'irregular' flags mentioned above), to prevent
    any accidental (mis)use of them. For the userspace, nothing is
    changed, only additions were made.

    Most noticeable bloat-o-meter difference (.text):

    vmlinux:        307/-1 (306)
    gre.ko:         62/0 (62)
    ip_gre.ko:      941/-217 (724)  [*]
    ip_tunnel.ko:   390/-900 (-510) [**]
    ip_vti.ko:      138/0 (138)
    ip6_gre.ko:     534/-18 (516)   [*]
    ip6_tunnel.ko:  118/-10 (108)

    [*] gre_flags_to_tnl_flags() grew, but still is inlined
    [**] ip_tunnel_find() got uninlined, hence such decrease

    The average code size increase in non-extreme case is 100-200 bytes
    per module, mostly due to sizeof(long) > sizeof(__be16), as
    %__IP_TUNNEL_FLAG_NUM is less than %BITS_PER_LONG and the compilers
    are able to expand the majority of bitmap_*() calls here into direct
    operations on scalars.

    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-06-12 14:49:18 +02:00
Ivan Vecera ee322f63ac net/ipv6: SKB symmetric hash should incorporate transport ports
JIRA: https://issues.redhat.com/browse/RHEL-36218

commit a5e2151ff9d5852d0ababbbcaeebd9646af9c8d9
Author: Quan Tian <qtian@vmware.com>
Date:   Tue Sep 5 10:36:10 2023 +0000

    net/ipv6: SKB symmetric hash should incorporate transport ports

    __skb_get_hash_symmetric() was added to compute a symmetric hash over
    the protocol, addresses and transport ports, by commit eb70db8756
    ("packet: Use symmetric hash for PACKET_FANOUT_HASH."). It uses
    flow_keys_dissector_symmetric_keys as the flow_dissector to incorporate
    IPv4 addresses, IPv6 addresses and ports. However, it should not specify
    the flag as FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL, which stops further
    dissection when an IPv6 flow label is encountered, making transport
    ports not being incorporated in such case.

    As a consequence, the symmetric hash is based on 5-tuple for IPv4 but
    3-tuple for IPv6 when flow label is present. It caused a few problems,
    e.g. when nft symhash and openvswitch l4_sym rely on the symmetric hash
    to perform load balancing as different L4 flows between two given IPv6
    addresses would always get the same symmetric hash, leading to uneven
    traffic distribution.

    Removing the use of FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL makes sure the
    symmetric hash is based on 5-tuple for both IPv4 and IPv6 consistently.

    Fixes: eb70db8756 ("packet: Use symmetric hash for PACKET_FANOUT_HASH.")
    Reported-by: Lars Ekman <uablrek@gmail.com>
    Closes: https://github.com/antrea-io/antrea/issues/5457
    Signed-off-by: Quan Tian <qtian@vmware.com>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-05-14 13:13:17 +02:00
Ivan Vecera 9dcd4a3dda net: flow_dissector: Add IPSEC dissector
JIRA: https://issues.redhat.com/browse/RHEL-36218

commit a57c34a80cbe15e36e12d42a4ddc5160a5bbb1a4
Author: Ratheesh Kannoth <rkannoth@marvell.com>
Date:   Tue Aug 1 07:10:58 2023 +0530

    net: flow_dissector: Add IPSEC dissector

    Support for dissecting IPSEC field SPI (which is
    32bits in size) for ESP and AH packets.

    Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-05-14 11:14:12 +02:00
Ivan Vecera a2e239b543 net: flow_dissector: Use 64bits for used_keys
JIRA: https://issues.redhat.com/browse/RHEL-29648

Conflicts:
- many conflicts caused by absence of certain changes in network drivers
  and netfilter core. The resolution were done by replacing BIT by BIT_ULL
  where FLOW_DISSECTOR_KEY_* are used and by replacing '%x' by '%llx'
  where used_keys is used as format string argument.

commit 2b3082c6ef3b0104d822f6f18d2afbe5fc9a5c2c
Author: Ratheesh Kannoth <rkannoth@marvell.com>
Date:   Sat Jul 29 04:52:15 2023 +0530

    net: flow_dissector: Use 64bits for used_keys

    As 32bits of dissector->used_keys are exhausted,
    increase the size to 64bits.

    This is base change for ESP/AH flow dissector patch.
    Please find patch and discussions at
    https://lore.kernel.org/netdev/ZMDNjD46BvZ5zp5I@corigine.com/T/#t

    Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
    Reviewed-by: Petr Machata <petrm@nvidia.com> # for mlxsw
    Tested-by: Petr Machata <petrm@nvidia.com>
    Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
    Reviewed-by: Simon Horman <simon.horman@corigine.com>
    Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-03-25 14:11:42 +01:00
Jan Stancek 25a74f04f2 Merge: net-core: stable backports for 9.4 phase 1
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3251

net-core: stable backports for 9.4 phase 1

JIRA: https://issues.redhat.com/browse/RHEL-14364
Tested: LNST, Tier1

A bunch of fixes for the networking core, comprising a few
serious ones.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Approved-by: Florian Westphal <fwestpha@redhat.com>
Approved-by: Sabrina Dubroca <sdubroca@redhat.com>

Signed-off-by: Jan Stancek <jstancek@redhat.com>
2023-11-20 21:49:02 +01:00
Paolo Abeni 43fb5bf8ba net/core: Fix ETH_P_1588 flow dissector
JIRA: https://issues.redhat.com/browse/RHEL-14364
Tested: LNST, Tier1

Upstream commit:
commit 75ad80ed88a182ab2ad5513e448cf07b403af5c3
Author: Sasha Neftin <sasha.neftin@intel.com>
Date:   Wed Sep 13 09:39:05 2023 +0300

    net/core: Fix ETH_P_1588 flow dissector

    When a PTP ethernet raw frame with a size of more than 256 bytes followed
    by a 0xff pattern is sent to __skb_flow_dissect, nhoff value calculation
    is wrong. For example: hdr->message_length takes the wrong value (0xffff)
    and it does not replicate real header length. In this case, 'nhoff' value
    was overridden and the PTP header was badly dissected. This leads to a
    kernel crash.

    net/core: flow_dissector
    net/core flow dissector nhoff = 0x0000000e
    net/core flow dissector hdr->message_length = 0x0000ffff
    net/core flow dissector nhoff = 0x0001000d (u16 overflow)
    ...
    skb linear:   00000000: 00 a0 c9 00 00 00 00 a0 c9 00 00 00 88
    skb frag:     00000000: f7 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

    Using the size of the ptp_header struct will allow the corrected
    calculation of the nhoff value.

    net/core flow dissector nhoff = 0x0000000e
    net/core flow dissector nhoff = 0x00000030 (sizeof ptp_header)
    ...
    skb linear:   00000000: 00 a0 c9 00 00 00 00 a0 c9 00 00 00 88 f7 ff ff
    skb linear:   00000010: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    skb linear:   00000020: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    skb frag:     00000000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

    Kernel trace:
    [   74.984279] ------------[ cut here ]------------
    [   74.989471] kernel BUG at include/linux/skbuff.h:2440!
    [   74.995237] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
    [   75.001098] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G     U            5.15.85-intel-ese-standard-lts #1
    [   75.011629] Hardware name: Intel Corporation A-Island (CPU:AlderLake)/A-Island (ID:06), BIOS SB_ADLP.01.01.00.01.03.008.D-6A9D9E73-dirty Mar 30 2023
    [   75.026507] RIP: 0010:eth_type_trans+0xd0/0x130
    [   75.031594] Code: 03 88 47 78 eb c7 8b 47 68 2b 47 6c 48 8b 97 c0 00 00 00 83 f8 01 7e 1b 48 85 d2 74 06 66 83 3a ff 74 09 b8 00 04 00 00 eb ab <0f> 0b b8 00 01 00 00 eb a2 48 85 ff 74 eb 48 8d 54 24 06 31 f6 b9
    [   75.052612] RSP: 0018:ffff9948c0228de0 EFLAGS: 00010297
    [   75.058473] RAX: 00000000000003f2 RBX: ffff8e47047dc300 RCX: 0000000000001003
    [   75.066462] RDX: ffff8e4e8c9ea040 RSI: ffff8e4704e0a000 RDI: ffff8e47047dc300
    [   75.074458] RBP: ffff8e4704e2acc0 R08: 00000000000003f3 R09: 0000000000000800
    [   75.082466] R10: 000000000000000d R11: ffff9948c0228dec R12: ffff8e4715e4e010
    [   75.090461] R13: ffff9948c0545018 R14: 0000000000000001 R15: 0000000000000800
    [   75.098464] FS:  0000000000000000(0000) GS:ffff8e4e8fb00000(0000) knlGS:0000000000000000
    [   75.107530] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [   75.113982] CR2: 00007f5eb35934a0 CR3: 0000000150e0a002 CR4: 0000000000770ee0
    [   75.121980] PKRU: 55555554
    [   75.125035] Call Trace:
    [   75.127792]  <IRQ>
    [   75.130063]  ? eth_get_headlen+0xa4/0xc0
    [   75.134472]  igc_process_skb_fields+0xcd/0x150
    [   75.139461]  igc_poll+0xc80/0x17b0
    [   75.143272]  __napi_poll+0x27/0x170
    [   75.147192]  net_rx_action+0x234/0x280
    [   75.151409]  __do_softirq+0xef/0x2f4
    [   75.155424]  irq_exit_rcu+0xc7/0x110
    [   75.159432]  common_interrupt+0xb8/0xd0
    [   75.163748]  </IRQ>
    [   75.166112]  <TASK>
    [   75.168473]  asm_common_interrupt+0x22/0x40
    [   75.173175] RIP: 0010:cpuidle_enter_state+0xe2/0x350
    [   75.178749] Code: 85 c0 0f 8f 04 02 00 00 31 ff e8 39 6c 67 ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 50 02 00 00 31 ff e8 52 b0 6d ff fb 45 85 f6 <0f> 88 b1 00 00 00 49 63 ce 4c 2b 2c 24 48 89 c8 48 6b d1 68 48 c1
    [   75.199757] RSP: 0018:ffff9948c013bea8 EFLAGS: 00000202
    [   75.205614] RAX: ffff8e4e8fb00000 RBX: ffffb948bfd23900 RCX: 000000000000001f
    [   75.213619] RDX: 0000000000000004 RSI: ffffffff94206161 RDI: ffffffff94212e20
    [   75.221620] RBP: 0000000000000004 R08: 000000117568973a R09: 0000000000000001
    [   75.229622] R10: 000000000000afc8 R11: ffff8e4e8fb29ce4 R12: ffffffff945ae980
    [   75.237628] R13: 000000117568973a R14: 0000000000000004 R15: 0000000000000000
    [   75.245635]  ? cpuidle_enter_state+0xc7/0x350
    [   75.250518]  cpuidle_enter+0x29/0x40
    [   75.254539]  do_idle+0x1d9/0x260
    [   75.258166]  cpu_startup_entry+0x19/0x20
    [   75.262582]  secondary_startup_64_no_verify+0xc2/0xcb
    [   75.268259]  </TASK>
    [   75.270721] Modules linked in: 8021q snd_sof_pci_intel_tgl snd_sof_intel_hda_common tpm_crb snd_soc_hdac_hda snd_sof_intel_hda snd_hda_ext_core snd_sof_pci snd_sof snd_sof_xtensa_dsp snd_soc_acpi_intel_match snd_soc_acpi snd_soc_core snd_compress iTCO_wdt ac97_bus intel_pmc_bxt mei_hdcp iTCO_vendor_support snd_hda_codec_hdmi pmt_telemetry intel_pmc_core pmt_class snd_hda_intel x86_pkg_temp_thermal snd_intel_dspcfg snd_hda_codec snd_hda_core kvm_intel snd_pcm snd_timer kvm snd mei_me soundcore tpm_tis irqbypass i2c_i801 mei tpm_tis_core pcspkr intel_rapl_msr tpm i2c_smbus intel_pmt thermal sch_fq_codel uio uhid i915 drm_buddy video drm_display_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm fuse configfs
    [   75.342736] ---[ end trace 3785f9f360400e3a ]---
    [   75.347913] RIP: 0010:eth_type_trans+0xd0/0x130
    [   75.352984] Code: 03 88 47 78 eb c7 8b 47 68 2b 47 6c 48 8b 97 c0 00 00 00 83 f8 01 7e 1b 48 85 d2 74 06 66 83 3a ff 74 09 b8 00 04 00 00 eb ab <0f> 0b b8 00 01 00 00 eb a2 48 85 ff 74 eb 48 8d 54 24 06 31 f6 b9
    [   75.373994] RSP: 0018:ffff9948c0228de0 EFLAGS: 00010297
    [   75.379860] RAX: 00000000000003f2 RBX: ffff8e47047dc300 RCX: 0000000000001003
    [   75.387856] RDX: ffff8e4e8c9ea040 RSI: ffff8e4704e0a000 RDI: ffff8e47047dc300
    [   75.395864] RBP: ffff8e4704e2acc0 R08: 00000000000003f3 R09: 0000000000000800
    [   75.403857] R10: 000000000000000d R11: ffff9948c0228dec R12: ffff8e4715e4e010
    [   75.411863] R13: ffff9948c0545018 R14: 0000000000000001 R15: 0000000000000800
    [   75.419875] FS:  0000000000000000(0000) GS:ffff8e4e8fb00000(0000) knlGS:0000000000000000
    [   75.428946] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [   75.435403] CR2: 00007f5eb35934a0 CR3: 0000000150e0a002 CR4: 0000000000770ee0
    [   75.443410] PKRU: 55555554
    [   75.446477] Kernel panic - not syncing: Fatal exception in interrupt
    [   75.453738] Kernel Offset: 0x11c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
    [   75.465794] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

    Fixes: 4f1cc51f34 ("net: flow_dissector: Parse PTP L2 packet header")
    Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-20 13:49:24 +02:00
Ivan Vecera c83fde248b net: flow_dissector: add support for cfm packets
JIRA: https://issues.redhat.com/browse/RHEL-1773

commit d7ad70b5ef5ab8dedaa403e0e5c711ca1aa8cb14
Author: Zahari Doychev <zdoychev@maxlinear.com>
Date:   Thu Jun 8 12:56:46 2023 +0200

    net: flow_dissector: add support for cfm packets

    Add support for dissecting cfm packets. The cfm packet header
    fields maintenance domain level and opcode can be dissected.

    Signed-off-by: Zahari Doychev <zdoychev@maxlinear.com>
    Reviewed-by: Simon Horman <simon.horman@corigine.com>
    Reviewed-by: Ido Schimmel <idosch@nvidia.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2023-10-13 09:03:10 +02:00
Ivan Vecera 08c3e62443 flow_dissector: Dissect layer 2 miss from tc skb extension
JIRA: https://issues.redhat.com/browse/RHEL-1773

commit d5ccfd90df7fd0a50038a68634c131b8fd081bac
Author: Ido Schimmel <idosch@nvidia.com>
Date:   Mon May 29 14:48:29 2023 +0300

    flow_dissector: Dissect layer 2 miss from tc skb extension

    Extend the 'FLOW_DISSECTOR_KEY_META' key with a new 'l2_miss' field and
    populate it from a field with the same name in the tc skb extension.
    This field is set by the bridge driver for packets that incur an FDB or
    MDB miss.

    The next patch will extend the flower classifier to be able to match on
    layer 2 misses.

    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2023-10-13 09:03:09 +02:00
Ivan Vecera b7be5e5971 flow_dissector: fix false-positive __read_overflow2_field() warning
JIRA: https://issues.redhat.com/browse/RHEL-1773

commit 1b808993e19447731e823b1313ee4e8da7fd92a0
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Wed Apr 6 14:15:21 2022 -0700

    flow_dissector: fix false-positive __read_overflow2_field() warning

    Bounds checking is unhappy that we try to copy both Ethernet
    addresses but pass pointer to the first one. Luckily destination
    address is the first field so pass the pointer to the entire header,
    whatever.

    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Reviewed-by: Kees Cook <keescook@chromium.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2023-10-13 09:03:06 +02:00
Felix Maurer 91971717b4 flow_dissector: Add support for HSRv0
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2177256

commit f65e58440d4fca277233ebef78f402a0dbd02da5
Author: Kurt Kanzenbach <kurt@linutronix.de>
Date:   Thu Mar 10 08:35:05 2022 +0100

    flow_dissector: Add support for HSRv0
    
    Commit bf08824a0f47 ("flow_dissector: Add support for HSR") added support for
    HSR within the flow dissector. However, it only works for HSR in version
    1. Version 0 uses a different Ether Type. Add support for it.
    
    Reported-by: Anthony Harivel <anthony.harivel@linutronix.de>
    Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
2023-07-26 17:35:28 +02:00
Felix Maurer 6e9d77f095 flow_dissector: Add support for HSR
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2177256

commit bf08824a0f4776fc0626b82b6924fa1a5643eacb
Author: Kurt Kanzenbach <kurt@linutronix.de>
Date:   Mon Feb 28 20:58:56 2022 +0100

    flow_dissector: Add support for HSR

    Network drivers such as igb or igc call eth_get_headlen() to determine the
    header length for their to be constructed skbs in receive path.

    When running HSR on top of these drivers, it results in triggering BUG_ON() in
    skb_pull(). The reason is the skb headlen is not sufficient for HSR to work
    correctly. skb_pull() notices that.

    For instance, eth_get_headlen() returns 14 bytes for TCP traffic over HSR which
    is not correct. The problem is, the flow dissection code does not take HSR into
    account. Therefore, add support for it.

    Reported-by: Anthony Harivel <anthony.harivel@linutronix.de>
    Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
    Link: https://lore.kernel.org/r/20220228195856.88187-1-kurt@linutronix.de
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
2023-07-26 17:35:28 +02:00
Jan Stancek 2a0d6a3cdd Merge: CNB: net/sched: L2TPv3 offload support
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/2206

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2178211
Tested: Using simple L2TP and session ID match (more details in https://bugzilla.redhat.com/show_bug.cgi?id=2178211#c2)

Commits:
```
65b32f801bfb ("uapi: move IPPROTO_L2TP to in.h")
dda2fa08a13c ("flow_dissector: Add L2TPv3 dissectors")
8b189ea08c33 ("net/sched: flower: Add L2TPv3 filter")
2c1befaced50 ("flow_offload: Introduce flow_match_l2tpv3")
036b8f5b8970 ("tools headers uapi: Update linux/in.h copy")
```

Signed-off-by: Ivan Vecera <ivecera@redhat.com>

Approved-by: Guillaume Nault <gnault@redhat.com>
Approved-by: Petr Oros <poros@redhat.com>

Signed-off-by: Jan Stancek <jstancek@redhat.com>
2023-04-29 10:45:33 +02:00
Jan Stancek cb25836a90 Merge: netfilter: conntrack: Fix data-races around ct mark
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/2237

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2180943
Upstream Status: All mainline in linux.git.
Conflicts: clean cherry-picks

nf_conn:mark can be read from and written to in parallel. Use
READ_ONCE()/WRITE_ONCE() for reads and writes to prevent unwanted
compiler optimizations.

Also grab the two followup fixes to avoid a compiler warning
and make sure ctnetlink events still include the ctmark in the
delete notification.

Signed-off-by: Florian Westphal <fwestpha@redhat.com>

Approved-by: Antoine Tenart <atenart@redhat.com>
Approved-by: Eelco Chaudron <echaudro@redhat.com>

Signed-off-by: Jan Stancek <jstancek@redhat.com>
2023-03-30 12:36:45 +02:00
Florian Westphal 539491426c netfilter: conntrack: Fix data-races around ct mark
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2180943
Upstream Status: commit 52d1aa8b8249f

commit 52d1aa8b8249ff477aaa38b6f74a8ced780d079c
Author: Daniel Xu <dxu@dxuuu.xyz>
Date:   Wed Nov 9 12:39:07 2022 -0700

    netfilter: conntrack: Fix data-races around ct mark

    nf_conn:mark can be read from and written to in parallel. Use
    READ_ONCE()/WRITE_ONCE() for reads and writes to prevent unwanted
    compiler optimizations.

    Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
    Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: Florian Westphal <fwestpha@redhat.com>
2023-03-24 11:20:55 +01:00
Ivan Vecera 7b63c6d047 flow_dissector: Add L2TPv3 dissectors
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2178211

commit dda2fa08a13c688bed320ef2e4ba541abb4d6c17
Author: Wojciech Drewek <wojciech.drewek@intel.com>
Date:   Thu Sep 8 10:16:41 2022 -0700

    flow_dissector: Add L2TPv3 dissectors

    Allow to dissect L2TPv3 specific field which is:
    - session ID (32 bits)

    L2TPv3 might be transported over IP or over UDP,
    this implementation is only about L2TPv3 over IP.
    IP protocol carries L2TPv3 when ip_proto is
    IPPROTO_L2TP (115).

    Acked-by: Guillaume Nault <gnault@redhat.com>
    Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2023-03-20 17:52:53 +01:00
Felix Maurer cc802bdac6 bpf, flow_dissector: Introduce BPF_FLOW_DISSECTOR_CONTINUE retcode for bpf progs
Bugzilla: https://bugzilla.redhat.com/2166911

commit 91350fe152930c0d61a362af68272526490efea5
Author: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Date:   Sun Aug 21 14:35:17 2022 +0300

    bpf, flow_dissector: Introduce BPF_FLOW_DISSECTOR_CONTINUE retcode for bpf progs
    
    Currently, attaching BPF_PROG_TYPE_FLOW_DISSECTOR programs completely
    replaces the flow-dissector logic with custom dissection logic. This
    forces implementors to write programs that handle dissection for any
    flows expected in the namespace.
    
    It makes sense for flow-dissector BPF programs to just augment the
    dissector with custom logic (e.g. dissecting certain flows or custom
    protocols), while enjoying the broad capabilities of the standard
    dissector for any other traffic.
    
    Introduce BPF_FLOW_DISSECTOR_CONTINUE retcode. Flow-dissector BPF
    programs may return this to indicate no dissection was made, and
    fallback to the standard dissector is requested.
    
    Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Reviewed-by: Stanislav Fomichev <sdf@google.com>
    Acked-by: John Fastabend <john.fastabend@gmail.com>
    Link: https://lore.kernel.org/bpf/20220821113519.116765-3-shmulik.ladkani@gmail.com

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
2023-03-06 14:54:33 +01:00
Felix Maurer e10f4efe42 flow_dissector: Make 'bpf_flow_dissect' return the bpf program retcode
Bugzilla: https://bugzilla.redhat.com/2166911

commit 0ba985024ae7db226776725d9aa436b5c1c9fca2
Author: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Date:   Sun Aug 21 14:35:16 2022 +0300

    flow_dissector: Make 'bpf_flow_dissect' return the bpf program retcode
    
    Let 'bpf_flow_dissect' callers know the BPF program's retcode and act
    accordingly.
    
    Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Reviewed-by: Stanislav Fomichev <sdf@google.com>
    Acked-by: John Fastabend <john.fastabend@gmail.com>
    Link: https://lore.kernel.org/bpf/20220821113519.116765-2-shmulik.ladkani@gmail.com

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
2023-03-06 14:54:33 +01:00
Frantisek Hrbata 740744db2f Merge: CNB: flow_dissector: add support to dissect PPPoE fields and number of VLAN tags
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1448

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2133511
Tested: Just build, new dissectors are not used yet. Will be tested by ice rebase.

Commits:
```
2e861e5e9717 ("dissector: do not set invalid PPP protocol")
34951fcf26c5 ("flow_dissector: Add number of vlan tags dissector")
46126db9c861 ("flow_dissector: Add PPPoE dissectors")
6a21b0856daa ("flow_offload: Introduce flow_match_pppoe")
9f87eb424699 ("flow_dissector: Do not count vlan tags inside tunnel payload")
```

Signed-off-by: Ivan Vecera <ivecera@redhat.com>

Approved-by: Jarod Wilson <jarod@redhat.com>
Approved-by: Guillaume Nault <gnault@redhat.com>
Approved-by: Petr Oros <poros@redhat.com>

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
2022-11-12 03:10:43 -05:00
Paolo Abeni 7c56a9b480 net: core: fix flow symmetric hash
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2134161
Tested: LNST, Tier1

Upstream commit:
commit 64ae13ed478428135cddc2f1113dff162d8112d4
Author: Ludovic Cintrat <ludovic.cintrat@gatewatcher.com>
Date:   Wed Sep 7 12:08:13 2022 +0200

    net: core: fix flow symmetric hash

    __flow_hash_consistentify() wrongly swaps ipv4 addresses in few cases.
    This function is indirectly used by __skb_get_hash_symmetric(), which is
    used to fanout packets in AF_PACKET.
    Intrusion detection systems may be impacted by this issue.

    __flow_hash_consistentify() computes the addresses difference then swaps
    them if the difference is negative. In few cases src - dst and dst - src
    are both negative.

    The following snippet mimics __flow_hash_consistentify():

    ```
     #include <stdio.h>
     #include <stdint.h>

     int main(int argc, char** argv) {

         int diffs_d, diffd_s;
         uint32_t dst  = 0xb225a8c0; /* 178.37.168.192 --> 192.168.37.178 */
         uint32_t src  = 0x3225a8c0; /*  50.37.168.192 --> 192.168.37.50  */
         uint32_t dst2 = 0x3325a8c0; /*  51.37.168.192 --> 192.168.37.51  */

         diffs_d = src - dst;
         diffd_s = dst - src;

         printf("src:%08x dst:%08x, diff(s-d)=%d(0x%x) diff(d-s)=%d(0x%x)\n",
                 src, dst, diffs_d, diffs_d, diffd_s, diffd_s);

         diffs_d = src - dst2;
         diffd_s = dst2 - src;

         printf("src:%08x dst:%08x, diff(s-d)=%d(0x%x) diff(d-s)=%d(0x%x)\n",
                 src, dst2, diffs_d, diffs_d, diffd_s, diffd_s);

         return 0;
     }
    ```

    Results:

    src:3225a8c0 dst:b225a8c0, \
        diff(s-d)=-2147483648(0x80000000) \
        diff(d-s)=-2147483648(0x80000000)

    src:3225a8c0 dst:3325a8c0, \
        diff(s-d)=-16777216(0xff000000) \
        diff(d-s)=16777216(0x1000000)

    In the first case the addresses differences are always < 0, therefore
    __flow_hash_consistentify() always swaps, thus dst->src and src->dst
    packets have differents hashes.

    Fixes: c3f8324188 ("net: Add full IPv6 addresses to flow_keys")
    Signed-off-by: Ludovic Cintrat <ludovic.cintrat@gatewatcher.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-13 13:00:04 +02:00
Ivan Vecera 7203dea82f flow_dissector: Do not count vlan tags inside tunnel payload
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2133511

commit 9f87eb4246994e32a4e4ea88476b20ab3b412840
Author: Qingqing Yang <qingqing.yang@broadcom.com>
Date:   Mon Sep 19 15:48:08 2022 +0800

    flow_dissector: Do not count vlan tags inside tunnel payload

    We've met the problem that when there is a vlan tag inside
    GRE encapsulation, the match of num_of_vlans fails.
    It is caused by the vlan tag inside GRE payload has been
    counted into num_of_vlans, which is not expected.

    One example packet is like this:
    Ethernet II, Src: Broadcom_68:56:07 (00:10:18:68:56:07)
                       Dst: Broadcom_68:56:08 (00:10:18:68:56:08)
    802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 100
    Internet Protocol Version 4, Src: 192.168.1.4, Dst: 192.168.1.200
    Generic Routing Encapsulation (Transparent Ethernet bridging)
    Ethernet II, Src: Broadcom_68:58:07 (00:10:18:68:58:07)
                       Dst: Broadcom_68:58:08 (00:10:18:68:58:08)
    802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 200
    ...
    It should match the (num_of_vlans 1) rule, but it matches
    the (num_of_vlans 2) rule.

    The vlan tags inside the GRE or other tunnel encapsulated payload
    should not be taken into num_of_vlans.
    The fix is to stop counting the vlan number when the encapsulation
    bit is set.

    Fixes: 34951fcf26c5 ("flow_dissector: Add number of vlan tags dissector")
    Signed-off-by: Qingqing Yang <qingqing.yang@broadcom.com>
    Reviewed-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
    Link: https://lore.kernel.org/r/20220919074808.136640-1-qingqing.yang@broadcom.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2022-10-10 18:20:19 +02:00
Ivan Vecera 0f235093c2 flow_dissector: Add PPPoE dissectors
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2133511

commit 46126db9c86110e5fc1e369b9bb89735ddefdae4
Author: Wojciech Drewek <wojciech.drewek@intel.com>
Date:   Mon Jul 18 14:18:10 2022 +0200

    flow_dissector: Add PPPoE dissectors

    Allow to dissect PPPoE specific fields which are:
    - session ID (16 bits)
    - ppp protocol (16 bits)
    - type (16 bits) - this is PPPoE ethertype, for now only
      ETH_P_PPP_SES is supported, possible ETH_P_PPP_DISC
      in the future

    The goal is to make the following TC command possible:

      # tc filter add dev ens6f0 ingress prio 1 protocol ppp_ses \
          flower \
            pppoe_sid 12 \
            ppp_proto ip \
          action drop

    Note that only PPPoE Session is supported.

    Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
    Acked-by: Guillaume Nault <gnault@redhat.com>
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2022-10-10 18:20:19 +02:00
Ivan Vecera fff4b87e7a flow_dissector: Add number of vlan tags dissector
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2133511

commit 34951fcf26c59e78ae430fba1fce7c08b1871249
Author: Boris Sukholitko <boris.sukholitko@broadcom.com>
Date:   Tue Apr 19 11:14:32 2022 +0300

    flow_dissector: Add number of vlan tags dissector

    Our customers in the fiber telecom world have network configurations
    where they would like to control their traffic according to the number
    of tags appearing in the packet.

    For example, TR247 GPON conformance test suite specification mostly
    talks about untagged, single, double tagged packets and gives lax
    guidelines on the vlan protocol vs. number of vlan tags.

    This is different from the common IT networks where 802.1Q and 802.1ad
    protocols are usually describe single and double tagged packet. GPON
    configurations that we work with have arbitrary mix the above protocols
    and number of vlan tags in the packet.

    The goal is to make the following TC commands possible:

    tc filter add dev eth1 ingress flower \
      num_of_vlans 1 vlan_prio 5 action drop

    From our logs, we have redirect rules such that:

    tc filter add dev $GPON ingress flower num_of_vlans $N \
         action mirred egress redirect dev $DEV

    where N can range from 0 to 3 and $DEV is the function of $N.

    Also there are rules setting skb mark based on the number of vlans:

    tc filter add dev $GPON ingress flower num_of_vlans $N vlan_prio \
        $P action skbedit mark $M

    This new dissector allows extracting the number of vlan tags existing in
    the packet.

    Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2022-10-10 18:20:18 +02:00
Ivan Vecera f4f1a0676a dissector: do not set invalid PPP protocol
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2133511

commit 2e861e5e97175dfa7b7bc055c45acdc06d2301d3
Author: Boris Sukholitko <boris.sukholitko@broadcom.com>
Date:   Wed Sep 29 14:32:23 2021 +0300

    dissector: do not set invalid PPP protocol

    The following flower filter fails to match non-PPP_IP{V6} packets
    wrapped in PPP_SES protocol:

    tc filter add dev eth0 ingress protocol ppp_ses flower \
            action simple sdata hi64

    The reason is that proto local variable is being set even when
    FLOW_DISSECT_RET_OUT_BAD status is returned.

    The fix is to avoid setting proto variable if the PPP protocol is unknown.

    Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2022-10-10 18:20:18 +02:00
Herton R. Krzesinski a1d3a6c101 Merge: netfilter: conntrack: rebase to 5.19
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1186

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2111270
Upstream Status: All mainline in nf-next.git
Conflicts: Minor only, see patches for details

The upstream kernel carries a number of enhancements in the connection tracking module:

1. Remove a few indirect calls.
2. Remove the unconfirmed/dying list
3. Avoid allocation of ct->ext area if possible.
   Detect if userspace requested the "ecache" feature.
   In almost all cases, the extension allocation can then be avoided.
4. Restrict a local_bh_disable/enable section to the "l7 helper (ftp, h323...) needed" case.

Improves the connections-per-second rate.

The first patch isn't related to netfilter but it avoids extra
surgery on a few followup patches.

Bernard Zhao (1):
  netfilter: ctnetlink: remove useless type conversion to bool

Bill Wendling (1):
  netfilter: conntrack: use correct format characters

Eric Dumazet (1):
  net: align static siphash keys

Florian Westphal (38):
  netfilter: ctnetlink: remove expired entries first
  netfilter: ctnetlink: add and use a helper for mark parsing
  netfilter: ctnetlink: allow to filter dump by status bits
  netfilter: nf_conntrack_netbios_ns: fix helper module alias
  netfilter: conntrack: revisit gc autotuning
  netfilter: conntrack: don't refresh sctp entries in closed state
  netfilter: conntrack: pptp: use single option structure
  netfilter: ecache: remove one indent level
  netfilter: ecache: remove another indent level
  netfilter: ecache: add common helper for nf_conntrack_eventmask_report
  netfilter: ecache: prepare for event notifier merge
  netfilter: ecache: remove nf_exp_event_notifier structure
  netfilter: ecache: don't use nf_conn spinlock
  netfilter: cttimeout: use option structure
  netfilter: ctnetlink: use dump structure instead of raw args
  netfilter: ecache: move to separate structure
  netfilter: conntrack: split inner loop of list dumping to own function
  netfilter: ecache: use dedicated list for event redelivery
  netfilter: conntrack: include ecache dying list in dumps
  netfilter: conntrack: remove the percpu dying list
  netfilter: cttimeout: decouple unlink and free on netns destruction
  netfilter: remove nf_ct_unconfirmed_destroy helper
  netfilter: extensions: introduce extension genid count
  netfilter: cttimeout: decouple unlink and free on netns destruction
  netfilter: conntrack: remove __nf_ct_unconfirmed_destroy
  netfilter: conntrack: remove unconfirmed list
  netfilter: conntrack: avoid unconditional local_bh_disable
  netfilter: nfnetlink: allow to detect if ctnetlink listeners exist
  netfilter: conntrack: un-inline nf_ct_ecache_ext_add
  netfilter: conntrack: add nf_conntrack_events autodetect mode
  netfilter: prefer extension check to pointer check
  netfilter: conntrack: remove pr_debug callsites from tcp tracker
  netfilter: nfnetlink: fix warn in nfnetlink_unbind
  netfilter: cttimeout: fix slab-out-of-bounds read in
    cttimeout_net_exit
  netfilter: cttimeout: fix slab-out-of-bounds read typo in
    cttimeout_net_exit
  netfilter: nf_conntrack: add missing __rcu annotations
  netfilter: nf_conntrack: use rcu accessors where needed
  netfilter: h323: merge nat hook pointers into one

Jackie Liu (1):
  netfilter: conntrack: use fallthrough to cleanup

Kees Cook (1):
  netfilter: conntrack: Use memset_startat() to zero struct nf_conn

Pablo Neira Ayuso (2):
  netfilter: ctnetlink: missing counters and timestamp in
    nfnetlink_{log,queue}
  netfilter: conntrack: add nf_ct_iter_data object for
    nf_ct_iterate_cleanup*()

Stephen Rothwell (1):
  netfilter: ctnetlink: fix up for "netfilter: conntrack: remove
    unconfirmed list"

luo penghao (1):
  netfilter: conntrack: Remove useless assignment statements

Signed-off-by: Florian Westphal <fwestpha@redhat.com>

Approved-by: Phil Sutter <psutter@redhat.com>
Approved-by: Guillaume Nault <gnault@redhat.com>
Approved-by: Antoine Tenart <atenart@redhat.com>

Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
2022-08-19 14:36:24 +00:00
Florian Westphal 37df13d4bf net: align static siphash keys
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2111270
Upstream Status: commit 49ecc2e9c3ab

commit 49ecc2e9c3abd269951972fa8b23a4d081111b80
Author: Eric Dumazet <edumazet@google.com>
Date:   Mon Nov 15 09:23:03 2021 -0800

    net: align static siphash keys

    siphash keys use 16 bytes.

    Define siphash_aligned_key_t macro so that we can make sure they
    are not crossing a cache line boundary.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Florian Westphal <fwestpha@redhat.com>
2022-07-27 00:34:24 +02:00
Petr Oros 21e2fb0e83 net: Don't include filter.h from net/sock.h
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2101792

Conflicts:
drivers/infiniband/core/cache.c
- adjusted context conflict due to missing b74525f21e33ab ("RDMA/core:
  Delete useless module.h include")
drivers/infiniband/hw/mlx5/fs.c
- missing upstream commit ffa501ef196312 ("RDMA/mlx5: Add steering support in
  optional flow counters") adding net/inet_ecn.h. Without inet_ecn.h missing
  declarations for ether_addr_copy() and is_multicast_ether_addr()
  We add net/inet_ecn.h include in this commit.
drivers/net/amt.c
- Unmerged because file missing in RHEL

Upstream commit(s):
commit b6459415b384cb829f0b2a4268f211c789f6cf0b
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Tue Dec 28 16:49:13 2021 -0800

    net: Don't include filter.h from net/sock.h

    sock.h is pretty heavily used (5k objects rebuilt on x86 after
    it's touched). We can drop the include of filter.h from it and
    add a forward declaration of struct sk_filter instead.
    This decreases the number of rebuilt objects when bpf.h
    is touched from ~5k to ~1k.

    There's a lot of missing includes this was masking. Primarily
    in networking tho, this time.

    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
    Acked-by: Florian Fainelli <f.fainelli@gmail.com>
    Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
    Acked-by: Stefano Garzarella <sgarzare@redhat.com>
    Link: https://lore.kernel.org/bpf/20211229004913.513372-1-kuba@kernel.org

Signed-off-by: Petr Oros <poros@redhat.com>
2022-07-13 10:49:16 +02:00
Ivan Vecera eb8b1a7420 net/sched: flower: fix parsing of ethertype following VLAN header
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2090410

commit 2105f700b53c24aa48b65c15652acc386044d26a
Author: Vlad Buslov <vladbu@nvidia.com>
Date:   Wed Apr 6 14:22:41 2022 +0300

    net/sched: flower: fix parsing of ethertype following VLAN header

    A tc flower filter matching TCA_FLOWER_KEY_VLAN_ETH_TYPE is expected to
    match the L2 ethertype following the first VLAN header, as confirmed by
    linked discussion with the maintainer. However, such rule also matches
    packets that have additional second VLAN header, even though filter has
    both eth_type and vlan_ethtype set to "ipv4". Looking at the code this
    seems to be mostly an artifact of the way flower uses flow dissector.
    First, even though looking at the uAPI eth_type and vlan_ethtype appear
    like a distinct fields, in flower they are all mapped to the same
    key->basic.n_proto. Second, flow dissector skips following VLAN header as
    no keys for FLOW_DISSECTOR_KEY_CVLAN are set and eventually assigns the
    value of n_proto to last parsed header. With these, such filters ignore any
    headers present between first VLAN header and first "non magic"
    header (ipv4 in this case) that doesn't result
    FLOW_DISSECT_RET_PROTO_AGAIN.

    Fix the issue by extending flow dissector VLAN key structure with new
    'vlan_eth_type' field that matches first ethertype following previously
    parsed VLAN header. Modify flower classifier to set the new
    flow_dissector_key_vlan->vlan_eth_type with value obtained from
    TCA_FLOWER_KEY_VLAN_ETH_TYPE/TCA_FLOWER_KEY_CVLAN_ETH_TYPE uAPIs.

    Link: https://lore.kernel.org/all/Yjhgi48BpTGh6dig@nanopsycho/
    Fixes: 9399ae9a6c ("net_sched: flower: Add vlan support")
    Fixes: d64efd0926 ("net/sched: flower: Add supprt for matching on QinQ vlan headers")
    Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2022-06-06 16:32:47 +02:00
Ivan Vecera 6572d4f7b7 cls_flower: Fix inability to match GRE/IPIP packets
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2090410

commit 6de6e46d27ef386feecdbea56b3bfd6c3b3bc1f9
Author: Yoshiki Komachi <komachi.yoshiki@gmail.com>
Date:   Fri Oct 29 09:21:41 2021 +0000

    cls_flower: Fix inability to match GRE/IPIP packets

    When a packet of a new flow arrives in openvswitch kernel module, it dissects
    the packet and passes the extracted flow key to ovs-vswtichd daemon. If hw-
    offload configuration is enabled, the daemon creates a new TC flower entry to
    bypass openvswitch kernel module for the flow (TC flower can also offload flows
    to NICs but this time that does not matter).

    In this processing flow, I found the following issue in cases of GRE/IPIP
    packets.

    When ovs_flow_key_extract() in openvswitch module parses a packet of a new
    GRE (or IPIP) flow received on non-tunneling vports, it extracts information
    of the outer IP header for ip_proto/src_ip/dst_ip match keys.

    This means ovs-vswitchd creates a TC flower entry with IP protocol/addresses
    match keys whose values are those of the outer IP header. OTOH, TC flower,
    which uses flow_dissector (different parser from openvswitch module), extracts
    information of the inner IP header.

    The following flow is an example to describe the issue in more detail.

       <----------- Outer IP -----------------> <---------- Inner IP ---------->
      +----------+--------------+--------------+----------+----------+----------+
      | ip_proto | src_ip       | dst_ip       | ip_proto | src_ip   | dst_ip   |
      | 47 (GRE) | 192.168.10.1 | 192.168.10.2 | 6 (TCP)  | 10.0.0.1 | 10.0.0.2 |
      +----------+--------------+--------------+----------+----------+----------+

    In this case, TC flower entry and extracted information are shown as below:

      - ovs-vswitchd creates TC flower entry with:
          - ip_proto: 47
          - src_ip: 192.168.10.1
          - dst_ip: 192.168.10.2

      - TC flower extracts below for IP header matches:
          - ip_proto: 6
          - src_ip: 10.0.0.1
          - dst_ip: 10.0.0.2

    Thus, GRE or IPIP packets never match the TC flower entry, as each
    dissector behaves differently.

    IMHO, the behavior of TC flower (flow dissector) does not look correct,
    as ip_proto/src_ip/dst_ip in TC flower match means the outermost IP
    header information except for GRE/IPIP cases. This patch adds a new
    flow_dissector flag FLOW_DISSECTOR_F_STOP_BEFORE_ENCAP which skips
    dissection of the encapsulated inner GRE/IPIP header in TC flower
    classifier.

    Signed-off-by: Yoshiki Komachi <komachi.yoshiki@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2022-06-06 16:30:55 +02:00
Jiri Benc 4b57058723 flow_dissector: Fix out-of-bounds warnings
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2071618

commit 323e0cb473e2a8706ff162b6b4f4fa16023c9ba7
Author: Gustavo A. R. Silva <gustavoars@kernel.org>
Date:   Mon Jul 26 14:25:11 2021 -0500

    flow_dissector: Fix out-of-bounds warnings

    Fix the following out-of-bounds warnings:

        net/core/flow_dissector.c: In function '__skb_flow_dissect':
    >> net/core/flow_dissector.c:1104:4: warning: 'memcpy' offset [24, 39] from the object at '<unknown>' is out of the bounds of referenced subobject 'saddr' with type 'struct in6_addr' at offset 8 [-Warray-bounds]
         1104 |    memcpy(&key_addrs->v6addrs, &iph->saddr,
              |    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         1105 |           sizeof(key_addrs->v6addrs));
              |           ~~~~~~~~~~~~~~~~~~~~~~~~~~~
        In file included from include/linux/ipv6.h:5,
                         from net/core/flow_dissector.c:6:
        include/uapi/linux/ipv6.h:133:18: note: subobject 'saddr' declared here
          133 |  struct in6_addr saddr;
              |                  ^~~~~
    >> net/core/flow_dissector.c:1059:4: warning: 'memcpy' offset [16, 19] from the object at '<unknown>' is out of the bounds of referenced subobject 'saddr' with type 'unsigned int' at offset 12 [-Warray-bounds]
         1059 |    memcpy(&key_addrs->v4addrs, &iph->saddr,
              |    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         1060 |           sizeof(key_addrs->v4addrs));
              |           ~~~~~~~~~~~~~~~~~~~~~~~~~~~
        In file included from include/linux/ip.h:17,
                         from net/core/flow_dissector.c:5:
        include/uapi/linux/ip.h:103:9: note: subobject 'saddr' declared here
          103 |  __be32 saddr;
              |         ^~~~~

    The problem is that the original code is trying to copy data into a
    couple of struct members adjacent to each other in a single call to
    memcpy().  So, the compiler legitimately complains about it. As these
    are just a couple of members, fix this by copying each one of them in
    separate calls to memcpy().

    This helps with the ongoing efforts to globally enable -Warray-bounds
    and get us closer to being able to tighten the FORTIFY_SOURCE routines
    on memcpy().

    Link: https://github.com/KSPP/linux/issues/109
    Reported-by: kernel test robot <lkp@intel.com>
    Link: https://lore.kernel.org/lkml/d5ae2e65-1f18-2577-246f-bada7eee6ccd@intel.com/
    Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Jiri Benc <jbenc@redhat.com>
2022-05-12 17:29:45 +02:00
Antoine Tenart ba7992f4a1 net/sched: flow_dissector: Fix matching on zone id for invalid conns
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2045048
Upstream Status: linux.git
Tested: Sanity only

commit 3849595866166b23bf6a0cb9ff87e06423167f67
Author: Paul Blakey <paulb@nvidia.com>
Date:   Tue Dec 14 19:24:34 2021 +0200

    net/sched: flow_dissector: Fix matching on zone id for invalid conns

    If ct rejects a flow, it removes the conntrack info from the skb.
    act_ct sets the post_ct variable so the dissector will see this case
    as an +tracked +invalid state, but the zone id is lost with the
    conntrack info.

    To restore the zone id on such cases, set the last executed zone,
    via the tc control block, when passing ct, and read it back in the
    dissector if there is no ct info on the skb (invalid connection).

    Fixes: 7baf2429a1 ("net/sched: cls_flower add CT_FLAGS_INVALID flag support")
    Signed-off-by: Paul Blakey <paulb@nvidia.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2022-01-26 16:54:01 +01:00
zhang kai 1e60cebf82 net: let flow have same hash in two directions
using same source and destination ip/port for flow hash calculation
within the two directions.

Signed-off-by: zhang kai <zhangkaiheb@126.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-28 12:54:06 +01:00
Vladimir Oltean ec13357263 net: flow_dissector: fix RPS on DSA masters
After the blamed patch, __skb_flow_dissect() on the DSA master stopped
adjusting for the length of the DSA headers. This is because it was told
to adjust only if the needed_headroom is zero, aka if there is no DSA
header. Of course, the adjustment should be done only if there _is_ a
DSA header.

Modify the comment too so it is clearer.

Fixes: 4e50025129 ("net: dsa: generalize overhead for taggers that use both headers and trailers")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-14 13:15:22 -07:00
Vladimir Oltean 4e50025129 net: dsa: generalize overhead for taggers that use both headers and trailers
Some really really weird switches just couldn't decide whether to use a
normal or a tail tagger, so they just did both.

This creates problems for DSA, because we only have the concept of an
'overhead' which can be applied to the headroom or to the tailroom of
the skb (like for example during the central TX reallocation procedure),
depending on the value of bool tail_tag, but not to both.

We need to generalize DSA to cater for these odd switches by
transforming the 'overhead / tail_tag' pair into 'needed_headroom /
needed_tailroom'.

The DSA master's MTU is increased to account for both.

The flow dissector code is modified such that it only calls the DSA
adjustment callback if the tagger has a non-zero header length.

Taggers are trivially modified to declare either needed_headroom or
needed_tailroom, based on the tail_tag value that they currently
declare.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11 12:45:38 -07:00
Gustavo A. R. Silva 1e3d976dbb flow_dissector: Fix out-of-bounds warning in __skb_flow_bpf_to_target()
Fix the following out-of-bounds warning:

net/core/flow_dissector.c:835:3: warning: 'memcpy' offset [33, 48] from the object at 'flow_keys' is out of the bounds of referenced subobject 'ipv6_src' with type '__u32[4]' {aka 'unsigned int[4]'} at offset 16 [-Warray-bounds]

The problem is that the original code is trying to copy data into a
couple of struct members adjacent to each other in a single call to
memcpy().  So, the compiler legitimately complains about it. As these
are just a couple of members, fix this by copying each one of them in
separate calls to memcpy().

This helps with the ongoing efforts to globally enable -Warray-bounds
and get us closer to being able to tighten the FORTIFY_SOURCE routines
on memcpy().

Link: https://github.com/KSPP/linux/issues/109
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16 17:02:27 -07:00
David S. Miller efd13b71a3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-25 15:31:22 -07:00
Alexander Lobakin f96533cded flow_dissector: constify raw input data argument
Flow Dissector code never modifies the input buffer, neither skb nor
raw data.
Make 'data' argument const for all of the Flow dissector's functions.

Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-14 14:46:32 -07:00
Alexander Lobakin a25f822285 flow_dissector: fix byteorder of dissected ICMP ID
flow_dissector_key_icmp::id is of type u16 (CPU byteorder),
ICMP header has its ID field in network byteorder obviously.
Sparse says:

net/core/flow_dissector.c:178:43: warning: restricted __be16 degrades to integer

Convert ID value to CPU byteorder when storing it into
flow_dissector_key_icmp.

Fixes: 5dec597e5c ("flow_dissector: extract more ICMP information")
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-14 14:30:20 -07:00
David S. Miller d489ded1a3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-02-16 17:51:13 -08:00
Davide Caratti d212683805 flow_dissector: fix TTL and TOS dissection on IPv4 fragments
the following command:

 # tc filter add dev $h2 ingress protocol ip pref 1 handle 101 flower \
   $tcflags dst_ip 192.0.2.2 ip_ttl 63 action drop

doesn't drop all IPv4 packets that match the configured TTL / destination
address. In particular, if "fragment offset" or "more fragments" have non
zero value in the IPv4 header, setting of FLOW_DISSECTOR_KEY_IP is simply
ignored. Fix this dissecting IPv4 TTL and TOS before fragment info; while
at it, add a selftest for tc flower's match on 'ip_ttl' that verifies the
correct behavior.

Fixes: 518d8a2e9b ("net/flow_dissector: add support for dissection of misc ip header fields")
Reported-by: Shuang Li <shuali@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12 17:03:51 -08:00
wenxu 7baf2429a1 net/sched: cls_flower add CT_FLAGS_INVALID flag support
This patch add the TCA_FLOWER_KEY_CT_FLAGS_INVALID flag to
match the ct_state with invalid for conntrack.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://lore.kernel.org/r/1611045110-682-1-git-send-email-wenxu@ucloud.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-20 21:09:44 -08:00
Eran Ben Elisha 4f1cc51f34 net: flow_dissector: Parse PTP L2 packet header
Add support for parsing PTP L2 packet header. Such packet consists
of an L2 header (with ethertype of ETH_P_1588), PTP header, body
and an optional suffix.

Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-14 18:24:54 -08:00