Commit Graph

277 Commits

Author SHA1 Message Date
Rado Vrbovsky 65ee7b65eb Merge: net: visibility patches for 9.6
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5833

JIRA: https://issues.redhat.com/browse/RHEL-68063

Signed-off-by: Antoine Tenart <atenart@redhat.com>

Approved-by: Guillaume Nault <gnault@redhat.com>
Approved-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2025-01-06 08:26:06 +00:00
Petr Oros ab39cead6a genetlink: remove linux/genetlink.h
JIRA: https://issues.redhat.com/browse/RHEL-57756

Upstream commit(s):
commit cd7209628cdb2a7edd7656c126d2455e7102e949
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Fri Mar 29 10:57:10 2024 -0700

    genetlink: remove linux/genetlink.h

    genetlink.h is a shell of what used to be a combined uAPI
    and kernel header over a decade ago. It has fewer than
    10 lines of code. Merge it into net/genetlink.h.
    In some ways it'd be better to keep the combined header
    under linux/ but it would make looking through git history
    harder.

    Acked-by: Sven Eckelmann <sven@narfation.org>
    Link: https://lore.kernel.org/r/20240329175710.291749-4-kuba@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-12-10 10:37:53 +01:00
Antoine Tenart dca204658f net: ovs: fix ovs_drop_reasons error
JIRA: https://issues.redhat.com/browse/RHEL-68063
Upstream Status: linux.git

commit 57fb67783c4011581882f32e656d738da1f82042
Author: Menglong Dong <menglong8.dong@gmail.com>
Date:   Wed Aug 21 20:32:52 2024 +0800

    net: ovs: fix ovs_drop_reasons error

    There is something wrong with ovs_drop_reasons. ovs_drop_reasons[0] is
    "OVS_DROP_LAST_ACTION", but OVS_DROP_LAST_ACTION == __OVS_DROP_REASON + 1,
    which means that ovs_drop_reasons[1] should be "OVS_DROP_LAST_ACTION".

    And as Adrian tested, without the patch, adding flow to drop packets
    results in:

    drop at: do_execute_actions+0x197/0xb20 [openvsw (0xffffffffc0db6f97)
    origin: software
    input port ifindex: 8
    timestamp: Tue Aug 20 10:19:17 2024 859853461 nsec
    protocol: 0x800
    length: 98
    original length: 98
    drop reason: OVS_DROP_ACTION_ERROR

    With the patch, the same results in:

    drop at: do_execute_actions+0x197/0xb20 [openvsw (0xffffffffc0db6f97)
    origin: software
    input port ifindex: 8
    timestamp: Tue Aug 20 10:16:13 2024 475856608 nsec
    protocol: 0x800
    length: 98
    original length: 98
    drop reason: OVS_DROP_LAST_ACTION

    Fix this by initializing ovs_drop_reasons with index.

    Fixes: 9d802da40b7c ("net: openvswitch: add last-action drop reason")
    Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
    Tested-by: Adrian Moreno <amorenoz@redhat.com>
    Reviewed-by: Adrian Moreno <amorenoz@redhat.com>
    Link: https://patch.msgid.link/20240821123252.186305-1-dongml2@chinatelecom.cn
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2024-11-19 14:30:01 +01:00
Ivan Vecera 25a5e1ea3a genetlink: remove userhdr from struct genl_info
JIRA: https://issues.redhat.com/browse/RHEL-30656

commit bffcc6882a1bb2be8c9420184966f4c2c822078e
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Mon Aug 14 14:47:16 2023 -0700

    genetlink: remove userhdr from struct genl_info

    Only three families use info->userhdr today and going forward
    we discourage using fixed headers in new families.
    So having the pointer to user header in struct genl_info
    is an overkill. Compute the header pointer at runtime.

    Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Aaron Conole <aconole@redhat.com>
    Link: https://lore.kernel.org/r/20230814214723.2924989-4-kuba@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-04-10 09:19:30 +02:00
Jan Stancek f25e7a1141 Merge: ovs: P1 backports from upstream
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3247

JIRA: https://issues.redhat.com/browse/RHEL-14346

Signed-off-by: Antoine Tenart <atenart@redhat.com>

Approved-by: Eelco Chaudron <echaudro@redhat.com>
Approved-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Jan Stancek <jstancek@redhat.com>
2023-11-24 07:31:07 +01:00
Antoine Tenart 6d85d98a29 net: openvswitch: reject negative ifindex
JIRA: https://issues.redhat.com/browse/RHEL-14346
Upstream Status: linux.git

commit a552bfa16bab4ce901ee721346a28c4e483f4066
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Mon Aug 14 13:38:40 2023 -0700

    net: openvswitch: reject negative ifindex

    Recent changes in net-next (commit 759ab1edb56c ("net: store netdevs
    in an xarray")) refactored the handling of pre-assigned ifindexes
    and let syzbot surface a latent problem in ovs. ovs does not validate
    ifindex, making it possible to create netdev ports with negative
    ifindex values. It's easy to repro with YNL:

    $ ./cli.py --spec netlink/specs/ovs_datapath.yaml \
             --do new \
             --json '{"upcall-pid": 1, "name":"my-dp"}'
    $ ./cli.py --spec netlink/specs/ovs_vport.yaml \
             --do new \
             --json '{"upcall-pid": "00000001", "name": "some-port0", "dp-ifindex":3,"ifindex":4294901760,"type":2}'

    $ ip link show
    -65536: some-port0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
        link/ether 7a:48:21:ad:0b:fb brd ff:ff:ff:ff:ff:ff
    ...

    Validate the inputs. Now the second command correctly returns:

    $ ./cli.py --spec netlink/specs/ovs_vport.yaml \
             --do new \
             --json '{"upcall-pid": "00000001", "name": "some-port0", "dp-ifindex":3,"ifindex":4294901760,"type":2}'

    lib.ynl.NlError: Netlink error: Numerical result out of range
    nl_len = 108 (92) nl_flags = 0x300 nl_type = 2
            error: -34      extack: {'msg': 'integer out of range', 'unknown': [[type:4 len:36] b'\x0c\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x03\x00\xff\xff\xff\x7f\x00\x00\x00\x00\x08\x00\x01\x00\x08\x00\x00\x00'], 'bad-attr': '.ifindex'}

    Accept 0 since it used to be silently ignored.

    Fixes: 54c4ef34c4b6 ("openvswitch: allow specifying ifindex of new interfaces")
    Reported-by: syzbot+7456b5dcf65111553320@syzkaller.appspotmail.com
    Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
    Reviewed-by: Aaron Conole <aconole@redhat.com>
    Link: https://lore.kernel.org/r/20230814203840.2908710-1-kuba@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2023-10-20 10:21:36 +02:00
Ivan Vecera 497f645693 net: move gso declarations and functions to their own files
JIRA: https://issues.redhat.com/browse/RHEL-12679

commit d457a0e329b0bfd3a1450e0b1a18cd2b47a25a08
Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Jun 8 19:17:37 2023 +0000

    net: move gso declarations and functions to their own files

    Move declarations into include/net/gso.h and code into net/core/gso.c

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: Stanislav Fomichev <sdf@google.com>
    Reviewed-by: Simon Horman <simon.horman@corigine.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Link: https://lore.kernel.org/r/20230608191738.3947077-1-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2023-10-11 13:35:27 +02:00
Adrian Moreno 10015df94f net: openvswitch: add last-action drop reason
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2232283
Upstream Status: net-next.git

commit 9d802da40b7c820deb9c60fc394457ea565cafc8
Author: Adrian Moreno <amorenoz@redhat.com>
Date:   Fri Aug 11 16:12:48 2023 +0200

    Create a new drop reason subsystem for openvswitch and add the first
    drop reason to represent last-action drops.

    Last-action drops happen when a flow has an empty action list or there
    is no action that consumes the packet (output, userspace, recirc, etc).
    It is the most common way in which OVS drops packets.

    Implementation-wise, most of these skb-consuming actions already call
    "consume_skb" internally and return directly from within the
    do_execute_actions() loop so with minimal changes we can assume that
    any skb that exits the loop normally is a packet drop.

    Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
2023-08-21 08:34:17 +02:00
Jan Stancek c088b1cb17 Merge: net: openvswitch: fix upcall counter access before allocation
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/2662

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2203263
Upstream Status: Backport net-next.git commit de9df6c6b27e
Conflicts: none

Backport of upstream commit:

commit de9df6c6b27e22d7bdd20107947ef3a20e687de5
Author: Eelco Chaudron <echaudro@redhat.com>
Date:   Tue Jun 6 13:56:35 2023 +0200

    net: openvswitch: fix upcall counter access before allocation

    Currently, the per cpu upcall counters are allocated after the vport is
    created and inserted into the system. This could lead to the datapath
    accessing the counters before they are allocated resulting in a kernel
    Oops.

    Here is an example:

      PID: 59693    TASK: ffff0005f4f51500  CPU: 0    COMMAND: "ovs-vswitchd"
       ...

      PID: 58682    TASK: ffff0005b2f0bf00  CPU: 0    COMMAND: "kworker/0:3"

    We moved the per cpu upcall counter allocation to the existing vport
    alloc and free functions to solve this.

    Fixes: 95637d91fefd ("net: openvswitch: release vport resources on failure")
    Fixes: 1933ea365aa7 ("net: openvswitch: Add support to count upcall packets")
    Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
    Reviewed-by: Simon Horman <simon.horman@corigine.com>
    Acked-by: Aaron Conole <aconole@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>

Approved-by: Paolo Abeni <pabeni@redhat.com>
Approved-by: Antoine Tenart <atenart@redhat.com>
Approved-by: Aaron Conole <aconole@redhat.com>

Signed-off-by: Jan Stancek <jstancek@redhat.com>
2023-07-19 08:47:03 +02:00
Eelco Chaudron 6785b5bdee net: openvswitch: fix upcall counter access before allocation
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2203263
Upstream Status: Backport net-next.git commit de9df6c6b27e
Conflicts: none

Backport of upstream commit:

commit de9df6c6b27e22d7bdd20107947ef3a20e687de5
Author: Eelco Chaudron <echaudro@redhat.com>
Date:   Tue Jun 6 13:56:35 2023 +0200

    net: openvswitch: fix upcall counter access before allocation

    Currently, the per cpu upcall counters are allocated after the vport is
    created and inserted into the system. This could lead to the datapath
    accessing the counters before they are allocated resulting in a kernel
    Oops.

    Here is an example:

      PID: 59693    TASK: ffff0005f4f51500  CPU: 0    COMMAND: "ovs-vswitchd"
       #0 [ffff80000a39b5b0] __switch_to at ffffb70f0629f2f4
       #1 [ffff80000a39b5d0] __schedule at ffffb70f0629f5cc
       #2 [ffff80000a39b650] preempt_schedule_common at ffffb70f0629fa60
       #3 [ffff80000a39b670] dynamic_might_resched at ffffb70f0629fb58
       #4 [ffff80000a39b680] mutex_lock_killable at ffffb70f062a1388
       #5 [ffff80000a39b6a0] pcpu_alloc at ffffb70f0594460c
       #6 [ffff80000a39b750] __alloc_percpu_gfp at ffffb70f05944e68
       #7 [ffff80000a39b760] ovs_vport_cmd_new at ffffb70ee6961b90 [openvswitch]
       ...

      PID: 58682    TASK: ffff0005b2f0bf00  CPU: 0    COMMAND: "kworker/0:3"
       #0 [ffff80000a5d2f40] machine_kexec at ffffb70f056a0758
       #1 [ffff80000a5d2f70] __crash_kexec at ffffb70f057e2994
       #2 [ffff80000a5d3100] crash_kexec at ffffb70f057e2ad8
       #3 [ffff80000a5d3120] die at ffffb70f0628234c
       #4 [ffff80000a5d31e0] die_kernel_fault at ffffb70f062828a8
       #5 [ffff80000a5d3210] __do_kernel_fault at ffffb70f056a31f4
       #6 [ffff80000a5d3240] do_bad_area at ffffb70f056a32a4
       #7 [ffff80000a5d3260] do_translation_fault at ffffb70f062a9710
       #8 [ffff80000a5d3270] do_mem_abort at ffffb70f056a2f74
       #9 [ffff80000a5d32a0] el1_abort at ffffb70f06297dac
      #10 [ffff80000a5d32d0] el1h_64_sync_handler at ffffb70f06299b24
      #11 [ffff80000a5d3410] el1h_64_sync at ffffb70f056812dc
      #12 [ffff80000a5d3430] ovs_dp_upcall at ffffb70ee6963c84 [openvswitch]
      #13 [ffff80000a5d3470] ovs_dp_process_packet at ffffb70ee6963fdc [openvswitch]
      #14 [ffff80000a5d34f0] ovs_vport_receive at ffffb70ee6972c78 [openvswitch]
      #15 [ffff80000a5d36f0] netdev_port_receive at ffffb70ee6973948 [openvswitch]
      #16 [ffff80000a5d3720] netdev_frame_hook at ffffb70ee6973a28 [openvswitch]
      #17 [ffff80000a5d3730] __netif_receive_skb_core.constprop.0 at ffffb70f06079f90

    We moved the per cpu upcall counter allocation to the existing vport
    alloc and free functions to solve this.

    Fixes: 95637d91fefd ("net: openvswitch: release vport resources on failure")
    Fixes: 1933ea365aa7 ("net: openvswitch: Add support to count upcall packets")
    Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
    Reviewed-by: Simon Horman <simon.horman@corigine.com>
    Acked-by: Aaron Conole <aconole@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
2023-06-12 14:31:52 +02:00
Ivan Vecera 1cb324e3cc net: Remove the obsolte u64_stats_fetch_*_irq() users (net).
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2193170

Conflicts:
* net/netfilter/ipvs/ip_vs_ctl.c
  - the change was already applied by RHEL commit 914c1e31d9 ("ipvs:
    use u64_stats_t for the per-cpu counters")
* net/core/devlink.c
  - hunk was applied in different file (net/devlink/leftover.c)

commit d120d1a63b2c484d6175873d8ee736a633f74b70
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Wed Oct 26 15:22:15 2022 +0200

    net: Remove the obsolte u64_stats_fetch_*_irq() users (net).

    Now that the 32bit UP oddity is gone and 32bit uses always a sequence
    count, there is no need for the fetch_irq() variants anymore.

    Convert to the regular interface.

    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2023-06-08 13:38:11 +02:00
Antoine Tenart 4117d32fcf net: openvswitch: fix flow memory leak in ovs_flow_cmd_new
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2190207
Upstream Status: linux.git

commit 0c598aed445eb45b0ee7ba405f7ece99ee349c30
Author: Fedor Pchelkin <pchelkin@ispras.ru>
Date:   Thu Feb 2 00:02:18 2023 +0300

    net: openvswitch: fix flow memory leak in ovs_flow_cmd_new

    Syzkaller reports a memory leak of new_flow in ovs_flow_cmd_new() as it is
    not freed when an allocation of a key fails.

    BUG: memory leak
    unreferenced object 0xffff888116668000 (size 632):
      comm "syz-executor231", pid 1090, jiffies 4294844701 (age 18.871s)
      hex dump (first 32 bytes):
        00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      backtrace:
        [<00000000defa3494>] kmem_cache_zalloc include/linux/slab.h:654 [inline]
        [<00000000defa3494>] ovs_flow_alloc+0x19/0x180 net/openvswitch/flow_table.c:77
        [<00000000c67d8873>] ovs_flow_cmd_new+0x1de/0xd40 net/openvswitch/datapath.c:957
        [<0000000010a539a8>] genl_family_rcv_msg_doit+0x22d/0x330 net/netlink/genetlink.c:739
        [<00000000dff3302d>] genl_family_rcv_msg net/netlink/genetlink.c:783 [inline]
        [<00000000dff3302d>] genl_rcv_msg+0x328/0x590 net/netlink/genetlink.c:800
        [<000000000286dd87>] netlink_rcv_skb+0x153/0x430 net/netlink/af_netlink.c:2515
        [<0000000061fed410>] genl_rcv+0x24/0x40 net/netlink/genetlink.c:811
        [<000000009dc0f111>] netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline]
        [<000000009dc0f111>] netlink_unicast+0x545/0x7f0 net/netlink/af_netlink.c:1339
        [<000000004a5ee816>] netlink_sendmsg+0x8e7/0xde0 net/netlink/af_netlink.c:1934
        [<00000000482b476f>] sock_sendmsg_nosec net/socket.c:651 [inline]
        [<00000000482b476f>] sock_sendmsg+0x152/0x190 net/socket.c:671
        [<00000000698574ba>] ____sys_sendmsg+0x70a/0x870 net/socket.c:2356
        [<00000000d28d9e11>] ___sys_sendmsg+0xf3/0x170 net/socket.c:2410
        [<0000000083ba9120>] __sys_sendmsg+0xe5/0x1b0 net/socket.c:2439
        [<00000000c00628f8>] do_syscall_64+0x30/0x40 arch/x86/entry/common.c:46
        [<000000004abfdcf4>] entry_SYSCALL_64_after_hwframe+0x61/0xc6

    To fix this the patch rearranges the goto labels to reflect the order of
    object allocations and adds appropriate goto statements on the error
    paths.

    Found by Linux Verification Center (linuxtesting.org) with Syzkaller.

    Fixes: 68bb10101e6b ("openvswitch: Fix flow lookup to use unmasked key")
    Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
    Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
    Acked-by: Eelco Chaudron <echaudro@redhat.com>
    Reviewed-by: Simon Horman <simon.horman@corigine.com>
    Link: https://lore.kernel.org/r/20230201210218.361970-1-pchelkin@ispras.ru
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2023-04-27 16:16:00 +02:00
Ivan Vecera 3ea8b38ecd net: openvswitch: add missing .resv_start_op
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2175249

commit e4ba4554209f626c52e2e57f26cba49a62663c8b
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Thu Oct 27 20:25:01 2022 -0700

    net: openvswitch: add missing .resv_start_op

    I missed one of the families in OvS when annotating .resv_start_op.
    This triggers the warning added in commit ce48ebdd5651 ("genetlink:
    limit the use of validation workarounds to old ops").

    Reported-by: syzbot+40eb8c0447c0e47a7e9b@syzkaller.appspotmail.com
    Fixes: 9c5d03d36251 ("genetlink: start to validate reserved header bytes")
    Link: https://lore.kernel.org/r/20221028032501.2724270-1-kuba@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2023-03-06 16:17:54 +01:00
Ivan Vecera 6fb59586eb genetlink: start to validate reserved header bytes
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2175249

Conflicts:
* kernel/taskstats.c
  context conflict due to missing edc73c7261ca ("kernel: make taskstats
  available from all net namespaces")
* fs/ksmbd/transport_ipc.c
* net/ipv6/ioam6.c
  hunks skipped as the files are not present in RHEL kernel

commit 9c5d03d362519f36cd551aec596388f895c93d2d
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Wed Aug 24 17:18:30 2022 -0700

    genetlink: start to validate reserved header bytes

    We had historically not checked that genlmsghdr.reserved
    is 0 on input which prevents us from using those precious
    bytes in the future.

    One use case would be to extend the cmd field, which is
    currently just 8 bits wide and 256 is not a lot of commands
    for some core families.

    To make sure that new families do the right thing by default
    put the onus of opting out of validation on existing families.

    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Acked-by: Paul Moore <paul@paul-moore.com> (NetLabel)
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2023-03-06 15:42:45 +01:00
Herton R. Krzesinski 98f0fb10c6 Merge: net: openvswitch: Add support to count upcall packets
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1936

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2163678

Signed-off-by: Antoine Tenart <atenart@redhat.com>

Approved-by: Eelco Chaudron <echaudro@redhat.com>
Approved-by: Aaron Conole <aconole@redhat.com>
Approved-by: Xin Long <lxin@redhat.com>
Approved-by: Andrea Claudi <aclaudi@redhat.com>

Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
2023-02-07 00:09:02 +00:00
Antoine Tenart 988b7c89c2 net: openvswitch: release vport resources on failure
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2163678
Upstream Status: linux.git

commit 95637d91fefdb94d6e7389222ba9ddab0e9f5abe
Author: Aaron Conole <aconole@redhat.com>
Date:   Tue Dec 20 16:27:17 2022 -0500

    net: openvswitch: release vport resources on failure

    A recent commit introducing upcall packet accounting failed to properly
    release the vport object when the per-cpu stats struct couldn't be
    allocated.  This can cause dangling pointers to dp objects long after
    they've been released.

    Cc: wangchuanlei <wangchuanlei@inspur.com>
    Fixes: 1933ea365aa7 ("net: openvswitch: Add support to count upcall packets")
    Reported-by: syzbot+8f4e2dcfcb3209ac35f9@syzkaller.appspotmail.com
    Signed-off-by: Aaron Conole <aconole@redhat.com>
    Acked-by: Eelco Chaudron <echaudro@redhat.com>
    Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
    Link: https://lore.kernel.org/r/20221220212717.526780-1-aconole@redhat.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2023-01-24 09:35:30 +01:00
Antoine Tenart ff50616f2f net: openvswitch: Add support to count upcall packets
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2163678
Upstream Status: linux.git

commit 1933ea365aa7a48ce26bea2ea09c9f7cc48cc668
Author: wangchuanlei <wangchuanlei@inspur.com>
Date:   Tue Dec 6 20:38:57 2022 -0500

    net: openvswitch: Add support to count upcall packets

    Add support to count upall packets, when kmod of openvswitch
    upcall to count the number of packets for upcall succeed and
    failed, which is a better way to see how many packets upcalled
    on every interfaces.

    Signed-off-by: wangchuanlei <wangchuanlei@inspur.com>
    Acked-by: Eelco Chaudron <echaudro@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2023-01-24 09:35:25 +01:00
Antoine Tenart b4ba873b1e openvswitch: Fix flow lookup to use unmasked key
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2163374
Upstream Status: linux.git

commit 68bb10101e6b0a6bb44e9c908ef795fc4af99eae
Author: Eelco Chaudron <echaudro@redhat.com>
Date:   Thu Dec 15 15:46:33 2022 +0100

    openvswitch: Fix flow lookup to use unmasked key

    The commit mentioned below causes the ovs_flow_tbl_lookup() function
    to be called with the masked key. However, it's supposed to be called
    with the unmasked key. This due to the fact that the datapath supports
    installing wider flows, and OVS relies on this behavior. For example
    if ipv4(src=1.1.1.1/192.0.0.0, dst=1.1.1.2/192.0.0.0) exists, a wider
    flow (smaller mask) of ipv4(src=192.1.1.1/128.0.0.0,dst=192.1.1.2/
    128.0.0.0) is allowed to be added.

    However, if we try to add a wildcard rule, the installation fails:

    $ ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \
      ipv4(src=1.1.1.1/192.0.0.0,dst=1.1.1.2/192.0.0.0,frag=no)" 2
    $ ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \
      ipv4(src=192.1.1.1/0.0.0.0,dst=49.1.1.2/0.0.0.0,frag=no)" 2
    ovs-vswitchd: updating flow table (File exists)

    The reason is that the key used to determine if the flow is already
    present in the system uses the original key ANDed with the mask.
    This results in the IP address not being part of the (miniflow) key,
    i.e., being substituted with an all-zero value. When doing the actual
    lookup, this results in the key wrongfully matching the first flow,
    and therefore the flow does not get installed.

    This change reverses the commit below, but rather than having the key
    on the stack, it's allocated.

    Fixes: 190aa3e778 ("openvswitch: Fix Frame-size larger than 1024 bytes warning.")

    Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2023-01-23 14:40:36 +01:00
Antoine Tenart f86b7a4b84 openvswitch: switch from WARN to pr_warn
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2163374
Upstream Status: linux.git

commit fd954cc1919e35cb92f78671cab6e42d661945a3
Author: Aaron Conole <aconole@redhat.com>
Date:   Tue Oct 25 06:50:17 2022 -0400

    openvswitch: switch from WARN to pr_warn

    As noted by Paolo Abeni, pr_warn doesn't generate any splat and can still
    preserve the warning to the user that feature downgrade occurred.  We
    likely cannot introduce other kinds of checks / enforcement here because
    syzbot can generate different genl versions to the datapath.

    Reported-by: syzbot+31cde0bef4bbf8ba2d86@syzkaller.appspotmail.com
    Fixes: 44da5ae5fb ("openvswitch: Drop user features if old user space attempted to create datapath")
    Cc: Thomas Graf <tgraf@suug.ch>
    Signed-off-by: Aaron Conole <aconole@redhat.com>
    Acked-by: Ilya Maximets <i.maximets@ovn.org>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2023-01-23 14:23:42 +01:00
Antoine Tenart 70ea3487bd openvswitch: add OVS_DP_ATTR_PER_CPU_PIDS to get requests
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2134560
Upstream Status: linux.git

commit 347541e299d50c154f69ead0fcac2917a63e4481
Author: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
Date:   Thu Aug 25 05:04:50 2022 +0300

    openvswitch: add OVS_DP_ATTR_PER_CPU_PIDS to get requests

    CRIU needs OVS_DP_ATTR_PER_CPU_PIDS to checkpoint/restore newest
    openvswitch versions.
    Add pids to generic datapath reply. Limit exported pids amount to
    nr_cpu_ids.

    Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
    Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2022-10-17 16:39:23 +02:00
Antoine Tenart c108e071fc openvswitch: allow specifying ifindex of new interfaces
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2134560
Upstream Status: linux.git

commit 54c4ef34c4b6f9720fded620e2893894f9f2c554
Author: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
Date:   Thu Aug 25 05:04:49 2022 +0300

    openvswitch: allow specifying ifindex of new interfaces

    CRIU is preserving ifindexes of net devices after restoration. However,
    current Open vSwitch API does not allow to target ifindex, so we cannot
    correctly restore OVS configuration.

    Add new OVS_DP_ATTR_IFINDEX for OVS_DP_CMD_NEW and use it as desired
    ifindex.
    Use OVS_VPORT_ATTR_IFINDEX during OVS_VPORT_CMD_NEW to specify new netdev
    ifindex.

    Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
    Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2022-10-17 16:39:18 +02:00
Antoine Tenart 4ceb4396cf openvswitch: Fix overreporting of drops in dropwatch
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2134560
Upstream Status: linux.git

commit c21ab2afa2c64896a7f0e3cbc6845ec63dcfad2e
Author: Mike Pattrick <mkp@redhat.com>
Date:   Wed Aug 17 11:06:35 2022 -0400

    openvswitch: Fix overreporting of drops in dropwatch

    Currently queue_userspace_packet will call kfree_skb for all frames,
    whether or not an error occurred. This can result in a single dropped
    frame being reported as multiple drops in dropwatch. This functions
    caller may also call kfree_skb in case of an error. This patch will
    consume the skbs instead and allow caller's to use kfree_skb.

    Signed-off-by: Mike Pattrick <mkp@redhat.com>
    Link: https://bugzilla.redhat.com/show_bug.cgi?id=2109957
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2022-10-17 16:39:09 +02:00
Antoine Tenart 90b26de603 openvswitch: Fix double reporting of drops in dropwatch
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2134560
Upstream Status: linux.git

commit 1100248a5c5ccd57059eb8d02ec077e839a23826
Author: Mike Pattrick <mkp@redhat.com>
Date:   Wed Aug 17 11:06:34 2022 -0400

    openvswitch: Fix double reporting of drops in dropwatch

    Frames sent to userspace can be reported as dropped in
    ovs_dp_process_packet, however, if they are dropped in the netlink code
    then netlink_attachskb will report the same frame as dropped.

    This patch checks for error codes which indicate that the frame has
    already been freed.

    Signed-off-by: Mike Pattrick <mkp@redhat.com>
    Link: https://bugzilla.redhat.com/show_bug.cgi?id=2109946
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2022-10-17 16:39:04 +02:00
Antoine Tenart 8f873cff6f openvswitch: fix memory leak at failed datapath creation
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2134560
Upstream Status: linux.git

commit a87406f4adee9c53b311d8a1ba2849c69e29a6d0
Author: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
Date:   Thu Aug 25 05:03:26 2022 +0300

    openvswitch: fix memory leak at failed datapath creation

    ovs_dp_cmd_new()->ovs_dp_change()->ovs_dp_set_upcall_portids()
    allocates array via kmalloc.
    If for some reason new_vport() fails during ovs_dp_cmd_new()
    dp->upcall_portids must be freed.
    Add missing kfree.

    Kmemleak example:
    unreferenced object 0xffff88800c382500 (size 64):
      comm "dump_state", pid 323, jiffies 4294955418 (age 104.347s)
      hex dump (first 32 bytes):
        5e c2 79 e4 1f 7a 38 c7 09 21 38 0c 80 88 ff ff  ^.y..z8..!8.....
        03 00 00 00 0a 00 00 00 14 00 00 00 28 00 00 00  ............(...
      backtrace:
        [<0000000071bebc9f>] ovs_dp_set_upcall_portids+0x38/0xa0
        [<000000000187d8bd>] ovs_dp_change+0x63/0xe0
        [<000000002397e446>] ovs_dp_cmd_new+0x1f0/0x380
        [<00000000aa06f36e>] genl_family_rcv_msg_doit+0xea/0x150
        [<000000008f583bc4>] genl_rcv_msg+0xdc/0x1e0
        [<00000000fa10e377>] netlink_rcv_skb+0x50/0x100
        [<000000004959cece>] genl_rcv+0x24/0x40
        [<000000004699ac7f>] netlink_unicast+0x23e/0x360
        [<00000000c153573e>] netlink_sendmsg+0x24e/0x4b0
        [<000000006f4aa380>] sock_sendmsg+0x62/0x70
        [<00000000d0068654>] ____sys_sendmsg+0x230/0x270
        [<0000000012dacf7d>] ___sys_sendmsg+0x88/0xd0
        [<0000000011776020>] __sys_sendmsg+0x59/0xa0
        [<000000002e8f2dc1>] do_syscall_64+0x3b/0x90
        [<000000003243e7cb>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

    Fixes: b83d23a2a38b ("openvswitch: Introduce per-cpu upcall dispatch")
    Acked-by: Aaron Conole <aconole@redhat.com>
    Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
    Link: https://lore.kernel.org/r/20220825020326.664073-1-andrey.zhadchenko@virtuozzo.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2022-10-17 16:38:58 +02:00
Antoine Tenart 7d6882629c net/sched: Enable tc skb ext allocation on chain miss only when needed
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2082155
Upstream Status: linux.git
Tested: Sanity only

commit 35d39fecbc242150af5587506e58ec1f8541fb68
Author: Paul Blakey <paulb@nvidia.com>
Date:   Thu Feb 3 10:44:30 2022 +0200

    net/sched: Enable tc skb ext allocation on chain miss only when needed

    Currently tc skb extension is used to send miss info from
    tc to ovs datapath module, and driver to tc. For the tc to ovs
    miss it is currently always allocated even if it will not
    be used by ovs datapath (as it depends on a requested feature).

    Export the static key which is used by openvswitch module to
    guard this code path as well, so it will be skipped if ovs
    datapath doesn't need it. Enable this code path once
    ovs datapath needs it.

    Signed-off-by: Paul Blakey <paulb@nvidia.com>
    Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2022-05-18 09:30:11 +02:00
Antoine Tenart 58691f070f openvswitch: fix sparse warning incorrect type
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2045048
Upstream Status: linux.git
Tested: Sanity only

commit 076999e460279cec45c4653513a4f3121fe236d7
Author: Mark Gray <mark.d.gray@redhat.com>
Date:   Fri Jul 23 10:24:14 2021 -0400

    openvswitch: fix sparse warning incorrect type

    fix incorrect type in argument 1 (different address spaces)

    ../net/openvswitch/datapath.c:169:17: warning: incorrect type in argument 1 (different address spaces)
    ../net/openvswitch/datapath.c:169:17:    expected void const *
    ../net/openvswitch/datapath.c:169:17:    got struct dp_nlsk_pids [noderef] __rcu *upcall_portids

    Found at: https://patchwork.kernel.org/project/netdevbpf/patch/20210630095350.817785-1-mark.d.gray@redhat.com/#24285159

    Signed-off-by: Mark Gray <mark.d.gray@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2022-01-26 16:54:01 +01:00
Antoine Tenart ff54292215 openvswitch: fix alignment issues
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2045048
Upstream Status: linux.git
Tested: Sanity only

commit 784dcfa56e0453bb197601ba0b8196f6f892ebcb
Author: Mark Gray <mark.d.gray@redhat.com>
Date:   Fri Jul 23 10:24:13 2021 -0400

    openvswitch: fix alignment issues

    Signed-off-by: Mark Gray <mark.d.gray@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2022-01-26 16:54:00 +01:00
Antoine Tenart 79b9456df0 openvswitch: Introduce per-cpu upcall dispatch
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2045048
Upstream Status: linux.git
Tested: Sanity only

commit b83d23a2a38b1770da0491257ae81d52307f7816
Author: Mark Gray <mark.d.gray@redhat.com>
Date:   Thu Jul 15 08:27:54 2021 -0400

    openvswitch: Introduce per-cpu upcall dispatch

    The Open vSwitch kernel module uses the upcall mechanism to send
    packets from kernel space to user space when it misses in the kernel
    space flow table. The upcall sends packets via a Netlink socket.
    Currently, a Netlink socket is created for every vport. In this way,
    there is a 1:1 mapping between a vport and a Netlink socket.
    When a packet is received by a vport, if it needs to be sent to
    user space, it is sent via the corresponding Netlink socket.

    This mechanism, with various iterations of the corresponding user
    space code, has seen some limitations and issues:

    * On systems with a large number of vports, there is a correspondingly
    large number of Netlink sockets which can limit scaling.
    (https://bugzilla.redhat.com/show_bug.cgi?id=1526306)
    * Packet reordering on upcalls.
    (https://bugzilla.redhat.com/show_bug.cgi?id=1844576)
    * A thundering herd issue.
    (https://bugzilla.redhat.com/show_bug.cgi?id=1834444)

    This patch introduces an alternative, feature-negotiated, upcall
    mode using a per-cpu dispatch rather than a per-vport dispatch.

    In this mode, the Netlink socket to be used for the upcall is
    selected based on the CPU of the thread that is executing the upcall.
    In this way, it resolves the issues above as:

    a) The number of Netlink sockets scales with the number of CPUs
    rather than the number of vports.
    b) Ordering per-flow is maintained as packets are distributed to
    CPUs based on mechanisms such as RSS and flows are distributed
    to a single user space thread.
    c) Packets from a flow can only wake up one user space thread.

    The corresponding user space code can be found at:
    https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385139.html

    Bugzilla: https://bugzilla.redhat.com/1844576
    Signed-off-by: Mark Gray <mark.d.gray@redhat.com>
    Acked-by: Flavio Leitner <fbl@sysclose.org>
    Acked-by: Pravin B Shelar <pshelar@ovn.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2022-01-26 16:54:00 +01:00
Aaron Conole c4ab7b56be openvswitch: add trace points
This makes openvswitch module use the event tracing framework
to log the upcall interface and action execution pipeline.  When
using openvswitch as the packet forwarding engine, some types of
debugging are made possible simply by using the ovs-vswitchd's
ofproto/trace command.  However, such a command has some
limitations:

  1. When trying to trace packets that go through the CT action,
     the state of the packet can't be determined, and probably
     would be potentially wrong.

  2. Deducing problem packets can sometimes be difficult as well
     even if many of the flows are known

  3. It's possible to use the openvswitch module even without
     the ovs-vswitchd (although, not common use).

Introduce the event tracing points here to make it possible for
working through these problems in kernel space.  The style is
copied from the mac80211 driver-trace / trace code for
consistency - this creates some checkpatch splats, but the
official 'guide' for adding tracepoints, as well as the existing
examples all add the same splats so it seems acceptable.

Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 10:47:32 -07:00
Eelco Chaudron fea07a487c net: openvswitch: silence suspicious RCU usage warning
Silence suspicious RCU usage warning in ovs_flow_tbl_masks_cache_resize()
by replacing rcu_dereference() with rcu_dereference_ovsl().

In addition, when creating a new datapath, make sure it's configured under
the ovs_lock.

Fixes: 9bf24f594c ("net: openvswitch: make masks cache size configurable")
Reported-by: syzbot+9a8f8bfcc56e8578016c@syzkaller.appspotmail.com
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Link: https://lore.kernel.org/r/160439190002.56943.1418882726496275961.stgit@ebuild
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-03 16:57:42 -08:00
Jakub Kicinski 66a9b9287d genetlink: move to smaller ops wherever possible
Bulk of the genetlink users can use smaller ops, move them.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-02 19:11:11 -07:00
Eelco Chaudron e0afe91443 net: openvswitch: fixes crash if nf_conncount_init() fails
If nf_conncount_init fails currently the dispatched work is not canceled,
causing problems when the timer fires. This change fixes this by not
scheduling the work until all initialization is successful.

Fixes: a65878d6f0 ("net: openvswitch: fixes potential deadlock in dp cleanup code")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-01 13:23:23 -07:00
Tonghao Zhang cf3266ad48 net: openvswitch: improve the coding style
Not change the logic, just improve the coding style.

Cc: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-01 11:42:15 -07:00
Tonghao Zhang 1f3a090b90 net: openvswitch: introduce common code for flushing flows
To avoid some issues, for example RCU usage warning and double free,
we should flush the flows under ovs_lock. This patch refactors
table_instance_destroy and introduces table_instance_flow_flush
which can be invoked by __dp_destroy or ovs_flow_tbl_flush.

Fixes: 50b0e61b32 ("net: openvswitch: fix possible memleak on destroy flow-table")
Reported-by: Johan Knöös <jknoos@google.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-August/050489.html
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-13 15:53:30 -07:00
Eelco Chaudron 9bf24f594c net: openvswitch: make masks cache size configurable
This patch makes the masks cache size configurable, or with
a size of 0, disable it.

Reviewed-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-03 15:17:48 -07:00
Eelco Chaudron 9d2f627b7e net: openvswitch: add masks cache hit counter
Add a counter that counts the number of masks cache hits, and
export it through the megaflow netlink statistics.

Reviewed-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-03 15:17:48 -07:00
Eelco Chaudron a65878d6f0 net: openvswitch: fixes potential deadlock in dp cleanup code
The previous patch introduced a deadlock, this patch fixes it by making
sure the work is canceled without holding the global ovs lock. This is
done by moving the reorder processing one layer up to the netns level.

Fixes: eac87c413b ("net: openvswitch: reorder masks array based on usage")
Reported-by: syzbot+2c4ff3614695f75ce26c@syzkaller.appspotmail.com
Reported-by: syzbot+bad6507e5db05017b008@syzkaller.appspotmail.com
Reviewed-by: Paolo <pabeni@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24 16:58:38 -07:00
Eelco Chaudron eac87c413b net: openvswitch: reorder masks array based on usage
This patch reorders the masks array every 4 seconds based on their
usage count. This greatly reduces the masks per packet hit, and
hence the overall performance. Especially in the OVS/OVN case for
OpenShift.

Here are some results from the OVS/OVN OpenShift test, which use
8 pods, each pod having 512 uperf connections, each connection
sends a 64-byte request and gets a 1024-byte response (TCP).
All uperf clients are on 1 worker node while all uperf servers are
on the other worker node.

Kernel without this patch     :  7.71 Gbps
Kernel with this patch applied: 14.52 Gbps

We also run some tests to verify the rebalance activity does not
lower the flow insertion rate, which does not.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Tested-by: Andrew Theurer <atheurer@redhat.com>
Reviewed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-17 10:36:50 -07:00
Tonghao Zhang 27de77cec9 net: openvswitch: ovs_ct_exit to be done under ovs_lock
syzbot wrote:
| =============================
| WARNING: suspicious RCU usage
| 5.7.0-rc1+ #45 Not tainted
| -----------------------------
| net/openvswitch/conntrack.c:1898 RCU-list traversed in non-reader section!!
|
| other info that might help us debug this:
| rcu_scheduler_active = 2, debug_locks = 1
| ...
|
| stack backtrace:
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
| Workqueue: netns cleanup_net
| Call Trace:
| ...
| ovs_ct_exit
| ovs_exit_net
| ops_exit_list.isra.7
| cleanup_net
| process_one_work
| worker_thread

To avoid that warning, invoke the ovs_ct_exit under ovs_lock and add
lockdep_ovsl_is_held as optional lockdep expression.

Link: https://lore.kernel.org/lkml/000000000000e642a905a0cbee6e@google.com
Fixes: 11efd5cb04 ("openvswitch: Support conntrack zone limit")
Cc: Pravin B Shelar <pshelar@ovn.org>
Cc: Yi-Hung Wei <yihung.wei@gmail.com>
Reported-by: syzbot+7ef50afd3a211f879112@syzkaller.appspotmail.com
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-20 10:53:54 -07:00
Cambda Zhu a08e7fd912 net: Fix typo of SKB_SGO_CB_OFFSET
The SKB_SGO_CB_OFFSET should be SKB_GSO_CB_OFFSET which means the
offset of the GSO in skb cb. This patch fixes the typo.

Fixes: 9207f9d45b ("net: preserve IP control block during GSO segmentation")
Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 21:53:18 -07:00
Jakub Kicinski b5ab1f1be6 openvswitch: add missing attribute validation for hash
Add missing attribute validation for OVS_PACKET_ATTR_HASH
to the netlink policy.

Fixes: bd1903b7c4 ("net: openvswitch: add hash info to upcall")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-03 13:28:48 -08:00
Madhuparna Bhowmik 53742e69e8 datapath.c: Use built-in RCU list checking
hlist_for_each_entry_rcu() has built-in RCU and lock checking.

Pass cond argument to list_for_each_entry_rcu() to silence
false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled
by default.

Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-18 12:46:27 -08:00
Jason A. Donenfeld 2cec4448db net: openvswitch: use skb_list_walk_safe helper for gso segments
This is a straight-forward conversion case for the new function, keeping
the flow of the existing code as intact as possible.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14 11:48:41 -08:00
Pankaj Bharadiya c593642c8b treewide: Use sizeof_field() macro
Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except
at places where these are defined. Later patches will remove the unused
definition of FIELD_SIZEOF().

This patch is generated using following script:

EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h"

git grep -l -e "\bFIELD_SIZEOF\b" | while read file;
do

	if [[ "$file" =~ $EXCLUDE_FILES ]]; then
		continue
	fi
	sed -i  -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file;
done

Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com>
Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: David Miller <davem@davemloft.net> # for net
2019-12-09 10:36:44 -08:00
Paolo Abeni 8a574f8665 openvswitch: remove another BUG_ON()
If we can't build the flow del notification, we can simply delete
the flow, no need to crash the kernel. Still keep a WARN_ON to
preserve debuggability.

Note: the BUG_ON() predates the Fixes tag, but this change
can be applied only after the mentioned commit.

v1 -> v2:
 - do not leak an skb on error

Fixes: aed067783e ("openvswitch: Minimize ovs_flow_cmd_del critical section.")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-01 13:21:24 -08:00
Paolo Abeni 8ffeb03fbb openvswitch: drop unneeded BUG_ON() in ovs_flow_cmd_build_info()
All the callers of ovs_flow_cmd_build_info() already deal with
error return code correctly, so we can handle the error condition
in a more gracefull way. Still dump a warning to preserve
debuggability.

v1 -> v2:
 - clarify the commit message
 - clean the skb and report the error (DaveM)

Fixes: ccb1352e76 ("net: Add Open vSwitch kernel components.")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-01 13:21:24 -08:00
Paolo Abeni 4e81c0b3fa openvswitch: fix flow command message size
When user-space sets the OVS_UFID_F_OMIT_* flags, and the relevant
flow has no UFID, we can exceed the computed size, as
ovs_nla_put_identifier() will always dump an OVS_FLOW_ATTR_KEY
attribute.
Take the above in account when computing the flow command message
size.

Fixes: 74ed7ab926 ("openvswitch: Add support for unique flow IDs.")
Reported-by: Qi Jun Ding <qding@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-26 15:13:46 -08:00
Tonghao Zhang 61ca533c0e net: openvswitch: don't call pad_packet if not necessary
The nla_put_u16/nla_put_u32 makes sure that
*attrlen is align. The call tree is that:

nla_put_u16/nla_put_u32
  -> nla_put		attrlen = sizeof(u16) or sizeof(u32)
  -> __nla_put		attrlen
  -> __nla_reserve	attrlen
  -> skb_put(skb, nla_total_size(attrlen))

nla_total_size returns the total length of attribute
including padding.

Cc: Joe Stringer <joe@ovn.org>
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15 12:43:27 -08:00
Tonghao Zhang bd1903b7c4 net: openvswitch: add hash info to upcall
When using the kernel datapath, the upcall don't
include skb hash info relatived. That will introduce
some problem, because the hash of skb is important
in kernel stack. For example, VXLAN module uses
it to select UDP src port. The tx queue selection
may also use the hash in stack.

Hash is computed in different ways. Hash is random
for a TCP socket, and hash may be computed in hardware,
or software stack. Recalculation hash is not easy.

Hash of TCP socket is computed:
tcp_v4_connect
    -> sk_set_txhash (is random)

__tcp_transmit_skb
    -> skb_set_hash_from_sk

There will be one upcall, without information of skb
hash, to ovs-vswitchd, for the first packet of a TCP
session. The rest packets will be processed in Open vSwitch
modules, hash kept. If this tcp session is forward to
VXLAN module, then the UDP src port of first tcp packet
is different from rest packets.

TCP packets may come from the host or dockers, to Open vSwitch.
To fix it, we store the hash info to upcall, and restore hash
when packets sent back.

+---------------+          +-------------------------+
|   Docker/VMs  |          |     ovs-vswitchd        |
+----+----------+          +-+--------------------+--+
     |                       ^                    |
     |                       |                    |
     |                       |  upcall            v restore packet hash (not recalculate)
     |                     +-+--------------------+--+
     |  tap netdev         |                         |   vxlan module
     +--------------->     +-->  Open vSwitch ko     +-->
       or internal type    |                         |
                           +-------------------------+

Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364062.html
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 17:29:46 -08:00
Tonghao Zhang eec62eadd1 net: openvswitch: simplify the ovs_dp_cmd_new
use the specified functions to init resource.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03 17:18:04 -08:00