Commit Graph

479 Commits

Author SHA1 Message Date
CKI Backport Bot 29b595faa5 RDMA/mlx5: Move events notifier registration to be after device registration
JIRA: https://issues.redhat.com/browse/RHEL-72349
CVE: CVE-2024-53224

commit ede132a5cf559f3ab35a4c28bac4f4a6c20334d8
Author: Patrisious Haddad <phaddad@nvidia.com>
Date:   Wed Nov 13 13:23:19 2024 +0200

    RDMA/mlx5: Move events notifier registration to be after device registration

    Move pkey change work initialization and cleanup from device resources
    stage to notifier stage, since this is the stage which handles this work
    events.

    Fix a race between the device deregistration and pkey change work by moving
    MLX5_IB_STAGE_DEVICE_NOTIFIER to be after MLX5_IB_STAGE_IB_REG in order to
    ensure that the notifier is deregistered before the device during cleanup.
    Which ensures there are no works that are being executed after the
    device has already unregistered which can cause the panic below.

    BUG: kernel NULL pointer dereference, address: 0000000000000000
    PGD 0 P4D 0
    Oops: 0000 [#1] PREEMPT SMP PTI
    CPU: 1 PID: 630071 Comm: kworker/1:2 Kdump: loaded Tainted: G W OE --------- --- 5.14.0-162.6.1.el9_1.x86_64 #1
    Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 02/27/2023
    Workqueue: events pkey_change_handler [mlx5_ib]
    RIP: 0010:setup_qp+0x38/0x1f0 [mlx5_ib]
    Code: ee 41 54 45 31 e4 55 89 f5 53 48 89 fb 48 83 ec 20 8b 77 08 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 48 8b 07 48 8d 4c 24 16 <4c> 8b 38 49 8b 87 80 0b 00 00 4c 89 ff 48 8b 80 08 05 00 00 8b 40
    RSP: 0018:ffffbcc54068be20 EFLAGS: 00010282
    RAX: 0000000000000000 RBX: ffff954054494128 RCX: ffffbcc54068be36
    RDX: ffff954004934000 RSI: 0000000000000001 RDI: ffff954054494128
    RBP: 0000000000000023 R08: ffff954001be2c20 R09: 0000000000000001
    R10: ffff954001be2c20 R11: ffff9540260133c0 R12: 0000000000000000
    R13: 0000000000000023 R14: 0000000000000000 R15: ffff9540ffcb0905
    FS: 0000000000000000(0000) GS:ffff9540ffc80000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000000 CR3: 000000010625c001 CR4: 00000000003706e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    mlx5_ib_gsi_pkey_change+0x20/0x40 [mlx5_ib]
    process_one_work+0x1e8/0x3c0
    worker_thread+0x50/0x3b0
    ? rescuer_thread+0x380/0x380
    kthread+0x149/0x170
    ? set_kthread_struct+0x50/0x50
    ret_from_fork+0x22/0x30
    Modules linked in: rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_ib(OE) mlx5_fwctl(OE) fwctl(OE) ib_uverbs(OE) mlx5_core(OE) mlxdevm(OE) ib_core(OE) mlx_compat(OE) psample mlxfw(OE) tls knem(OE) netconsole nfsv3 nfs_acl nfs lockd grace fscache netfs qrtr rfkill sunrpc intel_rapl_msr intel_rapl_common rapl hv_balloon hv_utils i2c_piix4 pcspkr joydev fuse ext4 mbcache jbd2 sr_mod sd_mod cdrom t10_pi sg ata_generic pci_hyperv pci_hyperv_intf hyperv_drm drm_shmem_helper drm_kms_helper hv_storvsc syscopyarea hv_netvsc sysfillrect sysimgblt hid_hyperv fb_sys_fops scsi_transport_fc hyperv_keyboard drm ata_piix crct10dif_pclmul crc32_pclmul crc32c_intel libata ghash_clmulni_intel hv_vmbus serio_raw [last unloaded: ib_core]
    CR2: 0000000000000000
    ---[ end trace f6f8be4eae12f7bc ]---

    Fixes: 7722f47e71 ("IB/mlx5: Create GSI transmission QPs when P_Key table is changed")
    Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
    Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
    Link: https://patch.msgid.link/d271ceeff0c08431b3cbbbb3e2d416f09b6d1621.1731496944.git.leon@kernel.org
    Signed-off-by: Leon Romanovsky <leon@kernel.org>

Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>
2025-01-13 23:46:40 +00:00
Benjamin Poirier 5a175ac333 RDMA/mlx5: Use IB set_netdev and get_netdev functions
JIRA: https://issues.redhat.com/browse/RHEL-52869
Upstream-status: v6.12-rc1

commit 8d159eb2117b2e3697a31785662b653938f007cb
Author: Chiara Meiohas <cmeiohas@nvidia.com>
Date:   Mon Sep 9 20:30:23 2024 +0300

    RDMA/mlx5: Use IB set_netdev and get_netdev functions

    The IB layer provides a common interface to store and get net
    devices associated to an IB device port (ib_device_set_netdev()
    and ib_device_get_netdev()).
    Previously, mlx5_ib stored and managed the associated net devices
    internally.

    Replace internal net device management in mlx5_ib with
    ib_device_set_netdev() when attaching/detaching  a net device and
    ib_device_get_netdev() when retrieving the net device.

    Export ib_device_get_netdev().

    For mlx5 representors/PFs/VFs and lag creation we replace the netdev
    assignments with the IB set/get netdev functions.

    In active-backup mode lag the active slave net device is stored in the
    lag itself. To assure the net device stored in a lag bond IB device is
    the active slave we implement the following:
    - mlx5_core: when modifying the slave of a bond we send the internal driver event
      MLX5_DRIVER_EVENT_ACTIVE_BACKUP_LAG_CHANGE_LOWERSTATE.
    - mlx5_ib: when catching the event call ib_device_set_netdev()

    This patch also ensures the correct IB events are sent in switchdev lag.

    While at it, when in multiport eswitch mode, only a single IB device is
    created for all ports. The said IB device will receive all netdev events
    of its VFs once loaded, thus to avoid overwriting the mapping of PF IB
    device to PF netdev, ignore NETDEV_REGISTER events if the ib device has
    already been mapped to a netdev.

    Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com>
    Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
    Link: https://patch.msgid.link/20240909173025.30422-6-michaelgur@nvidia.com
    Signed-off-by: Leon Romanovsky <leon@kernel.org>

Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>
2024-12-05 10:32:09 -05:00
Benjamin Poirier 892c5f7ca0 RDMA/mlx5: Add implicit MR handling to ODP memory scheme
JIRA: https://issues.redhat.com/browse/RHEL-52869
Upstream-status: v6.12-rc1

commit 6f2487bfafce5e6cd6f89e7238a82012f7b9f5ac
Author: Michael Guralnik <michaelgur@nvidia.com>
Date:   Mon Sep 9 13:05:03 2024 +0300

    RDMA/mlx5: Add implicit MR handling to ODP memory scheme

    Implicit MRs in ODP memory scheme require allocating a private null mkey
    and assigning the mkey and va differently in the KSM mkey.
    The page faults are received on the null mkey so we also add storing the
    null mkey in the odp_mkey xarray.

    Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
    Link: https://patch.msgid.link/20240909100504.29797-8-michaelgur@nvidia.com
    Signed-off-by: Leon Romanovsky <leon@kernel.org>

Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>
2024-12-05 10:32:09 -05:00
Benjamin Poirier c8b0960396 net/mlx5: Expand mkey page size to support 6 bits
JIRA: https://issues.redhat.com/browse/RHEL-52869
Upstream-status: v6.12-rc1

commit cef7dde8836ab09a3bfe96ada4f18ef2496eacc9
Author: Michael Guralnik <michaelgur@nvidia.com>
Date:   Mon Sep 9 13:04:57 2024 +0300

    net/mlx5: Expand mkey page size to support 6 bits

    Protect the usage of the 6th bit with the relevant capability to ensure
    we are using the new page sizes with FW that supports the bit extension.

    Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
    Link: https://patch.msgid.link/20240909100504.29797-2-michaelgur@nvidia.com
    Signed-off-by: Leon Romanovsky <leon@kernel.org>

Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>
2024-12-05 10:32:08 -05:00
Benjamin Poirier 35555eab92 RDMA/mlx5: Fix MR cache temp entries cleanup
JIRA: https://issues.redhat.com/browse/RHEL-52869
Upstream-status: v6.12-rc1

commit 7ebb00cea49db641b458edef0ede389f7004821d
Author: Michael Guralnik <michaelgur@nvidia.com>
Date:   Tue Sep 3 14:24:50 2024 +0300

    RDMA/mlx5: Fix MR cache temp entries cleanup

    Fix the cleanup of the temp cache entries that are dynamically created
    in the MR cache.

    The cleanup of the temp cache entries is currently scheduled only when a
    new entry is created. Since in the cleanup of the entries only the mkeys
    are destroyed and the cache entry stays in the cache, subsequent
    registrations might reuse the entry and it will eventually be filled with
    new mkeys without cleanup ever getting scheduled again.

    On workloads that register and deregister MRs with a wide range of
    properties we see the cache ends up holding many cache entries, each
    holding the max number of mkeys that were ever used through it.

    Additionally, as the cleanup work is scheduled to run over the whole
    cache, any mkey that is returned to the cache after the cleanup was
    scheduled will be held for less than the intended 30 seconds timeout.

    Solve both issues by dropping the existing remove_ent_work and reusing
    the existing per-entry work to also handle the temp entries cleanup.

    Schedule the work to run with a 30 seconds delay every time we push an
    mkey to a clean temp entry.
    This ensures the cleanup runs on each entry only 30 seconds after the
    first mkey was pushed to an empty entry.

    As we have already been distinguishing between persistent and temp entries
    when scheduling the cache_work_func, it is not being scheduled in any
    other flows for the temp entries.

    Another benefit from moving to a per-entry cleanup is we now not
    required to hold the rb_tree mutex, thus enabling other flow to run
    concurrently.

    Fixes: dd1b913fb0d0 ("RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow")
    Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
    Link: https://patch.msgid.link/e4fa4bb03bebf20dceae320f26816cd2dde23a26.1725362530.git.leon@kernel.org
    Signed-off-by: Leon Romanovsky <leon@kernel.org>

Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>
2024-12-05 10:32:08 -05:00
Benjamin Poirier e74ca62c4c RDMA/mlx5: Remove two unused declarations
JIRA: https://issues.redhat.com/browse/RHEL-52869
Upstream-status: v6.12-rc1

commit 53ffc09a3e6d39d7a9b3758be4a8795fb57a7989
Author: Yue Haibing <yuehaibing@huawei.com>
Date:   Fri Aug 16 18:13:58 2024 +0800

    RDMA/mlx5: Remove two unused declarations

    Commit e6fb246cca ("RDMA/mlx5: Consolidate MR destruction to
    mlx5_ib_dereg_mr()") removed mlx5_ib_free_implicit_mr() but left
    the declaration.

    Commit d98995b4bf98 ("net/mlx5: Reimplement write combining test") left
    mlx5_ib_test_wc().

    Remove the unused declarations.

    Link: https://patch.msgid.link/r/20240816101358.881247-1-yuehaibing@huawei.com
    Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
    Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>
2024-12-05 10:32:08 -05:00
Benjamin Poirier 35e261e6d5 RDMA/mlx5: Add support for DMABUF MR registrations with Data-direct
JIRA: https://issues.redhat.com/browse/RHEL-52869
Upstream-status: v6.12-rc1

commit de8f847a5114ff7cfcdfc114af8485c431dec703
Author: Yishai Hadas <yishaih@nvidia.com>
Date:   Thu Aug 1 15:05:16 2024 +0300

    RDMA/mlx5: Add support for DMABUF MR registrations with Data-direct

    Add support for DMABUF MR registrations with Data-direct device.

    Upon userspace calling to register a DMABUF MR with the data direct bit
    set, the below algorithm will be followed.

    1) Obtain a pinned DMABUF umem from the IB core using the user input
    parameters (FD, offset, length) and the DMA PF device.  The DMA PF
    device is needed to allow the IOMMU to enable the DMA PF to access the
    user buffer over PCI.

    2) Create a KSM MKEY by setting its entries according to the user buffer
    VA to IOVA mapping, with the MKEY being the data direct device-crossed
    MKEY. This KSM MKEY is umrable and will be used as part of the MR cache.
    The PD for creating it is the internal device 'data direct' kernel one.

    3) Create a crossing MKEY that points to the KSM MKEY using the crossing
    access mode.

    4) Manage the KSM MKEY by adding it to a list of 'data direct' MKEYs
    managed on the mlx5_ib device.

    5) Return the crossing MKEY to the user, created with its supplied PD.

    Upon DMA PF unbind flow, the driver will revoke the KSM entries.
    The final deregistration will occur under the hood once the application
    deregisters its MKEY.

    Notes:
    - This version supports only the PINNED UMEM mode, so there is no
      dependency on ODP.
    - The IOVA supplied by the application must be system page aligned due to
      HW translations of KSM.
    - The crossing MKEY will not be umrable or part of the MR cache, as we
      cannot change its crossed (i.e. KSM) MKEY over UMR.

    Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
    Link: https://patch.msgid.link/1f99d8020ed540d9702b9e2252a145a439609ba6.1722512548.git.leon@kernel.org
    Signed-off-by: Leon Romanovsky <leon@kernel.org>

Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>
2024-12-05 10:32:08 -05:00
Benjamin Poirier 0726e29ee2 RDMA/mlx5: Add the initialization flow to utilize the 'data direct' device
JIRA: https://issues.redhat.com/browse/RHEL-52869
Upstream-status: v6.12-rc1

commit 2e8e631d7a41e3a4edc94f3c9dd5cb32c2aa539e
Author: Yishai Hadas <yishaih@nvidia.com>
Date:   Thu Aug 1 15:05:12 2024 +0300

    RDMA/mlx5: Add the initialization flow to utilize the 'data direct' device

    Add the NET device initialization flow to utilize the 'data
    direct' device.

    When a NET mlx5_ib device is capable of 'data direct', the following
    sequence of actions will occur:
    - Find its affiliated 'data direct' VUID via a firmware command.
    - Create its own private PD and 'data direct' mkey.
    - Register to be notified when its 'data direct' driver is probed or removed.

    The DMA device of the affiliated 'data direct' device, including the
    private PD and the 'data direct' mkey, will be used later during MR
    registrations that request the data direct functionality.

    Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
    Link: https://patch.msgid.link/b11fa87b2a65bce4db8d40341bb6cee490fa4d06.1722512548.git.leon@kernel.org
    Signed-off-by: Leon Romanovsky <leon@kernel.org>

Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>
2024-12-05 10:32:08 -05:00
Benjamin Poirier d93fd28f29 RDMA/mlx5: Introduce the 'data direct' driver
JIRA: https://issues.redhat.com/browse/RHEL-52869
Upstream-status: v6.12-rc1

commit 6910e3660d86c1a5654f742a40181d2c9154f26f
Author: Yishai Hadas <yishaih@nvidia.com>
Date:   Thu Aug 1 15:05:11 2024 +0300

    RDMA/mlx5: Introduce the 'data direct' driver

    Introduce the 'data direct' driver for a ConnectX-8 Data Direct device.

    The 'data direct' driver functions as the affiliated DMA device for one
    or more capable mlx5_ib devices. This DMA device, as the name suggests,
    is used exclusively for DMA operations. It can be considered a DMA engine
    managed by a PF/VF, lacking network capabilities and having minimal overall
    capabilities.

    Consequently, the DMA NIC PF will not be exposed to or directly used by
    software applications. The driver will not have any direct interface or
    interaction with the firmware (no command interface, no capabilities,
    etc.). It will operate solely over PCI to enable its DMA functionality.

    Registration and un-registration of the driver are handled as part of
    the mlx5_ib initialization and exit processes, as the mlx5_ib devices
    will effectively be its clients.

    The driver will serve as the DMA device for accessing another PCI device
    to achieve optimal performance (both on the same NUMA node, P2P access,
    etc.).

    Upon probing, it will read its VUID over PCI to handle mlx5_ib device
    registrations with the same VUID.

    Upon removal, it will notify its clients to allow them to clean up the
    resources that were mmaped with its DMA device.

    Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
    Link: https://patch.msgid.link/b77edecfd476c3f445da96ab6aef499ae47b2829.1722512548.git.leon@kernel.org
    Signed-off-by: Leon Romanovsky <leon@kernel.org>

Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>
2024-12-05 10:32:08 -05:00
Benjamin Poirier 7ed9c90b8c RDMA/mlx5: Support plane device and driver APIs to add and delete it
JIRA: https://issues.redhat.com/browse/RHEL-52869
JIRA: https://issues.redhat.com/browse/RHEL-52874
Upstream-status: v6.11-rc1

commit 026a425990af6969a15b57d6d7fa0138a7e21958
Author: Mark Zhang <markzhang@nvidia.com>
Date:   Sun Jun 16 19:08:39 2024 +0300

    RDMA/mlx5: Support plane device and driver APIs to add and delete it

    This patch supports driver APIs "add_sub_dev" and "del_sub_dev", to
    add and delete a plane device respectively.
    A mlx5 plane device is a rdma SMI device; It provides the SMI capability
    through user MAD for it's parent, the logical multi-plane aggregated
    device. For a plane port:
    - It supports QP0 only;
    - When adding a plane device, all plane ports are added;
    - For some commands like mad_ifc, both plane_index and native portnum
      is needed;
    - When querying or modifying a plane port context, the native portnum
      must be used, as the query/modify_hca_vport_context command doesn't
      support plane port.

    Signed-off-by: Mark Zhang <markzhang@nvidia.com>
    Link: https://lore.kernel.org/r/e933cd0562aece181f8657af2ca0f5b387d0f14e.1718553901.git.leon@kernel.org
    Signed-off-by: Leon Romanovsky <leonro@nvidia.com>

Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>
2024-12-05 10:32:03 -05:00
Benjamin Poirier 73d6c04f0f RDMA/mlx5: Add support to multi-plane device and port
JIRA: https://issues.redhat.com/browse/RHEL-52869
JIRA: https://issues.redhat.com/browse/RHEL-52874
Upstream-status: v6.11-rc1

commit 2a5db20fa532198639671713c6213f96ff285b85
Author: Mark Zhang <markzhang@nvidia.com>
Date:   Sun Jun 16 19:08:35 2024 +0300

    RDMA/mlx5: Add support to multi-plane device and port

    When multi-plane is supported, a logical port, which is aggregation of
    multiple physical plane ports, is exposed for data transmission.
    Compared with a normal mlx5 IB port, this logical port supports all
    functionalities except Subnet Management.

    Signed-off-by: Mark Zhang <markzhang@nvidia.com>
    Link: https://lore.kernel.org/r/7e37c06c9cb243be9ac79930cd17053903785b95.1718553901.git.leon@kernel.org
    Signed-off-by: Leon Romanovsky <leonro@nvidia.com>

Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>
2024-12-05 10:32:03 -05:00
Benjamin Poirier 5b25a87734 RDMA/mlx5: Send UAR page index as ioctl attribute
JIRA: https://issues.redhat.com/browse/RHEL-52869
Upstream-status: v6.11-rc1

commit 589b844f1bf04850d9fabcaa2e943325dc6768b4
Author: Akiva Goldberger <agoldberger@nvidia.com>
Date:   Thu Jun 27 21:23:50 2024 +0300

    RDMA/mlx5: Send UAR page index as ioctl attribute

    Add UAR page index as a driver ioctl attribute to increase the number of
    supported indices, previously limited to 16 bits by mlx5_ib_create_cq
    struct.

    Link: https://lore.kernel.org/r/0e18b34d7ec3b1ae02d694b0d545aed7413c0ef7.1719512393.git.leon@kernel.org
    Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com>
    Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
    Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>
2024-12-05 10:32:03 -05:00
Benjamin Poirier 04a3bdaf8f RDMA/mlx5: Set mkeys for dmabuf at PAGE_SIZE
JIRA: https://issues.redhat.com/browse/RHEL-52869
Upstream-status: v6.11-rc1

commit a4e540119be565f47c305f295ed43f8e0bc3f5c3
Author: Chiara Meiohas <cmeiohas@nvidia.com>
Date:   Thu Jun 13 21:01:42 2024 +0300

    RDMA/mlx5: Set mkeys for dmabuf at PAGE_SIZE

    Set the mkey for dmabuf at PAGE_SIZE to support any SGL
    after a move operation.

    ib_umem_find_best_pgsz returns 0 on error, so it is
    incorrect to check the returned page_size against PAGE_SIZE

    Fixes: 90da7dc820 ("RDMA/mlx5: Support dma-buf based userspace memory region")
    Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com>
    Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
    Link: https://lore.kernel.org/r/1e2289b9133e89f273a4e68d459057d032cbc2ce.1718301631.git.leon@kernel.org
    Signed-off-by: Leon Romanovsky <leon@kernel.org>

Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>
2024-12-05 10:32:03 -05:00
Benjamin Poirier 4afc011f2b IB/mlx5: Allocate resources just before first QP/SRQ is created
JIRA: https://issues.redhat.com/browse/RHEL-52869
Upstream-status: v6.11-rc1

commit 5895e70f2e6e8dc67b551ca554d6fcde0a7f0467
Author: Jianbo Liu <jianbol@nvidia.com>
Date:   Mon Jun 3 13:26:39 2024 +0300

    IB/mlx5: Allocate resources just before first QP/SRQ is created

    Previously, all IB dev resources are initialized on driver load. As
    they are not always used, move the initialization to the time when
    they are needed.

    To be more specific, move PD (p0) and CQ (c0) initialization to the
    time when the first SRQ is created. and move SRQs(s0 and s1)
    initialization to the time first QP is created. To avoid concurrent
    creations, two new mutexes are also added.

    Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
    Link: https://lore.kernel.org/r/98c3e53a8cc0bdfeb6dec6e5bb8b037d78ab00d8.1717409369.git.leon@kernel.org
    Signed-off-by: Leon Romanovsky <leon@kernel.org>

Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>
2024-12-05 10:32:03 -05:00
Benjamin Poirier 20c8b9560d IB/mlx5: Create UMR QP just before first reg_mr occurs
JIRA: https://issues.redhat.com/browse/RHEL-52869
Upstream-status: v6.11-rc1

commit 638420115cc4ad6c3a2683bf46a052b505abb202
Author: Jianbo Liu <jianbol@nvidia.com>
Date:   Mon Jun 3 13:26:38 2024 +0300

    IB/mlx5: Create UMR QP just before first reg_mr occurs

    UMR QP is not used in some cases, so move QP and its CQ creations from
    driver load flow to the time first reg_mr occurs, that is when MR
    interfaces are first called.

    The initialization of dev->umrc.pd and dev->umrc.lock is still done in
    driver load because pd is needed for mlx5_mkey_cache_init and the lock
    is reused to protect against the concurrent creation.

    When testing 4G bytes memory registration latency with rtool [1] and 8
    threads in parallel, there is minor performance degradation (<5% for
    the max latency) is seen for the first reg_mr with this change.

    Link: https://github.com/paravmellanox/rtool [1]

    Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
    Link: https://lore.kernel.org/r/55d3c4f8a542fd974d8a4c5816eccfb318a59b38.1717409369.git.leon@kernel.org
    Signed-off-by: Leon Romanovsky <leon@kernel.org>

Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>
2024-12-05 10:32:03 -05:00
Benjamin Poirier 0daec69495 net/mlx5: Reimplement write combining test
JIRA: https://issues.redhat.com/browse/RHEL-52869
Upstream-status: v6.11-rc1

commit d98995b4bf981519dde4af0a081c393d62474039
Author: Jianbo Liu <jianbol@nvidia.com>
Date:   Mon Jun 3 13:26:37 2024 +0300

    net/mlx5: Reimplement write combining test

    The test of write combining was added before in mlx5_ib driver. It
    opens UD QP and posts NOP WQEs, and uses BlueFlame doorbell. When
    BlueFlame is used, WQEs get written directly to a PCI BAR of the
    device (in addition to memory) so that the device handles them without
    having to access memory.

    In this test, the WQEs written in memory are different from the ones
    written to the BlueFlame which request CQE update. By checking the
    completion reports posted on CQ, we can know if BlueFlame succeeds or
    not. The write combining must be supported if BlueFlame succeeds as
    its register is written using write combining.

    This patch reimplements the test in the same way, but using a pair of
    SQ and CQ only. It is moved to mlx5_core as a general feature used by
    both mlx5_core and mlx5_ib.

    Besides, save write combine test result of the PCI function, so that
    its thousands of child functions such as SF can query without paying
    the time and resource penalty by itself. The test function is called
    only after failing to get the cached result. With this enhancement,
    all thousands of SFs of the PF attached to same driver no longer need
    to perform WC check explicitly, which is already done in the system.
    This saves several commands per SF, thereby speeds up SF creation and
    also saves completion EQ creation.

    Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
    Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
    Link: https://lore.kernel.org/r/4ff5a8cc4c5b5b0d98397baa45a5019bcdbf096e.1717409369.git.leon@kernel.org
    Signed-off-by: Leon Romanovsky <leon@kernel.org>

Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>
2024-12-05 10:32:03 -05:00
Benjamin Poirier 8b9ed7593d RDMA/mlx5: Delete unused mlx5_ib_copy_pas prototype
JIRA: https://issues.redhat.com/browse/RHEL-52869
Upstream-status: v6.9-rc1

commit a400073ce3dd3dbdf843e6c9c0a0a7f6ca9f05d7
Author: Alexey Dobriyan <adobriyan@gmail.com>
Date:   Tue Jan 23 13:35:38 2024 +0300

    RDMA/mlx5: Delete unused mlx5_ib_copy_pas prototype

    mlx5_ib_copy_pas() doesn't exist anymore.

    Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
    Link: https://lore.kernel.org/r/a2cb861e-d11e-4567-8a73-73763d1dc199@p183
    Signed-off-by: Leon Romanovsky <leon@kernel.org>

Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>
2024-12-05 10:31:58 -05:00
Kamal Heib bd8a5c9fe8 RDMA: Pass uverbs_attr_bundle as part of '.reg_user_mr_dmabuf' API
JIRA: https://issues.redhat.com/browse/RHEL-56245

commit 3aa73c6b795b9aaaf933f3c95495d85fc0de39e3
Author: Yishai Hadas <yishaih@nvidia.com>
Date:   Thu Aug 1 15:05:15 2024 +0300

    RDMA: Pass uverbs_attr_bundle as part of '.reg_user_mr_dmabuf' API

    Pass uverbs_attr_bundle as part of '.reg_user_mr_dmabuf' API instead of
    udata.

    This enables passing some new ioctl attributes to the drivers, as will
    be introduced in the next patches for mlx5 driver.

    Change the involved drivers accordingly.

    Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
    Link: https://patch.msgid.link/9a25b2fc02443f7c36c2d93499ae25252b6afd40.1722512548.git.leon@kernel.org
    Signed-off-by: Leon Romanovsky <leon@kernel.org>

Signed-off-by: Kamal Heib <kheib@redhat.com>
2024-10-27 19:32:22 -04:00
Kamal Heib 60620ca443 RDMA: Pass entire uverbs attr bundle to create cq function
JIRA: https://issues.redhat.com/browse/RHEL-56247
Conflicts:
Drop hunks for none existing drivers.

commit dd6d7f8574d7f8b6a0bf1aeef0b285d2706b8c2a
Author: Akiva Goldberger <agoldberger@nvidia.com>
Date:   Thu Jun 27 21:23:49 2024 +0300

    RDMA: Pass entire uverbs attr bundle to create cq function

    Changes the create_cq verb signature by sending the entire uverbs attr
    bundle as a parameter. This allows drivers to send driver specific attrs
    through ioctl for the create_cq verb and access them in their driver
    specific code.

    Also adds a new enum value for driver specific ioctl attributes for
    methods already supporting UHW.

    Link: https://lore.kernel.org/r/ed147343987c0d43fd391c1b2f85e2f425747387.1719512393.git.leon@kernel.org
    Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com>
    Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
    Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Signed-off-by: Kamal Heib <kheib@redhat.com>
2024-10-07 11:55:54 -04:00
Benjamin Poirier 93e14b10b1 RDMA/mlx5: Change check for cacheable mkeys
JIRA: https://issues.redhat.com/browse/RHEL-45365
Upstream-status: v6.10-rc1

commit 8c1185fef68cc603b954fece2a434c9f851d6a86
Author: Or Har-Toov <ohartoov@nvidia.com>
Date:   Wed Apr 3 13:36:00 2024 +0300

    RDMA/mlx5: Change check for cacheable mkeys

    umem can be NULL for user application mkeys in some cases. Therefore
    umem can't be used for checking if the mkey is cacheable and it is
    changed for checking a flag that indicates it. Also make sure that
    all mkeys which are not returned to the cache will be destroyed.

    Fixes: dd1b913fb0d0 ("RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow")
    Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
    Link: https://lore.kernel.org/r/2690bc5c6896bcb937f89af16a1ff0343a7ab3d0.1712140377.git.leon@kernel.org
    Signed-off-by: Leon Romanovsky <leon@kernel.org>

Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>
2024-08-07 09:20:55 -04:00
Benjamin Poirier ef5b55a3ba RDMA/mlx5: Uncacheable mkey has neither rb_key or cache_ent
JIRA: https://issues.redhat.com/browse/RHEL-45365
Upstream-status: v6.10-rc1

commit 0611a8e8b475fc5230b9a24d29c8397aaab20b63
Author: Or Har-Toov <ohartoov@nvidia.com>
Date:   Wed Apr 3 13:35:59 2024 +0300

    RDMA/mlx5: Uncacheable mkey has neither rb_key or cache_ent

    As some mkeys can't be modified with UMR due to some UMR limitations,
    like the size of translation that can be updated, not all user mkeys can
    be cached.

    Fixes: dd1b913fb0d0 ("RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow")
    Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
    Link: https://lore.kernel.org/r/f2742dd934ed73b2d32c66afb8e91b823063880c.1712140377.git.leon@kernel.org
    Signed-off-by: Leon Romanovsky <leon@kernel.org>

Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>
2024-08-07 09:20:55 -04:00
Benjamin Poirier 9edb9be3ae RDMA/mlx5: Implement mkeys management via LIFO queue
JIRA: https://issues.redhat.com/browse/RHEL-24466
Upstream-status: v6.7-rc1
Conflicts:
- drivers/infiniband/hw/mlx5/mr.c
	Due to commit 78e2a0dd38 (tags/kernel-5.14.0-417.el9) which
	is an OOO backport of
	a53e215f9007 RDMA/mlx5: Fix mkey cache WQ flush (v6.7-rc1)
	-> Adjust context

commit 57e7071683ef6148c9f5ea0ba84598d2ba681375
Author: Shay Drory <shayd@nvidia.com>
Date:   Thu Sep 21 11:07:16 2023 +0300

    RDMA/mlx5: Implement mkeys management via LIFO queue

    Currently, mkeys are managed via xarray. This implementation leads to
    a degradation in cases many MRs are unregistered in parallel, due to xarray
    internal implementation, for example: deregistration 1M MRs via 64 threads
    is taking ~15% more time[1].

    Hence, implement mkeys management via LIFO queue, which solved the
    degradation.

    [1]
    2.8us in kernel v5.19 compare to 3.2us in kernel v6.4

    Signed-off-by: Shay Drory <shayd@nvidia.com>
    Link: https://lore.kernel.org/r/fde3d4cfab0f32f0ccb231cd113298256e1502c5.1695283384.git.leon@kernel.org
    Signed-off-by: Leon Romanovsky <leon@kernel.org>

Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>
2024-07-22 15:33:43 -04:00
Amir Tzin 37765490f3 RDMA/mlx5: Handles RoCE MACsec steering rules addition and deletion
JIRA: https://issues.redhat.com/browse/RHEL-22227
Upstream-status: v6.6-rc1

commit 58dbd6428a6819e55a3c52ec60126b5d00804a38
Author: Patrisious Haddad <phaddad@nvidia.com>
Date:   Thu Apr 13 12:04:59 2023 +0300

    RDMA/mlx5: Handles RoCE MACsec steering rules addition and deletion

    Add RoCE MACsec rules when a gid is added for the MACsec netdevice and
    handle their cleanup when the gid is removed or the MACsec SA is deleted.
    Also support alias IP for the MACsec device, as long as we don't have
    more ips than what the gid table can hold.
    In addition handle the case where a gid is added but there are still no
    SAs added for the MACsec device, so the rules are added later on when
    the SAs are added.

    Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
    Signed-off-by: Leon Romanovsky <leonro@nvidia.com>

Signed-off-by: Amir Tzin <atzin@redhat.com>
2024-04-21 13:52:29 +00:00
Amir Tzin 33bd3be29e RDMA/mlx5: Implement MACsec gid addition and deletion
JIRA: https://issues.redhat.com/browse/RHEL-22227
Upstream-status: v6.6-rc1

commit 758ce14aee825f8f3ca8f76c9991c108094cae8b
Author: Patrisious Haddad <phaddad@nvidia.com>
Date:   Tue May 3 08:37:48 2022 +0300

    RDMA/mlx5: Implement MACsec gid addition and deletion

    Handle MACsec IP ambiguity issue, since mlx5 hw can't support
    programming both the MACsec and the physical gid when they have the same
    IP address, because it wouldn't know to whom to steer the traffic.
    Hence in such case we delete the physical gid from the hw gid table,
    which would then cause all traffic sent over it to fail, and we'll only
    be able to send traffic over the MACsec gid.

    Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
    Reviewed-by: Raed Salem <raeds@nvidia.com>
    Reviewed-by: Mark Zhang <markzhang@nvidia.com>
    Signed-off-by: Leon Romanovsky <leonro@nvidia.com>

Signed-off-by: Amir Tzin <atzin@redhat.com>
2024-04-21 13:52:28 +00:00
Amir Tzin 9fd4a7a96f RDMA/mlx5: Reduce QP table exposure
JIRA: https://issues.redhat.com/browse/RHEL-22227
Upstream-status: v6.5-rc1

commit 2ecfd946169e7f56534db2a5f6935858be3005ba
Author: Leon Romanovsky <leon@kernel.org>
Date:   Mon Jun 5 13:14:05 2023 +0300

    RDMA/mlx5: Reduce QP table exposure

    driver.h is common header to whole mlx5 code base, but struct
    mlx5_qp_table is used in mlx5_ib driver only. So move that struct
    to be under sole responsibility of mlx5_ib.

    Link: https://lore.kernel.org/r/bec0dc1158e795813b135d1143147977f26bf668.1685953497.git.leon@kernel.org
    Signed-off-by: Leon Romanovsky <leonro@nvidia.com>

Signed-off-by: Amir Tzin <atzin@redhat.com>
2024-04-21 13:52:25 +00:00
Mohammad Kabat 60f8094ff7 RDMA/mlx5: Remove not-used cache disable flag
JIRA: https://issues.redhat.com/browse/RHEL-882
Upstream-status: v6.6-rc5

commit c99a7457e5bb873914a74307ba2df85f6799203b
Author: Leon Romanovsky <leon@kernel.org>
Date:   Thu Sep 28 20:20:47 2023 +0300

    RDMA/mlx5: Remove not-used cache disable flag

    During execution of mlx5_mkey_cache_cleanup(), there is a guarantee
    that MR are not registered and/or destroyed. It means that we don't
    need newly introduced cache disable flag.

    Fixes: 374012b00457 ("RDMA/mlx5: Fix mkey cache possible deadlock on cleanup")
    Link: https://lore.kernel.org/r/c7e9c9f98c8ae4a7413d97d9349b29f5b0a23dbe.1695921626.git.leon@kernel.org
    Signed-off-by: Leon Romanovsky <leonro@nvidia.com>

Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
2024-01-16 09:00:54 +00:00
Mohammad Kabat bd35524428 RDMA/mlx5: Fix mkey cache possible deadlock on cleanup
JIRA: https://issues.redhat.com/browse/RHEL-882
Upstream-status: v6.6-rc5

commit 374012b0045780b7ad498be62e85153009bb7fe9
Author: Shay Drory <shayd@nvidia.com>
Date:   Tue Sep 12 13:07:45 2023 +0300

    RDMA/mlx5: Fix mkey cache possible deadlock on cleanup

    Fix the deadlock by refactoring the MR cache cleanup flow to flush the
    workqueue without holding the rb_lock.
    This adds a race between cache cleanup and creation of new entries which
    we solve by denied creation of new entries after cache cleanup started.

    Lockdep:
    WARNING: possible circular locking dependency detected
     [ 2785.326074 ] 6.2.0-rc6_for_upstream_debug_2023_01_31_14_02 #1 Not tainted
     [ 2785.339778 ] ------------------------------------------------------
     [ 2785.340848 ] devlink/53872 is trying to acquire lock:
     [ 2785.341701 ] ffff888124f8c0c8 ((work_completion)(&(&ent->dwork)->work)){+.+.}-{0:0}, at: __flush_work+0xc8/0x900
     [ 2785.343403 ]
     [ 2785.343403 ] but task is already holding lock:
     [ 2785.344464 ] ffff88817e8f1260 (&dev->cache.rb_lock){+.+.}-{3:3}, at: mlx5_mkey_cache_cleanup+0x77/0x250 [mlx5_ib]
     [ 2785.346273 ]
     [ 2785.346273 ] which lock already depends on the new lock.
     [ 2785.346273 ]
     [ 2785.347720 ]
     [ 2785.347720 ] the existing dependency chain (in reverse order) is:
     [ 2785.349003 ]
     [ 2785.349003 ] -> #1 (&dev->cache.rb_lock){+.+.}-{3:3}:
     [ 2785.350160 ]        __mutex_lock+0x14c/0x15c0
     [ 2785.350962 ]        delayed_cache_work_func+0x2d1/0x610 [mlx5_ib]
     [ 2785.352044 ]        process_one_work+0x7c2/0x1310
     [ 2785.352879 ]        worker_thread+0x59d/0xec0
     [ 2785.353636 ]        kthread+0x28f/0x330
     [ 2785.354370 ]        ret_from_fork+0x1f/0x30
     [ 2785.355135 ]
     [ 2785.355135 ] -> #0 ((work_completion)(&(&ent->dwork)->work)){+.+.}-{0:0}:
     [ 2785.356515 ]        __lock_acquire+0x2d8a/0x5fe0
     [ 2785.357349 ]        lock_acquire+0x1c1/0x540
     [ 2785.358121 ]        __flush_work+0xe8/0x900
     [ 2785.358852 ]        __cancel_work_timer+0x2c7/0x3f0
     [ 2785.359711 ]        mlx5_mkey_cache_cleanup+0xfb/0x250 [mlx5_ib]
     [ 2785.360781 ]        mlx5_ib_stage_pre_ib_reg_umr_cleanup+0x16/0x30 [mlx5_ib]
     [ 2785.361969 ]        __mlx5_ib_remove+0x68/0x120 [mlx5_ib]
     [ 2785.362960 ]        mlx5r_remove+0x63/0x80 [mlx5_ib]
     [ 2785.363870 ]        auxiliary_bus_remove+0x52/0x70
     [ 2785.364715 ]        device_release_driver_internal+0x3c1/0x600
     [ 2785.365695 ]        bus_remove_device+0x2a5/0x560
     [ 2785.366525 ]        device_del+0x492/0xb80
     [ 2785.367276 ]        mlx5_detach_device+0x1a9/0x360 [mlx5_core]
     [ 2785.368615 ]        mlx5_unload_one_devl_locked+0x5a/0x110 [mlx5_core]
     [ 2785.369934 ]        mlx5_devlink_reload_down+0x292/0x580 [mlx5_core]
     [ 2785.371292 ]        devlink_reload+0x439/0x590
     [ 2785.372075 ]        devlink_nl_cmd_reload+0xaef/0xff0
     [ 2785.372973 ]        genl_family_rcv_msg_doit.isra.0+0x1bd/0x290
     [ 2785.374011 ]        genl_rcv_msg+0x3ca/0x6c0
     [ 2785.374798 ]        netlink_rcv_skb+0x12c/0x360
     [ 2785.375612 ]        genl_rcv+0x24/0x40
     [ 2785.376295 ]        netlink_unicast+0x438/0x710
     [ 2785.377121 ]        netlink_sendmsg+0x7a1/0xca0
     [ 2785.377926 ]        sock_sendmsg+0xc5/0x190
     [ 2785.378668 ]        __sys_sendto+0x1bc/0x290
     [ 2785.379440 ]        __x64_sys_sendto+0xdc/0x1b0
     [ 2785.380255 ]        do_syscall_64+0x3d/0x90
     [ 2785.381031 ]        entry_SYSCALL_64_after_hwframe+0x46/0xb0
     [ 2785.381967 ]
     [ 2785.381967 ] other info that might help us debug this:
     [ 2785.381967 ]
     [ 2785.383448 ]  Possible unsafe locking scenario:
     [ 2785.383448 ]
     [ 2785.384544 ]        CPU0                    CPU1
     [ 2785.385383 ]        ----                    ----
     [ 2785.386193 ]   lock(&dev->cache.rb_lock);
     [ 2785.386940 ]                                lock((work_completion)(&(&ent->dwork)->work));
     [ 2785.388327 ]                                lock(&dev->cache.rb_lock);
     [ 2785.389425 ]   lock((work_completion)(&(&ent->dwork)->work));
     [ 2785.390414 ]
     [ 2785.390414 ]  *** DEADLOCK ***
     [ 2785.390414 ]
     [ 2785.391579 ] 6 locks held by devlink/53872:
     [ 2785.392341 ]  #0: ffffffff84c17a50 (cb_lock){++++}-{3:3}, at: genl_rcv+0x15/0x40
     [ 2785.393630 ]  #1: ffff888142280218 (&devlink->lock_key){+.+.}-{3:3}, at: devlink_get_from_attrs_lock+0x12d/0x2d0
     [ 2785.395324 ]  #2: ffff8881422d3c38 (&dev->lock_key){+.+.}-{3:3}, at: mlx5_unload_one_devl_locked+0x4a/0x110 [mlx5_core]
     [ 2785.397322 ]  #3: ffffffffa0e59068 (mlx5_intf_mutex){+.+.}-{3:3}, at: mlx5_detach_device+0x60/0x360 [mlx5_core]
     [ 2785.399231 ]  #4: ffff88810e3cb0e8 (&dev->mutex){....}-{3:3}, at: device_release_driver_internal+0x8d/0x600
     [ 2785.400864 ]  #5: ffff88817e8f1260 (&dev->cache.rb_lock){+.+.}-{3:3}, at: mlx5_mkey_cache_cleanup+0x77/0x250 [mlx5_ib]

    Fixes: b95845178328 ("RDMA/mlx5: Change the cache structure to an RB-tree")
    Signed-off-by: Shay Drory <shayd@nvidia.com>
    Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
    Signed-off-by: Leon Romanovsky <leonro@nvidia.com>

Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
2024-01-16 09:00:50 +00:00
Mohammad Kabat 9f9d3cc92b RDMA/mlx5: Fix affinity assignment
JIRA: https://issues.redhat.com/browse/RHEL-882
Upstream-status: v6.4-rc7

commit 617f5db1a626f18d5cbb7c7faf7bf8f9ea12be78
Author: Mark Bloch <mbloch@nvidia.com>
Date:   Mon Jun 5 13:33:26 2023 +0300

    RDMA/mlx5: Fix affinity assignment

    The cited commit aimed to ensure that Virtual Functions (VFs) assign a
    queue affinity to a Queue Pair (QP) to distribute traffic when
    the LAG master creates a hardware LAG. If the affinity was set while
    the hardware was not in LAG, the firmware would ignore the affinity value.

    However, this commit unintentionally assigned an affinity to QPs on the LAG
    master's VPORT even if the RDMA device was not marked as LAG-enabled.
    In most cases, this was not an issue because when the hardware entered
    hardware LAG configuration, the RDMA device of the LAG master would be
    destroyed and a new one would be created, marked as LAG-enabled.

    The problem arises when a user configures Equal-Cost Multipath (ECMP).
    In ECMP mode, traffic can be directed to different physical ports based on
    the queue affinity, which is intended for use by VPORTS other than the
    E-Switch manager. ECMP mode is supported only if both E-Switch managers are
    in switchdev mode and the appropriate route is configured via IP. In this
    configuration, the RDMA device is not destroyed, and we retain the RDMA
    device that is not marked as LAG-enabled.

    To ensure correct behavior, Send Queues (SQs) opened by the E-Switch
    manager through verbs should be assigned strict affinity. This means they
    will only be able to communicate through the native physical port
    associated with the E-Switch manager. This will prevent the firmware from
    assigning affinity and will not allow the SQs to be remapped in case of
    failover.

    Fixes: 802dcc7fc5 ("RDMA/mlx5: Support TX port affinity for VF drivers in LAG mode")
    Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
    Signed-off-by: Mark Bloch <mbloch@nvidia.com>
    Link: https://lore.kernel.org/r/425b05f4da840bc684b0f7e8ebf61aeb5cef09b0.1685960567.git.leon@kernel.org
    Signed-off-by: Leon Romanovsky <leon@kernel.org>

Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
2024-01-16 08:58:21 +00:00
Mohammad Kabat 43193c4444 RDMA/mlx5: Create an indirect flow table for steering anchor
JIRA: https://issues.redhat.com/browse/RHEL-882
Upstream-status: v6.4-rc7

commit e1f4a52ac171dd863fe89055e749ef5e0a0bc5ce
Author: Mark Bloch <mbloch@nvidia.com>
Date:   Mon Jun 5 13:33:18 2023 +0300

    RDMA/mlx5: Create an indirect flow table for steering anchor

    A misbehaved user can create a steering anchor that points to a kernel
    flow table and then destroy the anchor without freeing the associated
    STC. This creates a problem as the kernel can't destroy the flow
    table since there is still a reference to it. As a result, this can
    exhaust all available flow table resources, preventing other users from
    using the RDMA device.

    To prevent this problem, a solution is implemented where a special flow
    table with two steering rules is created when a user creates a steering
    anchor for the first time. The rules include one that drops all traffic
    and another that points to the kernel flow table. If the steering anchor
    is destroyed, only the rule pointing to the kernel's flow table is removed.
    Any traffic reaching the special flow table after that is dropped.

    Since the special flow table is not destroyed when the steering anchor is
    destroyed, any issues are prevented from occurring. The remaining resources
    are only destroyed when the RDMA device is destroyed, which happens after
    all DEVX objects are freed, including the STCs, thus mitigating the issue.

    Fixes: 0c6ab0ca9a66 ("RDMA/mlx5: Expose steering anchor to userspace")
    Signed-off-by: Mark Bloch <mbloch@nvidia.com>
    Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
    Link: https://lore.kernel.org/r/b4a88a871d651fa4e8f98d552553c1cfe9ba2cd6.1685960567.git.leon@kernel.org
    Signed-off-by: Leon Romanovsky <leon@kernel.org>

Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
2024-01-16 08:58:21 +00:00
Mohammad Kabat ba789aaf26 IB/mlx5: Extend debug control for CC parameters
Bugzilla: https://bugzilla.redhat.com/2165364
Upstream-status: v6.3-rc1

commit 66fb1d5df6ace316a4a6e2c31e13fc123ea2b644
Author: Edward Srouji <edwards@nvidia.com>
Date:   Thu Feb 16 11:13:45 2023 +0200

    IB/mlx5: Extend debug control for CC parameters

    This patch adds rtt_resp_dscp to the current debug controllability of
    congestion control (CC) parameters.
    rtt_resp_dscp can be read or written through debugfs.
    If set, its value overwrites the DSCP of the generated RTT response.

    Signed-off-by: Edward Srouji <edwards@nvidia.com>
    Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
    Link: https://lore.kernel.org/r/1dcc3440ee53c688f19f579a051ded81a2aaa70a.1676538714.git.leon@kernel.org
    Signed-off-by: Leon Romanovsky <leon@kernel.org>

Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
2023-07-25 07:41:08 +00:00
Mohammad Kabat bd6f48e0a0 RDMA/mlx5: Use query_special_contexts for mkeys
Bugzilla: https://bugzilla.redhat.com/2165364
Upstream-status: v6.3-rc1

commit 594cac11ab6a1be8022a3c96d181dde7cfb0b8cf
Author: Or Har-Toov <ohartoov@nvidia.com>
Date:   Tue Jan 17 15:14:52 2023 +0200

    RDMA/mlx5: Use query_special_contexts for mkeys

    Use query_sepcial_contexts to get the correct value of mkeys such as
    null_mkey, terminate_scatter_list_mkey and dump_fill_mkey, as FW will
    change them in certain configurations.

    Link: https://lore.kernel.org/r/000236f0a9487d48809f87bcc3620a3964b2d3d3.1673960981.git.leon@kernel.org
    Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
    Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
    Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
    Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
2023-07-25 07:41:07 +00:00
Mohammad Kabat dc3bc8db65 RDMA/mlx5: Track netdev to avoid deadlock during netdev notifier unregister
Bugzilla: https://bugzilla.redhat.com/2165364
Upstream-status: v6.3-rc1

commit dca55da0a15717dde509d17163946e951bad56c4
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Tue Nov 1 15:36:01 2022 +0100

    RDMA/mlx5: Track netdev to avoid deadlock during netdev notifier unregister

    When removing a network namespace with mlx5 devlink instance being in
    it, following callchain is performed:

    cleanup_net (takes down_read(&pernet_ops_rwsem)
    devlink_pernet_pre_exit()
    devlink_reload()
    mlx5_devlink_reload_down()
    mlx5_unload_one_devl_locked()
    mlx5_detach_device()
    del_adev()
    mlx5r_remove()
    __mlx5_ib_remove()
    mlx5_ib_roce_cleanup()
    mlx5_remove_netdev_notifier()
    unregister_netdevice_notifier (takes down_write(&pernet_ops_rwsem)

    This deadlocks.

    Resolve this by converting to register_netdevice_notifier_dev_net()
    which does not take pernet_ops_rwsem and moves the notifier block around
    according to netdev it takes as arg.

    Use previously introduced netdev added/removed events to track uplink
    netdev to be used for register_netdevice_notifier_dev_net() purposes.

    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
    Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
2023-07-25 07:41:04 +00:00
Mohammad Kabat f578ac6b23 RDMA/mlx5: Remove impossible check of mkey cache cleanup failure
Bugzilla: https://bugzilla.redhat.com/2165364
Upstream-status: v6.3-rc1

commit 85f9e38a5ac7d397f9bb5e57901b2d6af4dcc3b9
Author: Leon Romanovsky <leon@kernel.org>
Date:   Thu Feb 2 11:03:07 2023 +0200

    RDMA/mlx5: Remove impossible check of mkey cache cleanup failure

    mlx5_mkey_cache_cleanup() can't fail and can be changed to be void.

    Link: https://lore.kernel.org/r/1acd9528995d083114e7dec2a2afc59436406583.1675328463.git.leon@kernel.org
    Signed-off-by: Leon Romanovsky <leonro@nvidia.com>

Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
2023-07-25 07:41:01 +00:00
Mohammad Kabat 465d1920e3 RDMA/mlx5: Add work to remove temporary entries from the cache
Bugzilla: https://bugzilla.redhat.com/2165364
Upstream-status: v6.3-rc1

commit 627122280c878cf5d3cda2d2c5a0a8f6a7e35cb7
Author: Michael Guralnik <michaelgur@nvidia.com>
Date:   Thu Jan 26 00:28:07 2023 +0200

    RDMA/mlx5: Add work to remove temporary entries from the cache

    The non-cache mkeys are stored in the cache only to shorten restarting
    application time. Don't store them longer than needed.

    Configure cache entries that store non-cache MRs as temporary entries.  If
    30 seconds have passed and no user reclaimed the temporarily cached mkeys,
    an asynchronous work will destroy the mkeys entries.

    Link: https://lore.kernel.org/r/20230125222807.6921-7-michaelgur@nvidia.com
    Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
    Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
2023-07-25 07:40:58 +00:00
Mohammad Kabat 31ecbaf6d5 RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow
Bugzilla: https://bugzilla.redhat.com/2165364
Upstream-status: v6.3-rc1

commit dd1b913fb0d0e3e6d55e92d2319d954474dd66ac
Author: Michael Guralnik <michaelgur@nvidia.com>
Date:   Thu Jan 26 00:28:06 2023 +0200

    RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow

    Currently, when dereging an MR, if the mkey doesn't belong to a cache
    entry, it will be destroyed.  As a result, the restart of applications
    with many non-cached mkeys is not efficient since all the mkeys are
    destroyed and then recreated.  This process takes a long time (for 100,000
    MRs, it is ~20 seconds for dereg and ~28 seconds for re-reg).

    To shorten the restart runtime, insert all cacheable mkeys to the cache.
    If there is no fitting entry to the mkey properties, create a temporary
    entry that fits it.

    After a predetermined timeout, the cache entries will shrink to the
    initial high limit.

    The mkeys will still be in the cache when consuming them again after an
    application restart. Therefore, the registration will be much faster
    (for 100,000 MRs, it is ~4 seconds for dereg and ~5 seconds for re-reg).

    The temporary cache entries created to store the non-cache mkeys are not
    exposed through sysfs like the default cache entries.

    Link: https://lore.kernel.org/r/20230125222807.6921-6-michaelgur@nvidia.com
    Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
    Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
2023-07-25 07:40:58 +00:00
Mohammad Kabat 4b769bdd9c RDMA/mlx5: Introduce mlx5r_cache_rb_key
Bugzilla: https://bugzilla.redhat.com/2165364
Upstream-status: v6.3-rc1

commit 73d09b2fe8336f5f37935e46418666ddbcd3c343
Author: Michael Guralnik <michaelgur@nvidia.com>
Date:   Thu Jan 26 00:28:05 2023 +0200

    RDMA/mlx5: Introduce mlx5r_cache_rb_key

    Switch from using the mkey order to using the new struct as the key to the
    RB tree of cache entries.

    The key is all the mkey properties that UMR operations can't modify.
    Using this key to define the cache entries and to search and create cache
    mkeys.

    Link: https://lore.kernel.org/r/20230125222807.6921-5-michaelgur@nvidia.com
    Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
    Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
2023-07-25 07:40:58 +00:00
Mohammad Kabat 5ded6a3406 RDMA/mlx5: Change the cache structure to an RB-tree
Bugzilla: https://bugzilla.redhat.com/2165364
Upstream-status: v6.3-rc1

commit b9584517832858a0f78d6851d09b697a829514cd
Author: Michael Guralnik <michaelgur@nvidia.com>
Date:   Thu Jan 26 00:28:04 2023 +0200

    RDMA/mlx5: Change the cache structure to an RB-tree

    Currently, the cache structure is a static linear array. Therefore, his
    size is limited to the number of entries in it and is not expandable.  The
    entries are dedicated to mkeys of size 2^x and no access_flags. Mkeys with
    different properties are not cacheable.

    In this patch, we change the cache structure to an RB-tree.  This will
    allow to extend the cache to support more entries with different mkey
    properties.

    Link: https://lore.kernel.org/r/20230125222807.6921-4-michaelgur@nvidia.com
    Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
    Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
2023-07-25 07:40:58 +00:00
Mohammad Kabat a16ac19808 RDMA/mlx5: Don't keep umrable 'page_shift' in cache entries
Bugzilla: https://bugzilla.redhat.com/2165364
Upstream-status: v6.3-rc1

commit a2a88b8e22d1b202225d0e40b02ad068afab2ccb
Author: Aharon Landau <aharonl@nvidia.com>
Date:   Thu Jan 26 00:28:02 2023 +0200

    RDMA/mlx5: Don't keep umrable 'page_shift' in cache entries

    mkc.log_page_size can be changed using UMR. Therefore, don't treat it as a
    cache entry property.

    Removing it from struct mlx5_cache_ent.

    All cache mkeys will be created with default PAGE_SHIFT, and updated with
    the needed page_shift using UMR when passing them to a user.

    Link: https://lore.kernel.org/r/20230125222807.6921-2-michaelgur@nvidia.com
    Signed-off-by: Aharon Landau <aharonl@nvidia.com>
    Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
2023-07-25 07:40:58 +00:00
Mohammad Kabat 8255cc648b RDMA/mlx5: Don't set tx affinity when lag is in hash mode
Bugzilla: https://bugzilla.redhat.com/2165355
Upstream-status: v6.1-rc1

commit a83bb5df2ac604ab418fbe0a8720f55de46652eb
Author: Liu, Changcheng <jerrliu@nvidia.com>
Date:   Wed Sep 7 16:36:26 2022 -0700

    RDMA/mlx5: Don't set tx affinity when lag is in hash mode

    In hash mode, without setting tx affinity explicitly, the port select
    flow table decides which port is used for the traffic.
    If port_select_flow_table_bypass capability is supported and tx affinity
    is set explicitly for QP/TIS, they will be added into the explicit affinity
    table in FW to check which port is used for the traffic.
    1. The overloaded explicit affinity table may affect performance.
       To avoid this, do not set tx affinity explicitly by default.
    2. The packets of the same flow need to be transmitted on the same port.
       Because the packets of the same flow use different QPs in slow & fast
       path, it shouldn't set tx affinity explicitly for these QPs.

    Signed-off-by: Liu, Changcheng <jerrliu@nvidia.com>
    Reviewed-by: Mark Bloch <mbloch@nvidia.com>
    Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
    Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
2023-06-29 09:21:37 +00:00
Herton R. Krzesinski 9de1dafa38 Merge: RDMA: Add support of RDMA dmabuf for mlx5 driver
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1940

Upstream status: v6.1.
Bugzilla: https://bugzilla.redhat.com/2123401
Tested: cuda pyverbs tests passed.

Add support for DMABUF FD's when creating a devx umem in the RDMA mlx5 driver.
This allows applications to create work queues directly on GPU memory where
the GPU fully controls the data flow out of the RDMA NIC.

Signed-off-by: Kamal Heib <kheib@redhat.com>

Approved-by: Íñigo Huguet <ihuguet@redhat.com>
Approved-by: José Ignacio Tornos Martínez <jtornosm@redhat.com>
Approved-by: Jonathan Toppins <jtoppins@redhat.com>

Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
2023-02-08 01:35:25 +00:00
Kamal Heib ee8da65a04 RDMA/mlx5: Enable ATS support for MRs and umems
Bugzilla: https://bugzilla.redhat.com/2123401

commit 72b2f7608a59727e7c2e5b11cff2749c2c080fac
Author: Jason Gunthorpe <jgg@ziepe.ca>
Date:   Thu Sep 1 11:20:56 2022 -0300

    RDMA/mlx5: Enable ATS support for MRs and umems

    For mlx5 if ATS is enabled in the PCI config then the device will use ATS
    requests for only certain DMA operations. This has to be opted in by the
    SW side based on the mkey or umem settings.

    ATS slows down the PCI performance, so it should only be set in cases when
    it is needed. All of these cases revolve around optimizing PCI P2P
    transfers and avoiding bad cases where the bus just doesn't work.

    Link: https://lore.kernel.org/r/4-v1-bd147097458e+ede-umem_dmabuf_jgg@nvidia.com
    Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Signed-off-by: Kamal Heib <kheib@redhat.com>
2023-01-24 10:44:39 -05:00
Mohammad Kabat 470a289875 RDMA/mlx5: Fix UMR cleanup on error flow of driver init
Bugzilla: https://bugzilla.redhat.com/2112947
Upstream-status: v6.0-rc5

commit 9b7d4be967f16f79a2283b2338709fcc750313ee
Author: Maor Gottlieb <maorg@nvidia.com>
Date:   Mon Aug 29 12:02:29 2022 +0300

    RDMA/mlx5: Fix UMR cleanup on error flow of driver init

    The cited commit removed from the cleanup flow of umr the checks
    if the resources were created. This could lead to null-ptr-deref
    in case that we had failure in mlx5_ib_stage_ib_reg_init stage.

    Fix it by adding new state to the umr that can say if the resources
    were created or not and check it in the umr cleanup flow before
    destroying the resources.

    Fixes: 04876c12c19e ("RDMA/mlx5: Move init and cleanup of UMR to umr.c")
    Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
    Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
    Link: https://lore.kernel.org/r/4cfa61386cf202e9ce330e8d228ce3b25a36326e.1661763459.git.leonro@nvidia.com
    Signed-off-by: Leon Romanovsky <leon@kernel.org>

Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
2023-01-19 10:21:42 +00:00
Mohammad Kabat 3b72ca1f74 RDMA/mlx5: Rename the mkey cache variables and functions
Bugzilla: https://bugzilla.redhat.com/2112947
Upstream-status: v6.0-rc1

commit 0113780870b1597ae49f30abfa4957c239f913d3
Author: Aharon Landau <aharonl@nvidia.com>
Date:   Tue Jul 26 10:19:11 2022 +0300

    RDMA/mlx5: Rename the mkey cache variables and functions

    After replacing the MR cache with an Mkey cache, rename the variables and
    functions to fit the new meaning.

    Link: https://lore.kernel.org/r/20220726071911.122765-6-michaelgur@nvidia.com
    Signed-off-by: Aharon Landau <aharonl@nvidia.com>
    Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
    Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
2023-01-19 10:21:38 +00:00
Mohammad Kabat 1f1e9871f0 RDMA/mlx5: Store in the cache mkeys instead of mrs
Bugzilla: https://bugzilla.redhat.com/2112947
Upstream-status: v6.0-rc1

commit 6b7533869523ae58e2b914551305b0e47cbeb247
Author: Aharon Landau <aharonl@nvidia.com>
Date:   Tue Jul 26 10:19:10 2022 +0300

    RDMA/mlx5: Store in the cache mkeys instead of mrs

    Currently, the driver stores mlx5_ib_mr struct in the cache entries,
    although the only use of the cached MR is the mkey. Store only the mkey in
    the cache.

    Link: https://lore.kernel.org/r/20220726071911.122765-5-michaelgur@nvidia.com
    Signed-off-by: Aharon Landau <aharonl@nvidia.com>
    Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
    Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
2023-01-19 10:21:38 +00:00
Mohammad Kabat 47033b9181 RDMA/mlx5: Store the number of in_use cache mkeys instead of total_mrs
Bugzilla: https://bugzilla.redhat.com/2112947
Upstream-status: v6.0-rc1

commit 19591f134c59703dfc272356808e6fe2037d0d40
Author: Aharon Landau <aharonl@nvidia.com>
Date:   Tue Jul 26 10:19:09 2022 +0300

    RDMA/mlx5: Store the number of in_use cache mkeys instead of total_mrs

    total_mrs is used only to calculate the number of mkeys currently in
    use. To simplify things, replace it with a new member called "in_use" and
    directly store the number of mkeys currently in use.

    Link: https://lore.kernel.org/r/20220726071911.122765-4-michaelgur@nvidia.com
    Signed-off-by: Aharon Landau <aharonl@nvidia.com>
    Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
    Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
2023-01-19 10:21:38 +00:00
Mohammad Kabat 0fad9155ff RDMA/mlx5: Replace cache list with Xarray
Bugzilla: https://bugzilla.redhat.com/2112947
Upstream-status: v6.0-rc1

commit 86457a92df1bebdcd8e20afa286427e4b525aa08
Author: Aharon Landau <aharonl@nvidia.com>
Date:   Tue Jul 26 10:19:08 2022 +0300

    RDMA/mlx5: Replace cache list with Xarray

    The Xarray allows us to store the cached mkeys in memory efficient way.

    Entries are reserved in the Xarray using xa_cmpxchg before calling to the
    upcoming callbacks to avoid allocations in interrupt context.  The
    xa_cmpxchg can sleep when using GFP_KERNEL, so we call it in a loop to
    ensure one reserved entry for each process trying to reserve.

    Link: https://lore.kernel.org/r/20220726071911.122765-3-michaelgur@nvidia.com
    Signed-off-by: Aharon Landau <aharonl@nvidia.com>
    Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
    Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
2023-01-19 10:21:37 +00:00
Mohammad Kabat 52ce7c8519 RDMA/mlx5: Replace ent->lock with xa_lock
Bugzilla: https://bugzilla.redhat.com/2112947
Upstream-status: v6.0-rc1

commit 17ae355926ed1832449d52748334b8fa799301f1
Author: Aharon Landau <aharonl@nvidia.com>
Date:   Tue Jul 26 10:19:07 2022 +0300

    RDMA/mlx5: Replace ent->lock with xa_lock

    In the next patch, ent->list will be replaced with an xarray. The xarray
    uses an internal lock to protect the indexes. Use it to protect all the
    entry fields, and get rid of ent->lock.

    Link: https://lore.kernel.org/r/20220726071911.122765-2-michaelgur@nvidia.com
    Signed-off-by: Aharon Landau <aharonl@nvidia.com>
    Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
    Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
2023-01-19 10:21:37 +00:00
Mohammad Kabat 3a331a3b0f RDMA/mlx5: Expose steering anchor to userspace
Bugzilla: https://bugzilla.redhat.com/2112947
Upstream-status: v6.0-rc1

commit 0c6ab0ca9a662d4ca9742d97156bac0d3067d72d
Author: Mark Bloch <mbloch@nvidia.com>
Date:   Sun Jul 3 13:54:07 2022 -0700

    RDMA/mlx5: Expose steering anchor to userspace

    Expose a steering anchor per priority to allow users to re-inject
    packets back into default NIC pipeline for additional processing.

    MLX5_IB_METHOD_STEERING_ANCHOR_CREATE returns a flow table ID which
    a user can use to re-inject packets at a specific priority.

    A FTE (flow table entry) can be created and the flow table ID
    used as a destination.

    When a packet is taken into a RDMA-controlled steering domain (like
    software steering) there may be a need to insert the packet back into
    the default NIC pipeline. This exposes a flow table ID to the user that can
    be used as a destination in a flow table entry.

    With this new method priorities that are exposed to users via
    MLX5_IB_METHOD_FLOW_MATCHER_CREATE can be reached from a non-zero UID.

    As user-created flow tables (via RDMA DEVX) are created with a non-zero UID
    thus it's impossible to point to a NIC core flow table (core driver flow tables
    are created with UID value of zero) from userspace.
    Create flow tables that are exposed to users with the shared UID, this
    allows users to point to default NIC flow tables.

    Steering loops are prevented at FW level as FW enforces that no flow
    table at level X can point to a table at level lower than X.

    Link: https://lore.kernel.org/all/20220703205407.110890-6-saeed@kernel.org/
    Signed-off-by: Mark Bloch <mbloch@nvidia.com>
    Reviewed-by: Yishai Hadas <yishaih@nvidia.com>
    Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
    Signed-off-by: Leon Romanovsky <leon@kernel.org>

Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
2023-01-19 10:21:36 +00:00
Mohammad Kabat e01810e974 RDMA/mlx5: Add a umr recovery flow
Bugzilla: https://bugzilla.redhat.com/2112947
Upstream-status: v6.0-rc1

commit 158e71bb69e368b8b33e8b7c4ac8c111da0c1ae2
Author: Aharon Landau <aharonl@nvidia.com>
Date:   Sun May 15 07:19:53 2022 +0300

    RDMA/mlx5: Add a umr recovery flow

    When a UMR fails, the UMR QP state changes to an error state. Therefore,
    all the further UMR operations will fail too.

    Add a recovery flow to the UMR QP, and repost the flushed WQEs.

    Link: https://lore.kernel.org/r/6cc24816cca049bd8541317f5e41d3ac659445d3.1652588303.git.leonro@nvidia.com
    Signed-off-by: Aharon Landau <aharonl@nvidia.com>
    Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
    Signed-off-by: Leon Romanovsky <leon@kernel.org>

Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
2023-01-19 10:21:32 +00:00
Mohammad Kabat f8a2b29fbb net/mlx5: Lag, expose number of lag ports
Bugzilla: https://bugzilla.redhat.com/2112940
Upstream-status: v5.19-rc1

commit 34a30d7635a8e37275a7b63bec09035ed762969b
Author: Mark Bloch <mbloch@nvidia.com>
Date:   Tue Mar 1 15:42:01 2022 +0000

    net/mlx5: Lag, expose number of lag ports

    Downstream patches will add support for hardware lag with
    more than 2 ports. Add a way for users to query the number of lag ports.

    Signed-off-by: Mark Bloch <mbloch@nvidia.com>
    Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
    Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
2022-12-18 10:14:09 +00:00