JIRA: https://issues.redhat.com/browse/RHEL-52869
Upstream-status: v6.12-rc1
commit 8d159eb2117b2e3697a31785662b653938f007cb
Author: Chiara Meiohas <cmeiohas@nvidia.com>
Date: Mon Sep 9 20:30:23 2024 +0300
RDMA/mlx5: Use IB set_netdev and get_netdev functions
The IB layer provides a common interface to store and get net
devices associated to an IB device port (ib_device_set_netdev()
and ib_device_get_netdev()).
Previously, mlx5_ib stored and managed the associated net devices
internally.
Replace internal net device management in mlx5_ib with
ib_device_set_netdev() when attaching/detaching a net device and
ib_device_get_netdev() when retrieving the net device.
Export ib_device_get_netdev().
For mlx5 representors/PFs/VFs and lag creation we replace the netdev
assignments with the IB set/get netdev functions.
In active-backup mode lag the active slave net device is stored in the
lag itself. To assure the net device stored in a lag bond IB device is
the active slave we implement the following:
- mlx5_core: when modifying the slave of a bond we send the internal driver event
MLX5_DRIVER_EVENT_ACTIVE_BACKUP_LAG_CHANGE_LOWERSTATE.
- mlx5_ib: when catching the event call ib_device_set_netdev()
This patch also ensures the correct IB events are sent in switchdev lag.
While at it, when in multiport eswitch mode, only a single IB device is
created for all ports. The said IB device will receive all netdev events
of its VFs once loaded, thus to avoid overwriting the mapping of PF IB
device to PF netdev, ignore NETDEV_REGISTER events if the ib device has
already been mapped to a netdev.
Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com>
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://patch.msgid.link/20240909173025.30422-6-michaelgur@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-52869
Upstream-status: v6.12-rc1
commit 6f2487bfafce5e6cd6f89e7238a82012f7b9f5ac
Author: Michael Guralnik <michaelgur@nvidia.com>
Date: Mon Sep 9 13:05:03 2024 +0300
RDMA/mlx5: Add implicit MR handling to ODP memory scheme
Implicit MRs in ODP memory scheme require allocating a private null mkey
and assigning the mkey and va differently in the KSM mkey.
The page faults are received on the null mkey so we also add storing the
null mkey in the odp_mkey xarray.
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://patch.msgid.link/20240909100504.29797-8-michaelgur@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-52869
Upstream-status: v6.12-rc1
commit cef7dde8836ab09a3bfe96ada4f18ef2496eacc9
Author: Michael Guralnik <michaelgur@nvidia.com>
Date: Mon Sep 9 13:04:57 2024 +0300
net/mlx5: Expand mkey page size to support 6 bits
Protect the usage of the 6th bit with the relevant capability to ensure
we are using the new page sizes with FW that supports the bit extension.
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://patch.msgid.link/20240909100504.29797-2-michaelgur@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-52869
Upstream-status: v6.12-rc1
commit 7ebb00cea49db641b458edef0ede389f7004821d
Author: Michael Guralnik <michaelgur@nvidia.com>
Date: Tue Sep 3 14:24:50 2024 +0300
RDMA/mlx5: Fix MR cache temp entries cleanup
Fix the cleanup of the temp cache entries that are dynamically created
in the MR cache.
The cleanup of the temp cache entries is currently scheduled only when a
new entry is created. Since in the cleanup of the entries only the mkeys
are destroyed and the cache entry stays in the cache, subsequent
registrations might reuse the entry and it will eventually be filled with
new mkeys without cleanup ever getting scheduled again.
On workloads that register and deregister MRs with a wide range of
properties we see the cache ends up holding many cache entries, each
holding the max number of mkeys that were ever used through it.
Additionally, as the cleanup work is scheduled to run over the whole
cache, any mkey that is returned to the cache after the cleanup was
scheduled will be held for less than the intended 30 seconds timeout.
Solve both issues by dropping the existing remove_ent_work and reusing
the existing per-entry work to also handle the temp entries cleanup.
Schedule the work to run with a 30 seconds delay every time we push an
mkey to a clean temp entry.
This ensures the cleanup runs on each entry only 30 seconds after the
first mkey was pushed to an empty entry.
As we have already been distinguishing between persistent and temp entries
when scheduling the cache_work_func, it is not being scheduled in any
other flows for the temp entries.
Another benefit from moving to a per-entry cleanup is we now not
required to hold the rb_tree mutex, thus enabling other flow to run
concurrently.
Fixes: dd1b913fb0d0 ("RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow")
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://patch.msgid.link/e4fa4bb03bebf20dceae320f26816cd2dde23a26.1725362530.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-52869
Upstream-status: v6.12-rc1
commit de8f847a5114ff7cfcdfc114af8485c431dec703
Author: Yishai Hadas <yishaih@nvidia.com>
Date: Thu Aug 1 15:05:16 2024 +0300
RDMA/mlx5: Add support for DMABUF MR registrations with Data-direct
Add support for DMABUF MR registrations with Data-direct device.
Upon userspace calling to register a DMABUF MR with the data direct bit
set, the below algorithm will be followed.
1) Obtain a pinned DMABUF umem from the IB core using the user input
parameters (FD, offset, length) and the DMA PF device. The DMA PF
device is needed to allow the IOMMU to enable the DMA PF to access the
user buffer over PCI.
2) Create a KSM MKEY by setting its entries according to the user buffer
VA to IOVA mapping, with the MKEY being the data direct device-crossed
MKEY. This KSM MKEY is umrable and will be used as part of the MR cache.
The PD for creating it is the internal device 'data direct' kernel one.
3) Create a crossing MKEY that points to the KSM MKEY using the crossing
access mode.
4) Manage the KSM MKEY by adding it to a list of 'data direct' MKEYs
managed on the mlx5_ib device.
5) Return the crossing MKEY to the user, created with its supplied PD.
Upon DMA PF unbind flow, the driver will revoke the KSM entries.
The final deregistration will occur under the hood once the application
deregisters its MKEY.
Notes:
- This version supports only the PINNED UMEM mode, so there is no
dependency on ODP.
- The IOVA supplied by the application must be system page aligned due to
HW translations of KSM.
- The crossing MKEY will not be umrable or part of the MR cache, as we
cannot change its crossed (i.e. KSM) MKEY over UMR.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://patch.msgid.link/1f99d8020ed540d9702b9e2252a145a439609ba6.1722512548.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-52869
Upstream-status: v6.12-rc1
commit 2e8e631d7a41e3a4edc94f3c9dd5cb32c2aa539e
Author: Yishai Hadas <yishaih@nvidia.com>
Date: Thu Aug 1 15:05:12 2024 +0300
RDMA/mlx5: Add the initialization flow to utilize the 'data direct' device
Add the NET device initialization flow to utilize the 'data
direct' device.
When a NET mlx5_ib device is capable of 'data direct', the following
sequence of actions will occur:
- Find its affiliated 'data direct' VUID via a firmware command.
- Create its own private PD and 'data direct' mkey.
- Register to be notified when its 'data direct' driver is probed or removed.
The DMA device of the affiliated 'data direct' device, including the
private PD and the 'data direct' mkey, will be used later during MR
registrations that request the data direct functionality.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://patch.msgid.link/b11fa87b2a65bce4db8d40341bb6cee490fa4d06.1722512548.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-52869
Upstream-status: v6.12-rc1
commit 6910e3660d86c1a5654f742a40181d2c9154f26f
Author: Yishai Hadas <yishaih@nvidia.com>
Date: Thu Aug 1 15:05:11 2024 +0300
RDMA/mlx5: Introduce the 'data direct' driver
Introduce the 'data direct' driver for a ConnectX-8 Data Direct device.
The 'data direct' driver functions as the affiliated DMA device for one
or more capable mlx5_ib devices. This DMA device, as the name suggests,
is used exclusively for DMA operations. It can be considered a DMA engine
managed by a PF/VF, lacking network capabilities and having minimal overall
capabilities.
Consequently, the DMA NIC PF will not be exposed to or directly used by
software applications. The driver will not have any direct interface or
interaction with the firmware (no command interface, no capabilities,
etc.). It will operate solely over PCI to enable its DMA functionality.
Registration and un-registration of the driver are handled as part of
the mlx5_ib initialization and exit processes, as the mlx5_ib devices
will effectively be its clients.
The driver will serve as the DMA device for accessing another PCI device
to achieve optimal performance (both on the same NUMA node, P2P access,
etc.).
Upon probing, it will read its VUID over PCI to handle mlx5_ib device
registrations with the same VUID.
Upon removal, it will notify its clients to allow them to clean up the
resources that were mmaped with its DMA device.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://patch.msgid.link/b77edecfd476c3f445da96ab6aef499ae47b2829.1722512548.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-52869
JIRA: https://issues.redhat.com/browse/RHEL-52874
Upstream-status: v6.11-rc1
commit 026a425990af6969a15b57d6d7fa0138a7e21958
Author: Mark Zhang <markzhang@nvidia.com>
Date: Sun Jun 16 19:08:39 2024 +0300
RDMA/mlx5: Support plane device and driver APIs to add and delete it
This patch supports driver APIs "add_sub_dev" and "del_sub_dev", to
add and delete a plane device respectively.
A mlx5 plane device is a rdma SMI device; It provides the SMI capability
through user MAD for it's parent, the logical multi-plane aggregated
device. For a plane port:
- It supports QP0 only;
- When adding a plane device, all plane ports are added;
- For some commands like mad_ifc, both plane_index and native portnum
is needed;
- When querying or modifying a plane port context, the native portnum
must be used, as the query/modify_hca_vport_context command doesn't
support plane port.
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Link: https://lore.kernel.org/r/e933cd0562aece181f8657af2ca0f5b387d0f14e.1718553901.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-52869
JIRA: https://issues.redhat.com/browse/RHEL-52874
Upstream-status: v6.11-rc1
commit 2a5db20fa532198639671713c6213f96ff285b85
Author: Mark Zhang <markzhang@nvidia.com>
Date: Sun Jun 16 19:08:35 2024 +0300
RDMA/mlx5: Add support to multi-plane device and port
When multi-plane is supported, a logical port, which is aggregation of
multiple physical plane ports, is exposed for data transmission.
Compared with a normal mlx5 IB port, this logical port supports all
functionalities except Subnet Management.
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Link: https://lore.kernel.org/r/7e37c06c9cb243be9ac79930cd17053903785b95.1718553901.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-52869
Upstream-status: v6.11-rc1
commit 589b844f1bf04850d9fabcaa2e943325dc6768b4
Author: Akiva Goldberger <agoldberger@nvidia.com>
Date: Thu Jun 27 21:23:50 2024 +0300
RDMA/mlx5: Send UAR page index as ioctl attribute
Add UAR page index as a driver ioctl attribute to increase the number of
supported indices, previously limited to 16 bits by mlx5_ib_create_cq
struct.
Link: https://lore.kernel.org/r/0e18b34d7ec3b1ae02d694b0d545aed7413c0ef7.1719512393.git.leon@kernel.org
Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-52869
Upstream-status: v6.11-rc1
commit a4e540119be565f47c305f295ed43f8e0bc3f5c3
Author: Chiara Meiohas <cmeiohas@nvidia.com>
Date: Thu Jun 13 21:01:42 2024 +0300
RDMA/mlx5: Set mkeys for dmabuf at PAGE_SIZE
Set the mkey for dmabuf at PAGE_SIZE to support any SGL
after a move operation.
ib_umem_find_best_pgsz returns 0 on error, so it is
incorrect to check the returned page_size against PAGE_SIZE
Fixes: 90da7dc820 ("RDMA/mlx5: Support dma-buf based userspace memory region")
Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://lore.kernel.org/r/1e2289b9133e89f273a4e68d459057d032cbc2ce.1718301631.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-52869
Upstream-status: v6.11-rc1
commit 5895e70f2e6e8dc67b551ca554d6fcde0a7f0467
Author: Jianbo Liu <jianbol@nvidia.com>
Date: Mon Jun 3 13:26:39 2024 +0300
IB/mlx5: Allocate resources just before first QP/SRQ is created
Previously, all IB dev resources are initialized on driver load. As
they are not always used, move the initialization to the time when
they are needed.
To be more specific, move PD (p0) and CQ (c0) initialization to the
time when the first SRQ is created. and move SRQs(s0 and s1)
initialization to the time first QP is created. To avoid concurrent
creations, two new mutexes are also added.
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Link: https://lore.kernel.org/r/98c3e53a8cc0bdfeb6dec6e5bb8b037d78ab00d8.1717409369.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-52869
Upstream-status: v6.11-rc1
commit 638420115cc4ad6c3a2683bf46a052b505abb202
Author: Jianbo Liu <jianbol@nvidia.com>
Date: Mon Jun 3 13:26:38 2024 +0300
IB/mlx5: Create UMR QP just before first reg_mr occurs
UMR QP is not used in some cases, so move QP and its CQ creations from
driver load flow to the time first reg_mr occurs, that is when MR
interfaces are first called.
The initialization of dev->umrc.pd and dev->umrc.lock is still done in
driver load because pd is needed for mlx5_mkey_cache_init and the lock
is reused to protect against the concurrent creation.
When testing 4G bytes memory registration latency with rtool [1] and 8
threads in parallel, there is minor performance degradation (<5% for
the max latency) is seen for the first reg_mr with this change.
Link: https://github.com/paravmellanox/rtool [1]
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Link: https://lore.kernel.org/r/55d3c4f8a542fd974d8a4c5816eccfb318a59b38.1717409369.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-52869
Upstream-status: v6.11-rc1
commit d98995b4bf981519dde4af0a081c393d62474039
Author: Jianbo Liu <jianbol@nvidia.com>
Date: Mon Jun 3 13:26:37 2024 +0300
net/mlx5: Reimplement write combining test
The test of write combining was added before in mlx5_ib driver. It
opens UD QP and posts NOP WQEs, and uses BlueFlame doorbell. When
BlueFlame is used, WQEs get written directly to a PCI BAR of the
device (in addition to memory) so that the device handles them without
having to access memory.
In this test, the WQEs written in memory are different from the ones
written to the BlueFlame which request CQE update. By checking the
completion reports posted on CQ, we can know if BlueFlame succeeds or
not. The write combining must be supported if BlueFlame succeeds as
its register is written using write combining.
This patch reimplements the test in the same way, but using a pair of
SQ and CQ only. It is moved to mlx5_core as a general feature used by
both mlx5_core and mlx5_ib.
Besides, save write combine test result of the PCI function, so that
its thousands of child functions such as SF can query without paying
the time and resource penalty by itself. The test function is called
only after failing to get the cached result. With this enhancement,
all thousands of SFs of the PF attached to same driver no longer need
to perform WC check explicitly, which is already done in the system.
This saves several commands per SF, thereby speeds up SF creation and
also saves completion EQ creation.
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/4ff5a8cc4c5b5b0d98397baa45a5019bcdbf096e.1717409369.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-56245
commit 3aa73c6b795b9aaaf933f3c95495d85fc0de39e3
Author: Yishai Hadas <yishaih@nvidia.com>
Date: Thu Aug 1 15:05:15 2024 +0300
RDMA: Pass uverbs_attr_bundle as part of '.reg_user_mr_dmabuf' API
Pass uverbs_attr_bundle as part of '.reg_user_mr_dmabuf' API instead of
udata.
This enables passing some new ioctl attributes to the drivers, as will
be introduced in the next patches for mlx5 driver.
Change the involved drivers accordingly.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://patch.msgid.link/9a25b2fc02443f7c36c2d93499ae25252b6afd40.1722512548.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Kamal Heib <kheib@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-56247
Conflicts:
Drop hunks for none existing drivers.
commit dd6d7f8574d7f8b6a0bf1aeef0b285d2706b8c2a
Author: Akiva Goldberger <agoldberger@nvidia.com>
Date: Thu Jun 27 21:23:49 2024 +0300
RDMA: Pass entire uverbs attr bundle to create cq function
Changes the create_cq verb signature by sending the entire uverbs attr
bundle as a parameter. This allows drivers to send driver specific attrs
through ioctl for the create_cq verb and access them in their driver
specific code.
Also adds a new enum value for driver specific ioctl attributes for
methods already supporting UHW.
Link: https://lore.kernel.org/r/ed147343987c0d43fd391c1b2f85e2f425747387.1719512393.git.leon@kernel.org
Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Kamal Heib <kheib@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-45365
Upstream-status: v6.10-rc1
commit 8c1185fef68cc603b954fece2a434c9f851d6a86
Author: Or Har-Toov <ohartoov@nvidia.com>
Date: Wed Apr 3 13:36:00 2024 +0300
RDMA/mlx5: Change check for cacheable mkeys
umem can be NULL for user application mkeys in some cases. Therefore
umem can't be used for checking if the mkey is cacheable and it is
changed for checking a flag that indicates it. Also make sure that
all mkeys which are not returned to the cache will be destroyed.
Fixes: dd1b913fb0d0 ("RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow")
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Link: https://lore.kernel.org/r/2690bc5c6896bcb937f89af16a1ff0343a7ab3d0.1712140377.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-45365
Upstream-status: v6.10-rc1
commit 0611a8e8b475fc5230b9a24d29c8397aaab20b63
Author: Or Har-Toov <ohartoov@nvidia.com>
Date: Wed Apr 3 13:35:59 2024 +0300
RDMA/mlx5: Uncacheable mkey has neither rb_key or cache_ent
As some mkeys can't be modified with UMR due to some UMR limitations,
like the size of translation that can be updated, not all user mkeys can
be cached.
Fixes: dd1b913fb0d0 ("RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow")
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Link: https://lore.kernel.org/r/f2742dd934ed73b2d32c66afb8e91b823063880c.1712140377.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-24466
Upstream-status: v6.7-rc1
Conflicts:
- drivers/infiniband/hw/mlx5/mr.c
Due to commit 78e2a0dd38 (tags/kernel-5.14.0-417.el9) which
is an OOO backport of
a53e215f9007 RDMA/mlx5: Fix mkey cache WQ flush (v6.7-rc1)
-> Adjust context
commit 57e7071683ef6148c9f5ea0ba84598d2ba681375
Author: Shay Drory <shayd@nvidia.com>
Date: Thu Sep 21 11:07:16 2023 +0300
RDMA/mlx5: Implement mkeys management via LIFO queue
Currently, mkeys are managed via xarray. This implementation leads to
a degradation in cases many MRs are unregistered in parallel, due to xarray
internal implementation, for example: deregistration 1M MRs via 64 threads
is taking ~15% more time[1].
Hence, implement mkeys management via LIFO queue, which solved the
degradation.
[1]
2.8us in kernel v5.19 compare to 3.2us in kernel v6.4
Signed-off-by: Shay Drory <shayd@nvidia.com>
Link: https://lore.kernel.org/r/fde3d4cfab0f32f0ccb231cd113298256e1502c5.1695283384.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-22227
Upstream-status: v6.6-rc1
commit 58dbd6428a6819e55a3c52ec60126b5d00804a38
Author: Patrisious Haddad <phaddad@nvidia.com>
Date: Thu Apr 13 12:04:59 2023 +0300
RDMA/mlx5: Handles RoCE MACsec steering rules addition and deletion
Add RoCE MACsec rules when a gid is added for the MACsec netdevice and
handle their cleanup when the gid is removed or the MACsec SA is deleted.
Also support alias IP for the MACsec device, as long as we don't have
more ips than what the gid table can hold.
In addition handle the case where a gid is added but there are still no
SAs added for the MACsec device, so the rules are added later on when
the SAs are added.
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Amir Tzin <atzin@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-22227
Upstream-status: v6.6-rc1
commit 758ce14aee825f8f3ca8f76c9991c108094cae8b
Author: Patrisious Haddad <phaddad@nvidia.com>
Date: Tue May 3 08:37:48 2022 +0300
RDMA/mlx5: Implement MACsec gid addition and deletion
Handle MACsec IP ambiguity issue, since mlx5 hw can't support
programming both the MACsec and the physical gid when they have the same
IP address, because it wouldn't know to whom to steer the traffic.
Hence in such case we delete the physical gid from the hw gid table,
which would then cause all traffic sent over it to fail, and we'll only
be able to send traffic over the MACsec gid.
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Amir Tzin <atzin@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-22227
Upstream-status: v6.5-rc1
commit 2ecfd946169e7f56534db2a5f6935858be3005ba
Author: Leon Romanovsky <leon@kernel.org>
Date: Mon Jun 5 13:14:05 2023 +0300
RDMA/mlx5: Reduce QP table exposure
driver.h is common header to whole mlx5 code base, but struct
mlx5_qp_table is used in mlx5_ib driver only. So move that struct
to be under sole responsibility of mlx5_ib.
Link: https://lore.kernel.org/r/bec0dc1158e795813b135d1143147977f26bf668.1685953497.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Amir Tzin <atzin@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-882
Upstream-status: v6.6-rc5
commit c99a7457e5bb873914a74307ba2df85f6799203b
Author: Leon Romanovsky <leon@kernel.org>
Date: Thu Sep 28 20:20:47 2023 +0300
RDMA/mlx5: Remove not-used cache disable flag
During execution of mlx5_mkey_cache_cleanup(), there is a guarantee
that MR are not registered and/or destroyed. It means that we don't
need newly introduced cache disable flag.
Fixes: 374012b00457 ("RDMA/mlx5: Fix mkey cache possible deadlock on cleanup")
Link: https://lore.kernel.org/r/c7e9c9f98c8ae4a7413d97d9349b29f5b0a23dbe.1695921626.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-882
Upstream-status: v6.4-rc7
commit 617f5db1a626f18d5cbb7c7faf7bf8f9ea12be78
Author: Mark Bloch <mbloch@nvidia.com>
Date: Mon Jun 5 13:33:26 2023 +0300
RDMA/mlx5: Fix affinity assignment
The cited commit aimed to ensure that Virtual Functions (VFs) assign a
queue affinity to a Queue Pair (QP) to distribute traffic when
the LAG master creates a hardware LAG. If the affinity was set while
the hardware was not in LAG, the firmware would ignore the affinity value.
However, this commit unintentionally assigned an affinity to QPs on the LAG
master's VPORT even if the RDMA device was not marked as LAG-enabled.
In most cases, this was not an issue because when the hardware entered
hardware LAG configuration, the RDMA device of the LAG master would be
destroyed and a new one would be created, marked as LAG-enabled.
The problem arises when a user configures Equal-Cost Multipath (ECMP).
In ECMP mode, traffic can be directed to different physical ports based on
the queue affinity, which is intended for use by VPORTS other than the
E-Switch manager. ECMP mode is supported only if both E-Switch managers are
in switchdev mode and the appropriate route is configured via IP. In this
configuration, the RDMA device is not destroyed, and we retain the RDMA
device that is not marked as LAG-enabled.
To ensure correct behavior, Send Queues (SQs) opened by the E-Switch
manager through verbs should be assigned strict affinity. This means they
will only be able to communicate through the native physical port
associated with the E-Switch manager. This will prevent the firmware from
assigning affinity and will not allow the SQs to be remapped in case of
failover.
Fixes: 802dcc7fc5 ("RDMA/mlx5: Support TX port affinity for VF drivers in LAG mode")
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://lore.kernel.org/r/425b05f4da840bc684b0f7e8ebf61aeb5cef09b0.1685960567.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-882
Upstream-status: v6.4-rc7
commit e1f4a52ac171dd863fe89055e749ef5e0a0bc5ce
Author: Mark Bloch <mbloch@nvidia.com>
Date: Mon Jun 5 13:33:18 2023 +0300
RDMA/mlx5: Create an indirect flow table for steering anchor
A misbehaved user can create a steering anchor that points to a kernel
flow table and then destroy the anchor without freeing the associated
STC. This creates a problem as the kernel can't destroy the flow
table since there is still a reference to it. As a result, this can
exhaust all available flow table resources, preventing other users from
using the RDMA device.
To prevent this problem, a solution is implemented where a special flow
table with two steering rules is created when a user creates a steering
anchor for the first time. The rules include one that drops all traffic
and another that points to the kernel flow table. If the steering anchor
is destroyed, only the rule pointing to the kernel's flow table is removed.
Any traffic reaching the special flow table after that is dropped.
Since the special flow table is not destroyed when the steering anchor is
destroyed, any issues are prevented from occurring. The remaining resources
are only destroyed when the RDMA device is destroyed, which happens after
all DEVX objects are freed, including the STCs, thus mitigating the issue.
Fixes: 0c6ab0ca9a66 ("RDMA/mlx5: Expose steering anchor to userspace")
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Link: https://lore.kernel.org/r/b4a88a871d651fa4e8f98d552553c1cfe9ba2cd6.1685960567.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2165364
Upstream-status: v6.3-rc1
commit 66fb1d5df6ace316a4a6e2c31e13fc123ea2b644
Author: Edward Srouji <edwards@nvidia.com>
Date: Thu Feb 16 11:13:45 2023 +0200
IB/mlx5: Extend debug control for CC parameters
This patch adds rtt_resp_dscp to the current debug controllability of
congestion control (CC) parameters.
rtt_resp_dscp can be read or written through debugfs.
If set, its value overwrites the DSCP of the generated RTT response.
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Link: https://lore.kernel.org/r/1dcc3440ee53c688f19f579a051ded81a2aaa70a.1676538714.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2165364
Upstream-status: v6.3-rc1
commit 594cac11ab6a1be8022a3c96d181dde7cfb0b8cf
Author: Or Har-Toov <ohartoov@nvidia.com>
Date: Tue Jan 17 15:14:52 2023 +0200
RDMA/mlx5: Use query_special_contexts for mkeys
Use query_sepcial_contexts to get the correct value of mkeys such as
null_mkey, terminate_scatter_list_mkey and dump_fill_mkey, as FW will
change them in certain configurations.
Link: https://lore.kernel.org/r/000236f0a9487d48809f87bcc3620a3964b2d3d3.1673960981.git.leon@kernel.org
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2165364
Upstream-status: v6.3-rc1
commit dca55da0a15717dde509d17163946e951bad56c4
Author: Jiri Pirko <jiri@nvidia.com>
Date: Tue Nov 1 15:36:01 2022 +0100
RDMA/mlx5: Track netdev to avoid deadlock during netdev notifier unregister
When removing a network namespace with mlx5 devlink instance being in
it, following callchain is performed:
cleanup_net (takes down_read(&pernet_ops_rwsem)
devlink_pernet_pre_exit()
devlink_reload()
mlx5_devlink_reload_down()
mlx5_unload_one_devl_locked()
mlx5_detach_device()
del_adev()
mlx5r_remove()
__mlx5_ib_remove()
mlx5_ib_roce_cleanup()
mlx5_remove_netdev_notifier()
unregister_netdevice_notifier (takes down_write(&pernet_ops_rwsem)
This deadlocks.
Resolve this by converting to register_netdevice_notifier_dev_net()
which does not take pernet_ops_rwsem and moves the notifier block around
according to netdev it takes as arg.
Use previously introduced netdev added/removed events to track uplink
netdev to be used for register_netdevice_notifier_dev_net() purposes.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2165364
Upstream-status: v6.3-rc1
commit 85f9e38a5ac7d397f9bb5e57901b2d6af4dcc3b9
Author: Leon Romanovsky <leon@kernel.org>
Date: Thu Feb 2 11:03:07 2023 +0200
RDMA/mlx5: Remove impossible check of mkey cache cleanup failure
mlx5_mkey_cache_cleanup() can't fail and can be changed to be void.
Link: https://lore.kernel.org/r/1acd9528995d083114e7dec2a2afc59436406583.1675328463.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2165364
Upstream-status: v6.3-rc1
commit 627122280c878cf5d3cda2d2c5a0a8f6a7e35cb7
Author: Michael Guralnik <michaelgur@nvidia.com>
Date: Thu Jan 26 00:28:07 2023 +0200
RDMA/mlx5: Add work to remove temporary entries from the cache
The non-cache mkeys are stored in the cache only to shorten restarting
application time. Don't store them longer than needed.
Configure cache entries that store non-cache MRs as temporary entries. If
30 seconds have passed and no user reclaimed the temporarily cached mkeys,
an asynchronous work will destroy the mkeys entries.
Link: https://lore.kernel.org/r/20230125222807.6921-7-michaelgur@nvidia.com
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2165364
Upstream-status: v6.3-rc1
commit dd1b913fb0d0e3e6d55e92d2319d954474dd66ac
Author: Michael Guralnik <michaelgur@nvidia.com>
Date: Thu Jan 26 00:28:06 2023 +0200
RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow
Currently, when dereging an MR, if the mkey doesn't belong to a cache
entry, it will be destroyed. As a result, the restart of applications
with many non-cached mkeys is not efficient since all the mkeys are
destroyed and then recreated. This process takes a long time (for 100,000
MRs, it is ~20 seconds for dereg and ~28 seconds for re-reg).
To shorten the restart runtime, insert all cacheable mkeys to the cache.
If there is no fitting entry to the mkey properties, create a temporary
entry that fits it.
After a predetermined timeout, the cache entries will shrink to the
initial high limit.
The mkeys will still be in the cache when consuming them again after an
application restart. Therefore, the registration will be much faster
(for 100,000 MRs, it is ~4 seconds for dereg and ~5 seconds for re-reg).
The temporary cache entries created to store the non-cache mkeys are not
exposed through sysfs like the default cache entries.
Link: https://lore.kernel.org/r/20230125222807.6921-6-michaelgur@nvidia.com
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2165364
Upstream-status: v6.3-rc1
commit 73d09b2fe8336f5f37935e46418666ddbcd3c343
Author: Michael Guralnik <michaelgur@nvidia.com>
Date: Thu Jan 26 00:28:05 2023 +0200
RDMA/mlx5: Introduce mlx5r_cache_rb_key
Switch from using the mkey order to using the new struct as the key to the
RB tree of cache entries.
The key is all the mkey properties that UMR operations can't modify.
Using this key to define the cache entries and to search and create cache
mkeys.
Link: https://lore.kernel.org/r/20230125222807.6921-5-michaelgur@nvidia.com
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2165364
Upstream-status: v6.3-rc1
commit b9584517832858a0f78d6851d09b697a829514cd
Author: Michael Guralnik <michaelgur@nvidia.com>
Date: Thu Jan 26 00:28:04 2023 +0200
RDMA/mlx5: Change the cache structure to an RB-tree
Currently, the cache structure is a static linear array. Therefore, his
size is limited to the number of entries in it and is not expandable. The
entries are dedicated to mkeys of size 2^x and no access_flags. Mkeys with
different properties are not cacheable.
In this patch, we change the cache structure to an RB-tree. This will
allow to extend the cache to support more entries with different mkey
properties.
Link: https://lore.kernel.org/r/20230125222807.6921-4-michaelgur@nvidia.com
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2165364
Upstream-status: v6.3-rc1
commit a2a88b8e22d1b202225d0e40b02ad068afab2ccb
Author: Aharon Landau <aharonl@nvidia.com>
Date: Thu Jan 26 00:28:02 2023 +0200
RDMA/mlx5: Don't keep umrable 'page_shift' in cache entries
mkc.log_page_size can be changed using UMR. Therefore, don't treat it as a
cache entry property.
Removing it from struct mlx5_cache_ent.
All cache mkeys will be created with default PAGE_SHIFT, and updated with
the needed page_shift using UMR when passing them to a user.
Link: https://lore.kernel.org/r/20230125222807.6921-2-michaelgur@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2165355
Upstream-status: v6.1-rc1
commit a83bb5df2ac604ab418fbe0a8720f55de46652eb
Author: Liu, Changcheng <jerrliu@nvidia.com>
Date: Wed Sep 7 16:36:26 2022 -0700
RDMA/mlx5: Don't set tx affinity when lag is in hash mode
In hash mode, without setting tx affinity explicitly, the port select
flow table decides which port is used for the traffic.
If port_select_flow_table_bypass capability is supported and tx affinity
is set explicitly for QP/TIS, they will be added into the explicit affinity
table in FW to check which port is used for the traffic.
1. The overloaded explicit affinity table may affect performance.
To avoid this, do not set tx affinity explicitly by default.
2. The packets of the same flow need to be transmitted on the same port.
Because the packets of the same flow use different QPs in slow & fast
path, it shouldn't set tx affinity explicitly for these QPs.
Signed-off-by: Liu, Changcheng <jerrliu@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1940
Upstream status: v6.1.
Bugzilla: https://bugzilla.redhat.com/2123401
Tested: cuda pyverbs tests passed.
Add support for DMABUF FD's when creating a devx umem in the RDMA mlx5 driver.
This allows applications to create work queues directly on GPU memory where
the GPU fully controls the data flow out of the RDMA NIC.
Signed-off-by: Kamal Heib <kheib@redhat.com>
Approved-by: Íñigo Huguet <ihuguet@redhat.com>
Approved-by: José Ignacio Tornos Martínez <jtornosm@redhat.com>
Approved-by: Jonathan Toppins <jtoppins@redhat.com>
Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2123401
commit 72b2f7608a59727e7c2e5b11cff2749c2c080fac
Author: Jason Gunthorpe <jgg@ziepe.ca>
Date: Thu Sep 1 11:20:56 2022 -0300
RDMA/mlx5: Enable ATS support for MRs and umems
For mlx5 if ATS is enabled in the PCI config then the device will use ATS
requests for only certain DMA operations. This has to be opted in by the
SW side based on the mkey or umem settings.
ATS slows down the PCI performance, so it should only be set in cases when
it is needed. All of these cases revolve around optimizing PCI P2P
transfers and avoiding bad cases where the bus just doesn't work.
Link: https://lore.kernel.org/r/4-v1-bd147097458e+ede-umem_dmabuf_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Kamal Heib <kheib@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2112947
Upstream-status: v6.0-rc5
commit 9b7d4be967f16f79a2283b2338709fcc750313ee
Author: Maor Gottlieb <maorg@nvidia.com>
Date: Mon Aug 29 12:02:29 2022 +0300
RDMA/mlx5: Fix UMR cleanup on error flow of driver init
The cited commit removed from the cleanup flow of umr the checks
if the resources were created. This could lead to null-ptr-deref
in case that we had failure in mlx5_ib_stage_ib_reg_init stage.
Fix it by adding new state to the umr that can say if the resources
were created or not and check it in the umr cleanup flow before
destroying the resources.
Fixes: 04876c12c19e ("RDMA/mlx5: Move init and cleanup of UMR to umr.c")
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Link: https://lore.kernel.org/r/4cfa61386cf202e9ce330e8d228ce3b25a36326e.1661763459.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2112947
Upstream-status: v6.0-rc1
commit 0113780870b1597ae49f30abfa4957c239f913d3
Author: Aharon Landau <aharonl@nvidia.com>
Date: Tue Jul 26 10:19:11 2022 +0300
RDMA/mlx5: Rename the mkey cache variables and functions
After replacing the MR cache with an Mkey cache, rename the variables and
functions to fit the new meaning.
Link: https://lore.kernel.org/r/20220726071911.122765-6-michaelgur@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2112947
Upstream-status: v6.0-rc1
commit 6b7533869523ae58e2b914551305b0e47cbeb247
Author: Aharon Landau <aharonl@nvidia.com>
Date: Tue Jul 26 10:19:10 2022 +0300
RDMA/mlx5: Store in the cache mkeys instead of mrs
Currently, the driver stores mlx5_ib_mr struct in the cache entries,
although the only use of the cached MR is the mkey. Store only the mkey in
the cache.
Link: https://lore.kernel.org/r/20220726071911.122765-5-michaelgur@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2112947
Upstream-status: v6.0-rc1
commit 19591f134c59703dfc272356808e6fe2037d0d40
Author: Aharon Landau <aharonl@nvidia.com>
Date: Tue Jul 26 10:19:09 2022 +0300
RDMA/mlx5: Store the number of in_use cache mkeys instead of total_mrs
total_mrs is used only to calculate the number of mkeys currently in
use. To simplify things, replace it with a new member called "in_use" and
directly store the number of mkeys currently in use.
Link: https://lore.kernel.org/r/20220726071911.122765-4-michaelgur@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2112947
Upstream-status: v6.0-rc1
commit 86457a92df1bebdcd8e20afa286427e4b525aa08
Author: Aharon Landau <aharonl@nvidia.com>
Date: Tue Jul 26 10:19:08 2022 +0300
RDMA/mlx5: Replace cache list with Xarray
The Xarray allows us to store the cached mkeys in memory efficient way.
Entries are reserved in the Xarray using xa_cmpxchg before calling to the
upcoming callbacks to avoid allocations in interrupt context. The
xa_cmpxchg can sleep when using GFP_KERNEL, so we call it in a loop to
ensure one reserved entry for each process trying to reserve.
Link: https://lore.kernel.org/r/20220726071911.122765-3-michaelgur@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2112947
Upstream-status: v6.0-rc1
commit 17ae355926ed1832449d52748334b8fa799301f1
Author: Aharon Landau <aharonl@nvidia.com>
Date: Tue Jul 26 10:19:07 2022 +0300
RDMA/mlx5: Replace ent->lock with xa_lock
In the next patch, ent->list will be replaced with an xarray. The xarray
uses an internal lock to protect the indexes. Use it to protect all the
entry fields, and get rid of ent->lock.
Link: https://lore.kernel.org/r/20220726071911.122765-2-michaelgur@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2112947
Upstream-status: v6.0-rc1
commit 0c6ab0ca9a662d4ca9742d97156bac0d3067d72d
Author: Mark Bloch <mbloch@nvidia.com>
Date: Sun Jul 3 13:54:07 2022 -0700
RDMA/mlx5: Expose steering anchor to userspace
Expose a steering anchor per priority to allow users to re-inject
packets back into default NIC pipeline for additional processing.
MLX5_IB_METHOD_STEERING_ANCHOR_CREATE returns a flow table ID which
a user can use to re-inject packets at a specific priority.
A FTE (flow table entry) can be created and the flow table ID
used as a destination.
When a packet is taken into a RDMA-controlled steering domain (like
software steering) there may be a need to insert the packet back into
the default NIC pipeline. This exposes a flow table ID to the user that can
be used as a destination in a flow table entry.
With this new method priorities that are exposed to users via
MLX5_IB_METHOD_FLOW_MATCHER_CREATE can be reached from a non-zero UID.
As user-created flow tables (via RDMA DEVX) are created with a non-zero UID
thus it's impossible to point to a NIC core flow table (core driver flow tables
are created with UID value of zero) from userspace.
Create flow tables that are exposed to users with the shared UID, this
allows users to point to default NIC flow tables.
Steering loops are prevented at FW level as FW enforces that no flow
table at level X can point to a table at level lower than X.
Link: https://lore.kernel.org/all/20220703205407.110890-6-saeed@kernel.org/
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2112947
Upstream-status: v6.0-rc1
commit 158e71bb69e368b8b33e8b7c4ac8c111da0c1ae2
Author: Aharon Landau <aharonl@nvidia.com>
Date: Sun May 15 07:19:53 2022 +0300
RDMA/mlx5: Add a umr recovery flow
When a UMR fails, the UMR QP state changes to an error state. Therefore,
all the further UMR operations will fail too.
Add a recovery flow to the UMR QP, and repost the flushed WQEs.
Link: https://lore.kernel.org/r/6cc24816cca049bd8541317f5e41d3ac659445d3.1652588303.git.leonro@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Mohammad Kabat <mkabat@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2112940
Upstream-status: v5.19-rc1
commit 34a30d7635a8e37275a7b63bec09035ed762969b
Author: Mark Bloch <mbloch@nvidia.com>
Date: Tue Mar 1 15:42:01 2022 +0000
net/mlx5: Lag, expose number of lag ports
Downstream patches will add support for hardware lag with
more than 2 ports. Add a way for users to query the number of lag ports.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Mohammad Kabat <mkabat@redhat.com>