Centos-kernel-stream-9

Commit Graph

Author	SHA1	Message	Date
Benjamin Poirier	92d84d6467	RDMA/mlx5: Fix a WARN during dereg_mr for DM type JIRA: https://issues.redhat.com/browse/RHEL-6641 JIRA: https://issues.redhat.com/browse/RHEL-49958 JIRA: https://issues.redhat.com/browse/RHEL-77115 Upstream-status: git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git for-rc commit abc7b3f1f056d69a8f11d6dceecc0c9549ace770 Author: Yishai Hadas <yishaih@nvidia.com> Date: Mon Feb 3 14:51:43 2025 +0200 RDMA/mlx5: Fix a WARN during dereg_mr for DM type Memory regions (MR) of type DM (device memory) do not have an associated umem. In the __mlx5_ib_dereg_mr() -> mlx5_free_priv_descs() flow, the code incorrectly takes the wrong branch, attempting to call dma_unmap_single() on a DMA address that is not mapped. This results in a WARN [1], as shown below. The issue is resolved by properly accounting for the DM type and ensuring the correct branch is selected in mlx5_free_priv_descs(). [1] WARNING: CPU: 12 PID: 1346 at drivers/iommu/dma-iommu.c:1230 iommu_dma_unmap_page+0x79/0x90 Modules linked in: ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_mangle xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry ovelay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core fuse mlx5_core CPU: 12 UID: 0 PID: 1346 Comm: ibv_rc_pingpong Not tainted 6.12.0-rc7+ #1631 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:iommu_dma_unmap_page+0x79/0x90 Code: 2b 49 3b 29 72 26 49 3b 69 08 73 20 4d 89 f0 44 89 e9 4c 89 e2 48 89 ee 48 89 df 5b 5d 41 5c 41 5d 41 5e 41 5f e9 07 b8 88 ff <0f> 0b 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 66 0f 1f 44 00 RSP: 0018:ffffc90001913a10 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88810194b0a8 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001 RBP: ffff88810194b0a8 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f537abdd740(0000) GS:ffff88885fb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f537aeb8000 CR3: 000000010c248001 CR4: 0000000000372eb0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __warn+0x84/0x190 ? iommu_dma_unmap_page+0x79/0x90 ? report_bug+0xf8/0x1c0 ? handle_bug+0x55/0x90 ? exc_invalid_op+0x13/0x60 ? asm_exc_invalid_op+0x16/0x20 ? iommu_dma_unmap_page+0x79/0x90 dma_unmap_page_attrs+0xe6/0x290 mlx5_free_priv_descs+0xb0/0xe0 [mlx5_ib] __mlx5_ib_dereg_mr+0x37e/0x520 [mlx5_ib] ? _raw_spin_unlock_irq+0x24/0x40 ? wait_for_completion+0xfe/0x130 ? rdma_restrack_put+0x63/0xe0 [ib_core] ib_dereg_mr_user+0x5f/0x120 [ib_core] ? lock_release+0xc6/0x280 destroy_hw_idr_uobject+0x1d/0x60 [ib_uverbs] uverbs_destroy_uobject+0x58/0x1d0 [ib_uverbs] uobj_destroy+0x3f/0x70 [ib_uverbs] ib_uverbs_cmd_verbs+0x3e4/0xbb0 [ib_uverbs] ? __pfx_uverbs_destroy_def_handler+0x10/0x10 [ib_uverbs] ? lock_acquire+0xc1/0x2f0 ? ib_uverbs_ioctl+0xcb/0x170 [ib_uverbs] ? ib_uverbs_ioctl+0x116/0x170 [ib_uverbs] ? lock_release+0xc6/0x280 ib_uverbs_ioctl+0xe7/0x170 [ib_uverbs] ? ib_uverbs_ioctl+0xcb/0x170 [ib_uverbs] __x64_sys_ioctl+0x1b0/0xa70 do_syscall_64+0x6b/0x140 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f537adaf17b Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffff218f0b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007ffff218f1d8 RCX: 00007f537adaf17b RDX: 00007ffff218f1c0 RSI: 00000000c0181b01 RDI: 0000000000000003 RBP: 00007ffff218f1a0 R08: 00007f537aa8d010 R09: 0000561ee2e4f270 R10: 00007f537aace3a8 R11: 0000000000000246 R12: 00007ffff218f190 R13: 000000000000001c R14: 0000561ee2e4d7c0 R15: 00007ffff218f450 </TASK> Fixes: `f18ec42231` ("RDMA/mlx5: Use a union inside mlx5_ib_mr") Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://patch.msgid.link/2039c22cfc3df02378747ba4d623a558b53fc263.1738587076.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>	2025-02-25 10:49:18 -05:00
Benjamin Poirier	c8b0960396	net/mlx5: Expand mkey page size to support 6 bits JIRA: https://issues.redhat.com/browse/RHEL-52869 Upstream-status: v6.12-rc1 commit cef7dde8836ab09a3bfe96ada4f18ef2496eacc9 Author: Michael Guralnik <michaelgur@nvidia.com> Date: Mon Sep 9 13:04:57 2024 +0300 net/mlx5: Expand mkey page size to support 6 bits Protect the usage of the 6th bit with the relevant capability to ensure we are using the new page sizes with FW that supports the bit extension. Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/20240909100504.29797-2-michaelgur@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>	2024-12-05 10:32:08 -05:00
Benjamin Poirier	35555eab92	RDMA/mlx5: Fix MR cache temp entries cleanup JIRA: https://issues.redhat.com/browse/RHEL-52869 Upstream-status: v6.12-rc1 commit 7ebb00cea49db641b458edef0ede389f7004821d Author: Michael Guralnik <michaelgur@nvidia.com> Date: Tue Sep 3 14:24:50 2024 +0300 RDMA/mlx5: Fix MR cache temp entries cleanup Fix the cleanup of the temp cache entries that are dynamically created in the MR cache. The cleanup of the temp cache entries is currently scheduled only when a new entry is created. Since in the cleanup of the entries only the mkeys are destroyed and the cache entry stays in the cache, subsequent registrations might reuse the entry and it will eventually be filled with new mkeys without cleanup ever getting scheduled again. On workloads that register and deregister MRs with a wide range of properties we see the cache ends up holding many cache entries, each holding the max number of mkeys that were ever used through it. Additionally, as the cleanup work is scheduled to run over the whole cache, any mkey that is returned to the cache after the cleanup was scheduled will be held for less than the intended 30 seconds timeout. Solve both issues by dropping the existing remove_ent_work and reusing the existing per-entry work to also handle the temp entries cleanup. Schedule the work to run with a 30 seconds delay every time we push an mkey to a clean temp entry. This ensures the cleanup runs on each entry only 30 seconds after the first mkey was pushed to an empty entry. As we have already been distinguishing between persistent and temp entries when scheduling the cache_work_func, it is not being scheduled in any other flows for the temp entries. Another benefit from moving to a per-entry cleanup is we now not required to hold the rb_tree mutex, thus enabling other flow to run concurrently. Fixes: dd1b913fb0d0 ("RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow") Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/e4fa4bb03bebf20dceae320f26816cd2dde23a26.1725362530.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>	2024-12-05 10:32:08 -05:00
Benjamin Poirier	e8d9c02c59	RDMA/mlx5: Limit usage of over-sized mkeys from the MR cache JIRA: https://issues.redhat.com/browse/RHEL-52869 Upstream-status: v6.12-rc1 commit ee6d57a2e13d11ce9050cfc3e3b69ef707a44a63 Author: Michael Guralnik <michaelgur@nvidia.com> Date: Tue Sep 3 14:24:49 2024 +0300 RDMA/mlx5: Limit usage of over-sized mkeys from the MR cache When searching the MR cache for suitable cache entries, don't use mkeys larger than twice the size required for the MR. This should ensure the usage of mkeys closer to the minimal required size and reduce memory waste. On driver init we create entries for mkeys with clear attributes and powers of 2 sizes from 4 to the max supported size. This solves the issue for anyone using mkeys that fit these requirements. In the use case where an MR is registered with different attributes, like an access flag we can't UMR, we'll create a new cache entry to store it upon dereg. Without this fix, any later registration with same attributes and smaller size will use the newly created cache entry and it's mkeys, disregarding the memory waste of using mkeys larger than required. For example, one worst-case scenario can be when registering and deregistering a 1GB mkey with ATS enabled which will cause the creation of a new cache entry to hold those type of mkeys. A user registering a 4k MR with ATS will end up using the new cache entry and an mkey that can support a 1GB MR, thus wasting x250k memory than actually needed in the HW. Additionally, allow all small registration to use the smallest size cache entry that is initialized on driver load even if size is larger than twice the required size. Fixes: 73d09b2fe833 ("RDMA/mlx5: Introduce mlx5r_cache_rb_key") Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/8ba3a6e3748aace2026de8b83da03aba084f78f4.1725362530.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>	2024-12-05 10:32:08 -05:00
Benjamin Poirier	ebf403710c	RDMA/mlx5: Fix counter update on MR cache mkey creation JIRA: https://issues.redhat.com/browse/RHEL-52869 Upstream-status: v6.12-rc1 commit 6f5cd6ac9a4201e4ba6f10b76a9da8044d6e38b0 Author: Michael Guralnik <michaelgur@nvidia.com> Date: Tue Sep 3 14:24:48 2024 +0300 RDMA/mlx5: Fix counter update on MR cache mkey creation After an mkey is created, update the counter for pending mkeys before reshceduling the work that is filling the cache. Rescheduling the work with a full MR cache entry and a wrong 'pending' counter will cause us to miss disabling the fill_to_high_water flag. Thus leaving the cache full but with an indication that it's still needs to be filled up to it's full size (2 * limit). Next time an mkey will be taken from the cache, we'll unnecessarily continue the process of filling the cache to it's full size. Fixes: 57e7071683ef ("RDMA/mlx5: Implement mkeys management via LIFO queue") Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/0f44f462ba22e45f72cb3d0ec6a748634086b8d0.1725362530.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>	2024-12-05 10:32:08 -05:00
Benjamin Poirier	078cabb5a6	RDMA/mlx5: Drop redundant work canceling from clean_keys() JIRA: https://issues.redhat.com/browse/RHEL-52869 Upstream-status: v6.12-rc1 commit 30e6bd8d3b5639f8f4261e5e6c0917ce264b8dc2 Author: Michael Guralnik <michaelgur@nvidia.com> Date: Tue Sep 3 14:24:47 2024 +0300 RDMA/mlx5: Drop redundant work canceling from clean_keys() The canceling of dealyed work in clean_keys() is a leftover from years back and was added to prevent races in the cleanup process of MR cache. The cleanup process was rewritten a few years ago and the canceling of delayed work and flushing of workqueue was added before the call to clean_keys(). Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/943d21f5a9dba7b98a3e1d531e3561ffe9745d71.1725362530.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>	2024-12-05 10:32:08 -05:00
Benjamin Poirier	0a5939cf86	RDMA/mlx5: Enable ATS when allocating kernel MRs JIRA: https://issues.redhat.com/browse/RHEL-52869 Upstream-status: v6.12-rc1 commit 34efda1735a179cd233479a99f09728825748ea1 Author: Maher Sanalla <msanalla@nvidia.com> Date: Mon Sep 2 13:37:03 2024 +0300 RDMA/mlx5: Enable ATS when allocating kernel MRs When creating kernel MRs, it is not definitive whether they will be used for peer-to-peer transactions or for other usecases, since address mapping is performed only after the MR is created. Since peer-to-peer transactions benefit significantly from ATS performance-wise, enable ATS on newly-allocated kernel MRs when supported. Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Gal Shalom <galshalom@nvidia.com> Link: https://patch.msgid.link/fafd4c9f14cf438d2882d88649c2947e1d05d0b4.1725273403.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>	2024-12-05 10:32:08 -05:00
Benjamin Poirier	35e261e6d5	RDMA/mlx5: Add support for DMABUF MR registrations with Data-direct JIRA: https://issues.redhat.com/browse/RHEL-52869 Upstream-status: v6.12-rc1 commit de8f847a5114ff7cfcdfc114af8485c431dec703 Author: Yishai Hadas <yishaih@nvidia.com> Date: Thu Aug 1 15:05:16 2024 +0300 RDMA/mlx5: Add support for DMABUF MR registrations with Data-direct Add support for DMABUF MR registrations with Data-direct device. Upon userspace calling to register a DMABUF MR with the data direct bit set, the below algorithm will be followed. 1) Obtain a pinned DMABUF umem from the IB core using the user input parameters (FD, offset, length) and the DMA PF device. The DMA PF device is needed to allow the IOMMU to enable the DMA PF to access the user buffer over PCI. 2) Create a KSM MKEY by setting its entries according to the user buffer VA to IOVA mapping, with the MKEY being the data direct device-crossed MKEY. This KSM MKEY is umrable and will be used as part of the MR cache. The PD for creating it is the internal device 'data direct' kernel one. 3) Create a crossing MKEY that points to the KSM MKEY using the crossing access mode. 4) Manage the KSM MKEY by adding it to a list of 'data direct' MKEYs managed on the mlx5_ib device. 5) Return the crossing MKEY to the user, created with its supplied PD. Upon DMA PF unbind flow, the driver will revoke the KSM entries. The final deregistration will occur under the hood once the application deregisters its MKEY. Notes: - This version supports only the PINNED UMEM mode, so there is no dependency on ODP. - The IOVA supplied by the application must be system page aligned due to HW translations of KSM. - The crossing MKEY will not be umrable or part of the MR cache, as we cannot change its crossed (i.e. KSM) MKEY over UMR. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://patch.msgid.link/1f99d8020ed540d9702b9e2252a145a439609ba6.1722512548.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>	2024-12-05 10:32:08 -05:00
Benjamin Poirier	20c8b9560d	IB/mlx5: Create UMR QP just before first reg_mr occurs JIRA: https://issues.redhat.com/browse/RHEL-52869 Upstream-status: v6.11-rc1 commit 638420115cc4ad6c3a2683bf46a052b505abb202 Author: Jianbo Liu <jianbol@nvidia.com> Date: Mon Jun 3 13:26:38 2024 +0300 IB/mlx5: Create UMR QP just before first reg_mr occurs UMR QP is not used in some cases, so move QP and its CQ creations from driver load flow to the time first reg_mr occurs, that is when MR interfaces are first called. The initialization of dev->umrc.pd and dev->umrc.lock is still done in driver load because pd is needed for mlx5_mkey_cache_init and the lock is reused to protect against the concurrent creation. When testing 4G bytes memory registration latency with rtool [1] and 8 threads in parallel, there is minor performance degradation (<5% for the max latency) is seen for the first reg_mr with this change. Link: https://github.com/paravmellanox/rtool [1] Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Link: https://lore.kernel.org/r/55d3c4f8a542fd974d8a4c5816eccfb318a59b38.1717409369.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>	2024-12-05 10:32:03 -05:00
Kamal Heib	bd8a5c9fe8	RDMA: Pass uverbs_attr_bundle as part of '.reg_user_mr_dmabuf' API JIRA: https://issues.redhat.com/browse/RHEL-56245 commit 3aa73c6b795b9aaaf933f3c95495d85fc0de39e3 Author: Yishai Hadas <yishaih@nvidia.com> Date: Thu Aug 1 15:05:15 2024 +0300 RDMA: Pass uverbs_attr_bundle as part of '.reg_user_mr_dmabuf' API Pass uverbs_attr_bundle as part of '.reg_user_mr_dmabuf' API instead of udata. This enables passing some new ioctl attributes to the drivers, as will be introduced in the next patches for mlx5 driver. Change the involved drivers accordingly. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://patch.msgid.link/9a25b2fc02443f7c36c2d93499ae25252b6afd40.1722512548.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Kamal Heib <kheib@redhat.com>	2024-10-27 19:32:22 -04:00
Benjamin Poirier	65b31fd85d	RDMA/mlx5: Ensure created mkeys always have a populated rb_key JIRA: https://issues.redhat.com/browse/RHEL-45365 Upstream-status: v6.10-rc5 commit 2e4c02fdecf2f6f55cefe48cb82d93fa4f8e2204 Author: Jason Gunthorpe <jgg@ziepe.ca> Date: Tue May 28 15:52:54 2024 +0300 RDMA/mlx5: Ensure created mkeys always have a populated rb_key cachable and mmkey.rb_key together are used by mlx5_revoke_mr() to put the MR/mkey back into the cache. In all cases they should be set correctly. alloc_cacheable_mr() was setting cachable but not filling rb_key, resulting in cache_ent_find_and_store() bucketing them all into a 0 length entry. implicit_get_child_mr()/mlx5_ib_alloc_implicit_mr() failed to set cachable or rb_key at all, so the cache was not working at all for implicit ODP. Cc: stable@vger.kernel.org Fixes: 8c1185fef68c ("RDMA/mlx5: Change check for cacheable mkeys") Fixes: dd1b913fb0d0 ("RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/7778c02dfa0999a30d6746c79a23dd7140a9c729.1716900410.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>	2024-08-07 09:20:56 -04:00
Benjamin Poirier	6e9c7539af	RDMA/mlx5: Follow rb_key.ats when creating new mkeys JIRA: https://issues.redhat.com/browse/RHEL-45365 Upstream-status: v6.10-rc5 commit f637040c3339a2ed8c12d65ad03f9552386e2fe7 Author: Jason Gunthorpe <jgg@ziepe.ca> Date: Tue May 28 15:52:53 2024 +0300 RDMA/mlx5: Follow rb_key.ats when creating new mkeys When a cache ent already exists but doesn't have any mkeys in it the cache will automatically create a new one based on the specification in the ent->rb_key. ent->ats was missed when creating the new key and so ma_translation_mode was not being set even though the ent requires it. Cc: stable@vger.kernel.org Fixes: 73d09b2fe833 ("RDMA/mlx5: Introduce mlx5r_cache_rb_key") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://lore.kernel.org/r/7c5613458ecb89fbe5606b7aa4c8d990bdea5b9a.1716900410.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>	2024-08-07 09:20:56 -04:00
Benjamin Poirier	135c40a751	RDMA/mlx5: Remove extra unlock on error path JIRA: https://issues.redhat.com/browse/RHEL-45365 Upstream-status: v6.10-rc5 commit c1eb2512596fb3542357bb6c34c286f5e0374538 Author: Jason Gunthorpe <jgg@ziepe.ca> Date: Tue May 28 15:52:52 2024 +0300 RDMA/mlx5: Remove extra unlock on error path The below commit lifted the locking out of this function but left this error path unlock behind resulting in unbalanced locking. Remove the missed unlock too. Cc: stable@vger.kernel.org Fixes: 627122280c87 ("RDMA/mlx5: Add work to remove temporary entries from the cache") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://lore.kernel.org/r/78090c210c750f47219b95248f9f782f34548bb1.1716900410.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>	2024-08-07 09:20:56 -04:00
Benjamin Poirier	329a8fd0d5	RDMA/mlx5: Adding remote atomic access flag to updatable flags JIRA: https://issues.redhat.com/browse/RHEL-45365 Upstream-status: v6.10-rc1 commit 2ca7e93bc963d9ec2f5c24d117176851454967af Author: Or Har-Toov <ohartoov@nvidia.com> Date: Wed Apr 3 13:36:01 2024 +0300 RDMA/mlx5: Adding remote atomic access flag to updatable flags Currently IB_ACCESS_REMOTE_ATOMIC is blocked from being updated via UMR although in some cases it should be possible. These cases are checked in mlx5r_umr_can_reconfig function. Fixes: `ef3642c4f5` ("RDMA/mlx5: Fix error unwinds for rereg_mr") Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Link: https://lore.kernel.org/r/24dac73e2fa48cb806f33a932d97f3e402a5ea2c.1712140377.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>	2024-08-07 09:20:55 -04:00
Benjamin Poirier	93e14b10b1	RDMA/mlx5: Change check for cacheable mkeys JIRA: https://issues.redhat.com/browse/RHEL-45365 Upstream-status: v6.10-rc1 commit 8c1185fef68cc603b954fece2a434c9f851d6a86 Author: Or Har-Toov <ohartoov@nvidia.com> Date: Wed Apr 3 13:36:00 2024 +0300 RDMA/mlx5: Change check for cacheable mkeys umem can be NULL for user application mkeys in some cases. Therefore umem can't be used for checking if the mkey is cacheable and it is changed for checking a flag that indicates it. Also make sure that all mkeys which are not returned to the cache will be destroyed. Fixes: dd1b913fb0d0 ("RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow") Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Link: https://lore.kernel.org/r/2690bc5c6896bcb937f89af16a1ff0343a7ab3d0.1712140377.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>	2024-08-07 09:20:55 -04:00
Benjamin Poirier	6084c26b8e	RDMA/mlx5: Support handling of SW encap ICM area JIRA: https://issues.redhat.com/browse/RHEL-24466 Upstream-status: v6.8-rc1 commit a429ec96c07f3020af12029acefc46f42ff5c91c Author: Shun Hao <shunh@nvidia.com> Date: Wed Dec 6 16:01:35 2023 +0200 RDMA/mlx5: Support handling of SW encap ICM area New type for this ICM area, now the user can allocate/deallocate the new type of SW encap ICM memory, to store the encap header data which are managed by SW. Signed-off-by: Shun Hao <shunh@nvidia.com> Link: https://lore.kernel.org/r/546fe43fc700240709e30acf7713ec6834d652bd.1701871118.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>	2024-07-22 15:33:46 -04:00
Benjamin Poirier	9edb9be3ae	RDMA/mlx5: Implement mkeys management via LIFO queue JIRA: https://issues.redhat.com/browse/RHEL-24466 Upstream-status: v6.7-rc1 Conflicts: - drivers/infiniband/hw/mlx5/mr.c Due to commit `78e2a0dd38` (tags/kernel-5.14.0-417.el9) which is an OOO backport of a53e215f9007 RDMA/mlx5: Fix mkey cache WQ flush (v6.7-rc1) -> Adjust context commit 57e7071683ef6148c9f5ea0ba84598d2ba681375 Author: Shay Drory <shayd@nvidia.com> Date: Thu Sep 21 11:07:16 2023 +0300 RDMA/mlx5: Implement mkeys management via LIFO queue Currently, mkeys are managed via xarray. This implementation leads to a degradation in cases many MRs are unregistered in parallel, due to xarray internal implementation, for example: deregistration 1M MRs via 64 threads is taking ~15% more time[1]. Hence, implement mkeys management via LIFO queue, which solved the degradation. [1] 2.8us in kernel v5.19 compare to 3.2us in kernel v6.4 Signed-off-by: Shay Drory <shayd@nvidia.com> Link: https://lore.kernel.org/r/fde3d4cfab0f32f0ccb231cd113298256e1502c5.1695283384.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>	2024-07-22 15:33:43 -04:00
Benjamin Poirier	a6bf200975	RDMA/mlx5: align MR mem allocation size to power-of-two JIRA: https://issues.redhat.com/browse/RHEL-24466 Upstream-status: v6.6-rc1 commit 52b4bdd28c861e7331543f4b5a0853b80c9fd3fa Author: Yuanyuan Zhong <yzhong@purestorage.com> Date: Thu Jun 29 15:32:48 2023 -0600 RDMA/mlx5: align MR mem allocation size to power-of-two The MR memory allocation requests extra bytes to guarantee that there is enough space to find the memory aligned to MLX5_UMR_ALIGN. For power-of-two sizes, the alignment can be guaranteed by kmalloc() according to commit `59bb47985c` ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)"). So if target alignment is power-of-two and adding the extra bytes crosses a power-of-two boundary, use the next power-of-two as the allocation size. Signed-off-by: Yuanyuan Zhong <yzhong@purestorage.com> Link: https://lore.kernel.org/r/20230629213248.3184245-2-yzhong@purestorage.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Benjamin Poirier <bpoirier@redhat.com>	2024-07-22 15:33:39 -04:00
Mohammad Kabat	78e2a0dd38	RDMA/mlx5: Fix mkey cache WQ flush JIRA: https://issues.redhat.com/browse/RHEL-882 Upstream-status: v6.7-rc1 Conflicts: - drivers/infiniband/hw/mlx5/mr.c Context diff due to commit not ported yet in contrast to upstream ordering. 57e7071683ef ("RDMA/mlx5: Implement mkeys management via LIFO queue") commit a53e215f90079f617360439b1b6284820731e34c Author: Moshe Shemesh <moshe@nvidia.com> Date: Wed Oct 25 20:49:59 2023 +0300 RDMA/mlx5: Fix mkey cache WQ flush The cited patch tries to ensure no pending works on the mkey cache workqueue by disabling adding new works and call flush_workqueue(). But this workqueue also has delayed works which might still be pending the delay time to be queued. Add cancel_delayed_work() for the delayed works which waits to be queued and then the flush_workqueue() will flush all works which are already queued and running. Fixes: 374012b00457 ("RDMA/mlx5: Fix mkey cache possible deadlock on cleanup") Link: https://lore.kernel.org/r/b8722f14e7ed81452f791764a26d2ed4cfa11478.1698256179.git.leon@kernel.org Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Mohammad Kabat <mkabat@redhat.com>	2024-01-16 09:00:54 +00:00
Mohammad Kabat	60f8094ff7	RDMA/mlx5: Remove not-used cache disable flag JIRA: https://issues.redhat.com/browse/RHEL-882 Upstream-status: v6.6-rc5 commit c99a7457e5bb873914a74307ba2df85f6799203b Author: Leon Romanovsky <leon@kernel.org> Date: Thu Sep 28 20:20:47 2023 +0300 RDMA/mlx5: Remove not-used cache disable flag During execution of mlx5_mkey_cache_cleanup(), there is a guarantee that MR are not registered and/or destroyed. It means that we don't need newly introduced cache disable flag. Fixes: 374012b00457 ("RDMA/mlx5: Fix mkey cache possible deadlock on cleanup") Link: https://lore.kernel.org/r/c7e9c9f98c8ae4a7413d97d9349b29f5b0a23dbe.1695921626.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Mohammad Kabat <mkabat@redhat.com>	2024-01-16 09:00:54 +00:00
Mohammad Kabat	bd35524428	RDMA/mlx5: Fix mkey cache possible deadlock on cleanup JIRA: https://issues.redhat.com/browse/RHEL-882 Upstream-status: v6.6-rc5 commit 374012b0045780b7ad498be62e85153009bb7fe9 Author: Shay Drory <shayd@nvidia.com> Date: Tue Sep 12 13:07:45 2023 +0300 RDMA/mlx5: Fix mkey cache possible deadlock on cleanup Fix the deadlock by refactoring the MR cache cleanup flow to flush the workqueue without holding the rb_lock. This adds a race between cache cleanup and creation of new entries which we solve by denied creation of new entries after cache cleanup started. Lockdep: WARNING: possible circular locking dependency detected [ 2785.326074 ] 6.2.0-rc6_for_upstream_debug_2023_01_31_14_02 #1 Not tainted [ 2785.339778 ] ------------------------------------------------------ [ 2785.340848 ] devlink/53872 is trying to acquire lock: [ 2785.341701 ] ffff888124f8c0c8 ((work_completion)(&(&ent->dwork)->work)){+.+.}-{0:0}, at: __flush_work+0xc8/0x900 [ 2785.343403 ] [ 2785.343403 ] but task is already holding lock: [ 2785.344464 ] ffff88817e8f1260 (&dev->cache.rb_lock){+.+.}-{3:3}, at: mlx5_mkey_cache_cleanup+0x77/0x250 [mlx5_ib] [ 2785.346273 ] [ 2785.346273 ] which lock already depends on the new lock. [ 2785.346273 ] [ 2785.347720 ] [ 2785.347720 ] the existing dependency chain (in reverse order) is: [ 2785.349003 ] [ 2785.349003 ] -> #1 (&dev->cache.rb_lock){+.+.}-{3:3}: [ 2785.350160 ] __mutex_lock+0x14c/0x15c0 [ 2785.350962 ] delayed_cache_work_func+0x2d1/0x610 [mlx5_ib] [ 2785.352044 ] process_one_work+0x7c2/0x1310 [ 2785.352879 ] worker_thread+0x59d/0xec0 [ 2785.353636 ] kthread+0x28f/0x330 [ 2785.354370 ] ret_from_fork+0x1f/0x30 [ 2785.355135 ] [ 2785.355135 ] -> #0 ((work_completion)(&(&ent->dwork)->work)){+.+.}-{0:0}: [ 2785.356515 ] __lock_acquire+0x2d8a/0x5fe0 [ 2785.357349 ] lock_acquire+0x1c1/0x540 [ 2785.358121 ] __flush_work+0xe8/0x900 [ 2785.358852 ] __cancel_work_timer+0x2c7/0x3f0 [ 2785.359711 ] mlx5_mkey_cache_cleanup+0xfb/0x250 [mlx5_ib] [ 2785.360781 ] mlx5_ib_stage_pre_ib_reg_umr_cleanup+0x16/0x30 [mlx5_ib] [ 2785.361969 ] __mlx5_ib_remove+0x68/0x120 [mlx5_ib] [ 2785.362960 ] mlx5r_remove+0x63/0x80 [mlx5_ib] [ 2785.363870 ] auxiliary_bus_remove+0x52/0x70 [ 2785.364715 ] device_release_driver_internal+0x3c1/0x600 [ 2785.365695 ] bus_remove_device+0x2a5/0x560 [ 2785.366525 ] device_del+0x492/0xb80 [ 2785.367276 ] mlx5_detach_device+0x1a9/0x360 [mlx5_core] [ 2785.368615 ] mlx5_unload_one_devl_locked+0x5a/0x110 [mlx5_core] [ 2785.369934 ] mlx5_devlink_reload_down+0x292/0x580 [mlx5_core] [ 2785.371292 ] devlink_reload+0x439/0x590 [ 2785.372075 ] devlink_nl_cmd_reload+0xaef/0xff0 [ 2785.372973 ] genl_family_rcv_msg_doit.isra.0+0x1bd/0x290 [ 2785.374011 ] genl_rcv_msg+0x3ca/0x6c0 [ 2785.374798 ] netlink_rcv_skb+0x12c/0x360 [ 2785.375612 ] genl_rcv+0x24/0x40 [ 2785.376295 ] netlink_unicast+0x438/0x710 [ 2785.377121 ] netlink_sendmsg+0x7a1/0xca0 [ 2785.377926 ] sock_sendmsg+0xc5/0x190 [ 2785.378668 ] __sys_sendto+0x1bc/0x290 [ 2785.379440 ] __x64_sys_sendto+0xdc/0x1b0 [ 2785.380255 ] do_syscall_64+0x3d/0x90 [ 2785.381031 ] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 2785.381967 ] [ 2785.381967 ] other info that might help us debug this: [ 2785.381967 ] [ 2785.383448 ] Possible unsafe locking scenario: [ 2785.383448 ] [ 2785.384544 ] CPU0 CPU1 [ 2785.385383 ] ---- ---- [ 2785.386193 ] lock(&dev->cache.rb_lock); [ 2785.386940 ] lock((work_completion)(&(&ent->dwork)->work)); [ 2785.388327 ] lock(&dev->cache.rb_lock); [ 2785.389425 ] lock((work_completion)(&(&ent->dwork)->work)); [ 2785.390414 ] [ 2785.390414 ] * DEADLOCK * [ 2785.390414 ] [ 2785.391579 ] 6 locks held by devlink/53872: [ 2785.392341 ] #0: ffffffff84c17a50 (cb_lock){++++}-{3:3}, at: genl_rcv+0x15/0x40 [ 2785.393630 ] #1: ffff888142280218 (&devlink->lock_key){+.+.}-{3:3}, at: devlink_get_from_attrs_lock+0x12d/0x2d0 [ 2785.395324 ] #2: ffff8881422d3c38 (&dev->lock_key){+.+.}-{3:3}, at: mlx5_unload_one_devl_locked+0x4a/0x110 [mlx5_core] [ 2785.397322 ] #3: ffffffffa0e59068 (mlx5_intf_mutex){+.+.}-{3:3}, at: mlx5_detach_device+0x60/0x360 [mlx5_core] [ 2785.399231 ] #4: ffff88810e3cb0e8 (&dev->mutex){....}-{3:3}, at: device_release_driver_internal+0x8d/0x600 [ 2785.400864 ] #5: ffff88817e8f1260 (&dev->cache.rb_lock){+.+.}-{3:3}, at: mlx5_mkey_cache_cleanup+0x77/0x250 [mlx5_ib] Fixes: b95845178328 ("RDMA/mlx5: Change the cache structure to an RB-tree") Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Mohammad Kabat <mkabat@redhat.com>	2024-01-16 09:00:50 +00:00
Mohammad Kabat	43409b82de	RDMA/mlx5: Fix assigning access flags to cache mkeys JIRA: https://issues.redhat.com/browse/RHEL-882 Upstream-status: v6.6-rc5 commit 4f14c6c0213e1def48f0f887d35f44095416c67d Author: Michael Guralnik <michaelgur@nvidia.com> Date: Wed Sep 20 13:01:54 2023 +0300 RDMA/mlx5: Fix assigning access flags to cache mkeys After the change to use dynamic cache structure, new cache entries can be added and the mkey allocation can no longer assume that all mkeys created for the cache have access_flags equal to zero. Example of a flow that exposes the issue: A user registers MR with RO on a HCA that cannot UMR RO and the mkey is created outside of the cache. When the user deregisters the MR, a new cache entry is created to store mkeys with RO. Later, the user registers 2 MRs with RO. The first MR is reused from the new cache entry. When we try to get the second mkey from the cache we see the entry is empty so we go to the MR cache mkey allocation flow which would have allocated a mkey with no access flags, resulting the user getting a MR without RO. Fixes: dd1b913fb0d0 ("RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow") Reviewed-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://lore.kernel.org/r/8a802700b82def3ace3f77cd7a9ad9d734af87e7.1695203958.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Mohammad Kabat <mkabat@redhat.com>	2024-01-16 09:00:50 +00:00
Mohammad Kabat	355d681600	RDMA/mlx5: Fix trailing / formatting in block comment JIRA: https://issues.redhat.com/browse/RHEL-882 Upstream-status: v6.6-rc1 commit d3c2245754220b0fd4c6868e2fe48741a734be58 Author: Rohit Chavan <roheetchavan@gmail.com> Date: Tue Aug 22 17:34:51 2023 +0530 RDMA/mlx5: Fix trailing / formatting in block comment Resolved a formatting issue where the trailing */ in a block comment was placed on a same line instead of separate line. Signed-off-by: Rohit Chavan <roheetchavan@gmail.com> Link: https://lore.kernel.org/r/20230822120451.8215-1-roheetchavan@gmail.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Mohammad Kabat <mkabat@redhat.com>	2024-01-16 09:00:49 +00:00
Mohammad Kabat	d3605b9034	RDMA/mlx5: Allow relaxed ordering read in VFs and VMs JIRA: https://issues.redhat.com/browse/RHEL-882 Upstream-status: v6.4-rc1 commit bd4ba605c4a92b46ab414626a4f969a19103f97a Author: Avihai Horon <avihaih@nvidia.com> Date: Mon Apr 10 16:07:53 2023 +0300 RDMA/mlx5: Allow relaxed ordering read in VFs and VMs According to PCIe spec, Enable Relaxed Ordering value in the VF's PCI config space is wired to 0 and PF relaxed ordering (RO) setting should be applied to the VF. In QEMU (and maybe others), when assigning VFs, the RO bit in PCI config space is not emulated properly and is always set to 0. Therefore, pcie_relaxed_ordering_enabled() always returns 0 for VFs and VMs and thus MKeys can't be created with RO read even if the PF supports it. pcie_relaxed_ordering_enabled() check was added to avoid a syndrome when creating a MKey with relaxed ordering (RO) enabled when the driver's relaxed_ordering_read_pci_enabled HCA capability is out of sync with FW. With the new relaxed_ordering_read capability this can't happen, as it's set regardless of RO value in PCI config space and thus can't change during runtime. Hence, to allow RO read in VFs and VMs, use the new HCA capability relaxed_ordering_read without checking pcie_relaxed_ordering_enabled(). The old capability checks are kept for backward compatibility with older FWs. Allowing RO in VFs and VMs is valuable since it can greatly improve performance on some setups. For example, testing throughput of a VF on an AMD EPYC 7763 and ConnectX-6 Dx setup showed roughly 60% performance improvement. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Link: https://lore.kernel.org/r/e7048640d66c341a8fa0465e099926e7989184bc.1681131553.git.leon@kernel.org Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Mohammad Kabat <mkabat@redhat.com>	2024-01-16 08:58:15 +00:00
Mohammad Kabat	5c47ac2781	net/mlx5: Update relaxed ordering read HCA capabilities JIRA: https://issues.redhat.com/browse/RHEL-882 Upstream-status: v6.4-rc1 Conflicts: - include/linux/mlx5/mlx5_ifc.h Context diff due to commit already ported in contrast to upsteram order. 7a9770f1bfea ("net/mlx5: Handle sync reset unload event") commit ccbbfe0682f2fff1e157413c30092dd27c50e20e Author: Avihai Horon <avihaih@nvidia.com> Date: Mon Apr 10 16:07:52 2023 +0300 net/mlx5: Update relaxed ordering read HCA capabilities Rename existing HCA capability relaxed_ordering_read to relaxed_ordering_read_pci_enabled. This is in accordance with recent PRM change to better describe the capability, as it's set only if both the device supports relaxed ordering (RO) read and RO is enabled in PCI config space. In addition, add new HCA capability relaxed_ordering_read which is set if the device supports RO read, regardless of RO in PCI config space. This will be used in the following patch to allow RO in VFs and VMs. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Link: https://lore.kernel.org/r/caa0002fd8135086357dfcc368e2f5cc73b08480.1681131553.git.leon@kernel.org Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Mohammad Kabat <mkabat@redhat.com>	2024-01-16 08:58:15 +00:00
Mohammad Kabat	3ac47bdfe0	RDMA/mlx5: Remove pcie_relaxed_ordering_enabled() check for RO write JIRA: https://issues.redhat.com/browse/RHEL-882 Upstream-status: v6.4-rc1 commit ed4b0661cce119870edb1994fd06c9cbc1dc05c3 Author: Avihai Horon <avihaih@nvidia.com> Date: Mon Apr 10 16:07:50 2023 +0300 RDMA/mlx5: Remove pcie_relaxed_ordering_enabled() check for RO write pcie_relaxed_ordering_enabled() check was added to avoid a syndrome when creating a MKey with relaxed ordering (RO) enabled when the driver's relaxed_ordering_{read,write} HCA capabilities are out of sync with FW. While this can happen with relaxed_ordering_read, it can't happen with relaxed_ordering_write as it's set if the device supports RO write, regardless of RO in PCI config space, and thus can't change during runtime. Therefore, drop the pcie_relaxed_ordering_enabled() check for relaxed_ordering_write while keeping it for relaxed_ordering_read. Doing so will also allow the usage of RO write in VFs and VMs (where RO in PCI config space is not reported/emulated properly). Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Link: https://lore.kernel.org/r/7e8f55e31572c1702d69cae015a395d3a824a38a.1681131553.git.leon@kernel.org Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Mohammad Kabat <mkabat@redhat.com>	2024-01-16 08:58:14 +00:00
Mohammad Kabat	ed51981b72	RDMA/mlx5: Check reg_create() create for errors Bugzilla: https://bugzilla.redhat.com/2165364 Upstream-status: v6.3-rc1 commit 8e6e49ccf1a0f2b3257394dc8610bb6d48859d3f Author: Dan Carpenter <error27@gmail.com> Date: Mon Feb 6 17:40:35 2023 +0300 RDMA/mlx5: Check reg_create() create for errors The reg_create() can fail. Check for errors before dereferencing it. Fixes: dd1b913fb0d0 ("RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow") Signed-off-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/r/Y+ERYy4wN0LsKsm+@kili Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Mohammad Kabat <mkabat@redhat.com>	2023-07-25 07:41:02 +00:00
Mohammad Kabat	f578ac6b23	RDMA/mlx5: Remove impossible check of mkey cache cleanup failure Bugzilla: https://bugzilla.redhat.com/2165364 Upstream-status: v6.3-rc1 commit 85f9e38a5ac7d397f9bb5e57901b2d6af4dcc3b9 Author: Leon Romanovsky <leon@kernel.org> Date: Thu Feb 2 11:03:07 2023 +0200 RDMA/mlx5: Remove impossible check of mkey cache cleanup failure mlx5_mkey_cache_cleanup() can't fail and can be changed to be void. Link: https://lore.kernel.org/r/1acd9528995d083114e7dec2a2afc59436406583.1675328463.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Mohammad Kabat <mkabat@redhat.com>	2023-07-25 07:41:01 +00:00
Mohammad Kabat	e2c7c9f405	RDMA/mlx5: Fix MR cache debugfs error in IB representors mode Bugzilla: https://bugzilla.redhat.com/2165364 Upstream-status: v6.3-rc1 commit 828cf5936bea2438c21a3a6c303b34a2a1f6c3c2 Author: Leon Romanovsky <leon@kernel.org> Date: Thu Feb 2 11:03:06 2023 +0200 RDMA/mlx5: Fix MR cache debugfs error in IB representors mode Block MR cache debugfs creation for IB representor flow as MR cache shouldn't be used at all in that mode. As part of this change, add missing debugfs cleanup in error path too. This change fixes the following debugfs errors: bond0: (slave enp8s0f1): Enslaving as a backup interface with an up link mlx5_core 0000:08:00.0: lag map: port 1:1 port 2:1 mlx5_core 0000:08:00.0: shared_fdb:1 mode:queue_affinity mlx5_core 0000:08:00.0: Operation mode is single FDB debugfs: Directory '2' with parent '/' already present! ... debugfs: Directory '22' with parent '/' already present! Fixes: 73d09b2fe833 ("RDMA/mlx5: Introduce mlx5r_cache_rb_key") Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://lore.kernel.org/r/482a78c54acbcfa1742a0e06a452546428900ffa.1675328463.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Mohammad Kabat <mkabat@redhat.com>	2023-07-25 07:41:01 +00:00
Mohammad Kabat	465d1920e3	RDMA/mlx5: Add work to remove temporary entries from the cache Bugzilla: https://bugzilla.redhat.com/2165364 Upstream-status: v6.3-rc1 commit 627122280c878cf5d3cda2d2c5a0a8f6a7e35cb7 Author: Michael Guralnik <michaelgur@nvidia.com> Date: Thu Jan 26 00:28:07 2023 +0200 RDMA/mlx5: Add work to remove temporary entries from the cache The non-cache mkeys are stored in the cache only to shorten restarting application time. Don't store them longer than needed. Configure cache entries that store non-cache MRs as temporary entries. If 30 seconds have passed and no user reclaimed the temporarily cached mkeys, an asynchronous work will destroy the mkeys entries. Link: https://lore.kernel.org/r/20230125222807.6921-7-michaelgur@nvidia.com Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Mohammad Kabat <mkabat@redhat.com>	2023-07-25 07:40:58 +00:00
Mohammad Kabat	31ecbaf6d5	RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow Bugzilla: https://bugzilla.redhat.com/2165364 Upstream-status: v6.3-rc1 commit dd1b913fb0d0e3e6d55e92d2319d954474dd66ac Author: Michael Guralnik <michaelgur@nvidia.com> Date: Thu Jan 26 00:28:06 2023 +0200 RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow Currently, when dereging an MR, if the mkey doesn't belong to a cache entry, it will be destroyed. As a result, the restart of applications with many non-cached mkeys is not efficient since all the mkeys are destroyed and then recreated. This process takes a long time (for 100,000 MRs, it is ~20 seconds for dereg and ~28 seconds for re-reg). To shorten the restart runtime, insert all cacheable mkeys to the cache. If there is no fitting entry to the mkey properties, create a temporary entry that fits it. After a predetermined timeout, the cache entries will shrink to the initial high limit. The mkeys will still be in the cache when consuming them again after an application restart. Therefore, the registration will be much faster (for 100,000 MRs, it is ~4 seconds for dereg and ~5 seconds for re-reg). The temporary cache entries created to store the non-cache mkeys are not exposed through sysfs like the default cache entries. Link: https://lore.kernel.org/r/20230125222807.6921-6-michaelgur@nvidia.com Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Mohammad Kabat <mkabat@redhat.com>	2023-07-25 07:40:58 +00:00
Mohammad Kabat	4b769bdd9c	RDMA/mlx5: Introduce mlx5r_cache_rb_key Bugzilla: https://bugzilla.redhat.com/2165364 Upstream-status: v6.3-rc1 commit 73d09b2fe8336f5f37935e46418666ddbcd3c343 Author: Michael Guralnik <michaelgur@nvidia.com> Date: Thu Jan 26 00:28:05 2023 +0200 RDMA/mlx5: Introduce mlx5r_cache_rb_key Switch from using the mkey order to using the new struct as the key to the RB tree of cache entries. The key is all the mkey properties that UMR operations can't modify. Using this key to define the cache entries and to search and create cache mkeys. Link: https://lore.kernel.org/r/20230125222807.6921-5-michaelgur@nvidia.com Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Mohammad Kabat <mkabat@redhat.com>	2023-07-25 07:40:58 +00:00
Mohammad Kabat	5ded6a3406	RDMA/mlx5: Change the cache structure to an RB-tree Bugzilla: https://bugzilla.redhat.com/2165364 Upstream-status: v6.3-rc1 commit b9584517832858a0f78d6851d09b697a829514cd Author: Michael Guralnik <michaelgur@nvidia.com> Date: Thu Jan 26 00:28:04 2023 +0200 RDMA/mlx5: Change the cache structure to an RB-tree Currently, the cache structure is a static linear array. Therefore, his size is limited to the number of entries in it and is not expandable. The entries are dedicated to mkeys of size 2^x and no access_flags. Mkeys with different properties are not cacheable. In this patch, we change the cache structure to an RB-tree. This will allow to extend the cache to support more entries with different mkey properties. Link: https://lore.kernel.org/r/20230125222807.6921-4-michaelgur@nvidia.com Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Mohammad Kabat <mkabat@redhat.com>	2023-07-25 07:40:58 +00:00
Mohammad Kabat	a16ac19808	RDMA/mlx5: Don't keep umrable 'page_shift' in cache entries Bugzilla: https://bugzilla.redhat.com/2165364 Upstream-status: v6.3-rc1 commit a2a88b8e22d1b202225d0e40b02ad068afab2ccb Author: Aharon Landau <aharonl@nvidia.com> Date: Thu Jan 26 00:28:02 2023 +0200 RDMA/mlx5: Don't keep umrable 'page_shift' in cache entries mkc.log_page_size can be changed using UMR. Therefore, don't treat it as a cache entry property. Removing it from struct mlx5_cache_ent. All cache mkeys will be created with default PAGE_SHIFT, and updated with the needed page_shift using UMR when passing them to a user. Link: https://lore.kernel.org/r/20230125222807.6921-2-michaelgur@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Mohammad Kabat <mkabat@redhat.com>	2023-07-25 07:40:58 +00:00
Mohammad Kabat	ec5b2aa546	RDMA/mlx5: no need to kfree NULL pointer Bugzilla: https://bugzilla.redhat.com/2165363 Upstream-status: v6.2-rc1 commit 6978837ce42f8bea85041fc08c854f4e28852b3e Author: Li Zhijian <lizhijian@fujitsu.com> Date: Sat Dec 3 11:37:14 2022 +0800 RDMA/mlx5: no need to kfree NULL pointer Goto label 'free' where it will kfree the 'in' is not needed though it's safe to kfree NULL. Return err code directly to simplify the code. 1973 free: 1974 kfree(in); 1975 return err; Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Link: https://lore.kernel.org/r/20221203033714.25870-1-lizhijian@fujitsu.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Mohammad Kabat <mkabat@redhat.com>	2023-07-13 14:05:39 +00:00
Mohammad Kabat	67ce60afc1	RDMA/mlx5: Remove duplicate assignment in umr_rereg_pas() Bugzilla: https://bugzilla.redhat.com/2165355 Upstream-status: v6.1-rc1 commit e866025b3b1557f9bf6ab1770f297fe6d90e0417 Author: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Date: Thu Sep 8 17:30:58 2022 +0900 RDMA/mlx5: Remove duplicate assignment in umr_rereg_pas() The same value is assigned to 'mr->ibmr.length'. Remove redundant one. Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Link: https://lore.kernel.org/r/20220908083058.3993700-1-matsuda-daisuke@fujitsu.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Mohammad Kabat <mkabat@redhat.com>	2023-06-29 09:21:35 +00:00
Mohammad Kabat	e8be5cdccd	IB/mlx5: Remove duplicate header inclusion related to ODP Bugzilla: https://bugzilla.redhat.com/2165355 Upstream-status: v6.1-rc1 commit ca7ef7adad979648da5006152320caa71b746134 Author: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Date: Tue Aug 23 02:51:31 2022 +0000 IB/mlx5: Remove duplicate header inclusion related to ODP rdma/ib_umem.h and rdma/ib_verbs.h are included by rdma/ib_umem_odp.h. This patch removes the redundant entries. Link: https://lore.kernel.org/r/20220823025131.862811-1-matsuda-daisuke@fujitsu.com Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Mohammad Kabat <mkabat@redhat.com>	2023-06-29 09:18:29 +00:00
Herton R. Krzesinski	9de1dafa38	Merge: RDMA: Add support of RDMA dmabuf for mlx5 driver MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1940 Upstream status: v6.1. Bugzilla: https://bugzilla.redhat.com/2123401 Tested: cuda pyverbs tests passed. Add support for DMABUF FD's when creating a devx umem in the RDMA mlx5 driver. This allows applications to create work queues directly on GPU memory where the GPU fully controls the data flow out of the RDMA NIC. Signed-off-by: Kamal Heib <kheib@redhat.com> Approved-by: Íñigo Huguet <ihuguet@redhat.com> Approved-by: José Ignacio Tornos Martínez <jtornosm@redhat.com> Approved-by: Jonathan Toppins <jtoppins@redhat.com> Signed-off-by: Herton R. Krzesinski <herton@redhat.com>	2023-02-08 01:35:25 +00:00
Kamal Heib	ee8da65a04	RDMA/mlx5: Enable ATS support for MRs and umems Bugzilla: https://bugzilla.redhat.com/2123401 commit 72b2f7608a59727e7c2e5b11cff2749c2c080fac Author: Jason Gunthorpe <jgg@ziepe.ca> Date: Thu Sep 1 11:20:56 2022 -0300 RDMA/mlx5: Enable ATS support for MRs and umems For mlx5 if ATS is enabled in the PCI config then the device will use ATS requests for only certain DMA operations. This has to be opted in by the SW side based on the mkey or umem settings. ATS slows down the PCI performance, so it should only be set in cases when it is needed. All of these cases revolve around optimizing PCI P2P transfers and avoiding bad cases where the bus just doesn't work. Link: https://lore.kernel.org/r/4-v1-bd147097458e+ede-umem_dmabuf_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Kamal Heib <kheib@redhat.com>	2023-01-24 10:44:39 -05:00
Mohammad Kabat	3b72ca1f74	RDMA/mlx5: Rename the mkey cache variables and functions Bugzilla: https://bugzilla.redhat.com/2112947 Upstream-status: v6.0-rc1 commit 0113780870b1597ae49f30abfa4957c239f913d3 Author: Aharon Landau <aharonl@nvidia.com> Date: Tue Jul 26 10:19:11 2022 +0300 RDMA/mlx5: Rename the mkey cache variables and functions After replacing the MR cache with an Mkey cache, rename the variables and functions to fit the new meaning. Link: https://lore.kernel.org/r/20220726071911.122765-6-michaelgur@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Mohammad Kabat <mkabat@redhat.com>	2023-01-19 10:21:38 +00:00
Mohammad Kabat	1f1e9871f0	RDMA/mlx5: Store in the cache mkeys instead of mrs Bugzilla: https://bugzilla.redhat.com/2112947 Upstream-status: v6.0-rc1 commit 6b7533869523ae58e2b914551305b0e47cbeb247 Author: Aharon Landau <aharonl@nvidia.com> Date: Tue Jul 26 10:19:10 2022 +0300 RDMA/mlx5: Store in the cache mkeys instead of mrs Currently, the driver stores mlx5_ib_mr struct in the cache entries, although the only use of the cached MR is the mkey. Store only the mkey in the cache. Link: https://lore.kernel.org/r/20220726071911.122765-5-michaelgur@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Mohammad Kabat <mkabat@redhat.com>	2023-01-19 10:21:38 +00:00
Mohammad Kabat	47033b9181	RDMA/mlx5: Store the number of in_use cache mkeys instead of total_mrs Bugzilla: https://bugzilla.redhat.com/2112947 Upstream-status: v6.0-rc1 commit 19591f134c59703dfc272356808e6fe2037d0d40 Author: Aharon Landau <aharonl@nvidia.com> Date: Tue Jul 26 10:19:09 2022 +0300 RDMA/mlx5: Store the number of in_use cache mkeys instead of total_mrs total_mrs is used only to calculate the number of mkeys currently in use. To simplify things, replace it with a new member called "in_use" and directly store the number of mkeys currently in use. Link: https://lore.kernel.org/r/20220726071911.122765-4-michaelgur@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Mohammad Kabat <mkabat@redhat.com>	2023-01-19 10:21:38 +00:00
Mohammad Kabat	0fad9155ff	RDMA/mlx5: Replace cache list with Xarray Bugzilla: https://bugzilla.redhat.com/2112947 Upstream-status: v6.0-rc1 commit 86457a92df1bebdcd8e20afa286427e4b525aa08 Author: Aharon Landau <aharonl@nvidia.com> Date: Tue Jul 26 10:19:08 2022 +0300 RDMA/mlx5: Replace cache list with Xarray The Xarray allows us to store the cached mkeys in memory efficient way. Entries are reserved in the Xarray using xa_cmpxchg before calling to the upcoming callbacks to avoid allocations in interrupt context. The xa_cmpxchg can sleep when using GFP_KERNEL, so we call it in a loop to ensure one reserved entry for each process trying to reserve. Link: https://lore.kernel.org/r/20220726071911.122765-3-michaelgur@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Mohammad Kabat <mkabat@redhat.com>	2023-01-19 10:21:37 +00:00
Mohammad Kabat	52ce7c8519	RDMA/mlx5: Replace ent->lock with xa_lock Bugzilla: https://bugzilla.redhat.com/2112947 Upstream-status: v6.0-rc1 commit 17ae355926ed1832449d52748334b8fa799301f1 Author: Aharon Landau <aharonl@nvidia.com> Date: Tue Jul 26 10:19:07 2022 +0300 RDMA/mlx5: Replace ent->lock with xa_lock In the next patch, ent->list will be replaced with an xarray. The xarray uses an internal lock to protect the indexes. Use it to protect all the entry fields, and get rid of ent->lock. Link: https://lore.kernel.org/r/20220726071911.122765-2-michaelgur@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Mohammad Kabat <mkabat@redhat.com>	2023-01-19 10:21:37 +00:00
Mohammad Kabat	80b7ba3042	RDMA/mlx5: Support handling of modify-header pattern ICM area Bugzilla: https://bugzilla.redhat.com/2112947 Upstream-status: v6.0-rc1 commit a6492af3805ae3d9fe872545aa4797971b4e2a33 Author: Yevgeny Kliteynik <kliteyn@nvidia.com> Date: Tue Jun 7 15:47:45 2022 +0300 RDMA/mlx5: Support handling of modify-header pattern ICM area Add support for allocate/deallocate and registering MR of the new type of ICM area. Support exists only for devices that support sw_owner_v2. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Mohammad Kabat <mkabat@redhat.com>	2023-01-19 10:21:32 +00:00
Mohammad Kabat	6fc24e3532	RDMA/mlx5: Use mlx5_umr_post_send_wait() to update xlt Bugzilla: https://bugzilla.redhat.com/2112940 Upstream-status: v5.19-rc1 commit 636bdbfc9996567af1a3ed89ecf92ea5028a8a89 Author: Aharon Landau <aharonl@nvidia.com> Date: Tue Apr 12 10:24:06 2022 +0300 RDMA/mlx5: Use mlx5_umr_post_send_wait() to update xlt Move mlx5_ib_update_mr_pas logic to umr.c, and use mlx5_umr_post_send_wait() instead of mlx5_ib_post_send_wait(). Since it is the last use of mlx5_ib_post_send_wait(), remove it. Link: https://lore.kernel.org/r/55a4972f156aba3592a2fc9bcb33e2059acf295f.1649747695.git.leonro@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Mohammad Kabat <mkabat@redhat.com>	2022-12-18 10:14:05 +00:00
Mohammad Kabat	b788b13bee	RDMA/mlx5: Use mlx5_umr_post_send_wait() to update MR pas Bugzilla: https://bugzilla.redhat.com/2112940 Upstream-status: v5.19-rc1 commit b3d47ebd490823514a2d637caee0870b6f192b07 Author: Aharon Landau <aharonl@nvidia.com> Date: Tue Apr 12 10:24:05 2022 +0300 RDMA/mlx5: Use mlx5_umr_post_send_wait() to update MR pas Move mlx5_ib_update_mr_pas logic to umr.c, and use mlx5_umr_post_send_wait() instead of mlx5_ib_post_send_wait(). Link: https://lore.kernel.org/r/ed8f2ee6c64804072155d727149abf7105f92536.1649747695.git.leonro@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Mohammad Kabat <mkabat@redhat.com>	2022-12-18 10:14:05 +00:00
Mohammad Kabat	55d25b3623	RDMA/mlx5: Move creation and free of translation tables to umr.c Bugzilla: https://bugzilla.redhat.com/2112940 Upstream-status: v5.19-rc1 commit 916adb491e84bc8b130618e4969c1d196525abf2 Author: Aharon Landau <aharonl@nvidia.com> Date: Tue Apr 12 10:24:04 2022 +0300 RDMA/mlx5: Move creation and free of translation tables to umr.c The only use of the translation tables is to update the mkey translation by a UMR operation. Move the responsibility of creating and freeing them to umr.c Link: https://lore.kernel.org/r/1d93f1381be82a22aaf1168cdbdfb227eac1ce62.1649747695.git.leonro@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Mohammad Kabat <mkabat@redhat.com>	2022-12-18 10:14:05 +00:00
Mohammad Kabat	7081430569	RDMA/mlx5: Use mlx5_umr_post_send_wait() to rereg pd access Bugzilla: https://bugzilla.redhat.com/2112940 Upstream-status: v5.19-rc1 commit 483196764091621b8dd45d7af29e7a9c874a9f19 Author: Aharon Landau <aharonl@nvidia.com> Date: Tue Apr 12 10:24:03 2022 +0300 RDMA/mlx5: Use mlx5_umr_post_send_wait() to rereg pd access Move rereg_pd_access logic to umr.c, and use mlx5_umr_post_send_wait() instead of mlx5_ib_post_send_wait(). Link: https://lore.kernel.org/r/18da4f47edbc2561f652b7ee4e7a5269e866af77.1649747695.git.leonro@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Mohammad Kabat <mkabat@redhat.com>	2022-12-18 10:14:05 +00:00
Mohammad Kabat	0691a97570	RDMA/mlx5: Use mlx5_umr_post_send_wait() to revoke MRs Bugzilla: https://bugzilla.redhat.com/2112940 Upstream-status: v5.19-rc1 commit 33e8aa8e049811de87cd1c16a2ead85e0c9f9606 Author: Aharon Landau <aharonl@nvidia.com> Date: Tue Apr 12 10:24:02 2022 +0300 RDMA/mlx5: Use mlx5_umr_post_send_wait() to revoke MRs Move the revoke_mr logic to umr.c, and using mlx5_umr_post_send_wait() instead of mlx5_ib_post_send_wait(). In the new implementation, do not zero out the access flags. Before reusing the MR, we will update it to the required access. Link: https://lore.kernel.org/r/63717dfdaf6007f81b3e6dbf598f5bf3875ce86f.1649747695.git.leonro@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Mohammad Kabat <mkabat@redhat.com>	2022-12-18 10:14:05 +00:00

1 2 3 4 5 ...

303 Commits