JIRA: https://issues.redhat.com/browse/RHEL-77880
commit 7566752e4d7d7fc0186531aa800068a7243f95c1
Author: Chiara Meiohas <cmeiohas@nvidia.com>
Date: Thu Oct 31 11:31:14 2024 +0200
RDMA/nldev: Add IB device and net device rename events
Implement event sending for IB device rename and IB device
port associated netdevice rename.
In iproute2, rdma monitor displays the IB device name, port
and the netdevice name when displaying event info. Since
users can modiy these names, we track and notify on renaming
events.
Note: In order to receive netdevice rename events, drivers
must use the ib_device_set_netdev() API when attaching net
devices to IB devices.
$ rdma monitor
$ rmmod mlx5_ib
[UNREGISTER] dev 1 rocep8s0f1
[UNREGISTER] dev 0 rocep8s0f0
$ modprobe mlx5_ib
[REGISTER] dev 2 mlx5_0
[NETDEV_ATTACH] dev 2 mlx5_0 port 1 netdev 4 eth2
[REGISTER] dev 3 mlx5_1
[NETDEV_ATTACH] dev 3 mlx5_1 port 1 netdev 5 eth3
[RENAME] dev 2 rocep8s0f0
[RENAME] dev 3 rocep8s0f1
$ devlink dev eswitch set pci/0000:08:00.0 mode switchdev
[UNREGISTER] dev 2 rocep8s0f0
[REGISTER] dev 4 mlx5_0
[NETDEV_ATTACH] dev 4 mlx5_0 port 30 netdev 4 eth2
[RENAME] dev 4 rdmap8s0f0
$ echo 4 > /sys/class/net/eth2/device/sriov_numvfs
[NETDEV_ATTACH] dev 4 rdmap8s0f0 port 2 netdev 7 eth4
[NETDEV_ATTACH] dev 4 rdmap8s0f0 port 3 netdev 8 eth5
[NETDEV_ATTACH] dev 4 rdmap8s0f0 port 4 netdev 9 eth6
[NETDEV_ATTACH] dev 4 rdmap8s0f0 port 5 netdev 10 eth7
[REGISTER] dev 5 mlx5_0
[NETDEV_ATTACH] dev 5 mlx5_0 port 1 netdev 11 eth8
[REGISTER] dev 6 mlx5_1
[NETDEV_ATTACH] dev 6 mlx5_1 port 1 netdev 12 eth9
[RENAME] dev 5 rocep8s0f0v0
[RENAME] dev 6 rocep8s0f0v1
[REGISTER] dev 7 mlx5_0
[NETDEV_ATTACH] dev 7 mlx5_0 port 1 netdev 13 eth10
[RENAME] dev 7 rocep8s0f0v2
[REGISTER] dev 8 mlx5_0
[NETDEV_ATTACH] dev 8 mlx5_0 port 1 netdev 14 eth11
[RENAME] dev 8 rocep8s0f0v3
$ ip link set eth2 name myeth2
[NETDEV_RENAME] netdev 4 myeth2
$ ip link set eth1 name myeth1
** no events received, because eth1 is not attached to
an IB device **
Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com>
Link: https://patch.msgid.link/093c978ef2766fd3ab4ff8798eeb68f2f11582f6.1730367038.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Kamal Heib <kheib@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-56245
commit 6ff57a2ea7c2911f80457a5a3a5b4370756ad475
Author: Qianqiang Liu <qianqiang.liu@163.com>
Date: Fri Sep 27 22:06:13 2024 +0800
RDMA/nldev: Fix NULL pointer dereferences issue in rdma_nl_notify_event
nlmsg_put() may return a NULL pointer assigned to nlh, which will later
be dereferenced in nlmsg_end().
Fixes: 9cbed5aab5ae ("RDMA/nldev: Add support for RDMA monitoring")
Link: https://patch.msgid.link/r/Zva71Yf3F94uxi5A@iZbp1asjb3cy8ks0srf007Z
Signed-off-by: Qianqiang Liu <qianqiang.liu@163.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Kamal Heib <kheib@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-56245
commit 7acad3c442df6d5158c5b732a7a0ccf3a01d9b30
Author: Nathan Chancellor <nathan@kernel.org>
Date: Mon Sep 16 06:24:34 2024 -0700
RDMA/nldev: Add missing break in rdma_nl_notify_err_msg()
Clang warns (or errors with CONFIG_WERROR=y):
drivers/infiniband/core/nldev.c:2795:2: error: unannotated fall-through between switch labels [-Werror,-Wimplicit-fallthrough]
2795 | default:
| ^
Clang is a little more pedantic than GCC, which does not warn when
falling through to a case that is just break or return. Clang's version
is more in line with the kernel's own stance in deprecated.rst, which
states that all switch/case blocks must end in either break,
fallthrough, continue, goto, or return. Add the missing break to silence
the warning.
Fixes: 9cbed5aab5ae ("RDMA/nldev: Add support for RDMA monitoring")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20240916-rdma-fix-clang-fallthrough-nl_notify_err_msg-v1-1-89de6a7423f1@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Kamal Heib <kheib@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-56245
commit 12fb1153c53bf9b53e299c9775b84fa7838640f7
Author: Chiara Meiohas <cmeiohas@nvidia.com>
Date: Mon Sep 9 20:30:25 2024 +0300
RDMA/nldev: Expose whether RDMA monitoring is supported
Extend the "rdma sys" command to display whether RDMA
monitoring is supported.
RDMA monitoring is not supported in mlx4 because it does
not use the ib_device_set_netdev() API, which sends the
RDMA events.
Example output for kernel where monitoring is supported:
$ rdma sys show
netns shared privileged-qkey off monitor on copy-on-fork on
Example output for kernel where monitoring is not supported:
$ rdma sys show
netns shared privileged-qkey off monitor off copy-on-fork on
Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com>
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://patch.msgid.link/20240909173025.30422-8-michaelgur@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Kamal Heib <kheib@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-56245
commit 9cbed5aab5aeea420d0aa945733bf608449d44fb
Author: Chiara Meiohas <cmeiohas@nvidia.com>
Date: Mon Sep 9 20:30:24 2024 +0300
RDMA/nldev: Add support for RDMA monitoring
Introduce a new netlink command to allow rdma event monitoring.
The rdma events supported now are IB device
registration/unregistration and net device attachment/detachment.
Example output of rdma monitor and the commands which trigger
the events:
$ rdma monitor
$ rmmod mlx5_ib
[UNREGISTER] dev 1 rocep8s0f1
[UNREGISTER] dev 0 rocep8s0f0
$ modprobe mlx5_ib
[REGISTER] dev 2 mlx5_0
[NETDEV_ATTACH] dev 2 mlx5_0 port 1 netdev 4 eth2
[REGISTER] dev 3 mlx5_1
[NETDEV_ATTACH] dev 3 mlx5_1 port 1 netdev 5 eth3
$ devlink dev eswitch set pci/0000:08:00.0 mode switchdev
[UNREGISTER] dev 2 rocep8s0f0
[REGISTER] dev 4 mlx5_0
[NETDEV_ATTACH] dev 4 mlx5_0 port 30 netdev 4 eth2
$ echo 4 > /sys/class/net/eth2/device/sriov_numvfs
[NETDEV_ATTACH] dev 4 rdmap8s0f0 port 2 netdev 7 eth4
[NETDEV_ATTACH] dev 4 rdmap8s0f0 port 3 netdev 8 eth5
[NETDEV_ATTACH] dev 4 rdmap8s0f0 port 4 netdev 9 eth6
[NETDEV_ATTACH] dev 4 rdmap8s0f0 port 5 netdev 10 eth7
[REGISTER] dev 5 mlx5_0
[NETDEV_ATTACH] dev 5 mlx5_0 port 1 netdev 11 eth8
[REGISTER] dev 6 mlx5_0
[NETDEV_ATTACH] dev 6 mlx5_0 port 1 netdev 12 eth9
[REGISTER] dev 7 mlx5_0
[NETDEV_ATTACH] dev 7 mlx5_0 port 1 netdev 13 eth10
[REGISTER] dev 8 mlx5_0
[NETDEV_ATTACH] dev 8 mlx5_0 port 1 netdev 14 eth11
$ echo 0 > /sys/class/net/eth2/device/sriov_numvfs
[UNREGISTER] dev 5 rocep8s0f0v0
[UNREGISTER] dev 6 rocep8s0f0v1
[UNREGISTER] dev 7 rocep8s0f0v2
[UNREGISTER] dev 8 rocep8s0f0v3
[NETDEV_DETACH] dev 4 rdmap8s0f0 port 2
[NETDEV_DETACH] dev 4 rdmap8s0f0 port 3
[NETDEV_DETACH] dev 4 rdmap8s0f0 port 4
[NETDEV_DETACH] dev 4 rdmap8s0f0 port 5
Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com>
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://patch.msgid.link/20240909173025.30422-7-michaelgur@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Kamal Heib <kheib@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-56245
commit df6d27a30970158466b632c82da09a9b24c30f4b
Author: Chiara Meiohas <cmeiohas@nvidia.com>
Date: Tue Jul 30 12:17:25 2024 +0300
RDMA/nldev: Enhance netlink message parsing and validation
Use strict parsing validation for set commands, and liberal
validation for get commands. Additionally, remove all usage of
nlmsg_parse_depricate().
Strict parsing validation fails when encountering unrecognized
attributes in the Netlink message, while liberal parsing
validation ignores them.
In 57d7a8fd904c ("rdma: Add an option to display driver-specific QPs in the rdma tool")
in iproute2, the attribute RDMA_NLDEV_ATTR_DRIVER_DETAILS
was added. This cause backwards compatibility issues when using
the rdma tool with the new attribute and an older kernel which does
recognize this attribute.
In this case, the command "rdma stat show mr" would fail, because the
new rdma tool would fill the netlink message with the new attribute and
the older kernel would fail as it used strict parsing and did not
recognize the new attribute.
In general, strict validation is appropriate for set commands as they
modify the system, while liberal validation is suitable for get
commands which only query system information.
Replace all uses of nlmsg_parse_deprecated() with __nlmsg_parse(),
using the NL_VALIDATE_LIBERAL flag.
The nlmsg_parse_deprecated() function internally calls
__nlmsg_parse() with the NL_VALIDATE_LIBERAL flag, but its name
is confusing.
Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://lore.kernel.org/r/f633a979a49db090d05c24a3ba83d30727bb777b.1722331020.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Kamal Heib <kheib@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-56247
Conflicts:
Drop the mlx5 hunks for now due to multiple missing changes in mlx5.
commit af48f95492dc1af36d9636a750ec492035c0ed7d
Author: Mark Zhang <markzhang@nvidia.com>
Date: Mon Jul 1 15:40:48 2024 +0300
RDMA/core: Introduce "name_assign_type" for an IB device
The name_assign_type indicates how the name is provided. Currently
these types are supported:
- RDMA_NAME_ASSIGN_TYPE_UNKNOWN: Unknown or not set;
- RDMA_NAME_ASSIGN_TYPE_USER: Name is provided by the user; The
user-created sub device, rxe and siw device has this type.
When filling nl device info, it is set in the new attribute
RDMA_NLDEV_ATTR_NAME_ASSIGN_TYPE. User-space tools like udev
"rdma_rename" could check this attribute to determine if this
device needs to be renamed or not.
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Link: https://lore.kernel.org/r/522591bef9a369cc8e5dcb77787e017bffee37fe.1719837610.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Kamal Heib <kheib@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-56247
commit 294424839b5ec2ecd17f4c8409796846b2b8dd31
Author: Mark Zhang <markzhang@nvidia.com>
Date: Sun Jun 16 19:08:41 2024 +0300
RDMA/nldev: Add support to dump device type and parent device if exists
If a device has a specific type or a parent device, dump them as well.
Example:
$ rdma dev show smi1
3: smi1: node_type ca fw 20.38.1002 node_guid 9803:9b03:009f:d5ef sys_image_guid 9803:9b03:009f:d5ee type smi parent ibp8s0f1
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Link: https://lore.kernel.org/r/4c022e3e34b5de1254a3b367d502a362cdd0c53a.1718553901.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Kamal Heib <kheib@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-56247
commit 060c642b2ab8b40b39f9db99c1d14c7d19ba507f
Author: Mark Zhang <markzhang@nvidia.com>
Date: Sun Jun 16 19:08:40 2024 +0300
RDMA/nldev: Add support to add/delete a sub IB device through netlink
Add new netlink commands and attributes to support adding and deleting
a sub IB device with admin privilege.
Examples:
$ rdma dev add smi1 type SMI parent ibp8s0f1
$ rdma dev del smi1
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Link: https://lore.kernel.org/r/77cbf1b36359642be8a8d8c5c2f4e585b544282f.1718553901.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Kamal Heib <kheib@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-56247
commit e18fa0bbcedf82aaa1db27079ef6a43e11367592
Author: Chiara Meiohas <cmeiohas@nvidia.com>
Date: Tue Apr 16 15:03:50 2024 +0300
RDMA/core: Add an option to display driver-specific QPs in the rdmatool
Utilize the -dd flag (driver-specific details) in the rdmatool
to view driver-specific QPs which are not exposed yet.
Add the netlink attribute to mark request to convey driver details and
use it to return QP subtype as a string.
$ rdma resource show qp link ibp8s0f1
link ibp8s0f1/1 lqpn 360 type UD state RTS sq-psn 0 comm [mlx5_ib]
link ibp8s0f1/1 lqpn 0 type SMI state RTS sq-psn 0 comm [ib_core]
link ibp8s0f1/1 lqpn 1 type GSI state RTS sq-psn 0 comm [ib_core]
$ rdma resource show qp link ibp8s0f1 -dd
link ibp8s0f1/1 lqpn 360 type UD state RTS sq-psn 0 comm [mlx5_ib]
link ibp8s0f1/1 lqpn 465 type DRIVER subtype REG_UMR state RTS sq-psn 0 comm [mlx5_ib]
link ibp8s0f1/1 lqpn 0 type SMI state RTS sq-psn 0 comm [ib_core]
link ibp8s0f1/1 lqpn 1 type GSI state RTS sq-psn 0 comm [ib_core]
$ rdma resource show
0: ibp8s0f0: pd 3 cq 4 qp 3 cm_id 0 mr 0 ctx 0 srq 2
1: ibp8s0f1: pd 3 cq 4 qp 3 cm_id 0 mr 0 ctx 0 srq 2
$ rdma resource show -dd
0: ibp8s0f0: pd 3 cq 4 qp 4 cm_id 0 mr 0 ctx 0 srq 2
1: ibp8s0f1: pd 3 cq 4 qp 4 cm_id 0 mr 0 ctx 0 srq 2
Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com>
Link: https://lore.kernel.org/r/2607bb3ddec3cae3443c2ea19e9f700825d20a98.1713268997.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Kamal Heib <kheib@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-56247
commit 7a1c2abf9a2be7d969b25e8d65567933335ca88e
Author: Yang Li <yang.lee@linux.alibaba.com>
Date: Tue Oct 24 08:38:15 2023 +0800
RDMA/core: Remove NULL check before dev_{put, hold}
The call netdev_{put, hold} of dev_{put, hold} will check NULL,
so there is no need to check before using dev_{put, hold},
remove it to silence the warning:
./drivers/infiniband/core/nldev.c:375:2-9: WARNING: NULL check before dev_{put, hold} functions is not needed.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=7047
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Link: https://lore.kernel.org/r/20231024003815.89742-1-yang.lee@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Kamal Heib <kheib@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-56247
commit 465d6b42f1a3b855c06da1d4d3b09907d261af69
Author: Patrisious Haddad <phaddad@nvidia.com>
Date: Mon Oct 9 13:43:58 2023 +0300
RDMA/core: Add support to set privileged QKEY parameter
Add netlink command that enables/disables privileged QKEY by default.
It is disabled by default, since according to IB spec only privileged
users are allowed to use privileged QKEY.
According to the IB specification rel-1.6, section 3.5.3:
"QKEYs with the most significant bit set are considered controlled
QKEYs, and a HCA does not allow a consumer to arbitrarily specify a
controlled QKEY."
Using rdma tool,
$rdma system set privileged-qkey on
When enabled non-privileged users would be able to use
controlled QKEYs which are considered privileged.
Using rdma tool,
$rdma system set privileged-qkey off
When disabled only privileged users would be able to use
controlled QKEYs.
You can also use the command below to check the parameter state:
$rdma system show
netns shared privileged-qkey off copy-on-fork on
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Link: https://lore.kernel.org/r/90398be70a9d23d2aa9d0f9fd11d2c264c1be534.1696848201.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Kamal Heib <kheib@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-56247
commit aebf8145e11a29a77dac15ee041a190676fac05f
Author: wenglianfa <wenglianfa@huawei.com>
Date: Mon Sep 18 21:11:09 2023 +0800
RDMA/core: Add support to dump SRQ resource in RAW format
Add support to dump SRQ resource in raw format. It enable drivers to
return the entire device specific SRQ context without setting each
field separately.
Example:
$ rdma res show srq -r
dev hns3 149000...
$ rdma res show srq -j -r
[{"ifindex":0,"ifname":"hns3","data":[149,0,0,...]}]
Signed-off-by: wenglianfa <wenglianfa@huawei.com>
Link: https://lore.kernel.org/r/20230918131110.3987498-3-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Kamal Heib <kheib@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2168936
commit fc8f93ad3e5485d45c992233c96acd902992dfc4
Author: Mark Zhang <markzhang@nvidia.com>
Date: Mon Nov 28 13:52:46 2022 +0200
RDMA/nldev: Fix failure to send large messages
Return "-EMSGSIZE" instead of "-EINVAL" when filling a QP entry, so that
new SKBs will be allocated if there's not enough room in current SKB.
Fixes: 65959522f8 ("RDMA: Add support to dump resource tracker in RAW format")
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Link: https://lore.kernel.org/r/b5e9c62f6b8369acab5648b661bf539cbceeffdc.1669636336.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Kamal Heib <kheib@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2168936
commit 67e6272d53386f9708f91c4d0015c4a1c470eef5
Author: Or Har-Toov <ohartoov@nvidia.com>
Date: Mon Nov 28 13:52:45 2022 +0200
RDMA/nldev: Add NULL check to silence false warnings
Using nlmsg_put causes static analysis tools to many
false positives of not checking the return value of nlmsg_put.
In all uses in nldev.c, payload parameter is 0 so NULL will never
be returned. So let's add useless checks to silence the warnings.
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://lore.kernel.org/r/bd924da89d5b4f5291a4a01d9b5ae47c0a9b6a3f.1669636336.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Kamal Heib <kheib@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2168936
commit ea5ef136e215fdef35f14010bc51fcd6686e6922
Author: Yuan Can <yuancan@huawei.com>
Date: Sat Nov 26 04:34:10 2022 +0000
RDMA/nldev: Add checks for nla_nest_start() in fill_stat_counter_qps()
As the nla_nest_start() may fail with NULL returned, the return value needs
to be checked.
Fixes: c4ffee7c9b ("RDMA/netlink: Implement counter dumpit calback")
Signed-off-by: Yuan Can <yuancan@huawei.com>
Link: https://lore.kernel.org/r/20221126043410.85632-1-yuancan@huawei.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Kamal Heib <kheib@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2168936
commit ecacb3751f254572af0009b9501e2cdc83a30b6a
Author: Mark Zhang <markzhang@nvidia.com>
Date: Mon Nov 7 10:51:36 2022 +0200
RDMA/nldev: Return "-EAGAIN" if the cm_id isn't from expected port
When filling a cm_id entry, return "-EAGAIN" instead of 0 if the cm_id
doesn'the have the same port as requested, otherwise an incomplete entry
may be returned, which causes "rdam res show cm_id" to return an error.
For example on a machine with two rdma devices with "rping -C 1 -v -s"
running background, the "rdma" command fails:
$ rdma -V
rdma utility, iproute2-5.19.0
$ rdma res show cm_id
link mlx5_0/- cm-idn 0 state LISTEN ps TCP pid 28056 comm rping src-addr 0.0.0.0:7174
error: Protocol not available
While with this fix it succeeds:
$ rdma res show cm_id
link mlx5_0/- cm-idn 0 state LISTEN ps TCP pid 26395 comm rping src-addr 0.0.0.0:7174
link mlx5_1/- cm-idn 0 state LISTEN ps TCP pid 26395 comm rping src-addr 0.0.0.0:7174
Fixes: 00313983cd ("RDMA/nldev: provide detailed CM_ID information")
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Link: https://lore.kernel.org/r/a08e898cdac5e28428eb749a99d9d981571b8ea7.1667810736.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Kamal Heib <kheib@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2120662
commit e945c653c8e972d1b81a88e474d79f801b60213a
Author: Jason Gunthorpe <jgg@ziepe.ca>
Date: Mon Apr 4 12:26:42 2022 -0300
RDMA: Split kernel-only global device caps from uverbs device caps
Split out flags from ib_device::device_cap_flags that are only used
internally to the kernel into kernel_cap_flags that is not part of the
uapi. This limits the device_cap_flags to being the same bitmap that will
be copied to userspace.
This cleanly splits out the uverbs flags from the kernel flags to avoid
confusion in the flags bitmap.
Add some short comments describing which each of the kernel flags is
connected to. Remove unused kernel flags.
Link: https://lore.kernel.org/r/0-v2-22c19e565eef+139a-kern_caps_jgg@nvidia.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Kamal Heib <kheib@redhat.com>
Bugzilla: http://bugzilla.redhat.com/2056772
commit 87e0eacb176f9500c2063d140c0a1d7fa51ab8a5
Author: Dan Carpenter <dan.carpenter@oracle.com>
Date: Wed Mar 16 11:39:48 2022 +0300
RDMA/nldev: Prevent underflow in nldev_stat_set_counter_dynamic_doit()
This code checks "index" for an upper bound but it does not check for
negatives. Change the type to unsigned to prevent underflows.
Fixes: 3c3c1f141639 ("RDMA/nldev: Allow optional-counter status configuration through RDMA netlink")
Link: https://lore.kernel.org/r/20220316083948.GC30941@kili
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Kamal Heib <kheib@redhat.com>
Bugzilla: http://bugzilla.redhat.com/2056770
commit 3c3c1f1416392382faa0238e76a70d7810aab2ef
Author: Aharon Landau <aharonl@nvidia.com>
Date: Fri Oct 8 15:24:35 2021 +0300
RDMA/nldev: Allow optional-counter status configuration through RDMA netlink
Provide an option to allow users to enable/disable optional counters
through RDMA netlink. Limiting it to users with ADMIN capability only.
Examples:
1. Enable optional counters cc_rx_ce_pkts and cc_rx_cnp_pkts (and
disable all others):
$ sudo rdma statistic set link rocep8s0f0/1 optional-counters \
cc_rx_ce_pkts,cc_rx_cnp_pkts
2. Remove all optional counters:
$ sudo rdma statistic unset link rocep8s0f0/1 optional-counters
Link: https://lore.kernel.org/r/20211008122439.166063-10-markzhang@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Kamal Heib <kheib@redhat.com>
Bugzilla: http://bugzilla.redhat.com/2056770
commit 822cf785ac6d9120386f59964d6d029f3f04a8e3
Author: Aharon Landau <aharonl@nvidia.com>
Date: Fri Oct 8 15:24:34 2021 +0300
RDMA/nldev: Split nldev_stat_set_mode_doit out of nldev_stat_set_doit
In order to allow expansion of the set command with more set options, take
the set mode out of the main set function.
Link: https://lore.kernel.org/r/20211008122439.166063-9-markzhang@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Kamal Heib <kheib@redhat.com>
Bugzilla: http://bugzilla.redhat.com/2056770
commit 7301d0a9834c7f1f0c91c1f0a46c7b191b1fd0da
Author: Aharon Landau <aharonl@nvidia.com>
Date: Fri Oct 8 15:24:33 2021 +0300
RDMA/nldev: Add support to get status of all counters
This patch adds the ability to get the name, index and status of all
counters for each link through RDMA netlink. This can be used for
user-space to get the current optional-counter mode.
Examples:
$ rdma statistic mode
link rocep8s0f0/1 optional-counters cc_rx_ce_pkts
$ rdma statistic mode supported
link rocep8s0f0/1 supported optional-counters cc_rx_ce_pkts,cc_rx_cnp_pkts,cc_tx_cnp_pkts
link rocep8s0f1/1 supported optional-counters cc_rx_ce_pkts,cc_rx_cnp_pkts,cc_tx_cnp_pkts
Link: https://lore.kernel.org/r/20211008122439.166063-8-markzhang@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Kamal Heib <kheib@redhat.com>
Bugzilla: http://bugzilla.redhat.com/2056770
commit 0dc89684605e8b60af645988d3e0d80a57b6e937
Author: Aharon Landau <aharonl@nvidia.com>
Date: Fri Oct 8 15:24:31 2021 +0300
RDMA/counter: Add an is_disabled field in struct rdma_hw_stats
Add a bitmap in rdma_hw_stat structure, with each bit indicates whether
the corresponding counter is currently disabled or not. By default
hwcounters are enabled.
Link: https://lore.kernel.org/r/20211008122439.166063-6-markzhang@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Kamal Heib <kheib@redhat.com>
Bugzilla: http://bugzilla.redhat.com/2056770
commit 13f30b0fa0a9fa4f713edbb262f2e451886ce242
Author: Aharon Landau <aharonl@nvidia.com>
Date: Fri Oct 8 15:24:29 2021 +0300
RDMA/counter: Add a descriptor in struct rdma_hw_stats
Add a counter statistic descriptor structure in rdma_hw_stats. In addition
to the counter name, more meta-information will be added. This code
extension is needed for optional-counter support in the following patches.
Link: https://lore.kernel.org/r/20211008122439.166063-4-markzhang@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Kamal Heib <kheib@redhat.com>
It is much saner to store a pointer to the kobject structure that contains
the cannonical stats pointer than to copy the stats pointers into a public
structure.
Future patches will require the sysfs pointer for other purposes.
Link: https://lore.kernel.org/r/f90551dfd296cde1cb507bbef27cca9891d19871.1623427137.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
This is being used to implement both the port and device global stats,
which is causing some confusion in the drivers. For instance EFA and i40iw
both seem to be misusing the device stats.
Split it into two ops so drivers that don't support one or the other can
leave the op NULL'd, making the calling code a little simpler to
understand.
Link: https://lore.kernel.org/r/1955c154197b2a159adc2dc97266ddc74afe420c.1623427137.git.leonro@nvidia.com
Tested-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The new attribute indicates that the kernel copies DMA pages on fork,
hence libibverbs' fork support through madvise and MADV_DONTFORK is not
needed.
The introduced attribute is always reported as supported since the kernel
has the patch that added the copy-on-fork behavior. This allows the
userspace library to identify older vs newer kernel versions. Extra care
should be taken when backporting this patch as it relies on the fact that
the copy-on-fork patch is merged, hence no check for support is added.
Don't backport this patch unless you also have the following series:
commit 70e806e4e6 ("mm: Do early cow for pinned pages during fork() for
ptes") and commit 4eae4efa2c ("hugetlb: do early cow when page pinned on
src mm").
Fixes: 70e806e4e6 ("mm: Do early cow for pinned pages during fork() for ptes")
Fixes: 4eae4efa2c ("hugetlb: do early cow when page pinned on src mm")
Link: https://lore.kernel.org/r/20210418121025.66849-1-galpress@amazon.com
Signed-off-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Add QP numbers that are associated with the SRQ to the SRQ information.
The QPs are displayed in a range form.
Sample output:
$ rdma res show srq
dev ibp8s0f0 srqn 0 type BASIC pdn 3 comm [ib_ipoib]
dev ibp8s0f0 srqn 4 type BASIC lqpn 125-128,130-140 pdn 9 pid 3581 comm ibv_srq_pingpon
dev ibp8s0f0 srqn 5 type BASIC lqpn 141-156 pdn 10 pid 3584 comm ibv_srq_pingpon
dev ibp8s0f0 srqn 6 type BASIC lqpn 157-172 pdn 11 pid 3590 comm ibv_srq_pingpon
dev ibp8s0f1 srqn 0 type BASIC pdn 3 comm [ib_ipoib]
dev ibp8s0f1 srqn 1 type BASIC lqpn 329-344 pdn 4 pid 3586 comm ibv_srq_pingpon
$ rdma res show srq lqpn 126-141
dev ibp8s0f0 srqn 4 type BASIC lqpn 126-128,130-140 pdn 9 pid 3581 comm ibv_srq_pingpon
dev ibp8s0f0 srqn 5 type BASIC lqpn 141 pdn 10 pid 3584 comm ibv_srq_pingpon
$ rdma res show srq lqpn 127
dev ibp8s0f0 srqn 4 type BASIC lqpn 127 pdn 9 pid 3581 comm ibv_srq_pingpon
Link: https://lore.kernel.org/r/79a4bd4caec2248fd9583cccc26786af8e4414fc.1618753110.git.leonro@nvidia.com
Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Extend the RDMA nldev return a SRQ information, like SRQ number, SRQ type,
PD number, CQ number and process ID that created that SRQ.
Sample output:
$ rdma res show srq
dev ibp8s0f0 srqn 0 type BASIC pdn 3 comm [ib_ipoib]
dev ibp8s0f0 srqn 4 type BASIC pdn 9 pid 3581 comm ibv_srq_pingpon
dev ibp8s0f0 srqn 5 type BASIC pdn 10 pid 3584 comm ibv_srq_pingpon
dev ibp8s0f0 srqn 6 type BASIC pdn 11 pid 3590 comm ibv_srq_pingpon
dev ibp8s0f1 srqn 0 type BASIC pdn 3 comm [ib_ipoib]
dev ibp8s0f1 srqn 1 type BASIC pdn 4 pid 3586 comm ibv_srq_pingpon
Link: https://lore.kernel.org/r/322f9210b95812799190dd4a0fb92f3a3bba0333.1618753110.git.leonro@nvidia.com
Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Extend the RDMA nldev return a context information, like ctx number and
process ID that created that context. This functionality is helpful to
find orphan contexts that are not closed for some reason.
Sample output:
$ rdma res show ctx
dev ibp8s0f0 ctxn 0 pid 980 comm ibv_rc_pingpong
dev ibp8s0f0 ctxn 1 pid 981 comm ibv_rc_pingpong
dev ibp8s0f0 ctxn 2 pid 992 comm ibv_rc_pingpong
dev ibp8s0f1 ctxn 0 pid 984 comm ibv_rc_pingpong
dev ibp8s0f1 ctxn 1 pid 987 comm ibv_rc_pingpong
$ rdma res show ctx dev ibp8s0f1
dev ibp8s0f1 ctxn 0 pid 984 comm ibv_rc_pingpong
dev ibp8s0f1 ctxn 1 pid 987 comm ibv_rc_pingpong
Link: https://lore.kernel.org/r/5c956acfeac4e9d532988575f3da7d64cb449374.1618753110.git.leonro@nvidia.com
Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Current code uses many different types when dealing with a port of a RDMA
device: u8, unsigned int and u32. Switch to u32 to clean up the logic.
This allows us to make (at least) the core view consistent and use the
same type. Unfortunately not all places can be converted. Many uverbs
functions expect port to be u8 so keep those places in order not to break
UAPIs. HW/Spec defined values must also not be changed.
With the switch to u32 we now can support devices with more than 255
ports. U32_MAX is reserved to make control logic a bit easier to deal
with. As a device with U32_MAX ports probably isn't going to happen any
time soon this seems like a non issue.
When a device with more than 255 ports is created uverbs will report the
RDMA device as having 255 ports as this is the max currently supported.
The verbs interface is not changed yet because the IBTA spec limits the
port size in too many places to be u8 and all applications that relies in
verbs won't be able to cope with this change. At this stage, we are
extending the interfaces that are using vendor channel solely
Once the limitation is lifted mlx5 in switchdev mode will be able to have
thousands of SFs created by the device. As the only instance of an RDMA
device that reports more than 255 ports will be a representor device and
it exposes itself as a RAW Ethernet only device CM/MAD/IPoIB and other
ULPs aren't effected by this change and their sysfs/interfaces that are
exposes to userspace can remain unchanged.
While here cleanup some alignment issues and remove unneeded sanity
checks (mainly in rdmavt),
Link: https://lore.kernel.org/r/20210301070420.439400-1-leon@kernel.org
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The bounded counter can't be reconfigured to be in auto mode, in attempt
to do it, the user will get an error, but without any hint why. Update
nldev interface to return an error message through extack mechanism.
Link: https://lore.kernel.org/r/20201230130240.180737-1-leon@kernel.org
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Calls to nla_strlcpy are now replaced by calls to nla_strscpy which is the new
name of this function.
Signed-off-by: Francis Laniel <laniel_francis@privacyrequired.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When dumping QPs bound to a counter, raw QPs should be allowed to dump
without the CAP_NET_RAW privilege. This is consistent with what "rdma res
show qp" does.
Fixes: c4ffee7c9b ("RDMA/netlink: Implement counter dumpit calback")
Link: https://lore.kernel.org/r/20200727095828.496195-1-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
With the "PID" category QPs have same PID will be bound to same counter;
If this category is not set then QPs have different PIDs will be bound
to same counter.
This is implemented for 2 reasons:
1. The counter is a limited resource, while there may be dozens of
applications, each of which creates several types of QPs, which means
it may doesn't have enough counter.
2. The system administrator needs all QPs created by all applications
with same type bound to one counter.
The counter name and PID is only make sense when "PID" category are
configured.
This category can also be used in combine with others, e.g. QP type.
Link: https://lore.kernel.org/r/20200702082933.424537-2-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Add support to get resource dump in raw format. It enable drivers to
return the entire device specific QP/CQ/MR context without a need from the
driver to set each field separately.
The raw query returns only the device specific data, general data is still
returned by using the existing queries.
Example:
$ rdma res show mr dev mlx5_1 mrn 2 -r -j
[{"ifindex":7,"ifname":"mlx5_1",
"data":[0,4,255,254,0,0,0,0,0,0,0,0,16,28,0,216,...]}]
Link: https://lore.kernel.org/r/20200623113043.1228482-9-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
In order to avoid double multiplexing of the resource when it is a cm id,
add a dedicated callback function. In addition remove fill_res_entry which
is not used anymore.
Link: https://lore.kernel.org/r/20200623113043.1228482-8-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
In order to avoid double multiplexing of the resource when it is a QP, add
a dedicated callback function.
Link: https://lore.kernel.org/r/20200623113043.1228482-7-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
In order to avoid double multiplexing of the resource when it is a CQ, add
a dedicated callback function.
Link: https://lore.kernel.org/r/20200623113043.1228482-6-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
In order to avoid double multiplexing of the resource when it is a MR, add
a dedicated callback function.
Link: https://lore.kernel.org/r/20200623113043.1228482-5-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
None of the drivers implement it, remove it.
Link: https://lore.kernel.org/r/20200623113043.1228482-4-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Do not decrease the reference count of resource tracker object twice in
the error flow of res_get_common_doit.
Fixes: c5dfe0ea6f ("RDMA/nldev: Add resource tracker doit callback")
Link: https://lore.kernel.org/r/20200507062942.98305-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>