Commit Graph

180 Commits

Author SHA1 Message Date
Patrick Talbert 98f52f1680 Merge: CNB96: netlink/devlink: update devlink & netlink to the v6.12
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5861

JIRA: https://issues.redhat.com/browse/RHEL-57756
Depends: !5257
Depends: !5851
Signed-off-by: Petr Oros <poros@redhat.com>

Approved-by: José Ignacio Tornos Martínez <jtornosm@redhat.com>
Approved-by: Davide Caratti <dcaratti@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Patrick Talbert <ptalbert@redhat.com>
2024-12-30 07:30:10 -05:00
Petr Oros b9ff5c853a netlink: add nlmsg_consume() and use it in devlink compat
JIRA: https://issues.redhat.com/browse/RHEL-57756

Upstream commit(s):
commit 8e69b3459ca1ed4f6f7bd0b0a11962ddb3e7d34a
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Wed Apr 3 13:22:59 2024 -0700

    netlink: add nlmsg_consume() and use it in devlink compat

    devlink_compat_running_version() sticks out when running
    netdevsim tests and watching dropped skbs. Add nlmsg_consume()
    for cases were we want to free a netlink skb but it is expected,
    rather than a drop. af_netlink code uses consume_skb() directly,
    which is fine, but some may prefer the symmetry of nlmsg_new() /
    nlmsg_consume().

    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-12-10 10:37:53 +01:00
Ivan Vecera f66c136a18 devlink: Constify the 'table_ops' parameter of devl_dpipe_table_register()
JIRA: https://issues.redhat.com/browse/RHEL-67125

commit 82dc29b9737edf2d13561ebcf6212c0b88c41129
Author: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Date:   Sun Jun 2 16:18:52 2024 +0200

    devlink: Constify the 'table_ops' parameter of devl_dpipe_table_register()

    "struct devlink_dpipe_table_ops" only contains some function pointers.

    Update "struct devlink_dpipe_table" and the 'table_ops' parameter of
    devl_dpipe_table_register() so that structures in drivers can be
    constified.

    Constifying these structures will move some data to a read-only section, so
    increase overall security.

    Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
    Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
    Reviewed-by: Ido Schimmel <idosch@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-12-03 15:46:16 +01:00
Petr Oros 053ad10203 devlink: use kvzalloc() to allocate devlink instance resources
JIRA: https://issues.redhat.com/browse/RHEL-57755

Upstream commit(s):
commit 730fffce4fd2eb7a0be2d0b6cd7e55e9194d76d5
Author: Jian Wen <wenjianhn@gmail.com>
Date:   Wed Mar 27 16:21:28 2024 +0800

    devlink: use kvzalloc() to allocate devlink instance resources

    During live migration of a virtual machine, the SR-IOV VF need to be
    re-registered. It may fail when the memory is badly fragmented.

    The related log is as follows.

        kernel: hv_netvsc 6045bdaa-c0d1-6045-bdaa-c0d16045bdaa eth0: VF slot 1 added
    ...
        kernel: kworker/0:0: page allocation failure: order:7, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0
        kernel: CPU: 0 PID: 24006 Comm: kworker/0:0 Tainted: G            E     5.4...x86_64 #1
        kernel: Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
        kernel: Workqueue: events work_for_cpu_fn
        kernel: Call Trace:
        kernel: dump_stack+0x8b/0xc8
        kernel: warn_alloc+0xff/0x170
        kernel: __alloc_pages_slowpath+0x92c/0xb2b
        kernel: ? get_page_from_freelist+0x1d4/0x1140
        kernel: __alloc_pages_nodemask+0x2f9/0x320
        kernel: alloc_pages_current+0x6a/0xb0
        kernel: kmalloc_order+0x1e/0x70
        kernel: kmalloc_order_trace+0x26/0xb0
        kernel: ? __switch_to_asm+0x34/0x70
        kernel: __kmalloc+0x276/0x280
        kernel: ? _raw_spin_unlock_irqrestore+0x1e/0x40
        kernel: devlink_alloc+0x29/0x110
        kernel: mlx5_devlink_alloc+0x1a/0x20 [mlx5_core]
        kernel: init_one+0x1d/0x650 [mlx5_core]
        kernel: local_pci_probe+0x46/0x90
        kernel: work_for_cpu_fn+0x1a/0x30
        kernel: process_one_work+0x16d/0x390
        kernel: worker_thread+0x1d3/0x3f0
        kernel: kthread+0x105/0x140
        kernel: ? max_active_store+0x80/0x80
        kernel: ? kthread_bind+0x20/0x20
        kernel: ret_from_fork+0x3a/0x50

    Signed-off-by: Jian Wen <wenjian1@xiaomi.com>
    Link: https://lore.kernel.org/r/20240327082128.942818-1-wenjian1@xiaomi.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-11-20 10:13:46 +01:00
Petr Oros d1450ffb82 devlink: fix port new reply cmd type
JIRA: https://issues.redhat.com/browse/RHEL-57755

Upstream commit(s):
commit 78a2f5e6c15d8dcbd6495bb9635c7cb89235dfc5
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Mon Mar 18 10:19:08 2024 +0100

    devlink: fix port new reply cmd type

    Due to a c&p error, port new reply fills-up cmd with wrong value,
    any other existing port command replies and notifications.

    Fix it by filling cmd with value DEVLINK_CMD_PORT_NEW.

    Skimmed through devlink userspace implementations, none of them cares
    about this cmd value.

    Reported-by: Chenyuan Yang <chenyuan0y@gmail.com>
    Closes: https://lore.kernel.org/all/ZfZcDxGV3tSy4qsV@cy-server/
    Fixes: cd76dcd68d ("devlink: Support add and delete devlink port")
    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Parav Pandit <parav@nvidia.com>
    Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
    Link: https://lore.kernel.org/r/20240318091908.2736542-1-jiri@resnulli.us
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-11-20 10:13:46 +01:00
CKI Backport Bot b0ecf5c2ec devlink: Support setting max_io_eqs
JIRA: https://issues.redhat.com/browse/RHEL-64903

commit 5af3e3876d567fb79a355bec1cb48e432d69b4fb
Author: Parav Pandit <parav@nvidia.com>
Date:   Sat Apr 6 04:05:37 2024 +0300

    devlink: Support setting max_io_eqs

    Many devices send event notifications for the IO queues,
    such as tx and rx queues, through event queues.

    Enable a privileged owner, such as a hypervisor PF, to set the number
    of IO event queues for the VF and SF during the provisioning stage.

    example:
    Get maximum IO event queues of the VF device::

      $ devlink port show pci/0000:06:00.0/2
      pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
          function:
              hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 10

    Set maximum IO event queues of the VF device::

      $ devlink port function set pci/0000:06:00.0/2 max_io_eqs 32

      $ devlink port show pci/0000:06:00.0/2
      pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
          function:
              hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 32

    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Shay Drory <shayd@nvidia.com>
    Signed-off-by: Parav Pandit <parav@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>
2024-10-25 09:01:55 +00:00
Rado Vrbovsky 40945cb730 Merge: CNB96: net/ethtool: rebase to v6.11
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5197

JIRA: https://issues.redhat.com/browse/RHEL-57750  
Depends: !5196

This updates the ethtool subsystem to v6.11. At the end of this series, the only remaining diffs from v6.11 are the RH_KABI_RESERVES in struct ethtool_ops, as shown by:  
`git diff v6.11 -- net/ethtool include/linux/ethtool.h include/uapi/linux/ethtool{,_netlink}.h Documentation/netlink/specs/ethtool.yaml Documentation/networking/ethtool-netlink.rst tools/net/ynl/ethtool.py`

Omitted-Fix: 9dbad38336a9 ("eth: bnxt: populate defaults in the RSS context struct")
 - bnxt has not been converted to .create_rxfh_context yet.
   This will be in a driver update later.

Omitted-Fix: cdc90f75387c ("pse-core: Conditionally set current limit during PI regulator registration")  
Omitted-Fix: 326f442784c2 ("net: pse-pd: pse_core: Fix pse regulator type")
 - All changes to pse_core.c omitted in the series.

Omitted-Fix: 2fa809b90617 ("net: pse-pd: Kconfig: Add missing Regulator API dependency")
 - Irrelevant. CONFIG_PSE_CONTROLLER is disabled.

Omitted-Fix: 93c3a96c301f ("net: pse-pd: Do not return EOPNOSUPP if config is null")
 - Contained in the merge conflict resolution backported in "net: ethtool: pse-pd: Fix possible null-deref".

Omitted-Fix: dda3529d2e84 ("net: pse-pd: Fix enabled status mismatch")
 - Irrelevant. CONFIG_PSE_CONTROLLER is disabled.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>

Approved-by: Antoine Tenart <atenart@redhat.com>
Approved-by: Ivan Vecera <ivecera@redhat.com>
Approved-by: Eric Chanudet <echanude@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2024-10-19 08:11:42 +00:00
Michal Schmidt 8e7994801b netlink: introduce type-checking attribute iteration
JIRA: https://issues.redhat.com/browse/RHEL-57750

commit e8058a49e67fe7bc7e4a0308851a3ca3a6d2e45d
Author: Johannes Berg <johannes.berg@intel.com>
Date:   Thu Mar 28 20:31:45 2024 +0100

    netlink: introduce type-checking attribute iteration

    There are, especially with multi-attr arrays, many cases
    of needing to iterate all attributes of a specific type
    in a netlink message or a nested attribute. Add specific
    macros to support that case.

    Also convert many instances using this spatch:

        @@
        iterator nla_for_each_attr;
        iterator name nla_for_each_attr_type;
        identifier nla;
        expression head, len, rem;
        expression ATTR;
        type T;
        identifier x;
        @@
        -nla_for_each_attr(nla, head, len, rem)
        +nla_for_each_attr_type(nla, ATTR, head, len, rem)
         {
        <... T x; ...>
        -if (nla_type(nla) == ATTR) {
         ...
        -}
         }

        @@
        identifier nla;
        iterator nla_for_each_nested;
        iterator name nla_for_each_nested_type;
        expression attr, rem;
        expression ATTR;
        type T;
        identifier x;
        @@
        -nla_for_each_nested(nla, attr, rem)
        +nla_for_each_nested_type(nla, ATTR, attr, rem)
         {
        <... T x; ...>
        -if (nla_type(nla) == ATTR) {
         ...
        -}
         }

        @@
        iterator nla_for_each_attr;
        iterator name nla_for_each_attr_type;
        identifier nla;
        expression head, len, rem;
        expression ATTR;
        type T;
        identifier x;
        @@
        -nla_for_each_attr(nla, head, len, rem)
        +nla_for_each_attr_type(nla, ATTR, head, len, rem)
         {
        <... T x; ...>
        -if (nla_type(nla) != ATTR) continue;
         ...
         }

        @@
        identifier nla;
        iterator nla_for_each_nested;
        iterator name nla_for_each_nested_type;
        expression attr, rem;
        expression ATTR;
        type T;
        identifier x;
        @@
        -nla_for_each_nested(nla, attr, rem)
        +nla_for_each_nested_type(nla, ATTR, attr, rem)
         {
        <... T x; ...>
        -if (nla_type(nla) != ATTR) continue;
         ...
         }

    Although I had to undo one bad change this made, and
    I also adjusted some other code for whitespace and to
    use direct variable initialization now.

    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Link: https://lore.kernel.org/r/20240328203144.b5a6c895fb80.I1869b44767379f204998ff44dd239803f39c23e0@changeid
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Conflicts:
	drivers/net/ethernet/netronome/nfp/nfp_net_common.c
	- The driver lacks .ndo_bridge_setlink implementation in RHEL 9.
	net/core/bpf_sk_storage.c
	- Missing commit bcc29b7f5af6 ("bpf: Add length check for
	  SK_DIAG_BPF_STORAGE_REQ_MAP_FD parsing")

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
2024-10-01 12:19:13 +02:00
Petr Oros f1cea59299 devlink: extend devlink_param *set pointer
JIRA: https://issues.redhat.com/browse/RHEL-59901

Conflicts:
- drivers/crypto/marvell/octeontx2/otx2_cpt_devlink.c - chunk ommited
  due to missing 82f89f1aa6ca33 ("crypto: octeontx2 - add devlink option
  to set t106 mode")
- drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c - chunk ommited
  due to missing dd784287863345 ("octeontx2-af: Add new devlink param to
  configure maximum usable NIX block LFs")
- Unmerged path net/dsa/devlink.c

Upstream commit(s):
commit 5625ca5640caa3fb797f155601d56379d260d6ba
Author: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Date:   Fri Apr 19 04:08:49 2024 -0400

    devlink: extend devlink_param *set pointer

    Extend devlink_param *set function pointer to take extack as a param.
    Sometimes it is needed to pass information to the end user from set
    function. It is more proper to use for that netlink instead of passing
    message to dmesg.

    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
    Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-09-25 15:38:32 +02:00
Petr Oros 850da3338c devlink: Fix devlink parallel commands processing
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit d7d75124965aee23e5e4421d78376545cf070b0a
Author: Shay Drory <shayd@nvidia.com>
Date:   Tue Mar 12 12:52:38 2024 +0200

    devlink: Fix devlink parallel commands processing

    Commit 870c7ad4a52b ("devlink: protect devlink->dev by the instance
    lock") added devlink instance locking inside a loop that iterates over
    all the registered devlink instances on the machine in the pre-doit
    phase. This can lead to serialization of devlink commands over
    different devlink instances.

    For example: While the first devlink instance is executing firmware
    flash, all commands to other devlink instances on the machine are
    forced to wait until the first devlink finishes.

    Therefore, in the pre-doit phase, take the devlink instance lock only
    for the devlink instance the command is targeting. Devlink layer is
    taking a reference on the devlink instance, ensuring the devlink->dev
    pointer is valid. This reference taking was introduced by commit
    a380687200e0 ("devlink: take device reference for devlink object").
    Without this commit, it would not be safe to access devlink->dev
    lockless.

    Fixes: 870c7ad4a52b ("devlink: protect devlink->dev by the instance lock")
    Signed-off-by: Shay Drory <shayd@nvidia.com>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-30 12:36:41 +02:00
Petr Oros 50682536c3 devlink: Fix length of eswitch inline-mode
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit 8f4cd89bf10607de08231d6d91a73dd63336808e
Author: William Tu <witu@nvidia.com>
Date:   Sun Mar 10 18:45:47 2024 +0200

    devlink: Fix length of eswitch inline-mode

    Set eswitch inline-mode to be u8, not u16. Otherwise, errors below

    $ devlink dev eswitch set pci/0000:08:00.0 mode switchdev \
      inline-mode network
        Error: Attribute failed policy validation.
        kernel answers: Numerical result out of rang
        netlink: 'devlink': attribute type 26 has an invalid length.

    Fixes: f2f9dd164db0 ("netlink: specs: devlink: add the remaining command to generate complete split_ops")
    Signed-off-by: William Tu <witu@nvidia.com>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Link: https://lore.kernel.org/r/20240310164547.35219-1-witu@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:11 +02:00
Petr Oros 7664348519 devlink: fix port dump cmd type
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit 61c43780e9444123410cd48c2483e01d2b8f75e8
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Tue Feb 20 08:52:45 2024 +0100

    devlink: fix port dump cmd type

    Unlike other commands, due to a c&p error, port dump fills-up cmd with
    wrong value, different from port-get request cmd, port-get doit reply
    and port notification.

    Fix it by filling cmd with value DEVLINK_CMD_PORT_NEW.

    Skimmed through devlink userspace implementations, none of them cares
    about this cmd value. Only ynl, for which, this is actually a fix, as it
    expects doit and dumpit ops rsp_value to be the same.

    Omit the fixes tag, even thought this is fix, better to target this for
    next release.

    Fixes: bfcd3a4661 ("Introduce devlink infrastructure")
    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Reviewed-by: Jakub Kicinski <kuba@kernel.org>
    Link: https://lore.kernel.org/r/20240220075245.75416-1-jiri@resnulli.us
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:11 +02:00
Petr Oros b365673e2a devlink: fix possible use-after-free and memory leaks in devlink_init()
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit def689fc26b9a9622d2e2cb0c4933dd3b1c8071c
Author: Vasiliy Kovalev <kovalev@altlinux.org>
Date:   Thu Feb 15 23:34:00 2024 +0300

    devlink: fix possible use-after-free and memory leaks in devlink_init()

    The pernet operations structure for the subsystem must be registered
    before registering the generic netlink family.

    Make an unregister in case of unsuccessful registration.

    Fixes: 687125b5799c ("devlink: split out core code")
    Signed-off-by: Vasiliy Kovalev <kovalev@altlinux.org>
    Link: https://lore.kernel.org/r/20240215203400.29976-1-kovalev@altlinux.org
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:11 +02:00
Petr Oros 427a344d82 devlink: avoid potential loop in devlink_rel_nested_in_notify_work()
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit 58086721b7781c3e35b19c9b78c8f5a791070ba3
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Mon Feb 5 18:11:14 2024 +0100

    devlink: avoid potential loop in devlink_rel_nested_in_notify_work()

    In case devlink_rel_nested_in_notify_work() can not take the devlink
    lock mutex. Convert the work to delayed work and in case of reschedule
    do it jiffie later and avoid potential looping.

    Suggested-by: Paolo Abeni <pabeni@redhat.com>
    Fixes: c137743bce02 ("devlink: introduce object and nested devlink relationship infra")
    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Link: https://lore.kernel.org/r/20240205171114.338679-1-jiri@resnulli.us
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:11 +02:00
Petr Oros 96b1b1004d devlink: Fix referring to hw_addr attribute during state validation
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit 1a89e24f8bfd3e3562d69709c9d9cd185ded869b
Author: Parav Pandit <parav@nvidia.com>
Date:   Mon Jan 29 21:10:59 2024 +0200

    devlink: Fix referring to hw_addr attribute during state validation

    When port function state change is requested, and when the driver
    does not support it, it refers to the hw address attribute instead
    of state attribute. Seems like a copy paste error.

    Fix it by referring to the port function state attribute.

    Fixes: c0bea69d1ca7 ("devlink: Validate port function request")
    Signed-off-by: Parav Pandit <parav@nvidia.com>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Link: https://lore.kernel.org/r/20240129191059.129030-1-parav@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:11 +02:00
Petr Oros 298f4a3ca9 devlink: extend multicast filtering by port index
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit ded6f77c05b113001d449cf2cc810e090f20ec4a
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Sat Dec 16 13:30:01 2023 +0100

    devlink: extend multicast filtering by port index

    Expose the previously introduced notification multicast messages
    filtering infrastructure and allow the user to select messages using
    port index.

    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:10 +02:00
Petr Oros d32acff1d1 devlink: add a command to set notification filter and use it for multicasts
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit 13b127d2578432e1e521310b69944c5a1b30679c
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Sat Dec 16 13:30:00 2023 +0100

    devlink: add a command to set notification filter and use it for multicasts

    Currently the user listening on a socket for devlink notifications
    gets always all messages for all existing instances, even if he is
    interested only in one of those. That may cause unnecessary overhead
    on setups with thousands of instances present.

    User is currently able to narrow down the devlink objects replies
    to dump commands by specifying select attributes.

    Allow similar approach for notifications. Introduce a new devlink
    NOTIFY_FILTER_SET which the user passes the select attributes. Store
    these per-socket and use them for filtering messages
    during multicast send.

    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:10 +02:00
Petr Oros 8b6e99b243 devlink: introduce a helper for netlink multicast send
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit 5648de0b1f2b68bffce9bdd49a276607b9a3e3d4
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Sat Dec 16 13:29:56 2023 +0100

    devlink: introduce a helper for netlink multicast send

    Introduce a helper devlink_nl_notify_send() so each object notification
    function does not have to call genlmsg_multicast_netns() with the same
    arguments.

    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:10 +02:00
Petr Oros 9c60d75257 devlink: send notifications only if there are listeners
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit cddbff470e3318834af518168d3a917b6e975062
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Sat Dec 16 13:29:55 2023 +0100

    devlink: send notifications only if there are listeners

    Introduce devlink_nl_notify_need() helper and using it to check at the
    beginning of notification functions to avoid overhead of composing
    notification messages in case nobody listens.

    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:10 +02:00
Petr Oros 987f28e0e7 devlink: introduce __devl_is_registered() helper and use it instead of xa_get_mark()
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit 11280ddeae238e3ea27d153794472cfca5e8d121
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Sat Dec 16 13:29:54 2023 +0100

    devlink: introduce __devl_is_registered() helper and use it instead of xa_get_mark()

    Introduce __devl_is_registered() which does not assert on devlink
    instance lock and use it in notifications which may be called
    without devlink instance lock held.

    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:09 +02:00
Petr Oros 6975febb78 devlink: use devl_is_registered() helper instead xa_get_mark()
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit 337ad364c48a0db7cedb5abb8d5e9163792fd596
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Sat Dec 16 13:29:53 2023 +0100

    devlink: use devl_is_registered() helper instead xa_get_mark()

    Instead of checking the xarray mark directly using xa_get_mark() helper
    use devl_is_registered() helper which wraps it up. Note that there are
    couple more users of xa_get_mark() left which are going to be handled
    by the next patch.

    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:09 +02:00
Petr Oros 1226dc0bc1 devlink: warn about existing entities during reload-reinit
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit 9b2348e2d6c94146f50b68d7d2067146e7339ac5
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Tue Nov 28 12:52:55 2023 +0100

    devlink: warn about existing entities during reload-reinit

    During reload-reinit, all entities except for params, resources, regions
    and health reporter should be removed and re-added. Add a warning to
    be triggered in case the driver behaves differently.

    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:06 +02:00
Petr Oros 084fb5b094 devlink: Add device lock assert in reload operation
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit 527a07e176eab0f61b1beec9e29b99c9a5ec219f
Author: Ido Schimmel <idosch@nvidia.com>
Date:   Wed Nov 15 13:17:15 2023 +0100

    devlink: Add device lock assert in reload operation

    Add an assert to verify that the device lock is always held throughout
    reload operations.

    Tested the following flows with netdevsim and mlxsw while lockdep is
    enabled:

    netdevsim:

     # echo "10 1" > /sys/bus/netdevsim/new_device
     # devlink dev reload netdevsim/netdevsim10
     # ip netns add bla
     # devlink dev reload netdevsim/netdevsim10 netns bla
     # ip netns del bla
     # echo 10 > /sys/bus/netdevsim/del_device

    mlxsw:

     # devlink dev reload pci/0000:01:00.0
     # ip netns add bla
     # devlink dev reload pci/0000:01:00.0 netns bla
     # ip netns del bla
     # echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/remove
     # echo 1 > /sys/bus/pci/rescan

    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: Petr Machata <petrm@nvidia.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:06 +02:00
Petr Oros 5a324ef43d devlink: Acquire device lock during reload command
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit bf6b200bc80d18480f8d0fb61e185bb0587e633c
Author: Ido Schimmel <idosch@nvidia.com>
Date:   Wed Nov 15 13:17:14 2023 +0100

    devlink: Acquire device lock during reload command

    Device drivers register with devlink from their probe routines (under
    the device lock) by acquiring the devlink instance lock and calling
    devl_register().

    Drivers that support a devlink reload usually implement the
    reload_{down, up}() operations in a similar fashion to their remove and
    probe routines, respectively.

    However, while the remove and probe routines are invoked with the device
    lock held, the reload operations are only invoked with the devlink
    instance lock held. It is therefore impossible for drivers to acquire
    the device lock from their reload operations, as this would result in
    lock inversion.

    The motivating use case for invoking the reload operations with the
    device lock held is in mlxsw which needs to trigger a PCI reset as part
    of the reload. The driver cannot call pci_reset_function() as this
    function acquires the device lock. Instead, it needs to call
    __pci_reset_function_locked which expects the device lock to be held.

    To that end, adjust devlink to always acquire the device lock before the
    devlink instance lock when performing a reload.

    Do that when reload is explicitly triggered by user space by specifying
    the 'DEVLINK_NL_FLAG_NEED_DEV_LOCK' flag in the pre_doit and post_doit
    operations of the reload command.

    A previous patch already handled the case where reload is invoked as
    part of netns dismantle.

    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: Petr Machata <petrm@nvidia.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:06 +02:00
Petr Oros 33afa4c3ac devlink: Allow taking device lock in pre_doit operations
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit d32c38256db30a2d55b849e2c77342bc70d58c6e
Author: Ido Schimmel <idosch@nvidia.com>
Date:   Wed Nov 15 13:17:13 2023 +0100

    devlink: Allow taking device lock in pre_doit operations

    Introduce a new private flag ('DEVLINK_NL_FLAG_NEED_DEV_LOCK') to allow
    netlink commands to specify that they need to acquire the device lock in
    their pre_doit operation and release it in their post_doit operation.

    The reload command will use this flag in the subsequent patch.

    No functional changes intended.

    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: Petr Machata <petrm@nvidia.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:05 +02:00
Petr Oros 9a27a3c46b devlink: Enable the use of private flags in post_doit operations
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit c8d0a7d6152bec970552786b77626f4b4c562f4d
Author: Ido Schimmel <idosch@nvidia.com>
Date:   Wed Nov 15 13:17:12 2023 +0100

    devlink: Enable the use of private flags in post_doit operations

    Currently, private flags (e.g., 'DEVLINK_NL_FLAG_NEED_PORT') are only
    used in pre_doit operations, but a subsequent patch will need to
    conditionally lock and unlock the device lock in pre and post doit
    operations, respectively.

    As a preparation, enable the use of private flags in post_doit
    operations in a similar fashion to how it is done for pre_doit
    operations.

    No functional changes intended.

    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: Petr Machata <petrm@nvidia.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:05 +02:00
Petr Oros c3569b4b6e devlink: Acquire device lock during netns dismantle
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit e21c52d7814f5768f05224e773644629fe124af2
Author: Ido Schimmel <idosch@nvidia.com>
Date:   Wed Nov 15 13:17:11 2023 +0100

    devlink: Acquire device lock during netns dismantle

    Device drivers register with devlink from their probe routines (under
    the device lock) by acquiring the devlink instance lock and calling
    devl_register().

    Drivers that support a devlink reload usually implement the
    reload_{down, up}() operations in a similar fashion to their remove and
    probe routines, respectively.

    However, while the remove and probe routines are invoked with the device
    lock held, the reload operations are only invoked with the devlink
    instance lock held. It is therefore impossible for drivers to acquire
    the device lock from their reload operations, as this would result in
    lock inversion.

    The motivating use case for invoking the reload operations with the
    device lock held is in mlxsw which needs to trigger a PCI reset as part
    of the reload. The driver cannot call pci_reset_function() as this
    function acquires the device lock. Instead, it needs to call
    __pci_reset_function_locked which expects the device lock to be held.

    To that end, adjust devlink to always acquire the device lock before the
    devlink instance lock when performing a reload.

    For now, only do that when reload is triggered as part of netns
    dismantle. Subsequent patches will handle the case where reload is
    explicitly triggered by user space.

    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: Petr Machata <petrm@nvidia.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:05 +02:00
Petr Oros 2d07429340 devlink: Move private netlink flags to C file
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit 526dd6d7877b80b1f56d87156b65b8227c69d59f
Author: Ido Schimmel <idosch@nvidia.com>
Date:   Wed Nov 15 13:17:10 2023 +0100

    devlink: Move private netlink flags to C file

    The flags are not used outside of the C file so move them there.

    Suggested-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: Petr Machata <petrm@nvidia.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:05 +02:00
Petr Oros 968587c19b netlink: specs: devlink: add forgotten port function caps enum values
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit 05f0431bb90f2ee3657e7fc2678f11a1f9b778b7
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Mon Oct 30 17:17:50 2023 +0100

    netlink: specs: devlink: add forgotten port function caps enum values

    Add two enum values that the blamed commit omitted.

    Fixes: f2f9dd164db0 ("netlink: specs: devlink: add the remaining command to generate complete split_ops")
    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Link: https://lore.kernel.org/r/20231030161750.110420-1-jiri@resnulli.us
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:05 +02:00
Petr Oros 33978784bf devlink: remove netlink small_ops
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit cebe7306073d4afeb24886f9063417e559fa2e22
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Sat Oct 21 13:27:11 2023 +0200

    devlink: remove netlink small_ops

    All commands are now covered by generated split_ops. Remove the
    small_ops entirely alongside with unified devlink netlink policy array.

    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
    Link: https://lore.kernel.org/r/20231021112711.660606-11-jiri@resnulli.us
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:04 +02:00
Petr Oros 00ffe14631 devlink: remove duplicated netlink callback prototypes
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit 15c80e7a53d28aeb7354ef6d79d0ff55452e53f1
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Sat Oct 21 13:27:10 2023 +0200

    devlink: remove duplicated netlink callback prototypes

    The prototypes are now generated, remove the old ones.

    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
    Link: https://lore.kernel.org/r/20231021112711.660606-10-jiri@resnulli.us
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:04 +02:00
Petr Oros d1e25cb9f7 netlink: specs: devlink: add the remaining command to generate complete split_ops
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit f2f9dd164db079161a834c8698c68a94a50b4168
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Sat Oct 21 13:27:09 2023 +0200

    netlink: specs: devlink: add the remaining command to generate complete split_ops

    Currently, some of the commands are not described in devlink yaml file
    and are manually filled in net/devlink/netlink.c in small_ops. To make
    all part of split_ops, add definitions of the rest of the commands
    alongside with needed attributes and enums.

    Note that this focuses on the kernel side. The requests are fully
    described in order to generate split_op alongside with policies.
    Follow-up will describe the replies in order to make the userspace
    helpers complete.

    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
    Link: https://lore.kernel.org/r/20231021112711.660606-9-jiri@resnulli.us
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:04 +02:00
Petr Oros 73146979d4 devlink: rename netlink callback to be aligned with the generated ones
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit 53590934ba9549c55c57a32e2a6980139af00345
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Sat Oct 21 13:27:08 2023 +0200

    devlink: rename netlink callback to be aligned with the generated ones

    All remaining doit and dumpit netlink callback functions are going to be
    used by generated split ops. They expect certain name format. Rename the
    callback to be aligned with generated names.

    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
    Link: https://lore.kernel.org/r/20231021112711.660606-8-jiri@resnulli.us
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:04 +02:00
Petr Oros bdfa8467f8 devlink: convert most of devlink_fmsg_*() to return void
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit 0050629cd36a58b568ac0aebeeca60bd2fde3d6d
Author: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Date:   Wed Oct 18 22:26:47 2023 +0200

    devlink: convert most of devlink_fmsg_*() to return void

    Since struct devlink_fmsg retains error by now (see 1st patch of this
    series), there is no longer need to keep returning it in each call.

    This is a separate commit to allow per-driver conversion to stop using
    those return values.

    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:03 +02:00
Petr Oros 4e4f30bace devlink: retain error in struct devlink_fmsg
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit db80d3b2558fcc6d18fbcb1452cdf6df65cec151
Author: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Date:   Wed Oct 18 22:26:37 2023 +0200

    devlink: retain error in struct devlink_fmsg

    Retain error value in struct devlink_fmsg, to relieve drivers from
    checking it after each call.
    Note that fmsg is an in-memory builder/buffer of formatted message,
    so it's not the case that half baked message was sent somewhere.

    We could find following scheme in multiple drivers:
      err = devlink_fmsg_obj_nest_start(fmsg);
      if (err)
            return err;
      err = devlink_fmsg_string_pair_put(fmsg, "src", src);
      if (err)
            return err;
      err = devlink_fmsg_something(fmsg, foo, bar);
      if (err)
            return err;
      // and so on...
      err = devlink_fmsg_obj_nest_end(fmsg);

    With retaining error API that translates to:
      devlink_fmsg_obj_nest_start(fmsg);
      devlink_fmsg_string_pair_put(fmsg, "src", src);
      devlink_fmsg_something(fmsg, foo, bar);
      // and so on...
      devlink_fmsg_obj_nest_end(fmsg);

    What means we check error just when is time to send.

    Possible error scenarios are developer error (API misuse) and memory
    exhaustion, both cases are good candidates to choose readability
    over fastest possible exit.

    Note that this patch keeps returning errors, to allow per-driver conversion
    to the new API, but those are not needed at this point already.

    This commit itself is an illustration of benefits for the dev-user,
    more of it will be in separate commits of the series.

    Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:02 +02:00
Petr Oros 133febd0b9 devlink: document devlink_rel_nested_in_notify() function
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit 5d77371e8c85abbe0f9fab7dacf3bc2c3214ada5
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Fri Oct 13 14:10:29 2023 +0200

    devlink: document devlink_rel_nested_in_notify() function

    Add a documentation for devlink_rel_nested_in_notify() describing the
    devlink instance locking consequences.

    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:02 +02:00
Petr Oros 42ce217377 devlink: don't take instance lock for nested handle put
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit b5f4e371336a62a48f6ae51abb8366e968a8f88f
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Fri Oct 13 14:10:26 2023 +0200

    devlink: don't take instance lock for nested handle put

    Lockdep reports following issue:

    WARNING: possible circular locking dependency detected
    ------------------------------------------------------
    devlink/8191 is trying to acquire lock:
    ffff88813f32c250 (&devlink->lock_key#14){+.+.}-{3:3}, at: devlink_rel_devlink_handle_put+0x11e/0x2d0

                               but task is already holding lock:
    ffffffff8511eca8 (rtnl_mutex){+.+.}-{3:3}, at: unregister_netdev+0xe/0x20

                               which lock already depends on the new lock.

                               the existing dependency chain (in reverse order) is:

                               -> #3 (rtnl_mutex){+.+.}-{3:3}:
           lock_acquire+0x1c3/0x500
           __mutex_lock+0x14c/0x1b20
           register_netdevice_notifier_net+0x13/0x30
           mlx5_lag_add_mdev+0x51c/0xa00 [mlx5_core]
           mlx5_load+0x222/0xc70 [mlx5_core]
           mlx5_init_one_devl_locked+0x4a0/0x1310 [mlx5_core]
           mlx5_init_one+0x3b/0x60 [mlx5_core]
           probe_one+0x786/0xd00 [mlx5_core]
           local_pci_probe+0xd7/0x180
           pci_device_probe+0x231/0x720
           really_probe+0x1e4/0xb60
           __driver_probe_device+0x261/0x470
           driver_probe_device+0x49/0x130
           __driver_attach+0x215/0x4c0
           bus_for_each_dev+0xf0/0x170
           bus_add_driver+0x21d/0x590
           driver_register+0x133/0x460
           vdpa_match_remove+0x89/0xc0 [vdpa]
           do_one_initcall+0xc4/0x360
           do_init_module+0x22d/0x760
           load_module+0x51d7/0x6750
           init_module_from_file+0xd2/0x130
           idempotent_init_module+0x326/0x5a0
           __x64_sys_finit_module+0xc1/0x130
           do_syscall_64+0x3d/0x90
           entry_SYSCALL_64_after_hwframe+0x46/0xb0

                               -> #2 (mlx5_intf_mutex){+.+.}-{3:3}:
           lock_acquire+0x1c3/0x500
           __mutex_lock+0x14c/0x1b20
           mlx5_register_device+0x3e/0xd0 [mlx5_core]
           mlx5_init_one_devl_locked+0x8fa/0x1310 [mlx5_core]
           mlx5_devlink_reload_up+0x147/0x170 [mlx5_core]
           devlink_reload+0x203/0x380
           devlink_nl_cmd_reload+0xb84/0x10e0
           genl_family_rcv_msg_doit+0x1cc/0x2a0
           genl_rcv_msg+0x3c9/0x670
           netlink_rcv_skb+0x12c/0x360
           genl_rcv+0x24/0x40
           netlink_unicast+0x435/0x6f0
           netlink_sendmsg+0x7a0/0xc70
           sock_sendmsg+0xc5/0x190
           __sys_sendto+0x1c8/0x290
           __x64_sys_sendto+0xdc/0x1b0
           do_syscall_64+0x3d/0x90
           entry_SYSCALL_64_after_hwframe+0x46/0xb0

                               -> #1 (&dev->lock_key#8){+.+.}-{3:3}:
           lock_acquire+0x1c3/0x500
           __mutex_lock+0x14c/0x1b20
           mlx5_init_one_devl_locked+0x45/0x1310 [mlx5_core]
           mlx5_devlink_reload_up+0x147/0x170 [mlx5_core]
           devlink_reload+0x203/0x380
           devlink_nl_cmd_reload+0xb84/0x10e0
           genl_family_rcv_msg_doit+0x1cc/0x2a0
           genl_rcv_msg+0x3c9/0x670
           netlink_rcv_skb+0x12c/0x360
           genl_rcv+0x24/0x40
           netlink_unicast+0x435/0x6f0
           netlink_sendmsg+0x7a0/0xc70
           sock_sendmsg+0xc5/0x190
           __sys_sendto+0x1c8/0x290
           __x64_sys_sendto+0xdc/0x1b0
           do_syscall_64+0x3d/0x90
           entry_SYSCALL_64_after_hwframe+0x46/0xb0

                               -> #0 (&devlink->lock_key#14){+.+.}-{3:3}:
           check_prev_add+0x1af/0x2300
           __lock_acquire+0x31d7/0x4eb0
           lock_acquire+0x1c3/0x500
           __mutex_lock+0x14c/0x1b20
           devlink_rel_devlink_handle_put+0x11e/0x2d0
           devlink_nl_port_fill+0xddf/0x1b00
           devlink_port_notify+0xb5/0x220
           __devlink_port_type_set+0x151/0x510
           devlink_port_netdevice_event+0x17c/0x220
           notifier_call_chain+0x97/0x240
           unregister_netdevice_many_notify+0x876/0x1790
           unregister_netdevice_queue+0x274/0x350
           unregister_netdev+0x18/0x20
           mlx5e_vport_rep_unload+0xc5/0x1c0 [mlx5_core]
           __esw_offloads_unload_rep+0xd8/0x130 [mlx5_core]
           mlx5_esw_offloads_rep_unload+0x52/0x70 [mlx5_core]
           mlx5_esw_offloads_unload_rep+0x85/0xc0 [mlx5_core]
           mlx5_eswitch_unload_sf_vport+0x41/0x90 [mlx5_core]
           mlx5_devlink_sf_port_del+0x120/0x280 [mlx5_core]
           genl_family_rcv_msg_doit+0x1cc/0x2a0
           genl_rcv_msg+0x3c9/0x670
           netlink_rcv_skb+0x12c/0x360
           genl_rcv+0x24/0x40
           netlink_unicast+0x435/0x6f0
           netlink_sendmsg+0x7a0/0xc70
           sock_sendmsg+0xc5/0x190
           __sys_sendto+0x1c8/0x290
           __x64_sys_sendto+0xdc/0x1b0
           do_syscall_64+0x3d/0x90
           entry_SYSCALL_64_after_hwframe+0x46/0xb0

                               other info that might help us debug this:

    Chain exists of:
                                 &devlink->lock_key#14 --> mlx5_intf_mutex --> rtnl_mutex

     Possible unsafe locking scenario:

           CPU0                    CPU1
           ----                    ----
      lock(rtnl_mutex);
                                   lock(mlx5_intf_mutex);
                                   lock(rtnl_mutex);
      lock(&devlink->lock_key#14);

    Problem is taking the devlink instance lock of nested instance when RTNL
    is already held.

    To fix this, don't take the devlink instance lock when putting nested
    handle. Instead, rely on the preparations done by previous two patches
    to be able to access device pointer and obtain netns id without devlink
    instance lock held.

    Fixes: c137743bce02 ("devlink: introduce object and nested devlink relationship infra")
    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:02 +02:00
Petr Oros 615a9d349b devlink: take device reference for devlink object
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit a380687200e0f7f0e00d745796fd8b8ea4bcb746
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Fri Oct 13 14:10:25 2023 +0200

    devlink: take device reference for devlink object

    In preparation to allow to access device pointer without devlink
    instance lock held, make sure the device pointer is usable until
    devlink_release() is called.

    Fixes: c137743bce02 ("devlink: introduce object and nested devlink relationship infra")
    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:02 +02:00
Petr Oros 05f6585afe devlink: call peernet2id_alloc() with net pointer under RCU read lock
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit c503bc7df602257e9d03851654a347649a33f3c3
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Fri Oct 13 14:10:24 2023 +0200

    devlink: call peernet2id_alloc() with net pointer under RCU read lock

    peernet2id_alloc() allows to be called lockless with peer net pointer
    obtained in RCU critical section and makes sure to return ns ID if net
    namespaces is not being removed concurrently. Benefit from
    read_pnet_rcu() helper addition, use it to obtain net pointer under RCU
    read lock and pass it to peernet2id_alloc() to get ns ID.

    Fixes: c137743bce02 ("devlink: introduce object and nested devlink relationship infra")
    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:01 +02:00
Petr Oros a27d3bd029 devlink: introduce possibility to expose info about nested devlinks
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit c5e1bf8a51cfe5060e91c7533098e329c0118f6d
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Wed Sep 13 09:12:42 2023 +0200

    devlink: introduce possibility to expose info about nested devlinks

    In mlx5, there is a devlink instance created for PCI device. Also, one
    separate devlink instance is created for auxiliary device that
    represents the netdev of uplink port. This relation is currently
    invisible to the devlink user.

    Benefit from the rel infrastructure and allow for nested devlink
    instance to set the relationship for the nested-in devlink instance.
    Note that there may be many nested instances, therefore use xarray to
    hold the list of rel_indexes for individual nested instances.

    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:00 +02:00
Petr Oros 058e7740f4 devlink: convert linecard nested devlink to new rel infrastructure
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit 9473bc0119e7e7630d7c1c7c3816c290a6f3ae19
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Wed Sep 13 09:12:41 2023 +0200

    devlink: convert linecard nested devlink to new rel infrastructure

    Benefit from the newly introduced rel infrastructure, treat the linecard
    nested devlink instances in the same way as port function instances.
    Convert the code to use the rel infrastructure.

    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:00 +02:00
Petr Oros 35f8d5366c devlink: expose peer SF devlink instance
JIRA: https://issues.redhat.com/browse/RHEL-30145

Conflicts:
- adjusted context conflict due to 2b022471f4 ("net: add reserved
  fields to devlink_port")

Upstream commit(s):
commit 0b7a2721e36c11313f8b0f251a508d25a872cd28
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Wed Sep 13 09:12:39 2023 +0200

    devlink: expose peer SF devlink instance

    Introduce a new helper devl_port_fn_devlink_set() to be used by driver
    assigning a devlink instance to the peer devlink port function.

    Expose this to user over new netlink attribute nested under port
    function nest to expose devlink handle related to the port function.

    This is particularly helpful for user to understand the relationship
    between devlink instances created for SFs and the port functions
    they belong to.

    Note that caller of devlink_port_notify() needs to hold devlink
    instance lock, put the assertion to devl_port_fn_devlink_set() to make
    this requirement explicit. Also note the limitations that only allow to
    make this assignment for registered objects.

    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:00 +02:00
Petr Oros ddad7fb46e devlink: introduce object and nested devlink relationship infra
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit c137743bce02b18c1537d4681aa515f7b80bf0a8
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Wed Sep 13 09:12:38 2023 +0200

    devlink: introduce object and nested devlink relationship infra

    It is a bit tricky to maintain relationship between devlink objects and
    nested devlink instances due to following aspects:

    1) Locking. It is necessary to lock the devlink instance that contains
       the object first, only after that to lock the nested instance.
    2) Lifetimes. Objects (e.g devlink port) may be removed before
       the nested devlink instance.
    3) Notifications. If nested instance changes (e.g. gets
       registered/unregistered) the nested-in object needs to send
       appropriate notifications.

    Resolve this by introducing an xarray that holds 1:1 relationships
    between devlink object and related nested devlink instance.
    Use that xarray index to get the object/nested devlink instance on
    the other side.

    Provide necessary helpers:
    devlink_rel_nested_in_add/clear() to add and clear the relationship.
    devlink_rel_nested_in_notify() to call the nested-in object to send
            notifications during nested instance register/unregister/netns
            change.
    devlink_rel_devlink_handle_put() to be used by nested-in object fill
            function to fill the nested handle.

    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:00 +02:00
Petr Oros d85d2b504b devlink: extend devlink_nl_put_nested_handle() with attrtype arg
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit 1c2197c47a93d0ea36e73e437271c7cbcc0e1ceb
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Wed Sep 13 09:12:37 2023 +0200

    devlink: extend devlink_nl_put_nested_handle() with attrtype arg

    As the next patch is going to call this helper with need to fill another
    type of nested attribute, pass it over function arg.

    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:00 +02:00
Petr Oros bd8ba7a254 devlink: move devlink_nl_put_nested_handle() into netlink.c
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit af1f1400af02e5a069d86ae7001b563c99395ea2
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Wed Sep 13 09:12:36 2023 +0200

    devlink: move devlink_nl_put_nested_handle() into netlink.c

    As the next patch is going to call this helper out of the linecard.c,
    move to netlink.c.

    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:15:59 +02:00
Petr Oros 4633ef815e devlink: put netnsid to nested handle
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit ad99637ac92dc18b979e6fa26eb440f38c0c6b55
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Wed Sep 13 09:12:35 2023 +0200

    devlink: put netnsid to nested handle

    If netns of devlink instance and nested devlink instance differs,
    put netnsid attr to indicate that.

    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:15:59 +02:00
Petr Oros a85eade209 devlink: move linecard struct into linecard.c
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit d0b7e990f760ec9a614fbe5f89a5cede4335a7bb
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Wed Sep 13 09:12:32 2023 +0200

    devlink: move linecard struct into linecard.c

    Instead of exposing linecard struct, expose a simple helper to get the
    linecard index, which is all is needed outside linecard.c. Move the
    linecard struct to linecard.c and keep it private similar to the rest of
    the devlink objects.

    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:15:59 +02:00
Ivan Vecera 46ef5b0dfd devlink: move devlink_notify_register/unregister() to dev.c
JIRA: https://issues.redhat.com/browse/RHEL-30656

commit 71179ac5c21185171556bc438d5f22d566948d7f
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Mon Aug 28 08:16:57 2023 +0200

    devlink: move devlink_notify_register/unregister() to dev.c

    At last, move the last bits out of leftover.c,
    the devlink_notify_register/unregister() functions to dev.c

    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Link: https://lore.kernel.org/r/20230828061657.300667-16-jiri@resnulli.us
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-04-10 09:19:34 +02:00
Ivan Vecera b82a843753 devlink: move small_ops definition into netlink.c
JIRA: https://issues.redhat.com/browse/RHEL-30656

commit 29a390d17748d93f9e6bc6fb0e09d89571aa25f6
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Mon Aug 28 08:16:56 2023 +0200

    devlink: move small_ops definition into netlink.c

    Move the generic netlink small_ops definition where they are consumed,
    into netlink.c

    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Link: https://lore.kernel.org/r/20230828061657.300667-15-jiri@resnulli.us
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-04-10 09:19:34 +02:00
Ivan Vecera 94ec273900 devlink: move tracepoint definitions into core.c
JIRA: https://issues.redhat.com/browse/RHEL-30656

commit 890c556674377c0abba4ab91ff6f1962175d578c
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Mon Aug 28 08:16:55 2023 +0200

    devlink: move tracepoint definitions into core.c

    Move remaining tracepoint definitions to most suitable file core.c.

    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Link: https://lore.kernel.org/r/20230828061657.300667-14-jiri@resnulli.us
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-04-10 09:19:34 +02:00