Commit Graph

26 Commits

Author SHA1 Message Date
Petr Oros 053ad10203 devlink: use kvzalloc() to allocate devlink instance resources
JIRA: https://issues.redhat.com/browse/RHEL-57755

Upstream commit(s):
commit 730fffce4fd2eb7a0be2d0b6cd7e55e9194d76d5
Author: Jian Wen <wenjianhn@gmail.com>
Date:   Wed Mar 27 16:21:28 2024 +0800

    devlink: use kvzalloc() to allocate devlink instance resources

    During live migration of a virtual machine, the SR-IOV VF need to be
    re-registered. It may fail when the memory is badly fragmented.

    The related log is as follows.

        kernel: hv_netvsc 6045bdaa-c0d1-6045-bdaa-c0d16045bdaa eth0: VF slot 1 added
    ...
        kernel: kworker/0:0: page allocation failure: order:7, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0
        kernel: CPU: 0 PID: 24006 Comm: kworker/0:0 Tainted: G            E     5.4...x86_64 #1
        kernel: Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
        kernel: Workqueue: events work_for_cpu_fn
        kernel: Call Trace:
        kernel: dump_stack+0x8b/0xc8
        kernel: warn_alloc+0xff/0x170
        kernel: __alloc_pages_slowpath+0x92c/0xb2b
        kernel: ? get_page_from_freelist+0x1d4/0x1140
        kernel: __alloc_pages_nodemask+0x2f9/0x320
        kernel: alloc_pages_current+0x6a/0xb0
        kernel: kmalloc_order+0x1e/0x70
        kernel: kmalloc_order_trace+0x26/0xb0
        kernel: ? __switch_to_asm+0x34/0x70
        kernel: __kmalloc+0x276/0x280
        kernel: ? _raw_spin_unlock_irqrestore+0x1e/0x40
        kernel: devlink_alloc+0x29/0x110
        kernel: mlx5_devlink_alloc+0x1a/0x20 [mlx5_core]
        kernel: init_one+0x1d/0x650 [mlx5_core]
        kernel: local_pci_probe+0x46/0x90
        kernel: work_for_cpu_fn+0x1a/0x30
        kernel: process_one_work+0x16d/0x390
        kernel: worker_thread+0x1d3/0x3f0
        kernel: kthread+0x105/0x140
        kernel: ? max_active_store+0x80/0x80
        kernel: ? kthread_bind+0x20/0x20
        kernel: ret_from_fork+0x3a/0x50

    Signed-off-by: Jian Wen <wenjian1@xiaomi.com>
    Link: https://lore.kernel.org/r/20240327082128.942818-1-wenjian1@xiaomi.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-11-20 10:13:46 +01:00
Petr Oros b365673e2a devlink: fix possible use-after-free and memory leaks in devlink_init()
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit def689fc26b9a9622d2e2cb0c4933dd3b1c8071c
Author: Vasiliy Kovalev <kovalev@altlinux.org>
Date:   Thu Feb 15 23:34:00 2024 +0300

    devlink: fix possible use-after-free and memory leaks in devlink_init()

    The pernet operations structure for the subsystem must be registered
    before registering the generic netlink family.

    Make an unregister in case of unsuccessful registration.

    Fixes: 687125b5799c ("devlink: split out core code")
    Signed-off-by: Vasiliy Kovalev <kovalev@altlinux.org>
    Link: https://lore.kernel.org/r/20240215203400.29976-1-kovalev@altlinux.org
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:11 +02:00
Petr Oros 427a344d82 devlink: avoid potential loop in devlink_rel_nested_in_notify_work()
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit 58086721b7781c3e35b19c9b78c8f5a791070ba3
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Mon Feb 5 18:11:14 2024 +0100

    devlink: avoid potential loop in devlink_rel_nested_in_notify_work()

    In case devlink_rel_nested_in_notify_work() can not take the devlink
    lock mutex. Convert the work to delayed work and in case of reschedule
    do it jiffie later and avoid potential looping.

    Suggested-by: Paolo Abeni <pabeni@redhat.com>
    Fixes: c137743bce02 ("devlink: introduce object and nested devlink relationship infra")
    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Link: https://lore.kernel.org/r/20240205171114.338679-1-jiri@resnulli.us
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:11 +02:00
Petr Oros c3569b4b6e devlink: Acquire device lock during netns dismantle
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit e21c52d7814f5768f05224e773644629fe124af2
Author: Ido Schimmel <idosch@nvidia.com>
Date:   Wed Nov 15 13:17:11 2023 +0100

    devlink: Acquire device lock during netns dismantle

    Device drivers register with devlink from their probe routines (under
    the device lock) by acquiring the devlink instance lock and calling
    devl_register().

    Drivers that support a devlink reload usually implement the
    reload_{down, up}() operations in a similar fashion to their remove and
    probe routines, respectively.

    However, while the remove and probe routines are invoked with the device
    lock held, the reload operations are only invoked with the devlink
    instance lock held. It is therefore impossible for drivers to acquire
    the device lock from their reload operations, as this would result in
    lock inversion.

    The motivating use case for invoking the reload operations with the
    device lock held is in mlxsw which needs to trigger a PCI reset as part
    of the reload. The driver cannot call pci_reset_function() as this
    function acquires the device lock. Instead, it needs to call
    __pci_reset_function_locked which expects the device lock to be held.

    To that end, adjust devlink to always acquire the device lock before the
    devlink instance lock when performing a reload.

    For now, only do that when reload is triggered as part of netns
    dismantle. Subsequent patches will handle the case where reload is
    explicitly triggered by user space.

    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: Petr Machata <petrm@nvidia.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:05 +02:00
Petr Oros 133febd0b9 devlink: document devlink_rel_nested_in_notify() function
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit 5d77371e8c85abbe0f9fab7dacf3bc2c3214ada5
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Fri Oct 13 14:10:29 2023 +0200

    devlink: document devlink_rel_nested_in_notify() function

    Add a documentation for devlink_rel_nested_in_notify() describing the
    devlink instance locking consequences.

    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:02 +02:00
Petr Oros 42ce217377 devlink: don't take instance lock for nested handle put
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit b5f4e371336a62a48f6ae51abb8366e968a8f88f
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Fri Oct 13 14:10:26 2023 +0200

    devlink: don't take instance lock for nested handle put

    Lockdep reports following issue:

    WARNING: possible circular locking dependency detected
    ------------------------------------------------------
    devlink/8191 is trying to acquire lock:
    ffff88813f32c250 (&devlink->lock_key#14){+.+.}-{3:3}, at: devlink_rel_devlink_handle_put+0x11e/0x2d0

                               but task is already holding lock:
    ffffffff8511eca8 (rtnl_mutex){+.+.}-{3:3}, at: unregister_netdev+0xe/0x20

                               which lock already depends on the new lock.

                               the existing dependency chain (in reverse order) is:

                               -> #3 (rtnl_mutex){+.+.}-{3:3}:
           lock_acquire+0x1c3/0x500
           __mutex_lock+0x14c/0x1b20
           register_netdevice_notifier_net+0x13/0x30
           mlx5_lag_add_mdev+0x51c/0xa00 [mlx5_core]
           mlx5_load+0x222/0xc70 [mlx5_core]
           mlx5_init_one_devl_locked+0x4a0/0x1310 [mlx5_core]
           mlx5_init_one+0x3b/0x60 [mlx5_core]
           probe_one+0x786/0xd00 [mlx5_core]
           local_pci_probe+0xd7/0x180
           pci_device_probe+0x231/0x720
           really_probe+0x1e4/0xb60
           __driver_probe_device+0x261/0x470
           driver_probe_device+0x49/0x130
           __driver_attach+0x215/0x4c0
           bus_for_each_dev+0xf0/0x170
           bus_add_driver+0x21d/0x590
           driver_register+0x133/0x460
           vdpa_match_remove+0x89/0xc0 [vdpa]
           do_one_initcall+0xc4/0x360
           do_init_module+0x22d/0x760
           load_module+0x51d7/0x6750
           init_module_from_file+0xd2/0x130
           idempotent_init_module+0x326/0x5a0
           __x64_sys_finit_module+0xc1/0x130
           do_syscall_64+0x3d/0x90
           entry_SYSCALL_64_after_hwframe+0x46/0xb0

                               -> #2 (mlx5_intf_mutex){+.+.}-{3:3}:
           lock_acquire+0x1c3/0x500
           __mutex_lock+0x14c/0x1b20
           mlx5_register_device+0x3e/0xd0 [mlx5_core]
           mlx5_init_one_devl_locked+0x8fa/0x1310 [mlx5_core]
           mlx5_devlink_reload_up+0x147/0x170 [mlx5_core]
           devlink_reload+0x203/0x380
           devlink_nl_cmd_reload+0xb84/0x10e0
           genl_family_rcv_msg_doit+0x1cc/0x2a0
           genl_rcv_msg+0x3c9/0x670
           netlink_rcv_skb+0x12c/0x360
           genl_rcv+0x24/0x40
           netlink_unicast+0x435/0x6f0
           netlink_sendmsg+0x7a0/0xc70
           sock_sendmsg+0xc5/0x190
           __sys_sendto+0x1c8/0x290
           __x64_sys_sendto+0xdc/0x1b0
           do_syscall_64+0x3d/0x90
           entry_SYSCALL_64_after_hwframe+0x46/0xb0

                               -> #1 (&dev->lock_key#8){+.+.}-{3:3}:
           lock_acquire+0x1c3/0x500
           __mutex_lock+0x14c/0x1b20
           mlx5_init_one_devl_locked+0x45/0x1310 [mlx5_core]
           mlx5_devlink_reload_up+0x147/0x170 [mlx5_core]
           devlink_reload+0x203/0x380
           devlink_nl_cmd_reload+0xb84/0x10e0
           genl_family_rcv_msg_doit+0x1cc/0x2a0
           genl_rcv_msg+0x3c9/0x670
           netlink_rcv_skb+0x12c/0x360
           genl_rcv+0x24/0x40
           netlink_unicast+0x435/0x6f0
           netlink_sendmsg+0x7a0/0xc70
           sock_sendmsg+0xc5/0x190
           __sys_sendto+0x1c8/0x290
           __x64_sys_sendto+0xdc/0x1b0
           do_syscall_64+0x3d/0x90
           entry_SYSCALL_64_after_hwframe+0x46/0xb0

                               -> #0 (&devlink->lock_key#14){+.+.}-{3:3}:
           check_prev_add+0x1af/0x2300
           __lock_acquire+0x31d7/0x4eb0
           lock_acquire+0x1c3/0x500
           __mutex_lock+0x14c/0x1b20
           devlink_rel_devlink_handle_put+0x11e/0x2d0
           devlink_nl_port_fill+0xddf/0x1b00
           devlink_port_notify+0xb5/0x220
           __devlink_port_type_set+0x151/0x510
           devlink_port_netdevice_event+0x17c/0x220
           notifier_call_chain+0x97/0x240
           unregister_netdevice_many_notify+0x876/0x1790
           unregister_netdevice_queue+0x274/0x350
           unregister_netdev+0x18/0x20
           mlx5e_vport_rep_unload+0xc5/0x1c0 [mlx5_core]
           __esw_offloads_unload_rep+0xd8/0x130 [mlx5_core]
           mlx5_esw_offloads_rep_unload+0x52/0x70 [mlx5_core]
           mlx5_esw_offloads_unload_rep+0x85/0xc0 [mlx5_core]
           mlx5_eswitch_unload_sf_vport+0x41/0x90 [mlx5_core]
           mlx5_devlink_sf_port_del+0x120/0x280 [mlx5_core]
           genl_family_rcv_msg_doit+0x1cc/0x2a0
           genl_rcv_msg+0x3c9/0x670
           netlink_rcv_skb+0x12c/0x360
           genl_rcv+0x24/0x40
           netlink_unicast+0x435/0x6f0
           netlink_sendmsg+0x7a0/0xc70
           sock_sendmsg+0xc5/0x190
           __sys_sendto+0x1c8/0x290
           __x64_sys_sendto+0xdc/0x1b0
           do_syscall_64+0x3d/0x90
           entry_SYSCALL_64_after_hwframe+0x46/0xb0

                               other info that might help us debug this:

    Chain exists of:
                                 &devlink->lock_key#14 --> mlx5_intf_mutex --> rtnl_mutex

     Possible unsafe locking scenario:

           CPU0                    CPU1
           ----                    ----
      lock(rtnl_mutex);
                                   lock(mlx5_intf_mutex);
                                   lock(rtnl_mutex);
      lock(&devlink->lock_key#14);

    Problem is taking the devlink instance lock of nested instance when RTNL
    is already held.

    To fix this, don't take the devlink instance lock when putting nested
    handle. Instead, rely on the preparations done by previous two patches
    to be able to access device pointer and obtain netns id without devlink
    instance lock held.

    Fixes: c137743bce02 ("devlink: introduce object and nested devlink relationship infra")
    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:02 +02:00
Petr Oros 615a9d349b devlink: take device reference for devlink object
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit a380687200e0f7f0e00d745796fd8b8ea4bcb746
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Fri Oct 13 14:10:25 2023 +0200

    devlink: take device reference for devlink object

    In preparation to allow to access device pointer without devlink
    instance lock held, make sure the device pointer is usable until
    devlink_release() is called.

    Fixes: c137743bce02 ("devlink: introduce object and nested devlink relationship infra")
    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:02 +02:00
Petr Oros a27d3bd029 devlink: introduce possibility to expose info about nested devlinks
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit c5e1bf8a51cfe5060e91c7533098e329c0118f6d
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Wed Sep 13 09:12:42 2023 +0200

    devlink: introduce possibility to expose info about nested devlinks

    In mlx5, there is a devlink instance created for PCI device. Also, one
    separate devlink instance is created for auxiliary device that
    represents the netdev of uplink port. This relation is currently
    invisible to the devlink user.

    Benefit from the rel infrastructure and allow for nested devlink
    instance to set the relationship for the nested-in devlink instance.
    Note that there may be many nested instances, therefore use xarray to
    hold the list of rel_indexes for individual nested instances.

    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:00 +02:00
Petr Oros ddad7fb46e devlink: introduce object and nested devlink relationship infra
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit c137743bce02b18c1537d4681aa515f7b80bf0a8
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Wed Sep 13 09:12:38 2023 +0200

    devlink: introduce object and nested devlink relationship infra

    It is a bit tricky to maintain relationship between devlink objects and
    nested devlink instances due to following aspects:

    1) Locking. It is necessary to lock the devlink instance that contains
       the object first, only after that to lock the nested instance.
    2) Lifetimes. Objects (e.g devlink port) may be removed before
       the nested devlink instance.
    3) Notifications. If nested instance changes (e.g. gets
       registered/unregistered) the nested-in object needs to send
       appropriate notifications.

    Resolve this by introducing an xarray that holds 1:1 relationships
    between devlink object and related nested devlink instance.
    Use that xarray index to get the object/nested devlink instance on
    the other side.

    Provide necessary helpers:
    devlink_rel_nested_in_add/clear() to add and clear the relationship.
    devlink_rel_nested_in_notify() to call the nested-in object to send
            notifications during nested instance register/unregister/netns
            change.
    devlink_rel_devlink_handle_put() to be used by nested-in object fill
            function to fill the nested handle.

    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:00 +02:00
Ivan Vecera 94ec273900 devlink: move tracepoint definitions into core.c
JIRA: https://issues.redhat.com/browse/RHEL-30656

commit 890c556674377c0abba4ab91ff6f1962175d578c
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Mon Aug 28 08:16:55 2023 +0200

    devlink: move tracepoint definitions into core.c

    Move remaining tracepoint definitions to most suitable file core.c.

    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Link: https://lore.kernel.org/r/20230828061657.300667-14-jiri@resnulli.us
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-04-10 09:19:34 +02:00
Davide Caratti b026a36e8c devlink: Fix crash with CONFIG_NET_NS=n
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2219326
Upstream Status: net.git commit d6352dae0903

commit d6352dae0903fe8beae4c007dc320e9e9f1fed45
Author: Ido Schimmel <idosch@nvidia.com>
Date:   Mon May 15 19:29:25 2023 +0300

    devlink: Fix crash with CONFIG_NET_NS=n

    '__net_initdata' becomes a no-op with CONFIG_NET_NS=y, but when this
    option is disabled it becomes '__initdata', which means the data can be
    freed after the initialization phase. This annotation is obviously
    incorrect for the devlink net device notifier block which is still
    registered after the initialization phase [1].

    Fix this crash by removing the '__net_initdata' annotation.

    [1]
    general protection fault, probably for non-canonical address 0xcccccccccccccccc: 0000 [#1] PREEMPT SMP
    CPU: 3 PID: 117 Comm: (udev-worker) Not tainted 6.4.0-rc1-custom-gdf0acdc59b09 #64
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc37 04/01/2014
    RIP: 0010:notifier_call_chain+0x58/0xc0
    [...]
    Call Trace:
     <TASK>
     dev_set_mac_address+0x85/0x120
     dev_set_mac_address_user+0x30/0x50
     do_setlink+0x219/0x1270
     rtnl_setlink+0xf7/0x1a0
     rtnetlink_rcv_msg+0x142/0x390
     netlink_rcv_skb+0x58/0x100
     netlink_unicast+0x188/0x270
     netlink_sendmsg+0x214/0x470
     __sys_sendto+0x12f/0x1a0
     __x64_sys_sendto+0x24/0x30
     do_syscall_64+0x38/0x80
     entry_SYSCALL_64_after_hwframe+0x63/0xcd

    Fixes: e93c9378e33f ("devlink: change per-devlink netdev notifier to static one")
    Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
    Closes: https://lore.kernel.org/netdev/600ddf9e-589a-2aa0-7b69-a438f833ca10@samsung.com/
    Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Simon Horman <simon.horman@corigine.com>
    Link: https://lore.kernel.org/r/20230515162925.1144416-1-idosch@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2023-07-03 10:57:40 +02:00
Davide Caratti 82ef3982f6 devlink: change per-devlink netdev notifier to static one
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2219326
Upstream Status: net.git commit e93c9378e33f

commit e93c9378e33f68b61ea9318580d841caa22fb9ea
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Wed May 10 16:46:21 2023 +0200

    devlink: change per-devlink netdev notifier to static one

    The commit 565b4824c39f ("devlink: change port event netdev notifier
    from per-net to global") changed original per-net notifier to be
    per-devlink instance. That fixed the issue of non-receiving events
    of netdev uninit if that moved to a different namespace.
    That worked fine in -net tree.

    However, later on when commit ee75f1fc44dd ("net/mlx5e: Create
    separate devlink instance for ethernet auxiliary device") and
    commit 72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in
    case of PCI device suspend") were merged, a deadlock was introduced
    when removing a namespace with devlink instance with another nested
    instance.

    Here there is the bad flow example resulting in deadlock with mlx5:
    net_cleanup_work -> cleanup_net (takes down_read(&pernet_ops_rwsem) ->
    devlink_pernet_pre_exit() -> devlink_reload() ->
    mlx5_devlink_reload_down() -> mlx5_unload_one_devl_locked() ->
    mlx5_detach_device() -> del_adev() -> mlx5e_remove() ->
    mlx5e_destroy_devlink() -> devlink_free() ->
    unregister_netdevice_notifier() (takes down_write(&pernet_ops_rwsem)

    Steps to reproduce:
    $ modprobe mlx5_core
    $ ip netns add ns1
    $ devlink dev reload pci/0000:08:00.0 netns ns1
    $ ip netns del ns1

    Resolve this by converting the notifier from per-devlink instance to
    a static one registered during init phase and leaving it registered
    forever. Use this notifier for all devlink port instances created
    later on.

    Note what a tree needs this fix only in case all of the cited fixes
    commits are present.

    Reported-by: Moshe Shemesh <moshe@nvidia.com>
    Fixes: 565b4824c39f ("devlink: change port event netdev notifier from per-net to global")
    Fixes: ee75f1fc44dd ("net/mlx5e: Create separate devlink instance for ethernet auxiliary device")
    Fixes: 72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in case of PCI device suspend")
    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Simon Horman <simon.horman@corigine.com>
    Link: https://lore.kernel.org/r/20230510144621.932017-1-jiri@resnulli.us
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2023-07-03 10:57:40 +02:00
Petr Oros 98edafcd32 devlink: convert param list to xarray
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2172273

Upstream commit(s):
commit a72e17b4523223015d3b3fd79bac2b065d6d09a9
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Fri Feb 10 11:01:29 2023 +0100

    devlink: convert param list to xarray

    Loose the linked list for params and use xarray instead.

    Note that this is required to be eventually possible to call
    devl_param_driverinit_value_get() without holding instance lock.

    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Simon Horman <simon.horman@corigine.com>
    Acked-by: Jakub Kicinski <kuba@kernel.org>
    Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2023-04-04 11:12:26 +02:00
Petr Oros 2a336c7173 devlink: change port event netdev notifier from per-net to global
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2172273

Conflicts:
-  adjusted conflict which was resolved in upstream 8697a258ae2470
   ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")

Upstream commit(s):
commit 565b4824c39fa335cba2028a09d7beb7112f3c9a
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Mon Feb 6 10:41:51 2023 +0100

    devlink: change port event netdev notifier from per-net to global

    Currently only the network namespace of devlink instance is monitored
    for port events. If netdev is moved to a different namespace and then
    unregistered, NETDEV_PRE_UNINIT is missed which leads to trigger
    following WARN_ON in devl_port_unregister().
    WARN_ON(devlink_port->type != DEVLINK_PORT_TYPE_NOTSET);

    Fix this by changing the netdev notifier from per-net to global so no
    event is missed.

    Fixes: 02a68a47eade ("net: devlink: track netdev with devlink_port assigned")
    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
    Link: https://lore.kernel.org/r/20230206094151.2557264-1-jiri@resnulli.us
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Petr Oros <poros@redhat.com>
2023-04-04 11:12:25 +02:00
Petr Oros 2adb8b0b2c devlink: remove devlink features
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2172273

Conflicts:
-  Unmerged, because file missing in rhel:
    drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_devlink.c
    drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_devlink.c

Upstream commit(s):
commit fb8421a94c5613fee86e192bab0892ecb1d56e4c
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Fri Jan 27 16:50:42 2023 +0100

    devlink: remove devlink features

    Devlink features were introduced to disallow devlink reload calls of
    userspace before the devlink was fully initialized. The reason for this
    workaround was the fact that devlink reload was originally called
    without devlink instance lock held.

    However, with recent changes that converted devlink reload to be
    performed under devlink instance lock, this is redundant so remove
    devlink features entirely.

    Note that mlx5 used this to enable devlink reload conditionally only
    when device didn't act as multi port slave. Move the multi port check
    into mlx5_devlink_reload_down() callback alongside with the other
    checks preventing the device from reload in certain states.

    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2023-04-04 11:12:22 +02:00
Petr Oros 74ecb62cab devlink: remove reporters_lock
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2172273

Upstream commit(s):
commit 1dea3b4e4c52f4bed64d1c527d548e82ccaea15a
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Wed Jan 18 16:21:09 2023 +0100

    devlink: remove reporters_lock

    Similar to other devlink objects, rely on devlink instance lock
    and remove object specific reporters_lock.

    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2023-04-04 11:12:18 +02:00
Petr Oros 2147f2b647 devlink: remove linecards lock
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2172273

Upstream commit(s):
commit 5cc9049cb9021a46ad5711a946eb3ded47eed0de
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Wed Jan 18 16:21:04 2023 +0100

    devlink: remove linecards lock

    Similar to other devlink objects, convert the linecards list to be
    protected by devlink instance lock. Alongside with that rename the
    create/destroy() functions to devl_* to indicate the devlink instance
    lock needs to be held while calling them.

    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2023-04-03 14:06:09 +02:00
Petr Oros b2f2d5c458 devlink: keep the instance mutex alive until references are gone
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2172273

Upstream commit(s):
commit 93e71edfd90ca7e07a3645167f1e8e4504d4e8ee
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Tue Jan 10 20:29:08 2023 -0800

    devlink: keep the instance mutex alive until references are gone

    The reference needs to keep the instance memory around, but also
    the instance lock must remain valid. Users will take the lock,
    check registration status and release the lock. mutex_destroy()
    etc. belong in the same place as the freeing of the memory.

    Unfortunately lockdep_unregister_key() sleeps so we need
    to switch the an rcu_work.

    Note that the problem is a bit hard to repro, because
    devlink_pernet_pre_exit() iterates over registered instances.
    AFAIU the instances must get devlink_free()d concurrently with
    the namespace getting deleted for the problem to occur.

    Reported-by: syzbot+d94d214ea473e218fc89@syzkaller.appspotmail.com
    Reported-by: syzbot+9f0dd863b87113935acf@syzkaller.appspotmail.com
    Fixes: 9053637e0da7 ("devlink: remove the registration guarantee of references")
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
    Link: https://lore.kernel.org/r/20230111042908.988199-1-kuba@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2023-04-03 14:06:08 +02:00
Petr Oros c61e6c6ebd devlink: don't require setting features before registration
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2172273

Upstream commit(s):
commit 6ef8f7da92750c3c25755fac39b561fff2d47378
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Thu Jan 5 22:33:59 2023 -0800

    devlink: don't require setting features before registration

    Requiring devlink_set_features() to be run before devlink is
    registered is overzealous. devlink_set_features() itself is
    a leftover from old workarounds which were trying to prevent
    initiating reload before probe was complete.

    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2023-04-03 14:06:08 +02:00
Petr Oros f3646fb5d8 devlink: remove the registration guarantee of references
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2172273

Upstream commit(s):
commit 9053637e0da783efdb37bbfea6a27b856c0228d7
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Thu Jan 5 22:33:58 2023 -0800

    devlink: remove the registration guarantee of references

    The objective of exposing the devlink instance locks to
    drivers was to let them use these locks to prevent user space
    from accessing the device before it's fully initialized.
    This is difficult because devlink_unregister() waits for all
    references to be released, meaning that devlink_unregister()
    can't itself be called under the instance lock.

    To avoid this issue devlink_register() was moved after subobject
    registration a while ago. Unfortunately the netdev paths get
    a hold of the devlink instances _before_ they are registered.
    Ideally netdev should wait for devlink init to finish (synchronizing
    on the instance lock). This can't work because we don't know if the
    instance will _ever_ be registered (in case of failures it may not).
    The other option of returning an error until devlink_register()
    is called is unappealing (user space would get a notification
    netdev exist but would have to wait arbitrary amount of time
    before accessing some of its attributes).

    Weaken the guarantees of the devlink references.

    Holding a reference will now only guarantee that the memory
    of the object is around. Another way of looking at it is that
    the reference now protects the object not its "registered" status.
    Use devlink instance lock to synchronize unregistration.

    This implies that releasing of the "main" reference of the devlink
    instance moves from devlink_unregister() to devlink_free().

    Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2023-04-03 14:06:08 +02:00
Petr Oros b753c1904f devlink: always check if the devlink instance is registered
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2172273

Upstream commit(s):
commit ed539ba614a079ea696b92beef1eafec66f831a4
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Thu Jan 5 22:33:57 2023 -0800

    devlink: always check if the devlink instance is registered

    Always check under the instance lock whether the devlink instance
    is still / already registered.

    This is a no-op for the most part, as the unregistration path currently
    waits for all references. On the init path, however, we may temporarily
    open up a race with netdev code, if netdevs are registered before the
    devlink instance. This is temporary, the next change fixes it, and this
    commit has been split out for the ease of review.

    Note that in case of iterating over sub-objects which have their
    own lock (regions and line cards) we assume an implicit dependency
    between those objects existing and devlink unregistration.

    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2023-04-03 14:06:07 +02:00
Petr Oros 71ee533c99 devlink: update the code in netns move to latest helpers
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2172273

Upstream commit(s):
commit 7a54a5195b2a877a972ec21a5ca415c1fc2aec61
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Thu Jan 5 22:33:55 2023 -0800

    devlink: update the code in netns move to latest helpers

    devlink_pernet_pre_exit() is the only obvious place which takes
    the instance lock without using the devl_ helpers. Update the code
    and move the error print after releasing the reference
    (having unlock and put together feels slightly idiomatic).

    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2023-04-03 14:06:07 +02:00
Petr Oros 34f653acb7 devlink: bump the instance index directly when iterating
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2172273

Upstream commit(s):
commit d772781964415c63759572b917e21c4f7ec08d9f
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Thu Jan 5 22:33:54 2023 -0800

    devlink: bump the instance index directly when iterating

    xa_find_after() is designed to handle multi-index entries correctly.
    If a xarray has two entries one which spans indexes 0-3 and one at
    index 4 xa_find_after(0) will return the entry at index 4.

    Having to juggle the two callbacks, however, is unnecessary in case
    of the devlink xarray, as there is 1:1 relationship with indexes.

    Always use xa_find() and increment the index manually.

    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2023-04-03 14:06:07 +02:00
Petr Oros 1627d8b8a6 devlink: restart dump based on devlink instance ids (simple)
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2172273

Upstream commit(s):
commit 731d69a6bd13b7c0cdbd3607edfa681269d54828
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Wed Jan 4 20:05:26 2023 -0800

    devlink: restart dump based on devlink instance ids (simple)

    xarray gives each devlink instance an id and allows us to restart
    walk based on that id quite neatly. This is nice both from the
    perspective of code brevity and from the stability of the dump
    (devlink instances disappearing from before the resumption point
    will not cause inconsistent dumps).

    This patch takes care of simple cases where state->idx counts
    devlink instances only.

    Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2023-04-03 14:06:06 +02:00
Petr Oros 356ec9f5dc devlink: drop the filter argument from devlinks_xa_find_get
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2172273

Upstream commit(s):
commit 8861c0933c78e3631fe752feadc0d2a6e5eab1e1
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Wed Jan 4 20:05:24 2023 -0800

    devlink: drop the filter argument from devlinks_xa_find_get

    Looks like devlinks_xa_find_get() was intended to get the mark
    from the @filter argument. It doesn't actually use @filter, passing
    DEVLINK_REGISTERED to xa_find_fn() directly. Walking marks other
    than registered is unlikely so drop @filter argument completely.

    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2023-04-03 14:06:06 +02:00
Petr Oros ab7cd16bd7 devlink: split out core code
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2172273

Upstream commit(s):
commit 687125b5799cd5120437fa455cfccbe8537916ff
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Wed Jan 4 20:05:19 2023 -0800

    devlink: split out core code

    Move core code into a separate file. It's spread around the main
    file which makes refactoring and figuring out how devlink works
    harder.

    Move the xarray, all the most core devlink instance code out like
    locking, ref counting, alloc, register, etc. Leave port stuff in
    leftover.c, if we want to move port code it'd probably be to its
    own file.

    Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2023-04-03 14:06:05 +02:00