Commit Graph

275 Commits

Author SHA1 Message Date
Kamal Heib 1eb0b6fbb0 IB/cm: Rework sending DREQ when destroying a cm_id
JIRA: https://issues.redhat.com/browse/RHEL-77880

commit fc0856c3a32576fb21c494f38b9c6c8dc3bf58ab
Author: Sean Hefty <shefty@nvidia.com>
Date:   Wed Nov 13 13:12:56 2024 +0200

    IB/cm: Rework sending DREQ when destroying a cm_id

    A DREQ is sent in 2 situations:

      1. When requested by the user.
         This DREQ has to wait for a DREP, which will be routed to the user.

      2. When the cm_id is destroyed.
         This DREQ is generated by the CM to notify the peer that the
         connection has been destroyed.

    In the latter case, any DREP that is received will be discarded.
    There's no need to hold a reference on the cm_id.  Today, both
    situations are covered by the same function: cm_send_dreq_locked().
    When invoked in the cm_id destroy path, the cm_id reference would be
    held until the DREQ completes, blocking the destruction.  Because it
    could take several seconds to minutes before the DREQ receives a DREP,
    the destroy call posts a send for the DREQ then immediately cancels the
    MAD.  However, cancellation is not immediate in the MAD layer.  There
    could still be a delay before the MAD layer returns the DREQ to the CM.
    Moreover, the only guarantee is that the DREQ will be sent at most once.

    Introduce a separate flow for sending a DREQ when destroying the cm_id.
    The new flow will not hold a reference on the cm_id, allowing it to be
    cleaned up immediately.  The cancellation trick is no longer needed.
    The MAD layer will send the DREQ exactly once.

    Signed-off-by: Sean Hefty <shefty@nvidia.com>
    Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
    Signed-off-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
    Link: https://patch.msgid.link/a288a098b8e0550305755fd4a7937431699317f4.1731495873.git.leon@kernel.org
    Signed-off-by: Leon Romanovsky <leon@kernel.org>

Signed-off-by: Kamal Heib <kheib@redhat.com>
2025-02-04 14:40:53 -05:00
Kamal Heib 8f1932ac4e IB/cm: Do not hold reference on cm_id unless needed
JIRA: https://issues.redhat.com/browse/RHEL-77880

commit 1e5159219076ddb2e44338c667c83fd1bd43dfef
Author: Sean Hefty <shefty@nvidia.com>
Date:   Wed Nov 13 13:12:55 2024 +0200

    IB/cm: Do not hold reference on cm_id unless needed

    Typically, when the CM sends a MAD it bumps a reference count
    on the associated cm_id.  There are some exceptions, such
    as when the MAD is a direct response to a receive MAD.  For
    example, the CM may generate an MRA in response to a duplicate
    REQ.  But, in general, if a MAD may be sent as a result of
    the user invoking an API call (e.g. ib_send_cm_rep(),
    ib_send_cm_rtu(), etc.), a reference is taken on the cm_id.

    This reference is necessary if the MAD requires a response.
    The reference allows routing a response MAD back to the
    cm_id, or, if no response is received, allows updating the
    cm_id state to reflect the failure.

    For MADs which do not generate a response from the
    target, however, there's no need to hold a reference on the cm_id.
    Such MADs will not be retried by the MAD layer and their
    completions do not change the state of the cm_id.

    There are 2 internal calls used to allocate MADs which take
    a reference on the cm_id: cm_alloc_msg() and cm_alloc_priv_msg().
    The latter calls the former.  It turns out that all other places
    where cm_alloc_msg() is called are for MADs that do not generate
    a response from the target: sending an RTU, DREP, REJ, MRA, or
    SIDR REP.  In all of these cases, there's no need to hold a
    reference on the cm_id.

    The benefit of dropping unneeded references is that it allows
    destruction of the cm_id to proceed immediately.  Currently,
    the cm_destroy_id() call blocks as long as there's a reference
    held on the cm_id.  Worse, is that cm_destroy_id() will send
    MADs, which it then needs to complete.  Sending the MADs is
    beneficial, as they notify the peer that a connection is
    being destroyed.  However, since the MADs hold a reference
    on the cm_id, they block destruction and cannot be retried.

    Move cm_id referencing from cm_alloc_msg() to cm_alloc_priv_msg().
    The latter should hold a reference on the cm_id in all cases but
    one, which will be handled in a separate patch.  cm_alloc_priv_msg()
    is used when sending a REQ, REP, DREQ, and SIDR REQ, all of which
    require a response.

    Also, merge common code into cm_alloc_priv_msg() and combine the
    freeing of all messages which do not need a response.

    Signed-off-by: Sean Hefty <shefty@nvidia.com>
    Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
    Signed-off-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
    Link: https://patch.msgid.link/1f0f96acace72790ecf89087fc765dead960189e.1731495873.git.leon@kernel.org
    Signed-off-by: Leon Romanovsky <leon@kernel.org>

Signed-off-by: Kamal Heib <kheib@redhat.com>
2025-02-04 14:40:53 -05:00
Kamal Heib aa8773072d IB/cm: Explicitly mark if a response MAD is a retransmission
JIRA: https://issues.redhat.com/browse/RHEL-77880

commit 0492458750c9fbd69cfc7baddd3ddcac77f2a0c8
Author: Sean Hefty <shefty@nvidia.com>
Date:   Wed Nov 13 13:12:54 2024 +0200

    IB/cm: Explicitly mark if a response MAD is a retransmission

    In several situations the CM may send a reply to a received MAD
    without the reply being directly linked with a cm_id.  For
    example, it may send a REJ in response to a REQ which does not
    match a listener.  Or, it may send a DREP in response to a DREQ
    if the cm_id has already been destroyed.  This can happen if the
    original DREP was lost and the DREQ was retried.

    When such a response MAD completes, it updates a counter tracking
    how many MADs were retried.  However, not all response MADs issued
    directly by the CM may be retries.  The REJ mentioned in the example
    above is such a case.  To distinguish between responses which were
    retries versus those that are not, the send_handler performs the
    following check: is a retry if the response is not associated with
    a cm_id and the response is not a REJ message.

    Replace this indirect method of checking if a response is a retry
    with an explicit check.  Note that these retries are generated
    directly by the CM, rather than retried by the MAD layer.

    This change will be needed by later changes which would otherwise
    break the indirect check.

    Signed-off-by: Sean Hefty <shefty@nvidia.com>
    Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
    Signed-off-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
    Link: https://patch.msgid.link/1ee6e2a68f8de1992b9da23aa1d7e3f9f25e0036.1731495873.git.leon@kernel.org
    Signed-off-by: Leon Romanovsky <leon@kernel.org>

Signed-off-by: Kamal Heib <kheib@redhat.com>
2025-02-04 14:40:53 -05:00
Kamal Heib e3f67f8fe4 RDMA/cm: Print the old state when cm_destroy_id gets timeout
JIRA: https://issues.redhat.com/browse/RHEL-56247

commit b68e1acb5834ed1a2ad42d9d002815a8bae7c0b6
Author: Mark Zhang <markzhang@nvidia.com>
Date:   Fri Mar 22 13:20:49 2024 +0200

    RDMA/cm: Print the old state when cm_destroy_id gets timeout

    The old state is helpful for debugging, as the current state is always
    IB_CM_IDLE when timeout happens.

    Fixes: 96d9cbe2f2ff ("RDMA/cm: add timeout to cm_destroy_id wait")
    Signed-off-by: Mark Zhang <markzhang@nvidia.com>
    Link: https://lore.kernel.org/r/20240322112049.2022994-1-markzhang@nvidia.com
    Signed-off-by: Leon Romanovsky <leon@kernel.org>

Signed-off-by: Kamal Heib <kheib@redhat.com>
2024-10-07 11:55:52 -04:00
Kamal Heib ae741fa68f RDMA/cm: add timeout to cm_destroy_id wait
JIRA: https://issues.redhat.com/browse/RHEL-56247

commit 96d9cbe2f2ff7abde021bac75eafaceabe9a51fa
Author: Manjunath Patil <manjunath.b.patil@oracle.com>
Date:   Fri Mar 8 22:33:23 2024 -0800

    RDMA/cm: add timeout to cm_destroy_id wait

    Add timeout to cm_destroy_id, so that userspace can trigger any data
    collection that would help in analyzing the cause of delay in destroying
    the cm_id.

    New noinline function helps dtrace/ebpf programs to hook on to it.
    Existing functionality isn't changed except triggering a probe-able new
    function at every timeout interval.

    We have seen cases where CM messages stuck with MAD layer (either due to
    software bug or faulty HCA), leading to cm_id getting stuck in the
    following call stack. This patch helps in resolving such issues faster.

    kernel: ... INFO: task XXXX:56778 blocked for more than 120 seconds.
    ...
            Call Trace:
            __schedule+0x2bc/0x895
            schedule+0x36/0x7c
            schedule_timeout+0x1f6/0x31f
            ? __slab_free+0x19c/0x2ba
            wait_for_completion+0x12b/0x18a
            ? wake_up_q+0x80/0x73
            cm_destroy_id+0x345/0x610 [ib_cm]
            ib_destroy_cm_id+0x10/0x20 [ib_cm]
            rdma_destroy_id+0xa8/0x300 [rdma_cm]
            ucma_destroy_id+0x13e/0x190 [rdma_ucm]
            ucma_write+0xe0/0x160 [rdma_ucm]
            __vfs_write+0x3a/0x16d
            vfs_write+0xb2/0x1a1
            ? syscall_trace_enter+0x1ce/0x2b8
            SyS_write+0x5c/0xd3
            do_syscall_64+0x79/0x1b9
            entry_SYSCALL_64_after_hwframe+0x16d/0x0

    Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com>
    Link: https://lore.kernel.org/r/20240309063323.458102-1-manjunath.b.patil@oracle.com
    Signed-off-by: Leon Romanovsky <leon@kernel.org>

Signed-off-by: Kamal Heib <kheib@redhat.com>
2024-10-07 11:55:52 -04:00
Kamal Heib cbc763e407 RDMA/cm: Trace icm_send_rej event before the cm state is reset
JIRA: https://issues.redhat.com/browse/RHEL-956

commit bd9de1badac7e4ff6780365d4aa38983f5e2a436
Author: Mark Zhang <markzhang@nvidia.com>
Date:   Thu Mar 30 10:23:51 2023 +0300

    RDMA/cm: Trace icm_send_rej event before the cm state is reset

    Trace icm_send_rej event before the cm state is reset to idle, so that
    correct cm state will be logged. For example when an incoming request is
    rejected, the old trace log was:
        icm_send_rej: local_id=961102742 remote_id=3829151631 state=IDLE reason=REJ_CONSUMER_DEFINED
    With this patch:
        icm_send_rej: local_id=312971016 remote_id=3778819983 state=MRA_REQ_SENT reason=REJ_CONSUMER_DEFINED

    Fixes: 8dc105befe ("RDMA/cm: Add tracepoints to track MAD send operations")
    Signed-off-by: Mark Zhang <markzhang@nvidia.com>
    Link: https://lore.kernel.org/r/20230330072351.481200-1-markzhang@nvidia.com
    Signed-off-by: Leon Romanovsky <leon@kernel.org>

Signed-off-by: Kamal Heib <kheib@redhat.com>
2023-09-05 10:56:05 -04:00
Kamal Heib 3a946fbe4f RDMA/cm: Make QP FLUSHABLE for supported device
Bugzilla: https://bugzilla.redhat.com/2168936

commit 8b4d379b399d19f4c803e565bfe13f07b66b5ad7
Author: Li Zhijian <lizhijian@fujitsu.com>
Date:   Tue Dec 6 21:02:00 2022 +0800

    RDMA/cm: Make QP FLUSHABLE for supported device

    Similar to RDMA and Atomic qp attributes enabled by default in CM, enable
    FLUSH attribute for supported device. That makes applications that are
    built with rdma_create_ep, rdma_accept APIs have FLUSH qp attribute
    natively so that user is able to request FLUSH operation simpler.

    Note that, a FLUSH operation requires FLUSH are supported by both
    device(HCA) and memory region(MR) and QP at the same time, so it's safe
    to enable FLUSH qp attribute by default here.

    FLUSH attribute can be disable by modify_qp() interface.

    Link: https://lore.kernel.org/r/20221206130201.30986-10-lizhijian@fujitsu.com
    Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
    Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Signed-off-by: Kamal Heib <kheib@redhat.com>
2023-03-31 14:16:05 -04:00
Kamal Heib 6b2dbc867c RDMA/cm: Use DLID from inbound/outbound PathRecords as the datapath DLID
Bugzilla: https://bugzilla.redhat.com/2168933

commit eb8336dbe373edd1ad6061c543e4ba6ea60f6cc9
Author: Mark Zhang <markzhang@nvidia.com>
Date:   Thu Sep 8 13:09:03 2022 +0300

    RDMA/cm: Use DLID from inbound/outbound PathRecords as the datapath DLID

    In inter-subnet cases, when inbound/outbound PRs are available,
    outbound_PR.dlid is used as the requestor's datapath DLID and
    inbound_PR.dlid is used as the responder's DLID. The inbound_PR.dlid
    is passed to responder side with the "ConnectReq.Primary_Local_Port_LID"
    field. With this solution the PERMISSIVE_LID is no longer used in
    Primary Local LID field.

    Signed-off-by: Mark Zhang <markzhang@nvidia.com>
    Reviewed-by: Mark Bloch <mbloch@nvidia.com>
    Link: https://lore.kernel.org/r/b3f6cac685bce9dde37c610be82e2c19d9e51d9e.1662631201.git.leonro@nvidia.com
    Signed-off-by: Leon Romanovsky <leon@kernel.org>

Signed-off-by: Kamal Heib <kheib@redhat.com>
2023-03-31 14:15:53 -04:00
Kamal Heib 4b1b6a329d IB/cm: Refactor cm_insert_listen() and cm_find_listen()
Bugzilla: https://bugzilla.redhat.com/2168933

commit 637ff8ea00a20dd731110c9cdbef0e41c050607d
Author: Mark Zhang <markzhang@nvidia.com>
Date:   Fri Aug 19 12:08:59 2022 +0300

    IB/cm: Refactor cm_insert_listen() and cm_find_listen()

    Move the device and service_id match code at the top of
    cm_insert_listen() and cm_find_listen() into the final else branch.

    Link: https://lore.kernel.org/r/20220819090859.957943-4-markzhang@nvidia.com
    Signed-off-by: Mark Zhang <markzhang@nvidia.com>
    Signed-off-by: Leon Romanovsky <leon@kernel.org>

Signed-off-by: Kamal Heib <kheib@redhat.com>
2023-03-31 14:15:52 -04:00
Kamal Heib 70489d6e06 IB/cm: remove cm_id_priv->id.service_mask and service_mask parameter of cm_init_listen()
Bugzilla: https://bugzilla.redhat.com/2168933

commit a461b746c5768b9b3001045cff2d508346f5f789
Author: Mark Zhang <markzhang@nvidia.com>
Date:   Fri Aug 19 12:08:58 2022 +0300

    IB/cm: remove cm_id_priv->id.service_mask and service_mask parameter of cm_init_listen()

    The service_mask is always ~cpu_to_be64(0), so the result is always
    a NOP when it is &'d with a service_id. Remove it for simplicity.

    Link: https://lore.kernel.org/r/20220819090859.957943-3-markzhang@nvidia.com
    Signed-off-by: Mark Zhang <markzhang@nvidia.com>
    Signed-off-by: Leon Romanovsky <leon@kernel.org>

Signed-off-by: Kamal Heib <kheib@redhat.com>
2023-03-31 14:15:52 -04:00
Kamal Heib db5159b827 IB/cm: Remove the service_mask parameter from ib_cm_listen()
Bugzilla: https://bugzilla.redhat.com/2168933

commit 91a3f14ec953f3224215dc867001b9a201785740
Author: Mark Zhang <markzhang@nvidia.com>
Date:   Fri Aug 19 12:08:57 2022 +0300

    IB/cm: Remove the service_mask parameter from ib_cm_listen()

    Remove the service_mask parameter of ib_cm_listen(), as all callers
    use 0.

    Link: https://lore.kernel.org/r/20220819090859.957943-2-markzhang@nvidia.com
    Signed-off-by: Mark Zhang <markzhang@nvidia.com>
    Signed-off-by: Leon Romanovsky <leon@kernel.org>

Signed-off-by: Kamal Heib <kheib@redhat.com>
2023-03-31 14:15:52 -04:00
Kamal Heib 7b4315962c RDMA/cm: Use SLID in the work completion as the DLID in responder side
Bugzilla: https://bugzilla.redhat.com/2120668

commit b7d95040c13f61a4a6a859c5355faf583eff9658
Author: Mark Zhang <markzhang@nvidia.com>
Date:   Thu Sep 8 13:09:02 2022 +0300

    RDMA/cm: Use SLID in the work completion as the DLID in responder side

    The responder should always use WC's SLID as the dlid, to follow the
    IB SPEC section "13.5.4.2 COMMON RESPONSE ACTIONS":
    A responder always takes the following actions in constructing a
    response packet:
    - The SLID of the received packet is used as the DLID in the response
      packet.

    Fixes: ac3a949fb2 ("IB/CM: Set appropriate slid and dlid when handling CM request")
    Signed-off-by: Mark Zhang <markzhang@nvidia.com>
    Reviewed-by: Mark Bloch <mbloch@nvidia.com>
    Link: https://lore.kernel.org/r/cd17c240231e059d2fc07c17dfe555d548b917eb.1662631201.git.leonro@nvidia.com
    Signed-off-by: Leon Romanovsky <leon@kernel.org>

Signed-off-by: Kamal Heib <kheib@redhat.com>
2022-11-29 11:40:49 -05:00
Kamal Heib 8b6458b791 RDMA/cm: Fix memory leak in ib_cm_insert_listen
Bugzilla: http://bugzilla.redhat.com/2097326

commit 2990f223ffa7bb25422956b9f79f9176a5b38346
Author: Miaoqian Lin <linmq006@gmail.com>
Date:   Tue Jun 21 09:25:44 2022 +0400

    RDMA/cm: Fix memory leak in ib_cm_insert_listen

    cm_alloc_id_priv() allocates resource for the cm_id_priv. When
    cm_init_listen() fails it doesn't free it, leading to memory leak.

    Add the missing error unwind.

    Fixes: 98f67156a8 ("RDMA/cm: Simplify establishing a listen cm_id")
    Link: https://lore.kernel.org/r/20220621052546.4821-1-linmq006@gmail.com
    Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
    Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Signed-off-by: Kamal Heib <kheib@redhat.com>
2022-07-27 08:55:15 -04:00
Kamal Heib 3fbfc06953 IB/cm: Cancel mad on the DREQ event when the state is MRA_REP_RCVD
Bugzilla: http://bugzilla.redhat.com/2056772

commit 107dd7beba403a363adfeb3ffe3734fe38a05cce
Author: Mark Zhang <markzhang@nvidia.com>
Date:   Mon Apr 4 11:58:05 2022 +0300

    IB/cm: Cancel mad on the DREQ event when the state is MRA_REP_RCVD

    On the passive side when the disconnectReq event comes, if the current
    state is MRA_REP_RCVD, it needs to cancel the MAD before entering the
    DREQ_RCVD and TIMEWAIT states, otherwise the destroy_id may block until
    this mad will reach timeout.

    Fixes: a977049dac ("[PATCH] IB: Add the kernel CM implementation")
    Link: https://lore.kernel.org/r/75261c00c1d82128b1d981af9ff46e994186e621.1649062436.git.leonro@nvidia.com
    Signed-off-by: Mark Zhang <markzhang@nvidia.com>
    Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
    Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
    Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Signed-off-by: Kamal Heib <kheib@redhat.com>
2022-05-10 11:45:11 +03:00
Kamal Heib f57fff0e9b IB/cm: Release previously acquired reference counter in the cm_id_priv
Bugzilla: http://bugzilla.redhat.com/2056771

commit b856101a1774b5f1c8c99e8dfdef802856520732
Author: Mark Zhang <markzhang@nvidia.com>
Date:   Wed Jan 19 10:37:55 2022 +0200

    IB/cm: Release previously acquired reference counter in the cm_id_priv

    In failure flow, the reference counter acquired was not released,
    and the following error was reported:

      drivers/infiniband/core/cm.c:3373 cm_lap_handler() warn: inconsistent
                            refcounting 'cm_id_priv->refcount.refs.counter':

    Fixes: 7345201c39 ("IB/cm: Improve the calling of cm_init_av_for_lap and cm_init_av_by_path")
    Link: https://lore.kernel.org/r/7615f23bbb5c5b66d03f6fa13e1c99d51dae6916.1642581448.git.leonro@nvidia.com
    Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
    Signed-off-by: Mark Zhang <markzhang@nvidia.com>
    Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
    Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Signed-off-by: Kamal Heib <kheib@redhat.com>
2022-03-23 20:02:42 -04:00
Wenpeng Liang 3cea7b4a7d RDMA/core: Fix incorrect print format specifier
There are some '%u' for 'int' and '%d' for 'unsigend int', they should be
fixed.

Link: https://lore.kernel.org/r/1623325232-30900-1-git-send-email-liweihang@huawei.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-21 15:38:30 -03:00
Jason Gunthorpe 526a12c8c5 RDMA/cm: Use an attribute_group on the ib_port_attribute intead of kobj's
This code is trying to attach a list of counters grouped into 4 groups to
the ib_port sysfs. Instead of creating a bunch of kobjects simply express
everything naturally as an ib_port_attribute and add a single
attribute_groups list.

Remove all the naked kobject manipulations.

Link: https://lore.kernel.org/r/0d5a7241ee0fe66622de04fcbaafaf6a791d5c7c.1623427137.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-16 20:58:31 -03:00
Jason Gunthorpe bf0480a2df IB/cm: Remove dgid from the cm_id_priv av
It turns out this is only being used to store the LID for SIDR mode to
search the RB tree for request de-duplication. Store the LID value
directly and don't pretend it is a GID.

Link: https://lore.kernel.org/r/2e7c87b6f662c90c642fc1838e363ad3e6ef14a4.1623236345.git.leonro@nvidia.com
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-10 09:39:27 -03:00
Mark Zhang 76039ac909 IB/cm: Protect cm_dev, cm_ports and mad_agent with kref and lock
During cm_dev deregistration in cm_remove_one(), the cm_device and
cm_ports will be freed, after that they should not be accessed. The
mad_agent needs to be protected as well.

This patch adds a cm_device kref to protect cm_dev and cm_ports, and a
mad_agent_lock spinlock to protect mad_agent.

Link: https://lore.kernel.org/r/501ba7a2ff203dccd0e6755d3f93329772adce52.1622629024.git.leonro@nvidia.com
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02 15:41:58 -03:00
Mark Zhang 7345201c39 IB/cm: Improve the calling of cm_init_av_for_lap and cm_init_av_by_path
The cm_init_av_for_lap() and cm_init_av_by_path() function calls have the
following issues:

1. Both of them might sleep and should not be called under spinlock.
2. The access of cm_id_priv->av should be under cm_id_priv->lock, which
   means it can't be initialized directly.

This patch splits the calling of 2 functions into two parts: first one
initializes an AV outside of the spinlock, the second one copies AV to
cm_id_priv->av under spinlock.

Fixes: e1444b5a16 ("IB/cm: Fix automatic path migration support")
Link: https://lore.kernel.org/r/038fb8ad932869b4548b0c7708cab7f76af06f18.1622629024.git.leonro@nvidia.com
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02 15:41:58 -03:00
Mark Zhang 70076a414e IB/cm: Simplify ib_cancel_mad() and ib_modify_mad() calls
The mad_agent parameter is redundant since the struct ib_mad_send_buf
already has a pointer of it.

Link: https://lore.kernel.org/r/0987c784b25f7bfa72f78691f50cff066de587e1.1622629024.git.leonro@nvidia.com
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02 15:41:58 -03:00
Mark Zhang 3595c398f6 Revert "IB/cm: Mark stale CM id's whenever the mad agent was unregistered"
This reverts commit 9db0ff53cb, which wasn't
a full fix and still causes to the following panic:

panic @ time 1605623870.843, thread 0xfffffeb63b552000: vm_fault_lookup: fault on nofault entry, addr: 0xfffffe811a94e000
    time = 1605623870
    cpuid = 9, TSC = 0xb7937acc1b6
    Panic occurred in module kernel loaded at 0xffffffff80200000:Stack: --------------------------------------------------
    kernel:vm_fault+0x19da
    kernel:vm_fault_trap+0x6e
    kernel:trap_pfault+0x1f1
    kernel:trap+0x31e
    kernel:cm_destroy_id+0x38c
    kernel:rdma_destroy_id+0x127
    kernel:sdp_shutdown_task+0x3ae
    kernel:taskqueue_run_locked+0x10b
    kernel:taskqueue_thread_loop+0x87
    kernel:fork_exit+0x83

Link: https://lore.kernel.org/r/4346449a7cdacc7a4eedc89cb1b42d8434ec9814.1622629024.git.leonro@nvidia.com
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02 15:41:58 -03:00
Jason Gunthorpe efafae6717 IB/cm: Tidy remaining cm_msg free paths
Now that all the free paths are explicit cm_free_msg() will only be called
for msgs's allocated with cm_alloc_msg(), so we can assume the context is
set. Place it after the allocation function it is paired with for clarity.

Also remove a bogus NULL assignment in one place after a cancel. This does
nothing other than disable completions to become events, but changing the
state already did that.

Link: https://lore.kernel.org/r/082fd3552be0d1a2c19b1c4cefb5f3f0e3e68e82.1622629024.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02 15:41:57 -03:00
Jason Gunthorpe c1cf6d9f74 IB/cm: Call the correct message free functions in cm_send_handler()
There are now three destroy functions for the cm_msg, and all places
except the general send completion handler use the correct function.

Fix cm_send_handler() to detect which kind of message is being completed
and destroy it using the correct function with the correct locking.

Link: https://lore.kernel.org/r/62a507195b8db85bb11228d0c6e7fa944204bf12.1622629024.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02 15:41:57 -03:00
Jason Gunthorpe 4b4e586ebe IB/cm: Split cm_alloc_msg()
This is being used with two quite different flows, one attaches the
message to the priv and the other does not.

Ensure the message attach is consistently done under the spinlock and
ensure that the free on error always detaches the message from the
cm_id_priv, also always under lock.

This makes read/write to the cm_id_priv->msg consistently locked and
consistently NULL'd when the message is freed, even in all error paths.

Link: https://lore.kernel.org/r/f692b8c89eecb34fd82244f317e478bea6c97688.1622629024.git.leonro@nvidia.com
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02 15:41:57 -03:00
Jason Gunthorpe 96376a4095 IB/cm: Pair cm_alloc_response_msg() with a cm_free_response_msg()
This is not a functional change, but it helps make the purpose of all the
cm_free_msg() calls clearer. In this case a response msg has a NULL
context[0], and is never placed in cm_id_priv->msg.

Link: https://lore.kernel.org/r/5cd53163be7df0a94f0d4ef7294546bc674fb74a.1622629024.git.leonro@nvidia.com
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02 15:41:57 -03:00
Håkon Bugge 65d4801ae4 RDMA/core: Unify RoCE check and re-factor code
In cm_req_handler(), unify the check for RoCE and re-factor to avoid
one test.

Link: https://lore.kernel.org/r/1617705423-15570-1-git-send-email-haakon.bugge@oracle.com
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Fixes: 8f97486024 ("IB/cm: Reduce dependency on gid attribute ndev check")
Fixes: 194f64a3ca ("RDMA/core: Fix corrupted SL on passive side")
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-19 12:56:53 -03:00
Wenpeng Liang 26caea5fda RDMA/core: Correct format of block comments
Block comments should not use a trailing */ on a separate line and every
line of a block comment should start with an '*'.

Link: https://lore.kernel.org/r/1617783353-48249-7-git-send-email-liweihang@huawei.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-12 14:56:51 -03:00
Wenpeng Liang b6eb7011f5 RDMA/core: Correct format of braces
Do following cleanups about braces:

- Add the necessary braces to maintain context alignment.
- Fix the open '{' that is not on the same line as "switch".
- Remove braces that are not necessary for single statement blocks.
- Fix "else" that doesn't follow close brace '}'.

Link: https://lore.kernel.org/r/1617783353-48249-6-git-send-email-liweihang@huawei.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-12 14:56:51 -03:00
Wenpeng Liang f681967ae7 RDMA/core: Remove redundant spaces
Space is not required after '(', before ')', before ',' and between '*'
and symbol name of a definition.

Link: https://lore.kernel.org/r/1617783353-48249-5-git-send-email-liweihang@huawei.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-12 14:56:48 -03:00
Wenpeng Liang 9516b8f9ec RDMA/core: Add necessary spaces
Space is required before '(' of switch statements and around '='.

Link: https://lore.kernel.org/r/1617783353-48249-4-git-send-email-liweihang@huawei.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-12 14:52:22 -03:00
Håkon Bugge 194f64a3ca RDMA/core: Fix corrupted SL on passive side
On RoCE systems, a CM REQ contains a Primary Hop Limit > 1 and Primary
Subnet Local is zero.

In cm_req_handler(), the cm_process_routed_req() function is called. Since
the Primary Subnet Local value is zero in the request, and since this is
RoCE (Primary Local LID is permissive), the following statement will be
executed:

      IBA_SET(CM_REQ_PRIMARY_SL, req_msg, wc->sl);

This corrupts SL in req_msg if it was different from zero. In other words,
a request to setup a connection using an SL != zero, will not be honored,
and a connection using SL zero will be created instead.

Fixed by not calling cm_process_routed_req() on RoCE systems, the
cm_process_route_req() is only for IB anyhow.

Fixes: 3971c9f6db ("IB/cm: Add interim support for routed paths")
Link: https://lore.kernel.org/r/1616420132-31005-1-git-send-email-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-01 14:47:24 -03:00
Mark Bloch 1fb7f8973f RDMA: Support more than 255 rdma ports
Current code uses many different types when dealing with a port of a RDMA
device: u8, unsigned int and u32. Switch to u32 to clean up the logic.

This allows us to make (at least) the core view consistent and use the
same type. Unfortunately not all places can be converted. Many uverbs
functions expect port to be u8 so keep those places in order not to break
UAPIs.  HW/Spec defined values must also not be changed.

With the switch to u32 we now can support devices with more than 255
ports. U32_MAX is reserved to make control logic a bit easier to deal
with. As a device with U32_MAX ports probably isn't going to happen any
time soon this seems like a non issue.

When a device with more than 255 ports is created uverbs will report the
RDMA device as having 255 ports as this is the max currently supported.

The verbs interface is not changed yet because the IBTA spec limits the
port size in too many places to be u8 and all applications that relies in
verbs won't be able to cope with this change. At this stage, we are
extending the interfaces that are using vendor channel solely

Once the limitation is lifted mlx5 in switchdev mode will be able to have
thousands of SFs created by the device. As the only instance of an RDMA
device that reports more than 255 ports will be a representor device and
it exposes itself as a RAW Ethernet only device CM/MAD/IPoIB and other
ULPs aren't effected by this change and their sysfs/interfaces that are
exposes to userspace can remain unchanged.

While here cleanup some alignment issues and remove unneeded sanity
checks (mainly in rdmavt),

Link: https://lore.kernel.org/r/20210301070420.439400-1-leon@kernel.org
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-26 09:31:21 -03:00
Saeed Mahameed 221384df61 RDMA/cm: Fix IRQ restore in ib_send_cm_sidr_rep
ib_send_cm_sidr_rep() {
	spin_lock_irqsave()
        cm_send_sidr_rep_locked() {
                ...
        	spin_lock_irq()
                ....
                spin_unlock_irq() <--- this will enable interrupts
        }
        spin_unlock_irqrestore()
}

spin_unlock_irqrestore() expects interrupts to be disabled but the
internal spin_unlock_irq() will always enable hard interrupts.

Fix this by replacing the internal spin_{lock,unlock}_irq() with
irqsave/restore variants.

It fixes the following kernel trace:

 raw_local_irq_restore() called with IRQs enabled
 WARNING: CPU: 2 PID: 20001 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x1d/0x20

 Call Trace:
  _raw_spin_unlock_irqrestore+0x4e/0x50
  ib_send_cm_sidr_rep+0x3a/0x50 [ib_cm]
  cma_send_sidr_rep+0xa1/0x160 [rdma_cm]
  rdma_accept+0x25e/0x350 [rdma_cm]
  ucma_accept+0x132/0x1cc [rdma_ucm]
  ucma_write+0xbf/0x140 [rdma_ucm]
  vfs_write+0xc1/0x340
  ksys_write+0xb3/0xe0
  do_syscall_64+0x2d/0x40
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: 87c4c774cb ("RDMA/cm: Protect access to remote_sidr_table")
Link: https://lore.kernel.org/r/20210301081844.445823-1-leon@kernel.org
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-01 14:43:16 -04:00
Parav Pandit 131be26750 IB/cm: Avoid a loop when device has 255 ports
When RDMA device has 255 ports, loop iterator i overflows.  Due to which
cm_add_one() port iterator loops infinitely.  Use core provided port
iterator to avoid the infinite loop.

Fixes: a977049dac ("[PATCH] IB: Add the kernel CM implementation")
Link: https://lore.kernel.org/r/20210127150010.1876121-9-leon@kernel.org
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-02 15:10:31 -04:00
Linus Torvalds 009bd55dfc RDMA 5.11 pull request
A smaller set of patches, nothing stands out as being particularly major
 this cycle:
 
 - Driver bug fixes and updates: bnxt_re, cxgb4, rxe, hns, i40iw, cxgb4,
   mlx4 and mlx5
 
 - Bug fixes and polishing for the new rts ULP
 
 - Cleanup of uverbs checking for allowed driver operations
 
 - Use sysfs_emit all over the place
 
 - Lots of bug fixes and clarity improvements for hns
 
 - hip09 support for hns
 
 - NDR and 50/100Gb signaling rates
 
 - Remove dma_virt_ops and go back to using the IB DMA wrappers
 
 - mlx5 optimizations for contiguous DMA regions
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl/aNXUACgkQOG33FX4g
 mxqlMQ/+O6UhxKnDAnMB+HzDGvOm+KXNHOQBuzxz4ZWXqtUrW8WU5ca3PhXovc4z
 /QX0HhMhQmVsva5mjp1OGVATxQ2E+yasqFLg4QXAFWFR3N7s0u/sikE9i1DoPvOC
 lsmLTeRauCFaE4mJD5nvYwm+riECX0GmyVVW7v6V05xwAp0hwdhyU7Kb6Yh3lxsE
 umTz+onPNJcD6Tc4snziyC5QEp5ebEjAaj4dVI1YPR5X0c2RwC5E1CIDI6u4OQ2k
 j7/+Kvo8LNdYNERGiR169x6c1L7WS6dYnGMMeXRgyy0BVbVdRGDnvCV9VRmF66w5
 99fHfDjNMNmqbGNt/4/gwNdVrR9aI4jMZWCh7SmsguX6XwNOlhYldy3x3WnlkfkQ
 e4O0huJceJqcB2Uya70GqufnAetRXsbjzcvWxpR5YAwRmcRkm1f6aGK3BxPjWEbr
 BbYRpiKMxxT4yTe65BuuThzx6g4pNQHe0z3BM/dzMJQAX+PZcs1CPQR8F8PbCrZR
 Ad7qw4HJ587PoSxPi3toVMpYZRP6cISh1zx9q/JCj8cxH9Ri4MovUCS3cF63Ny3B
 1LJ2q0x8FuLLjgZJogKUyEkS8OO6q7NL8WumjvrYWWx19+jcYsV81jTRGSkH3bfY
 F7Esv5K2T1F2gVsCe1ZFFplQg6ja1afIcc+LEl8cMJSyTdoSub4=
 =9t8b
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "A smaller set of patches, nothing stands out as being particularly
  major this cycle. The biggest item would be the new HIP09 HW support
  from HNS, otherwise it was pretty quiet for new work here:

   - Driver bug fixes and updates: bnxt_re, cxgb4, rxe, hns, i40iw,
     cxgb4, mlx4 and mlx5

   - Bug fixes and polishing for the new rts ULP

   - Cleanup of uverbs checking for allowed driver operations

   - Use sysfs_emit all over the place

   - Lots of bug fixes and clarity improvements for hns

   - hip09 support for hns

   - NDR and 50/100Gb signaling rates

   - Remove dma_virt_ops and go back to using the IB DMA wrappers

   - mlx5 optimizations for contiguous DMA regions"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (147 commits)
  RDMA/cma: Don't overwrite sgid_attr after device is released
  RDMA/mlx5: Fix MR cache memory leak
  RDMA/rxe: Use acquire/release for memory ordering
  RDMA/hns: Simplify AEQE process for different types of queue
  RDMA/hns: Fix inaccurate prints
  RDMA/hns: Fix incorrect symbol types
  RDMA/hns: Clear redundant variable initialization
  RDMA/hns: Fix coding style issues
  RDMA/hns: Remove unnecessary access right set during INIT2INIT
  RDMA/hns: WARN_ON if get a reserved sl from users
  RDMA/hns: Avoid filling sl in high 3 bits of vlan_id
  RDMA/hns: Do shift on traffic class when using RoCEv2
  RDMA/hns: Normalization the judgment of some features
  RDMA/hns: Limit the length of data copied between kernel and userspace
  RDMA/mlx4: Remove bogus dev_base_lock usage
  RDMA/uverbs: Fix incorrect variable type
  RDMA/core: Do not indicate device ready when device enablement fails
  RDMA/core: Clean up cq pool mechanism
  RDMA/core: Update kernel documentation for ib_create_named_qp()
  MAINTAINERS: SOFT-ROCE: Change Zhu Yanjun's email address
  ...
2020-12-16 13:42:26 -08:00
Leon Romanovsky 340b940ea0 RDMA/cm: Fix an attempt to use non-valid pointer when cleaning timewait
If cm_create_timewait_info() fails, the timewait_info pointer will contain
an error value and will be used in cm_remove_remote() later.

  general protection fault, probably for non-canonical address 0xdffffc0000000024: 0000 [#1] SMP KASAN PTI
  KASAN: null-ptr-deref in range [0×0000000000000120-0×0000000000000127]
  CPU: 2 PID: 12446 Comm: syz-executor.3 Not tainted 5.10.0-rc5-5d4c0742a60e #27
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  RIP: 0010:cm_remove_remote.isra.0+0x24/0×170 drivers/infiniband/core/cm.c:978
  Code: 84 00 00 00 00 00 41 54 55 53 48 89 fb 48 8d ab 2d 01 00 00 e8 7d bf 4b fe 48 89 ea 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <0f> b6 04 02 48 89 ea 83 e2 07 38 d0 7f 08 84 c0 0f 85 fc 00 00 00
  RSP: 0018:ffff888013127918 EFLAGS: 00010006
  RAX: dffffc0000000000 RBX: fffffffffffffff4 RCX: ffffc9000a18b000
  RDX: 0000000000000024 RSI: ffffffff82edc573 RDI: fffffffffffffff4
  RBP: 0000000000000121 R08: 0000000000000001 R09: ffffed1002624f1d
  R10: 0000000000000003 R11: ffffed1002624f1c R12: ffff888107760c70
  R13: ffff888107760c40 R14: fffffffffffffff4 R15: ffff888107760c9c
  FS:  00007fe1ffcc1700(0000) GS:ffff88811a600000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000001b2ff21000 CR3: 000000010f504001 CR4: 0000000000370ee0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   cm_destroy_id+0x189/0×15b0 drivers/infiniband/core/cm.c:1155
   cma_connect_ib drivers/infiniband/core/cma.c:4029 [inline]
   rdma_connect_locked+0x1100/0×17c0 drivers/infiniband/core/cma.c:4107
   rdma_connect+0x2a/0×40 drivers/infiniband/core/cma.c:4140
   ucma_connect+0x277/0×340 drivers/infiniband/core/ucma.c:1069
   ucma_write+0x236/0×2f0 drivers/infiniband/core/ucma.c:1724
   vfs_write+0x220/0×830 fs/read_write.c:603
   ksys_write+0x1df/0×240 fs/read_write.c:658
   do_syscall_64+0x33/0×40 arch/x86/entry/common.c:46
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: a977049dac ("[PATCH] IB: Add the kernel CM implementation")
Link: https://lore.kernel.org/r/20201204064205.145795-1-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Reported-by: Amit Matityahu <mitm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-12-09 15:51:35 -04:00
Mauro Carvalho Chehab 2988ca08ba IB: Fix kernel-doc markups
Some functions have different names between their prototypes and the
kernel-doc markup.

Others need to be fixed, as kernel-doc markups should use this format:
        identifier - description

Link: https://lore.kernel.org/r/78b98c41a5a0f4c0106433d305b143028a4168b0.1606823973.git.mchehab+huawei@kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-12-07 15:45:00 -04:00
Jason Gunthorpe bf3b7b7ba9 Merge branch 'for-rc' into rdma.git
From https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git

The rc RDMA branch is needed due to dependencies on the next patches.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-17 15:20:26 -04:00
Jason Gunthorpe eb73060b97 RDMA/cm: Make the local_id_table xarray non-irq
The xarray is never mutated from an IRQ handler, only from work queues
under a spinlock_irq. Thus there is no reason for it be an IRQ type
xarray.

This was copied over from the original IDR code, but the recent rework put
the xarray inside another spinlock_irq which will unbalance the unlocking.

Fixes: c206f8bad1 ("RDMA/cm: Make it clearer how concurrency works in cm_req_handler()")
Link: https://lore.kernel.org/r/0-v1-808b6da3bd3f+1857-cm_xarray_no_irq_jgg@nvidia.com
Reported-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-12 12:31:27 -04:00
Joe Perches e28bf1f03b RDMA: Convert various random sprintf sysfs _show uses to sysfs_emit
Manual changes for sysfs_emit as cocci scripts can't easily convert them.

Link: https://lore.kernel.org/r/ecde7791467cddb570c6f6d2c908ffbab9145cac.1602122880.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-30 21:03:52 -03:00
Jason Gunthorpe 6989aa62d3 Linux 5.9-rc3
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl9ML+IeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGA8EIAIy/kTbFS0yrE9yV
 hb98oX0z9+EU9YQg9vhaRWwPd+rJF/JMQZLqYcwbhjG9abaUL3T3fEcSAefMHw8E
 LAt+hYzA38dHt7tqhsFQX3vV1VorvDVICBVN0yRPRWKKikq4OPIHzaAR9tleGAF5
 8btQisl1PjN+obwYmLuNb6aX16OCwAF+uXOwehcoJs9dvMNhwtXRzfOflWzOvOo6
 tE0bHErlylLDfLv4ZzEfczTdks4QJZ7C0xLSf3oN9AAynW42Xnhct4hi8qZY/hCf
 CMaqeN4hdpub6TvQIqBdDqMMjEXGFgeNSnAEBQY9VpvUqz8NTu6sQxwgJEKDF5tg
 d81lv2c=
 =uW/F
 -----END PGP SIGNATURE-----

Merge tag 'v5.9-rc3' into rdma.git for-next

Required due to dependencies in following patches.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-31 12:28:12 -03:00
Chuck Lever 8dc105befe RDMA/cm: Add tracepoints to track MAD send operations
Surface the operation of MAD exchanges during connection
establishment. Some samples:

[root@klimt ~]# trace-cmd report -F ib_cma
cpus=4
     kworker/0:4-123   [000]    60.677388: icm_send_rep:         local_id=1965336542 remote_id=1096195961 state=REQ_RCVD lap_state=LAP_UNINIT
   kworker/u8:11-391   [002]    60.678808: icm_send_req:         local_id=1982113758 remote_id=0 state=IDLE lap_state=LAP_UNINIT
     kworker/0:4-123   [000]    60.679652: icm_send_rtu:         local_id=1982113758 remote_id=1079418745 state=REP_RCVD lap_state=LAP_UNINIT
            nfsd-1954  [001]    60.691350: icm_send_rep:         local_id=1998890974 remote_id=1129750393 state=MRA_REQ_SENT lap_state=LAP_UNINIT
            nfsd-1954  [003]    62.017931: icm_send_drep:        local_id=1998890974 remote_id=1129750393 state=TIMEWAIT lap_state=LAP_UNINIT

Link: https://lore.kernel.org/r/159767240197.2968.12048458026453596018.stgit@klimt.1015granger.net
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24 19:41:41 -03:00
Chuck Lever 75874b3d50 RDMA/cm: Replace pr_debug() call sites with tracepoints
In the interest of converging on a common instrumentation infrastructure,
modernize the pr_debug() call sites added by commit 119bf81793 ("IB/cm:
Add debug prints to ib_cm"). The new tracepoints appear in a new "ib_cma"
subsystem.

The conversion is somewhat mechanical. Someone more familiar with the
semantics of the recorded information might suggest additional data
capture.

Some benefits include:

- Tracepoints enable "always on" reporting of these errors
- The error records are structured and compact
- Tracepoints provide hooks for eBPF scripts

Sample output:

            nfsd-1954  [003]    62.017901: icm_dreq_skipped:     local_id=1998890974 remote_id=1129750393 state=DREQ_RCVD lap_state=LAP_UNINIT

Link: https://lore.kernel.org/r/159767239665.2968.10613294222688696646.stgit@klimt.1015granger.net
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24 19:41:41 -03:00
Gustavo A. R. Silva df561f6688 treewide: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-23 17:36:59 -05:00
Jason Gunthorpe c0f4979e90 RDMA/cm: Remove unused cm_class
Previous commits removed all references to the /sys/class/infiniband_cm/
directory represented by the cm_class symbol. Remove the directory and
cm_class.

Fixes: a1a8e4a85c ("rdma: Delete the ib_ucm module")
Link: https://lore.kernel.org/r/0-v1-90096a98c476+205-remove_cm_leftovers_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18 15:43:07 -03:00
Maor Gottlieb 87c4c774cb RDMA/cm: Protect access to remote_sidr_table
cm.lock must be held while accessing remote_sidr_table. This fixes the
below NULL pointer dereference.

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  #PF: supervisor write access in kernel mode
  #PF: error_code(0x0002) - not-present page
  PGD 0 P4D 0
  Oops: 0002 [#1] SMP PTI
  CPU: 2 PID: 7288 Comm: udaddy Not tainted 5.7.0_for_upstream_perf_2020_06_09_15_14_20_38 #1
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  RIP: 0010:rb_erase+0x10d/0x360
  Code: 00 00 00 48 89 c1 48 89 d0 48 8b 50 08 48 39 ca 74 48 f6 02 01 75 af 48 8b 7a 10 48 89 c1 48 83 c9 01 48 89 78 08 48 89 42 10 <48> 89 0f 48 8b 08 48 89 0a 48 83 e1 fc 48 89 10 0f 84 b1 00 00 00
  RSP: 0018:ffffc90000f77c30 EFLAGS: 00010086
  RAX: ffff8883df27d458 RBX: ffff8883df27da58 RCX: ffff8883df27d459
  RDX: ffff8883d183fa58 RSI: ffffffffa01e8d00 RDI: 0000000000000000
  RBP: ffff8883d62ac800 R08: 0000000000000000 R09: 00000000000000ce
  R10: 000000000000000a R11: 0000000000000000 R12: ffff8883df27da00
  R13: ffffc90000f77c98 R14: 0000000000000130 R15: 0000000000000000
  FS:  00007f009f877740(0000) GS:ffff8883f1a00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 00000003d467e003 CR4: 0000000000160ee0
  Call Trace:
   cm_send_sidr_rep_locked+0x15a/0x1a0 [ib_cm]
   ib_send_cm_sidr_rep+0x2b/0x50 [ib_cm]
   cma_send_sidr_rep+0x8b/0xe0 [rdma_cm]
   __rdma_accept+0x21d/0x2b0 [rdma_cm]
   ? ucma_get_ctx+0x2b/0xe0 [rdma_ucm]
   ? _copy_from_user+0x30/0x60
   ucma_accept+0x13e/0x1e0 [rdma_ucm]
   ucma_write+0xb4/0x130 [rdma_ucm]
   vfs_write+0xad/0x1a0
   ksys_write+0x9d/0xb0
   do_syscall_64+0x48/0x130
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f009ef60924
  Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 80 00 00 00 00 8b 05 2a ef 2c 00 48 63 ff 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 55 53 48 89 d5 48 89 f3 48 83
  RSP: 002b:00007fff843edf38 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
  RAX: ffffffffffffffda RBX: 000055743042e1d0 RCX: 00007f009ef60924
  RDX: 0000000000000130 RSI: 00007fff843edf40 RDI: 0000000000000003
  RBP: 00007fff843ee0e0 R08: 0000000000000000 R09: 0000557430433090
  R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
  R13: 00007fff843edf40 R14: 000000000000038c R15: 00000000ffffff00
  CR2: 0000000000000000

Fixes: 6a8824a74b ("RDMA/cm: Allow ib_send_cm_sidr_rep() to be done under lock")
Link: https://lore.kernel.org/r/20200716105519.1424266-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 09:58:53 -03:00
Leon Romanovsky 1ea7c546b8 RDMA/core: Annotate CMA unlock helper routine
Fix the following sparse error by adding annotation to
cm_queue_work_unlock() that it releases cm_id_priv->lock lock.

 drivers/infiniband/core/cm.c:936:24: warning: context imbalance in
 'cm_queue_work_unlock' - unexpected unlock

Fixes: e83f195aa4 ("RDMA/cm: Pull duplicated code into cm_queue_work_unlock()")
Link: https://lore.kernel.org/r/20200611130045.1994026-1-leon@kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-18 09:34:42 -03:00
Ka-Cheong Poon fba97dc7fc RDMA/cm: Spurious WARNING triggered in cm_destroy_id()
If the cm_id state is IB_CM_REP_SENT when cm_destroy_id() is called, it
calls cm_send_rej_locked().

In cm_send_rej_locked(), it calls cm_enter_timewait() and the state is
changed to IB_CM_TIMEWAIT.

Now back to cm_destroy_id(), it breaks from the switch statement, and the
next call is WARN_ON(cm_id->state != IB_CM_IDLE).

This triggers a spurious warning. Instead, the code should goto retest
after returning from cm_send_rej_locked() to move the state to IDLE.

Fixes: 67b3c8dcea ("RDMA/cm: Make sure the cm_id is in the IB_CM_IDLE state in destroy")
Link: https://lore.kernel.org/r/1591191218-9446-1-git-send-email-ka-cheong.poon@oracle.com
Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-03 15:48:18 -03:00
Leon Romanovsky a20652e175 RDMA/cm: Send and receive ECE parameter over the wire
ECE parameters are exchanged through REQ->REP/SIDR_REP messages, this
patch adds the data to provide to other side of CMID communication
channel.

Link: https://lore.kernel.org/r/20200526103304.196371-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-27 16:05:05 -03:00