Commit Graph

230 Commits

Author SHA1 Message Date
Tobias Huschle 5e9ec62245 net/smc: fix deadlock triggered by cancel_delayed_work_syn()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2160099
Upstream status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Tested: by IBM
Build-Info: ihttps://brewweb.engineering.redhat.com/brew/taskinfo?taskID=52893145
Conflicts: None
commit 13085e1b5cab8ad802904d72e6a6dae85ae0cd20
Author: Wenjia Zhang <wenjia@linux.ibm.com>
Date:   Mon Mar 13 11:08:28 2023 +0100

    net/smc: fix deadlock triggered by cancel_delayed_work_syn()

    The following LOCKDEP was detected:
                    Workqueue: events smc_lgr_free_work [smc]
                    WARNING: possible circular locking dependency detected
                    6.1.0-20221027.rc2.git8.56bc5b569087.300.fc36.s390x+debug #1 Not tainted
                    ------------------------------------------------------
                    kworker/3:0/176251 is trying to acquire lock:
                    00000000f1467148 ((wq_completion)smc_tx_wq-00000000#2){+.+.}-{0:0},
                            at: __flush_workqueue+0x7a/0x4f0
                    but task is already holding lock:
                    0000037fffe97dc8 ((work_completion)(&(&lgr->free_work)->work)){+.+.}-{0:0},
                            at: process_one_work+0x232/0x730
                    which lock already depends on the new lock.
                    the existing dependency chain (in reverse order) is:
                    -> #4 ((work_completion)(&(&lgr->free_work)->work)){+.+.}-{0:0}:
                           __lock_acquire+0x58e/0xbd8
                           lock_acquire.part.0+0xe2/0x248
                           lock_acquire+0xac/0x1c8
                           __flush_work+0x76/0xf0
                           __cancel_work_timer+0x170/0x220
                           __smc_lgr_terminate.part.0+0x34/0x1c0 [smc]
                           smc_connect_rdma+0x15e/0x418 [smc]
                           __smc_connect+0x234/0x480 [smc]
                           smc_connect+0x1d6/0x230 [smc]
                           __sys_connect+0x90/0xc0
                           __do_sys_socketcall+0x186/0x370
                           __do_syscall+0x1da/0x208
                           system_call+0x82/0xb0
                    -> #3 (smc_client_lgr_pending){+.+.}-{3:3}:
                           __lock_acquire+0x58e/0xbd8
                           lock_acquire.part.0+0xe2/0x248
                           lock_acquire+0xac/0x1c8
                           __mutex_lock+0x96/0x8e8
                           mutex_lock_nested+0x32/0x40
                           smc_connect_rdma+0xa4/0x418 [smc]
                           __smc_connect+0x234/0x480 [smc]
                           smc_connect+0x1d6/0x230 [smc]
                           __sys_connect+0x90/0xc0
                           __do_sys_socketcall+0x186/0x370
                           __do_syscall+0x1da/0x208
                           system_call+0x82/0xb0
                    -> #2 (sk_lock-AF_SMC){+.+.}-{0:0}:
                           __lock_acquire+0x58e/0xbd8
                           lock_acquire.part.0+0xe2/0x248
                           lock_acquire+0xac/0x1c8
                           lock_sock_nested+0x46/0xa8
                           smc_tx_work+0x34/0x50 [smc]
                           process_one_work+0x30c/0x730
                           worker_thread+0x62/0x420
                           kthread+0x138/0x150
                           __ret_from_fork+0x3c/0x58
                           ret_from_fork+0xa/0x40
                    -> #1 ((work_completion)(&(&smc->conn.tx_work)->work)){+.+.}-{0:0}:
                           __lock_acquire+0x58e/0xbd8
                           lock_acquire.part.0+0xe2/0x248
                           lock_acquire+0xac/0x1c8
                           process_one_work+0x2bc/0x730
                           worker_thread+0x62/0x420
                           kthread+0x138/0x150
                           __ret_from_fork+0x3c/0x58
                           ret_from_fork+0xa/0x40
                    -> #0 ((wq_completion)smc_tx_wq-00000000#2){+.+.}-{0:0}:
                           check_prev_add+0xd8/0xe88
                           validate_chain+0x70c/0xb20
                           __lock_acquire+0x58e/0xbd8
                           lock_acquire.part.0+0xe2/0x248
                           lock_acquire+0xac/0x1c8
                           __flush_workqueue+0xaa/0x4f0
                           drain_workqueue+0xaa/0x158
                           destroy_workqueue+0x44/0x2d8
                           smc_lgr_free+0x9e/0xf8 [smc]
                           process_one_work+0x30c/0x730
                           worker_thread+0x62/0x420
                           kthread+0x138/0x150
                           __ret_from_fork+0x3c/0x58
                           ret_from_fork+0xa/0x40
                    other info that might help us debug this:
                    Chain exists of:
                      (wq_completion)smc_tx_wq-00000000#2
                      --> smc_client_lgr_pending
                      --> (work_completion)(&(&lgr->free_work)->work)
                     Possible unsafe locking scenario:
                           CPU0                    CPU1
                           ----                    ----
                      lock((work_completion)(&(&lgr->free_work)->work));
                                       lock(smc_client_lgr_pending);
                                       lock((work_completion)
                                            (&(&lgr->free_work)->work));
                      lock((wq_completion)smc_tx_wq-00000000#2);
                     *** DEADLOCK ***
                    2 locks held by kworker/3:0/176251:
                     #0: 0000000080183548
                            ((wq_completion)events){+.+.}-{0:0},
                                    at: process_one_work+0x232/0x730
                     #1: 0000037fffe97dc8
                            ((work_completion)
                             (&(&lgr->free_work)->work)){+.+.}-{0:0},
                                    at: process_one_work+0x232/0x730
                    stack backtrace:
                    CPU: 3 PID: 176251 Comm: kworker/3:0 Not tainted
                    Hardware name: IBM 8561 T01 701 (z/VM 7.2.0)
                    Call Trace:
                     [<000000002983c3e4>] dump_stack_lvl+0xac/0x100
                     [<0000000028b477ae>] check_noncircular+0x13e/0x160
                     [<0000000028b48808>] check_prev_add+0xd8/0xe88
                     [<0000000028b49cc4>] validate_chain+0x70c/0xb20
                     [<0000000028b4bd26>] __lock_acquire+0x58e/0xbd8
                     [<0000000028b4cf6a>] lock_acquire.part.0+0xe2/0x248
                     [<0000000028b4d17c>] lock_acquire+0xac/0x1c8
                     [<0000000028addaaa>] __flush_workqueue+0xaa/0x4f0
                     [<0000000028addf9a>] drain_workqueue+0xaa/0x158
                     [<0000000028ae303c>] destroy_workqueue+0x44/0x2d8
                     [<000003ff8029af26>] smc_lgr_free+0x9e/0xf8 [smc]
                     [<0000000028adf3d4>] process_one_work+0x30c/0x730
                     [<0000000028adf85a>] worker_thread+0x62/0x420
                     [<0000000028aeac50>] kthread+0x138/0x150
                     [<0000000028a63914>] __ret_from_fork+0x3c/0x58
                     [<00000000298503da>] ret_from_fork+0xa/0x40
                    INFO: lockdep is turned off.
    ===================================================================

    This deadlock occurs because cancel_delayed_work_sync() waits for
    the work(&lgr->free_work) to finish, while the &lgr->free_work
    waits for the work(lgr->tx_wq), which needs the sk_lock-AF_SMC, that
    is already used under the mutex_lock.

    The solution is to use cancel_delayed_work() instead, which kills
    off a pending work.

    Fixes: a52bcc919b ("net/smc: improve termination processing")
    Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com>
    Reviewed-by: Jan Karcher <jaka@linux.ibm.com>
    Reviewed-by: Karsten Graul <kgraul@linux.ibm.com>
    Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2023-05-26 09:39:21 +00:00
Tobias Huschle 56673cb1f5 net/smc: fix application data exception
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2160099
Upstream status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Tested: by IBM
Build-Info: ihttps://brewweb.engineering.redhat.com/brew/taskinfo?taskID=52893145
Conflicts: None
commit 475f9ff63ee8c296aa46c6e9e9ad9bdd301c6bdf
Author: D. Wythe <alibuda@linux.alibaba.com>
Date:   Thu Feb 16 14:39:05 2023 +0800

    net/smc: fix application data exception

    There is a certain probability that following
    exceptions will occur in the wrk benchmark test:

    Running 10s test @ http://11.213.45.6:80
      8 threads and 64 connections
      Thread Stats   Avg      Stdev     Max   +/- Stdev
        Latency     3.72ms   13.94ms 245.33ms   94.17%
        Req/Sec     1.96k   713.67     5.41k    75.16%
      155262 requests in 10.10s, 23.10MB read
    Non-2xx or 3xx responses: 3

    We will find that the error is HTTP 400 error, which is a serious
    exception in our test, which means the application data was
    corrupted.

    Consider the following scenarios:

    CPU0                            CPU1

    buf_desc->used = 0;
                                    cmpxchg(buf_desc->used, 0, 1)
                                    deal_with(buf_desc)

    memset(buf_desc->cpu_addr,0);

    This will cause the data received by a victim connection to be cleared,
    thus triggering an HTTP 400 error in the server.

    This patch exchange the order between clear used and memset, add
    barrier to ensure memory consistency.

    Fixes: 1c5526968e27 ("net/smc: Clear memory when release and reuse buffer")
    Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
    Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2023-05-26 09:39:20 +00:00
Tobias Huschle 34f0cea532 net/smc: replace mutex rmbs_lock and sndbufs_lock with rw_semaphore
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2160099
Upstream status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Tested: by IBM
Build-Info: ihttps://brewweb.engineering.redhat.com/brew/taskinfo?taskID=52893145
Conflicts: None
commit aff7bfed9097435ea38de919befbe2d7771a3e87
Author: D. Wythe <alibuda@linux.alibaba.com>
Date:   Thu Feb 2 16:26:42 2023 +0800

    net/smc: replace mutex rmbs_lock and sndbufs_lock with rw_semaphore

    It's clear that rmbs_lock and sndbufs_lock are aims to protect the
    rmbs list or the sndbufs list.

    During connection establieshment, smc_buf_get_slot() will always
    be invoked, and it only performs read semantics in rmbs list and
    sndbufs list.

    Based on the above considerations, we replace mutex with rw_semaphore.
    Only smc_buf_get_slot() use down_read() to allow smc_buf_get_slot()
    run concurrently, other part use down_write() to keep exclusive
    semantics.

    Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2023-05-26 09:39:19 +00:00
Tobias Huschle f20361505b net/smc: use read semaphores to reduce unnecessary blocking in smc_buf_create() & smcr_buf_unuse()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2160099
Upstream status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Tested: by IBM
Build-Info: ihttps://brewweb.engineering.redhat.com/brew/taskinfo?taskID=52893145
Conflicts: None
commit f6421014e88983c5bb7a25c71c01ae6278a01df9
Author: D. Wythe <alibuda@linux.alibaba.com>
Date:   Thu Feb 2 16:26:40 2023 +0800

    net/smc: use read semaphores to reduce unnecessary blocking in smc_buf_create() & smcr_buf_unuse()

    Following is part of Off-CPU graph during frequent SMC-R short-lived
    processing:

    process_one_work                                (51.19%)
    smc_close_passive_work                  (28.36%)
            smcr_buf_unuse                          (28.34%)
            rwsem_down_write_slowpath               (28.22%)

    smc_listen_work                         (22.83%)
            smc_clc_wait_msg                        (1.84%)
            smc_buf_create                          (20.45%)
                    smcr_buf_map_usable_links
                    rwsem_down_write_slowpath       (20.43%)
            smcr_lgr_reg_rmbs                       (0.53%)
                    rwsem_down_write_slowpath       (0.43%)
                    smc_llc_do_confirm_rkey         (0.08%)

    We can clearly see that during the connection establishment time,
    waiting time of connections is not on IO, but on llc_conf_mutex.

    What is more important, the core critical area (smcr_buf_unuse() &
    smc_buf_create()) only perfroms read semantics on links, we can
    easily replace it with read semaphore.

    Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2023-05-26 09:39:18 +00:00
Tobias Huschle d6259c44a9 net/smc: llc_conf_mutex refactor, replace it with rw_semaphore
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2160099
Upstream status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Tested: by IBM
Build-Info: ihttps://brewweb.engineering.redhat.com/brew/taskinfo?taskID=52893145
Conflicts: None
commit b5dd4d6981717f7e2682c0419fe832328c7441cf
Author: D. Wythe <alibuda@linux.alibaba.com>
Date:   Thu Feb 2 16:26:39 2023 +0800

    net/smc: llc_conf_mutex refactor, replace it with rw_semaphore

    llc_conf_mutex was used to protect links and link related configurations
    in the same link group, for example, add or delete links. However,
    in most cases, the protected critical area has only read semantics and
    with no write semantics at all, such as obtaining a usable link or an
    available rmb_desc.

    This patch do simply code refactoring, replace mutex with rw_semaphore,
    replace mutex_lock with down_write and replace mutex_unlock with
    up_write.

    Theoretically, this replacement is equivalent, but after this patch,
    we can distinguish lock granularity according to different semantics
    of critical areas.

    Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2023-05-26 09:39:18 +00:00
Tobias Huschle f0afc0ee2a net/smc: De-tangle ism and smc device initialization
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2160099
Upstream status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Tested: by IBM
Build-Info: ihttps://brewweb.engineering.redhat.com/brew/taskinfo?taskID=52893145
Conflicts: None
commit 8c81ba20349daf9f7e58bb05a0c12f4b71813a30
Author: Stefan Raspl <raspl@linux.ibm.com>
Date:   Mon Jan 23 19:17:52 2023 +0100

    net/smc: De-tangle ism and smc device initialization

    The struct device for ISM devices was part of struct smcd_dev. Move to
    struct ism_dev, provide a new API call in struct smcd_ops, and convert
    existing SMCD code accordingly.
    Furthermore, remove struct smcd_dev from struct ism_dev.
    This is the final part of a bigger overhaul of the interfaces between SMC
    and ISM.

    Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
    Signed-off-by: Jan Karcher <jaka@linux.ibm.com>
    Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2023-05-26 09:39:18 +00:00
Tobias Huschle a45c725bf0 net/smc: Separate SMC-D and ISM APIs
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2160099
Upstream status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Tested: by IBM
Build-Info: ihttps://brewweb.engineering.redhat.com/brew/taskinfo?taskID=52893145
Conflicts: None
commit 9de4df7b6be1cfca500f8ba21137d53eec45418a
Author: Stefan Raspl <raspl@linux.ibm.com>
Date:   Mon Jan 23 19:17:50 2023 +0100

    net/smc: Separate SMC-D and ISM APIs

    We separate the code implementing the struct smcd_ops API in the ISM
    device driver from the functions that may be used by other exploiters of
    ISM devices.
    Note: We start out small, and don't offer the whole breadth of the ISM
    device for public use, as many functions are specific to or likely only
    ever used in the context of SMC-D.
    This is the third part of a bigger overhaul of the interfaces between SMC
    and ISM.

    Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
    Signed-off-by: Jan Karcher <jaka@linux.ibm.com>
    Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2023-05-26 09:39:17 +00:00
Tobias Huschle c2eb1d5eaa net/smc: Register SMC-D as ISM client
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2160099
Upstream status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Tested: by IBM
Build-Info: ihttps://brewweb.engineering.redhat.com/brew/taskinfo?taskID=52893145
Conflicts: None
commit 8747716f3942a610efdd12e3655df47269c268ac
Author: Stefan Raspl <raspl@linux.ibm.com>
Date:   Mon Jan 23 19:17:49 2023 +0100

    net/smc: Register SMC-D as ISM client

    Register the smc module with the new ism device driver API.
    This is the second part of a bigger overhaul of the interfaces between SMC
    and ISM.

    Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
    Signed-off-by: Jan Karcher <jaka@linux.ibm.com>
    Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2023-05-26 09:39:16 +00:00
Tobias Huschle 31759541c7 net/smc: Fix an error code in smc_lgr_create()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2160099
Upstream status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Tested: by IBM
Build-Info: ihttps://brewweb.engineering.redhat.com/brew/taskinfo?taskID=52893145
Conflicts: None
commit bdee15e8c58b450ad736a2b62ef8c7a12548b704
Author: Dan Carpenter <error27@gmail.com>
Date:   Fri Oct 14 12:34:36 2022 +0300

    net/smc: Fix an error code in smc_lgr_create()

    If smc_wr_alloc_lgr_mem() fails then return an error code.  Don't return
    success.

    Fixes: 8799e310fb3f ("net/smc: add v2 support to the work request layer")
    Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
    Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2023-05-26 09:39:14 +00:00
Tobias Huschle e00604b9a9 net/smc: Stop the CLC flow if no link to map buffers on
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2160099
Upstream status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Tested: by IBM
Build-Info: ihttps://brewweb.engineering.redhat.com/brew/taskinfo?taskID=52893145
Conflicts: None
commit e738455b2c6dcdab03e45d97de36476f93f557d2
Author: Wen Gu <guwen@linux.alibaba.com>
Date:   Tue Sep 20 14:43:09 2022 +0800

    net/smc: Stop the CLC flow if no link to map buffers on

    There might be a potential race between SMC-R buffer map and
    link group termination.

    smc_smcr_terminate_all()     | smc_connect_rdma()
    --------------------------------------------------------------
                                 | smc_conn_create()
    for links in smcibdev        |
            schedule links down  |
                                 | smc_buf_create()
                                 |  \- smcr_buf_map_usable_links()
                                 |      \- no usable links found,
                                 |         (rmb->mr = NULL)
                                 |
                                 | smc_clc_send_confirm()
                                 |  \- access conn->rmb_desc->mr[]->rkey
                                 |     (panic)

    During reboot and IB device module remove, all links will be set
    down and no usable links remain in link groups. In such situation
    smcr_buf_map_usable_links() should return an error and stop the
    CLC flow accessing to uninitialized mr.

    Fixes: b9247544c1 ("net/smc: convert static link ID instances to support multiple links")
    Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
    Link: https://lore.kernel.org/r/1663656189-32090-1-git-send-email-guwen@linux.alibaba.com
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2023-05-26 09:39:13 +00:00
Tobias Huschle 91bd156268 net/smc: Fix possible access to freed memory in link clear
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2160099
Upstream status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Tested: by IBM
Build-Info: ihttps://brewweb.engineering.redhat.com/brew/taskinfo?taskID=52893145
Conflicts: None
commit e9b1a4f867ae9c1dbd1d71cd09cbdb3239fb4968
Author: Yacan Liu <liuyacan@corp.netease.com>
Date:   Tue Sep 6 21:01:39 2022 +0800

    net/smc: Fix possible access to freed memory in link clear

    After modifying the QP to the Error state, all RX WR would be completed
    with WC in IB_WC_WR_FLUSH_ERR status. Current implementation does not
    wait for it is done, but destroy the QP and free the link group directly.
    So there is a risk that accessing the freed memory in tasklet context.

    Here is a crash example:

     BUG: unable to handle page fault for address: ffffffff8f220860
     #PF: supervisor write access in kernel mode
     #PF: error_code(0x0002) - not-present page
     PGD f7300e067 P4D f7300e067 PUD f7300f063 PMD 8c4e45063 PTE 800ffff08c9df060
     Oops: 0002 [#1] SMP PTI
     CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Tainted: G S         OE     5.10.0-0607+ #23
     Hardware name: Inspur NF5280M4/YZMB-00689-101, BIOS 4.1.20 07/09/2018
     RIP: 0010:native_queued_spin_lock_slowpath+0x176/0x1b0
     Code: f3 90 48 8b 32 48 85 f6 74 f6 eb d5 c1 ee 12 83 e0 03 83 ee 01 48 c1 e0 05 48 63 f6 48 05 00 c8 02 00 48 03 04 f5 00 09 98 8e <48> 89 10 8b 42 08 85 c0 75 09 f3 90 8b 42 08 85 c0 74 f7 48 8b 32
     RSP: 0018:ffffb3b6c001ebd8 EFLAGS: 00010086
     RAX: ffffffff8f220860 RBX: 0000000000000246 RCX: 0000000000080000
     RDX: ffff91db1f86c800 RSI: 000000000000173c RDI: ffff91db62bace00
     RBP: ffff91db62bacc00 R08: 0000000000000000 R09: c00000010000028b
     R10: 0000000000055198 R11: ffffb3b6c001ea58 R12: ffff91db80e05010
     R13: 000000000000000a R14: 0000000000000006 R15: 0000000000000040
     FS:  0000000000000000(0000) GS:ffff91db1f840000(0000) knlGS:0000000000000000
     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
     CR2: ffffffff8f220860 CR3: 00000001f9580004 CR4: 00000000003706e0
     DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
     DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
     Call Trace:
      <IRQ>
      _raw_spin_lock_irqsave+0x30/0x40
      mlx5_ib_poll_cq+0x4c/0xc50 [mlx5_ib]
      smc_wr_rx_tasklet_fn+0x56/0xa0 [smc]
      tasklet_action_common.isra.21+0x66/0x100
      __do_softirq+0xd5/0x29c
      asm_call_irq_on_stack+0x12/0x20
      </IRQ>
      do_softirq_own_stack+0x37/0x40
      irq_exit_rcu+0x9d/0xa0
      sysvec_call_function_single+0x34/0x80
      asm_sysvec_call_function_single+0x12/0x20

    Fixes: bd4ad57718 ("smc: initialize IB transport incl. PD, MR, QP, CQ, event, WR")
    Signed-off-by: Yacan Liu <liuyacan@corp.netease.com>
    Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2023-05-26 09:39:12 +00:00
Tobias Huschle 2797114541 net/smc: Extend SMC-R link group netlink attribute
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2160099
Upstream status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Tested: by IBM
Build-Info: ihttps://brewweb.engineering.redhat.com/brew/taskinfo?taskID=52893145
Conflicts: None
commit ddefb2d205539418f3c3851a3e06fac9624f257d
Author: Wen Gu <guwen@linux.alibaba.com>
Date:   Thu Jul 14 17:44:05 2022 +0800

    net/smc: Extend SMC-R link group netlink attribute

    Extend SMC-R link group netlink attribute SMC_GEN_LGR_SMCR.
    Introduce SMC_NLA_LGR_R_BUF_TYPE to show the buffer type of
    SMC-R link group.

    Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2023-05-26 09:39:10 +00:00
Tobias Huschle bddfce67c1 net/smc: Allow virtually contiguous sndbufs or RMBs for SMC-R
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2160099
Upstream status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Tested: by IBM
Build-Info: ihttps://brewweb.engineering.redhat.com/brew/taskinfo?taskID=52893145
Conflicts: None
commit b8d199451c99b3796b840c350eb74b830c5c869b
Author: Wen Gu <guwen@linux.alibaba.com>
Date:   Thu Jul 14 17:44:04 2022 +0800

    net/smc: Allow virtually contiguous sndbufs or RMBs for SMC-R

    On long-running enterprise production servers, high-order contiguous
    memory pages are usually very rare and in most cases we can only get
    fragmented pages.

    When replacing TCP with SMC-R in such production scenarios, attempting
    to allocate high-order physically contiguous sndbufs and RMBs may result
    in frequent memory compaction, which will cause unexpected hung issue
    and further stability risks.

    So this patch is aimed to allow SMC-R link group to use virtually
    contiguous sndbufs and RMBs to avoid potential issues mentioned above.
    Whether to use physically or virtually contiguous buffers can be set
    by sysctl smcr_buf_type.

    Note that using virtually contiguous buffers will bring an acceptable
    performance regression, which can be mainly divided into two parts:

    1) regression in data path, which is brought by additional address
       translation of sndbuf by RNIC in Tx. But in general, translating
       address through MTT is fast.

       Taking 256KB sndbuf and RMB as an example, the comparisons in qperf
       latency and bandwidth test with physically and virtually contiguous
       buffers are as follows:

    - client:
      smc_run taskset -c <cpu> qperf <server> -oo msg_size:1:64K:*2\
      -t 5 -vu tcp_{bw|lat}
    - server:
      smc_run taskset -c <cpu> qperf

       [latency]
       msgsize              tcp            smcr        smcr-use-virt-buf
       1               11.17 us         7.56 us         7.51 us (-0.67%)
       2               10.65 us         7.74 us         7.56 us (-2.31%)
       4               11.11 us         7.52 us         7.59 us ( 0.84%)
       8               10.83 us         7.55 us         7.51 us (-0.48%)
       16              11.21 us         7.46 us         7.51 us ( 0.71%)
       32              10.65 us         7.53 us         7.58 us ( 0.61%)
       64              10.95 us         7.74 us         7.80 us ( 0.76%)
       128             11.14 us         7.83 us         7.87 us ( 0.47%)
       256             10.97 us         7.94 us         7.92 us (-0.28%)
       512             11.23 us         7.94 us         8.20 us ( 3.25%)
       1024            11.60 us         8.12 us         8.20 us ( 0.96%)
       2048            14.04 us         8.30 us         8.51 us ( 2.49%)
       4096            16.88 us         9.13 us         9.07 us (-0.64%)
       8192            22.50 us        10.56 us        11.22 us ( 6.26%)
       16384           28.99 us        12.88 us        13.83 us ( 7.37%)
       32768           40.13 us        16.76 us        16.95 us ( 1.16%)
       65536           68.70 us        24.68 us        24.85 us ( 0.68%)
       [bandwidth]
       msgsize                tcp              smcr          smcr-use-virt-buf
       1                1.65 MB/s         1.59 MB/s         1.53 MB/s (-3.88%)
       2                3.32 MB/s         3.17 MB/s         3.08 MB/s (-2.67%)
       4                6.66 MB/s         6.33 MB/s         6.09 MB/s (-3.85%)
       8               13.67 MB/s        13.45 MB/s        11.97 MB/s (-10.99%)
       16              25.36 MB/s        27.15 MB/s        24.16 MB/s (-11.01%)
       32              48.22 MB/s        54.24 MB/s        49.41 MB/s (-8.89%)
       64             106.79 MB/s       107.32 MB/s        99.05 MB/s (-7.71%)
       128            210.21 MB/s       202.46 MB/s       201.02 MB/s (-0.71%)
       256            400.81 MB/s       416.81 MB/s       393.52 MB/s (-5.59%)
       512            746.49 MB/s       834.12 MB/s       809.99 MB/s (-2.89%)
       1024          1292.33 MB/s      1641.96 MB/s      1571.82 MB/s (-4.27%)
       2048          2007.64 MB/s      2760.44 MB/s      2717.68 MB/s (-1.55%)
       4096          2665.17 MB/s      4157.44 MB/s      4070.76 MB/s (-2.09%)
       8192          3159.72 MB/s      4361.57 MB/s      4270.65 MB/s (-2.08%)
       16384         4186.70 MB/s      4574.13 MB/s      4501.17 MB/s (-1.60%)
       32768         4093.21 MB/s      4487.42 MB/s      4322.43 MB/s (-3.68%)
       65536         4057.14 MB/s      4735.61 MB/s      4555.17 MB/s (-3.81%)

    2) regression in buffer initialization and destruction path, which is
       brought by additional MR operations of sndbufs. But thanks to link
       group buffer reuse mechanism, the impact of this kind of regression
       decreases as times of buffer reuse increases.

       Taking 256KB sndbuf and RMB as an example, latency of some key SMC-R
       buffer-related function obtained by bpftrace are as follows:

       Function                         Phys-bufs           Virt-bufs
       smcr_new_buf_create()             67154 ns            79164 ns
       smc_ib_buf_map_sg()                 525 ns              928 ns
       smc_ib_get_memory_region()       162294 ns           161191 ns
       smc_wr_reg_send()                  9957 ns             9635 ns
       smc_ib_put_memory_region()       203548 ns           198374 ns
       smc_ib_buf_unmap_sg()               508 ns             1158 ns

    ------------
    Test environment notes:
    1. Above tests run on 2 VMs within the same Host.
    2. The NIC is ConnectX-4Lx, using SRIOV and passing through 2 VFs to
       the each VM respectively.
    3. VMs' vCPUs are binded to different physical CPUs, and the binded
       physical CPUs are isolated by `isolcpus=xxx` cmdline.
    4. NICs' queue number are set to 1.

    Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2023-05-26 09:39:09 +00:00
Tobias Huschle 78e54713f6 net/smc: Use sysctl-specified types of buffers in new link group
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2160099
Upstream status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Tested: by IBM
Build-Info: ihttps://brewweb.engineering.redhat.com/brew/taskinfo?taskID=52893145
Conflicts: None
commit b984f370ed5182d180f92dbf14bdf847ff6ccc04
Author: Wen Gu <guwen@linux.alibaba.com>
Date:   Thu Jul 14 17:44:03 2022 +0800

    net/smc: Use sysctl-specified types of buffers in new link group

    This patch introduces a new SMC-R specific element buf_type
    in struct smc_link_group, for recording the value of sysctl
    smcr_buf_type when link group is created.

    New created link group will create and reuse buffers of the
    type specified by buf_type.

    Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2023-05-26 09:39:09 +00:00
Tobias Huschle fdcc3bb7fc net/smc: optimize for smc_sndbuf_sync_sg_for_device and smc_rmb_sync_sg_for_cpu
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2160099
Upstream status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Tested: by IBM
Build-Info: ihttps://brewweb.engineering.redhat.com/brew/taskinfo?taskID=52893145
Conflicts: None
commit 0ef69e788411cba2af017db731a9fc62d255e9ac
Author: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Date:   Thu Jul 14 17:44:01 2022 +0800

    net/smc: optimize for smc_sndbuf_sync_sg_for_device and smc_rmb_sync_sg_for_cpu

    Some CPU, such as Xeon, can guarantee DMA cache coherency.
    So it is no need to use dma sync APIs to flush cache on such CPUs.
    In order to avoid calling dma sync APIs on the IO path, use the
    dma_need_sync to check whether smc_buf_desc needs dma sync when
    creating smc_buf_desc.

    Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2023-05-26 09:39:08 +00:00
Tobias Huschle 3e71ec8595 net/smc: remove redundant dma sync ops
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2160099
Upstream status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Tested: by IBM
Build-Info: ihttps://brewweb.engineering.redhat.com/brew/taskinfo?taskID=52893145
Conflicts: None
commit 6d52e2de6415b7a035b3e8dc4ccffd0da25bbfb9
Author: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Date:   Thu Jul 14 17:44:00 2022 +0800

    net/smc: remove redundant dma sync ops

    smc_ib_sync_sg_for_cpu/device are the ops used for dma memory cache
    consistency. Smc sndbufs are dma buffers, where CPU writes data to
    it and PCIE device reads data from it. So for sndbufs,
    smc_ib_sync_sg_for_device is needed and smc_ib_sync_sg_for_cpu is
    redundant as PCIE device will not write the buffers. Smc rmbs
    are dma buffers, where PCIE device write data to it and CPU read
    data from it. So for rmbs, smc_ib_sync_sg_for_cpu is needed and
    smc_ib_sync_sg_for_device is redundant as CPU will not write the buffers.

    Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2023-05-26 09:39:08 +00:00
Tobias Huschle 4f48bb24b1 [s390] net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error cause by server
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294
Upstream Status: https://github.com/torvalds/linux.git
Tested: by IBM
Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016
Conflicts: None

commit 4940a1fdf31c39f0806ac831cde333134862030b
Author: D. Wythe <alibuda@linux.alibaba.com>
Date:   Wed Mar 2 21:25:12 2022 +0800

    net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error cause by server

    The problem of SMC_CLC_DECL_ERR_REGRMB on the server is very clear.
    Based on the fact that whether a new SMC connection can be accepted or
    not depends on not only the limit of conn nums, but also the available
    entries of rtoken. Since the rtoken release is trigger by peer, while
    the conn nums is decrease by local, tons of thing can happen in this
    time difference.

    This only thing that needs to be mentioned is that now all connection
    creations are completely protected by smc_server_lgr_pending lock, it's
    enough to check only the available entries in rtokens_used_mask.

    Fixes: cd6851f303 ("smc: remote memory buffers (RMBs)")
    Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2022-06-15 06:47:46 +02:00
Tobias Huschle 222f9445ef [s390] net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error generated by client
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294
Upstream Status: https://github.com/torvalds/linux.git
Tested: by IBM
Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016
Conflicts: None

commit 0537f0a2151375dcf90c1bbfda6a0aaf57164e89
Author: D. Wythe <alibuda@linux.alibaba.com>
Date:   Wed Mar 2 21:25:11 2022 +0800

    net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error generated by client

    The main reason for this unexpected SMC_CLC_DECL_ERR_REGRMB in client
    dues to following execution sequence:

    Server Conn A:           Server Conn B:                 Client Conn B:

    smc_lgr_unregister_conn
                            smc_lgr_register_conn
                            smc_clc_send_accept     ->
                                                            smc_rtoken_add
    smcr_buf_unuse
                    ->              Client Conn A:
                                    smc_rtoken_delete

    smc_lgr_unregister_conn() makes current link available to assigned to new
    incoming connection, while smcr_buf_unuse() has not executed yet, which
    means that smc_rtoken_add may fail because of insufficient rtoken_entry,
    reversing their execution order will avoid this problem.

    Fixes: 3e034725c0 ("net/smc: common functions for RMBs and send buffers")
    Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2022-06-15 06:47:46 +02:00
Tobias Huschle c53e70df91 [s390] net/smc: correct settings of RMB window update limit
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294
Upstream Status: https://github.com/torvalds/linux.git
Tested: by IBM
Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016
Conflicts: None

commit 6bf536eb5c8ca011d1ff57b5c5f7c57ceac06a37
Author: Dust Li <dust.li@linux.alibaba.com>
Date:   Tue Mar 1 17:44:00 2022 +0800

    net/smc: correct settings of RMB window update limit

    rmbe_update_limit is used to limit announcing receive
    window updating too frequently. RFC7609 request a minimal
    increase in the window size of 10% of the receive buffer
    space. But current implementation used:

      min_t(int, rmbe_size / 10, SOCK_MIN_SNDBUF / 2)

    and SOCK_MIN_SNDBUF / 2 == 2304 Bytes, which is almost
    always less then 10% of the receive buffer space.

    This causes the receiver always sending CDC message to
    update its consumer cursor when it consumes more then 2K
    of data. And as a result, we may encounter something like
    "TCP silly window syndrome" when sending 2.5~8K message.

    This patch fixes this using max(rmbe_size / 10, SOCK_MIN_SNDBUF / 2).

    With this patch and SMC autocorking enabled, qperf 2K/4K/8K
    tcp_bw test shows 45%/75%/40% increase in throughput respectively.

    Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2022-06-15 06:47:44 +02:00
Tobias Huschle ea839fb15e [s390] net/smc: Fix hung_task when removing SMC-R devices
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294
Upstream Status: https://github.com/torvalds/linux.git
Tested: by IBM
Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016
Conflicts: None

commit 56d99e81ecbc997a5f984684d0eeb583992b2072
Author: Wen Gu <guwen@linux.alibaba.com>
Date:   Sun Jan 16 15:43:42 2022 +0800

    net/smc: Fix hung_task when removing SMC-R devices

    A hung_task is observed when removing SMC-R devices. Suppose that
    a link group has two active links(lnk_A, lnk_B) associated with two
    different SMC-R devices(dev_A, dev_B). When dev_A is removed, the
    link group will be removed from smc_lgr_list and added into
    lgr_linkdown_list. lnk_A will be cleared and smcibdev(A)->lnk_cnt
    will reach to zero. However, when dev_B is removed then, the link
    group can't be found in smc_lgr_list and lnk_B won't be cleared,
    making smcibdev->lnk_cnt never reaches zero, which causes a hung_task.

    This patch fixes this issue by restoring the implementation of
    smc_smcr_terminate_all() to what it was before commit 349d43127dac
    ("net/smc: fix kernel panic caused by race of smc_sock"). The original
    implementation also satisfies the intention that make sure QP destroy
    earlier than CQ destroy because we will always wait for smcibdev->lnk_cnt
    reaches zero, which guarantees QP has been destroyed.

    Fixes: 349d43127dac ("net/smc: fix kernel panic caused by race of smc_sock")
    Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2022-06-15 06:47:36 +02:00
Tobias Huschle e4a60a1b3c [s390] net/smc: Resolve the race between SMC-R link access and clear
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294
Upstream Status: https://github.com/torvalds/linux.git
Tested: by IBM
Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016
Conflicts: None

commit 20c9398d3309d170300d67643b851fd26783af24
Author: Wen Gu <guwen@linux.alibaba.com>
Date:   Thu Jan 13 16:36:42 2022 +0800

    net/smc: Resolve the race between SMC-R link access and clear

    We encountered some crashes caused by the race between SMC-R
    link access and link clear that triggered by abnormal link
    group termination, such as port error.

    Here is an example of this kind of crashes:

     BUG: kernel NULL pointer dereference, address: 0000000000000000
     Workqueue: smc_hs_wq smc_listen_work [smc]
     RIP: 0010:smc_llc_flow_initiate+0x44/0x190 [smc]
     Call Trace:
      <TASK>
      ? __smc_buf_create+0x75a/0x950 [smc]
      smcr_lgr_reg_rmbs+0x2a/0xbf [smc]
      smc_listen_work+0xf72/0x1230 [smc]
      ? process_one_work+0x25c/0x600
      process_one_work+0x25c/0x600
      worker_thread+0x4f/0x3a0
      ? process_one_work+0x600/0x600
      kthread+0x15d/0x1a0
      ? set_kthread_struct+0x40/0x40
      ret_from_fork+0x1f/0x30
      </TASK>

    smc_listen_work()                     __smc_lgr_terminate()
    ---------------------------------------------------------------
                                        | smc_lgr_free()
                                        |  |- smcr_link_clear()
                                        |      |- memset(lnk, 0)
    smc_listen_rdma_reg()               |
     |- smcr_lgr_reg_rmbs()             |
         |- smc_llc_flow_initiate()     |
             |- access lnk->lgr (panic) |

    These crashes are similarly caused by clearing SMC-R link
    resources when some functions is still accessing to them.
    This patch tries to fix the issue by introducing reference
    count of SMC-R links and ensuring that the sensitive resources
    of links won't be cleared until reference count reaches zero.

    The operation to the SMC-R link reference count can be concluded
    as follows:

    object          [hold or initialized as 1]         [put]
    --------------------------------------------------------------------
    links           smcr_link_init()                   smcr_link_clear()
    connections     smc_conn_create()                  smc_conn_free()

    Through this way, the clear of SMC-R links is later than the
    free of all the smc connections above it, thus avoiding the
    unsafe reference to SMC-R links.

    Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2022-06-15 06:47:35 +02:00
Tobias Huschle 8b399872fb [s390] net/smc: Introduce a new conn->lgr validity check helper
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294
Upstream Status: https://github.com/torvalds/linux.git
Tested: by IBM
Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016
Conflicts: None

commit ea89c6c0983c39702a4a52ccaa4702e0cb71179b
Author: Wen Gu <guwen@linux.alibaba.com>
Date:   Thu Jan 13 16:36:41 2022 +0800

    net/smc: Introduce a new conn->lgr validity check helper

    It is no longer suitable to identify whether a smc connection
    is registered in a link group through checking if conn->lgr
    is NULL, because conn->lgr won't be reset even the connection
    is unregistered from a link group.

    So this patch introduces a new helper smc_conn_lgr_valid() and
    replaces all the check of conn->lgr in original implementation
    with the new helper to judge if conn->lgr is valid to use.

    Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2022-06-15 06:47:35 +02:00
Tobias Huschle b141c34e82 [s390] net/smc: Resolve the race between link group access and termination
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294
Upstream Status: https://github.com/torvalds/linux.git
Tested: by IBM
Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016
Conflicts: None

commit 61f434b0280ed65495831f1b6e1a5c21a90f47c6
Author: Wen Gu <guwen@linux.alibaba.com>
Date:   Thu Jan 13 16:36:40 2022 +0800

    net/smc: Resolve the race between link group access and termination

    We encountered some crashes caused by the race between the access
    and the termination of link groups.

    Here are some of panic stacks we met:

    1) Race between smc_clc_wait_msg() and __smc_lgr_terminate()

     BUG: kernel NULL pointer dereference, address: 00000000000002f0
     Workqueue: smc_hs_wq smc_listen_work [smc]
     RIP: 0010:smc_clc_wait_msg+0x3eb/0x5c0 [smc]
     Call Trace:
      <TASK>
      ? smc_clc_send_accept+0x45/0xa0 [smc]
      ? smc_clc_send_accept+0x45/0xa0 [smc]
      smc_listen_work+0x783/0x1220 [smc]
      ? finish_task_switch+0xc4/0x2e0
      ? process_one_work+0x1ad/0x3c0
      process_one_work+0x1ad/0x3c0
      worker_thread+0x4c/0x390
      ? rescuer_thread+0x320/0x320
      kthread+0x149/0x190
      ? set_kthread_struct+0x40/0x40
      ret_from_fork+0x1f/0x30
      </TASK>

    smc_listen_work()                abnormal case like port error
    ---------------------------------------------------------------
                                    | __smc_lgr_terminate()
                                    |  |- smc_conn_kill()
                                    |      |- smc_lgr_unregister_conn()
                                    |          |- set conn->lgr = NULL
    smc_clc_wait_msg()              |
     |- access conn->lgr (panic)    |

    2) Race between smc_setsockopt() and __smc_lgr_terminate()

     BUG: kernel NULL pointer dereference, address: 00000000000002e8
     RIP: 0010:smc_setsockopt+0x17a/0x280 [smc]
     Call Trace:
      <TASK>
      __sys_setsockopt+0xfc/0x190
      __x64_sys_setsockopt+0x20/0x30
      do_syscall_64+0x34/0x90
      entry_SYSCALL_64_after_hwframe+0x44/0xae
      </TASK>

    smc_setsockopt()                 abnormal case like port error
    --------------------------------------------------------------
                                    | __smc_lgr_terminate()
                                    |  |- smc_conn_kill()
                                    |      |- smc_lgr_unregister_conn()
                                    |          |- set conn->lgr = NULL
    mod_delayed_work()              |
     |- access conn->lgr (panic)    |

    There are some other panic places and they are caused by the
    similar reason as described above, which is accessing link
    group after termination, thus getting a NULL pointer or invalid
    resource.

    Currently, there seems to be no synchronization between the
    link group access and a sudden termination of it. This patch
    tries to fix this by introducing reference count of link group
    and not freeing link group until reference count is zero.

    Link group might be referred to by links or smc connections. So
    the operation to the link group reference count can be concluded
    as follows:

    object          [hold or initialized as 1]       [put]
    -------------------------------------------------------------------
    link group      smc_lgr_create()                 smc_lgr_free()
    connections     smc_conn_create()                smc_conn_free()
    links           smcr_link_init()                 smcr_link_clear()

    Througth this way, we extend the life cycle of link group and
    ensure it is longer than the life cycle of connections and links
    above it, so that avoid invalid access to link group after its
    termination.

    Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2022-06-15 06:47:34 +02:00
Tobias Huschle 393f03c09b [s390] net/smc: Reset conn->lgr when link group registration fails
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294
Upstream Status: https://github.com/torvalds/linux.git
Tested: by IBM
Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016
Conflicts: None

commit 36595d8ad46d9e4c41cc7c48c4405b7c3322deac
Author: Wen Gu <guwen@linux.alibaba.com>
Date:   Thu Jan 6 20:42:08 2022 +0800

    net/smc: Reset conn->lgr when link group registration fails

    SMC connections might fail to be registered in a link group due to
    unable to find a usable link during its creation. As a result,
    smc_conn_create() will return a failure and most resources related
    to the connection won't be applied or initialized, such as
    conn->abort_work or conn->lnk.

    If smc_conn_free() is invoked later, it will try to access the
    uninitialized resources related to the connection, thus causing
    a warning or crash.

    This patch tries to fix this by resetting conn->lgr to NULL if an
    abnormal exit occurs in smc_lgr_register_conn(), thus avoiding the
    access to uninitialized resources in smc_conn_free().

    Meanwhile, the new created link group should be terminated if smc
    connections can't be registered in it. So smc_lgr_cleanup_early() is
    modified to take care of link group only and invoked to terminate
    unusable link group by smc_conn_create(). The call to smc_conn_free()
    is moved out from smc_lgr_cleanup_early() to smc_conn_abort().

    Fixes: 56bc3b2094 ("net/smc: assign link to a new connection")
    Suggested-by: Karsten Graul <kgraul@linux.ibm.com>
    Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
    Acked-by: Karsten Graul <kgraul@linux.ibm.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2022-06-15 06:47:34 +02:00
Tobias Huschle ec18bd7be6 [s390] net/smc: Print net namespace in log
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294
Upstream Status: https://github.com/torvalds/linux.git
Tested: by IBM
Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016
Conflicts: None

commit de2fea7b39bfa1ee9db8726f7b71d54fec385d80
Author: Tony Lu <tonylu@linux.alibaba.com>
Date:   Tue Dec 28 21:06:11 2021 +0800

    net/smc: Print net namespace in log

    This adds net namespace ID to the kernel log, net_cookie is unique in
    the whole system. It is useful in container environment.

    Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2022-06-15 06:47:32 +02:00
Tobias Huschle b1dbd093ff [s390] net/smc: Add netlink net namespace support
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294
Upstream Status: https://github.com/torvalds/linux.git
Tested: by IBM
Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016
Conflicts: None

commit 79d39fc503b43b566feae5bc9a57dfcffdf41bd1
Author: Tony Lu <tonylu@linux.alibaba.com>
Date:   Tue Dec 28 21:06:10 2021 +0800

    net/smc: Add netlink net namespace support

    This adds net namespace ID to diag of linkgroup, helps us to distinguish
    different namespaces, and net_cookie is unique in the whole system.

    Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2022-06-15 06:47:32 +02:00
Tobias Huschle 42012cf975 [s390] net/smc: Introduce net namespace support for linkgroup
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294
Upstream Status: https://github.com/torvalds/linux.git
Tested: by IBM
Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016
Conflicts: None

commit 0237a3a683e4844ddc52782d83d439d6192e11f9
Author: Tony Lu <tonylu@linux.alibaba.com>
Date:   Tue Dec 28 21:06:09 2021 +0800

    net/smc: Introduce net namespace support for linkgroup

    Currently, rdma device supports exclusive net namespace isolation,
    however linkgroup doesn't know and support ibdev net namespace.
    Applications in the containers don't want to share the nics if we
    enabled rdma exclusive mode. Every net namespaces should have their own
    linkgroups.

    This patch introduce a new field net for linkgroup, which is standing
    for the ibdev net namespace in the linkgroup. The net in linkgroup is
    initialized with the net namespace of link's ibdev. It compares the net
    of linkgroup and sock or ibdev before choose it, if no matched, create
    new one in current net namespace. If rdma net namespace exclusive mode
    is not enabled, it behaves as before.

    Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2022-06-15 06:47:31 +02:00
Tobias Huschle 5da4f5e2ca [s390] net/smc: fix kernel panic caused by race of smc_sock
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294
Upstream Status: https://github.com/torvalds/linux.git
Tested: by IBM
Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016
Conflicts: None

commit 349d43127dac00c15231e8ffbcaabd70f7b0e544
Author: Dust Li <dust.li@linux.alibaba.com>
Date:   Tue Dec 28 17:03:25 2021 +0800

    net/smc: fix kernel panic caused by race of smc_sock

    A crash occurs when smc_cdc_tx_handler() tries to access smc_sock
    but smc_release() has already freed it.

    [ 4570.695099] BUG: unable to handle page fault for address: 000000002eae9e88
    [ 4570.696048] #PF: supervisor write access in kernel mode
    [ 4570.696728] #PF: error_code(0x0002) - not-present page
    [ 4570.697401] PGD 0 P4D 0
    [ 4570.697716] Oops: 0002 [#1] PREEMPT SMP NOPTI
    [ 4570.698228] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-rc4+ #111
    [ 4570.699013] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8c24b4c 04/0
    [ 4570.699933] RIP: 0010:_raw_spin_lock+0x1a/0x30
    <...>
    [ 4570.711446] Call Trace:
    [ 4570.711746]  <IRQ>
    [ 4570.711992]  smc_cdc_tx_handler+0x41/0xc0
    [ 4570.712470]  smc_wr_tx_tasklet_fn+0x213/0x560
    [ 4570.712981]  ? smc_cdc_tx_dismisser+0x10/0x10
    [ 4570.713489]  tasklet_action_common.isra.17+0x66/0x140
    [ 4570.714083]  __do_softirq+0x123/0x2f4
    [ 4570.714521]  irq_exit_rcu+0xc4/0xf0
    [ 4570.714934]  common_interrupt+0xba/0xe0

    Though smc_cdc_tx_handler() checked the existence of smc connection,
    smc_release() may have already dismissed and released the smc socket
    before smc_cdc_tx_handler() further visits it.

    smc_cdc_tx_handler()           |smc_release()
    if (!conn)                     |
                                   |
                                   |smc_cdc_tx_dismiss_slots()
                                   |      smc_cdc_tx_dismisser()
                                   |
                                   |sock_put(&smc->sk) <- last sock_put,
                                   |                      smc_sock freed
    bh_lock_sock(&smc->sk) (panic) |

    To make sure we won't receive any CDC messages after we free the
    smc_sock, add a refcount on the smc_connection for inflight CDC
    message(posted to the QP but haven't received related CQE), and
    don't release the smc_connection until all the inflight CDC messages
    haven been done, for both success or failed ones.

    Using refcount on CDC messages brings another problem: when the link
    is going to be destroyed, smcr_link_clear() will reset the QP, which
    then remove all the pending CQEs related to the QP in the CQ. To make
    sure all the CQEs will always come back so the refcount on the
    smc_connection can always reach 0, smc_ib_modify_qp_reset() was replaced
    by smc_ib_modify_qp_error().
    And remove the timeout in smc_wr_tx_wait_no_pending_sends() since we
    need to wait for all pending WQEs done, or we may encounter use-after-
    free when handling CQEs.

    For IB device removal routine, we need to wait for all the QPs on that
    device been destroyed before we can destroy CQs on the device, or
    the refcount on smc_connection won't reach 0 and smc_sock cannot be
    released.

    Fixes: 5f08318f61 ("smc: connection data control (CDC)")
    Reported-by: Wen Gu <guwen@linux.alibaba.com>
    Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2022-06-15 06:47:31 +02:00
Tobias Huschle 8d04415218 [s390] net/smc: don't send CDC/LLC message if link not ready
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294
Upstream Status: https://github.com/torvalds/linux.git
Tested: by IBM
Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016
Conflicts: None

commit 90cee52f2e780345d3629e278291aea5ac74f40f
Author: Dust Li <dust.li@linux.alibaba.com>
Date:   Tue Dec 28 17:03:24 2021 +0800

    net/smc: don't send CDC/LLC message if link not ready

    We found smc_llc_send_link_delete_all() sometimes wait
    for 2s timeout when testing with RDMA link up/down.
    It is possible when a smc_link is in ACTIVATING state,
    the underlaying QP is still in RESET or RTR state, which
    cannot send any messages out.

    smc_llc_send_link_delete_all() use smc_link_usable() to
    checks whether the link is usable, if the QP is still in
    RESET or RTR state, but the smc_link is in ACTIVATING, this
    LLC message will always fail without any CQE entering the
    CQ, and we will always wait 2s before timeout.

    Since we cannot send any messages through the QP before
    the QP enter RTS. I add a wrapper smc_link_sendable()
    which checks the state of QP along with the link state.
    And replace smc_link_usable() with smc_link_sendable()
    in all LLC & CDC message sending routine.

    Fixes: 5f08318f61 ("smc: connection data control (CDC)")
    Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2022-06-15 06:47:30 +02:00
Tobias Huschle e3ba7dd79c [s390] net/smc: Clear memory when release and reuse buffer
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294
Upstream Status: https://github.com/torvalds/linux.git
Tested: by IBM
Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016
Conflicts: None

commit 1c5526968e270e4efccfa1da21d211a4915cdeda
Author: Tony Lu <tonylu@linux.alibaba.com>
Date:   Fri Dec 3 12:33:31 2021 +0100

    net/smc: Clear memory when release and reuse buffer

    Currently, buffers are cleared when smc connections are created and
    buffers are reused. This slows down the speed of establishing new
    connections. In most cases, the applications want to establish
    connections as quickly as possible.

    This patch moves memset() from connection creation path to release and
    buffer unuse path, this trades off between speed of establishing and
    release.

    Test environments:
    - CPU Intel Xeon Platinum 8 core, mem 32 GiB, nic Mellanox CX4
    - socket sndbuf / rcvbuf: 16384 / 131072 bytes
    - w/o first round, 5 rounds, avg, 100 conns batch per round
    - smc_buf_create() use bpftrace kprobe, introduces extra latency

    Latency benchmarks for smc_buf_create():
      w/o patch : 19040.0 ns
      w/  patch :  1932.6 ns
      ratio :        10.2% (-89.8%)

    Latency benchmarks for socket create and connect:
      w/o patch :   143.3 us
      w/  patch :   102.2 us
      ratio :        71.3% (-28.7%)

    The latency of establishing connections is reduced by 28.7%.

    Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
    Reviewed-by: Wen Gu <guwen@linux.alibaba.com>
    Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
    Link: https://lore.kernel.org/r/20211203113331.2818873-1-kgraul@linux.ibm.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2022-06-15 06:47:29 +02:00
Tobias Huschle 675b2d7d0b [s390] net/smc: fix wrong list_del in smc_lgr_cleanup_early
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294
Upstream Status: https://github.com/torvalds/linux.git
Tested: by IBM
Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016
Conflicts: None

commit 789b6cc2a5f9123b9c549b886fdc47c865cfe0ba
Author: Dust Li <dust.li@linux.alibaba.com>
Date:   Wed Dec 1 11:02:30 2021 +0800

    net/smc: fix wrong list_del in smc_lgr_cleanup_early

    smc_lgr_cleanup_early() meant to delete the link
    group from the link group list, but it deleted
    the list head by mistake.

    This may cause memory corruption since we didn't
    remove the real link group from the list and later
    memseted the link group structure.
    We got a list corruption panic when testing:

    [  231.277259] list_del corruption. prev->next should be ffff8881398a8000, but was 0000000000000000
    [  231.278222] ------------[ cut here ]------------
    [  231.278726] kernel BUG at lib/list_debug.c:53!
    [  231.279326] invalid opcode: 0000 [#1] SMP NOPTI
    [  231.279803] CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.10.46+ #435
    [  231.280466] Hardware name: Alibaba Cloud ECS, BIOS 8c24b4c 04/01/2014
    [  231.281248] Workqueue: events smc_link_down_work
    [  231.281732] RIP: 0010:__list_del_entry_valid+0x70/0x90
    [  231.282258] Code: 4c 60 82 e8 7d cc 6a 00 0f 0b 48 89 fe 48 c7 c7 88 4c
    60 82 e8 6c cc 6a 00 0f 0b 48 89 fe 48 c7 c7 c0 4c 60 82 e8 5b cc 6a 00 <0f>
    0b 48 89 fe 48 c7 c7 00 4d 60 82 e8 4a cc 6a 00 0f 0b cc cc cc
    [  231.284146] RSP: 0018:ffffc90000033d58 EFLAGS: 00010292
    [  231.284685] RAX: 0000000000000054 RBX: ffff8881398a8000 RCX: 0000000000000000
    [  231.285415] RDX: 0000000000000001 RSI: ffff88813bc18040 RDI: ffff88813bc18040
    [  231.286141] RBP: ffffffff8305ad40 R08: 0000000000000003 R09: 0000000000000001
    [  231.286873] R10: ffffffff82803da0 R11: ffffc90000033b90 R12: 0000000000000001
    [  231.287606] R13: 0000000000000000 R14: ffff8881398a8000 R15: 0000000000000003
    [  231.288337] FS:  0000000000000000(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
    [  231.289160] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [  231.289754] CR2: 0000000000e72058 CR3: 000000010fa96006 CR4: 00000000003706f0
    [  231.290485] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [  231.291211] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [  231.291940] Call Trace:
    [  231.292211]  smc_lgr_terminate_sched+0x53/0xa0
    [  231.292677]  smc_switch_conns+0x75/0x6b0
    [  231.293085]  ? update_load_avg+0x1a6/0x590
    [  231.293517]  ? ttwu_do_wakeup+0x17/0x150
    [  231.293907]  ? update_load_avg+0x1a6/0x590
    [  231.294317]  ? newidle_balance+0xca/0x3d0
    [  231.294716]  smcr_link_down+0x50/0x1a0
    [  231.295090]  ? __wake_up_common_lock+0x77/0x90
    [  231.295534]  smc_link_down_work+0x46/0x60
    [  231.295933]  process_one_work+0x18b/0x350

    Fixes: a0a62ee15a ("net/smc: separate locks for SMCD and SMCR link group lists")
    Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
    Acked-by: Karsten Graul <kgraul@linux.ibm.com>
    Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2022-06-15 06:47:28 +02:00
Tobias Huschle 6052fdc4d7 [s390] net/smc: Fix NULL pointer dereferencing in smc_vlan_by_tcpsk()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294
Upstream Status: https://github.com/torvalds/linux.git
Tested: by IBM
Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016
Conflicts: None

commit 587acad41f1bc48e16f42bb2aca63bf323380be8
Author: Karsten Graul <kgraul@linux.ibm.com>
Date:   Wed Nov 24 13:32:37 2021 +0100

    net/smc: Fix NULL pointer dereferencing in smc_vlan_by_tcpsk()

    Coverity reports a possible NULL dereferencing problem:

    in smc_vlan_by_tcpsk():
    6. returned_null: netdev_lower_get_next returns NULL (checked 29 out of 30 times).
    7. var_assigned: Assigning: ndev = NULL return value from netdev_lower_get_next.
    1623                ndev = (struct net_device *)netdev_lower_get_next(ndev, &lower);
    CID 1468509 (#1 of 1): Dereference null return value (NULL_RETURNS)
    8. dereference: Dereferencing a pointer that might be NULL ndev when calling is_vlan_dev.
    1624                if (is_vlan_dev(ndev)) {

    Remove the manual implementation and use netdev_walk_all_lower_dev() to
    iterate over the lower devices. While on it remove an obsolete function
    parameter comment.

    Fixes: cb9d43f677 ("net/smc: determine vlan_id of stacked net_device")
    Suggested-by: Julian Wiedmann <jwi@linux.ibm.com>
    Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2022-06-15 06:47:27 +02:00
Tobias Huschle 5f0eac90e8 [s390] net/smc: Make sure the link_id is unique
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294
Upstream Status: https://github.com/torvalds/linux.git
Tested: by IBM
Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016
Conflicts: None

commit cf4f5530bb55ef7d5a91036b26676643b80b1616
Author: Wen Gu <guwen@linux.alibaba.com>
Date:   Mon Nov 15 17:45:07 2021 +0800

    net/smc: Make sure the link_id is unique

    The link_id is supposed to be unique, but smcr_next_link_id() doesn't
    skip the used link_id as expected. So the patch fixes this.

    Fixes: 026c381fb4 ("net/smc: introduce link_idx for link group array")
    Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
    Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
    Acked-by: Karsten Graul <kgraul@linux.ibm.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2022-06-15 06:47:26 +02:00
Tobias Huschle 06e251007a [s390] net/smc: Introduce tracepoint for smcr link down
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294
Upstream Status: https://github.com/torvalds/linux.git
Tested: by IBM
Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016
Conflicts: None

commit a3a0e81b6fd55745e100735c7667cd99a0650811
Author: Tony Lu <tonylu@linux.alibaba.com>
Date:   Mon Nov 1 15:39:16 2021 +0800

    net/smc: Introduce tracepoint for smcr link down

    SMC-R link down event is important to help us find links' issues, we
    should track this event, especially in the single nic mode, which means
    upper layer connection would be shut down. Then find out the direct
    link-down reason in time, not only increased the counter, also the
    location of the code who triggered this event.

    Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
    Reviewed-by: Wen Gu <guwen@linux.alibaba.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2022-06-15 06:47:24 +02:00
Tobias Huschle d97f87ca20 [s390] net/smc: add netlink support for SMC-Rv2
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294
Upstream Status: https://github.com/torvalds/linux.git
Tested: by IBM
Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016
Conflicts: None

commit b0539f5eddc2eefd24378bda3ee9cbbca916f58d
Author: Karsten Graul <kgraul@linux.ibm.com>
Date:   Sat Oct 16 11:37:51 2021 +0200

    net/smc: add netlink support for SMC-Rv2

    Implement the netlink support for SMC-Rv2 related attributes that are
    provided to user space.

    Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2022-06-15 06:47:22 +02:00
Tobias Huschle ddd334253b [s390] net/smc: add v2 support to the work request layer
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294
Upstream Status: https://github.com/torvalds/linux.git
Tested: by IBM
Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016
Conflicts: None

commit 8799e310fb3f15759824a78b6b93d7e6d5def067
Author: Karsten Graul <kgraul@linux.ibm.com>
Date:   Sat Oct 16 11:37:49 2021 +0200

    net/smc: add v2 support to the work request layer

    In the work request layer define one large v2 buffer for each link group
    that is used to transmit and receive large LLC control messages.
    Add the completion queue handling for this buffer.

    Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2022-06-15 06:47:21 +02:00
Tobias Huschle fea676e354 [s390] net/smc: retrieve v2 gid from IB device
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294
Upstream Status: https://github.com/torvalds/linux.git
Tested: by IBM
Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016
Conflicts: None

commit 24fb68111d4509524b483b2577f1b20a24f5fdfd
Author: Karsten Graul <kgraul@linux.ibm.com>
Date:   Sat Oct 16 11:37:48 2021 +0200

    net/smc: retrieve v2 gid from IB device

    In smc_ib.c, scan for RoCE devices that support UDP encapsulation.
    Find an eligible device and check that there is a route to the
    remote peer.

    Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2022-06-15 06:47:21 +02:00
Tobias Huschle 4c2e548f6e [s390] net/smc: add listen processing for SMC-Rv2
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294
Upstream Status: https://github.com/torvalds/linux.git
Tested: by IBM
Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016
Conflicts: None

commit e49300a6bf6218c835403545e9356141a6340181
Author: Karsten Graul <kgraul@linux.ibm.com>
Date:   Sat Oct 16 11:37:46 2021 +0200

    net/smc: add listen processing for SMC-Rv2

    Implement the server side of the SMC-Rv2 processing. Process incoming
    CLC messages, find eligible devices and check for a valid route to the
    remote peer.

    Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2022-06-15 06:47:20 +02:00
Tobias Huschle 64bd390c97 [s390] net/smc: keep static copy of system EID
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294
Upstream Status: https://github.com/torvalds/linux.git
Tested: by IBM
Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016
Conflicts: None

commit 11a26c59fc510091facd0d80236ac848da844830
Author: Karsten Graul <kgraul@linux.ibm.com>
Date:   Tue Sep 14 10:35:06 2021 +0200

    net/smc: keep static copy of system EID

    The system EID is retrieved using an registered ISM device each time
    when needed. This adds some unnecessary complexity at all places where
    the system EID is needed, but no ISM device is at hand.
    Simplify the code and save the system EID in a static variable in
    smc_ism.c.

    Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
    Reviewed-by: Guvenc Gulce  <guvenc@linux.ibm.com>
    Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2022-06-15 06:47:18 +02:00
Tobias Huschle 5902e79404 [s390] net/smc: Allow SMC-D 1MB DMB allocations
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294
Upstream Status: https://github.com/torvalds/linux.git
Tested: by IBM
Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016
Conflicts: None

commit 67161779a9ea926fccee8de047ae66cbd3482b91
Author: Stefan Raspl <raspl@linux.ibm.com>
Date:   Mon Aug 9 10:10:14 2021 +0200

    net/smc: Allow SMC-D 1MB DMB allocations

    Commit a3fe3d01bd ("net/smc: introduce sg-logic for RMBs") introduced
    a restriction for RMB allocations as used by SMC-R. However, SMC-D does
    not use scatter-gather lists to back its DMBs, yet it was limited by
    this restriction, still.
    This patch exempts SMC, but limits allocations to the maximum RMB/DMB
    size respectively.

    Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
    Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Tobias Huschle <thuschle@redhat.com>
2022-06-15 06:47:18 +02:00
Mete Durlu fa4d5ce7c8 [s390] net/smc: improved fix wait on already cleared link
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1869652
Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Build Info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=40812265
Tested: by IBM
Conflicts: None

commit 95f7f3e7dc6bd2e735cb5de11734ea2222b1e05a
Author: Karsten Graul <kgraul@linux.ibm.com>
Date:   Thu Oct 7 16:14:40 2021 +0200

    net/smc: improved fix wait on already cleared link

    Commit 8f3d65c166 ("net/smc: fix wait on already cleared link")
    introduced link refcounting to avoid waits on already cleared links.
    This patch extents and improves the refcounting to cover all
    remaining possible cases for this kind of error situation.

    Fixes: 15e1b99aad ("net/smc: no WR buffer wait for terminating link group")
    Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Mete Durlu <mdurlu@redhat.com>
2021-11-03 06:03:22 -04:00
Mete Durlu 0b4a66178f [s390] net/smc: fix 'workqueue leaked lock' in smc_conn_abort_work
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1869652
Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Build Info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=40812265
Tested: by IBM
Conflicts: None

commit a18cee4791b1123d0a6579a7c89f4b87e48abe03
Author: Karsten Graul <kgraul@linux.ibm.com>
Date:   Mon Sep 20 21:18:15 2021 +0200

    net/smc: fix 'workqueue leaked lock' in smc_conn_abort_work

    The abort_work is scheduled when a connection was detected to be
    out-of-sync after a link failure. The work calls smc_conn_kill(),
    which calls smc_close_active_abort() and that might end up calling
    smc_close_cancel_work().
    smc_close_cancel_work() cancels any pending close_work and tx_work but
    needs to release the sock_lock before and acquires the sock_lock again
    afterwards. So when the sock_lock was NOT acquired before then it may
    be held after the abort_work completes. Thats why the sock_lock is
    acquired before the call to smc_conn_kill() in __smc_lgr_terminate(),
    but this is missing in smc_conn_abort_work().

    Fix that by acquiring the sock_lock first and release it after the
    call to smc_conn_kill().

    Fixes: b286a0651e ("net/smc: handle incoming CDC validation message")
    Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Mete Durlu <mdurlu@redhat.com>
2021-11-03 06:02:59 -04:00
Guvenc Gulce 64513d269e net/smc: Correct smc link connection counter in case of smc client
SMC clients may be assigned to a different link after the initial
connection between two peers was established. In such a case,
the connection counter was not correctly set.

Update the connection counter correctly when a smc client connection
is assigned to a different smc link.

Fixes: 07d51580ff ("net/smc: Add connection counters for links")
Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com>
Tested-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-09 10:46:59 +01:00
Guvenc Gulce 194730a9be net/smc: Make SMC statistics network namespace aware
Make the gathered SMC statistics network namespace aware, for each
namespace collect an own set of statistic information.

Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16 12:54:02 -07:00
Guvenc Gulce e0e4b8fa53 net/smc: Add SMC statistics support
Add the ability to collect SMC statistics information. Per-cpu
variables are used to collect the statistic information for better
performance and for reducing concurrency pitfalls. The code that is
collecting statistic data is implemented in macros to increase code
reuse and readability.

Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16 12:54:02 -07:00
Karsten Graul f8e0a68bab net/smc: avoid possible duplicate dmb unregistration
smc_lgr_cleanup() calls smcd_unregister_all_dmbs() as part of the link
group termination process. This is a leftover from the times when
smc_lgr_cleanup() scheduled a worker to actually free the link group.
Nowadays smc_lgr_cleanup() directly calls smc_lgr_free() without any
delay so an earlier dmb unregistration is no longer needed.
So remove smcd_unregister_all_dmbs() and the call to it.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-03 13:54:49 -07:00
Guvenc Gulce 8a44653689 net/smc: use memcpy instead of snprintf to avoid out of bounds read
Using snprintf() to convert not null-terminated strings to null
terminated strings may cause out of bounds read in the source string.
Therefore use memcpy() and terminate the target string with a null
afterwards.

Fixes: a3db10efcc ("net/smc: Add support for obtaining SMCR device list")
Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-12 20:22:01 -08:00
Jakub Kicinski 25fe2c9c4c smc: fix out of bound access in smc_nl_get_sys_info()
smc_clc_get_hostname() sets the host pointer to a buffer
which is not NULL-terminated (see smc_clc_init()).

Reported-by: syzbot+f4708c391121cfc58396@syzkaller.appspotmail.com
Fixes: 099b990bd1 ("net/smc: Add support for obtaining system information")
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-12 20:22:01 -08:00
Guvenc Gulce a3db10efcc net/smc: Add support for obtaining SMCR device list
Deliver SMCR device information via netlink based
diagnostic interface.

Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-01 17:56:13 -08:00
Guvenc Gulce 8f9dde4bf2 net/smc: Add SMC-D Linkgroup diagnostic support
Deliver SMCD Linkgroup information via netlink based
diagnostic interface.

Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-01 17:56:13 -08:00