Centos-kernel-stream-9

Commit Graph

Author	SHA1	Message	Date
Tobias Huschle	5e9ec62245	net/smc: fix deadlock triggered by cancel_delayed_work_syn() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2160099 Upstream status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Tested: by IBM Build-Info: ihttps://brewweb.engineering.redhat.com/brew/taskinfo?taskID=52893145 Conflicts: None commit 13085e1b5cab8ad802904d72e6a6dae85ae0cd20 Author: Wenjia Zhang <wenjia@linux.ibm.com> Date: Mon Mar 13 11:08:28 2023 +0100 net/smc: fix deadlock triggered by cancel_delayed_work_syn() The following LOCKDEP was detected: Workqueue: events smc_lgr_free_work [smc] WARNING: possible circular locking dependency detected 6.1.0-20221027.rc2.git8.56bc5b569087.300.fc36.s390x+debug #1 Not tainted ------------------------------------------------------ kworker/3:0/176251 is trying to acquire lock: 00000000f1467148 ((wq_completion)smc_tx_wq-00000000#2){+.+.}-{0:0}, at: __flush_workqueue+0x7a/0x4f0 but task is already holding lock: 0000037fffe97dc8 ((work_completion)(&(&lgr->free_work)->work)){+.+.}-{0:0}, at: process_one_work+0x232/0x730 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #4 ((work_completion)(&(&lgr->free_work)->work)){+.+.}-{0:0}: __lock_acquire+0x58e/0xbd8 lock_acquire.part.0+0xe2/0x248 lock_acquire+0xac/0x1c8 __flush_work+0x76/0xf0 __cancel_work_timer+0x170/0x220 __smc_lgr_terminate.part.0+0x34/0x1c0 [smc] smc_connect_rdma+0x15e/0x418 [smc] __smc_connect+0x234/0x480 [smc] smc_connect+0x1d6/0x230 [smc] __sys_connect+0x90/0xc0 __do_sys_socketcall+0x186/0x370 __do_syscall+0x1da/0x208 system_call+0x82/0xb0 -> #3 (smc_client_lgr_pending){+.+.}-{3:3}: __lock_acquire+0x58e/0xbd8 lock_acquire.part.0+0xe2/0x248 lock_acquire+0xac/0x1c8 __mutex_lock+0x96/0x8e8 mutex_lock_nested+0x32/0x40 smc_connect_rdma+0xa4/0x418 [smc] __smc_connect+0x234/0x480 [smc] smc_connect+0x1d6/0x230 [smc] __sys_connect+0x90/0xc0 __do_sys_socketcall+0x186/0x370 __do_syscall+0x1da/0x208 system_call+0x82/0xb0 -> #2 (sk_lock-AF_SMC){+.+.}-{0:0}: __lock_acquire+0x58e/0xbd8 lock_acquire.part.0+0xe2/0x248 lock_acquire+0xac/0x1c8 lock_sock_nested+0x46/0xa8 smc_tx_work+0x34/0x50 [smc] process_one_work+0x30c/0x730 worker_thread+0x62/0x420 kthread+0x138/0x150 __ret_from_fork+0x3c/0x58 ret_from_fork+0xa/0x40 -> #1 ((work_completion)(&(&smc->conn.tx_work)->work)){+.+.}-{0:0}: __lock_acquire+0x58e/0xbd8 lock_acquire.part.0+0xe2/0x248 lock_acquire+0xac/0x1c8 process_one_work+0x2bc/0x730 worker_thread+0x62/0x420 kthread+0x138/0x150 __ret_from_fork+0x3c/0x58 ret_from_fork+0xa/0x40 -> #0 ((wq_completion)smc_tx_wq-00000000#2){+.+.}-{0:0}: check_prev_add+0xd8/0xe88 validate_chain+0x70c/0xb20 __lock_acquire+0x58e/0xbd8 lock_acquire.part.0+0xe2/0x248 lock_acquire+0xac/0x1c8 __flush_workqueue+0xaa/0x4f0 drain_workqueue+0xaa/0x158 destroy_workqueue+0x44/0x2d8 smc_lgr_free+0x9e/0xf8 [smc] process_one_work+0x30c/0x730 worker_thread+0x62/0x420 kthread+0x138/0x150 __ret_from_fork+0x3c/0x58 ret_from_fork+0xa/0x40 other info that might help us debug this: Chain exists of: (wq_completion)smc_tx_wq-00000000#2 --> smc_client_lgr_pending --> (work_completion)(&(&lgr->free_work)->work) Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((work_completion)(&(&lgr->free_work)->work)); lock(smc_client_lgr_pending); lock((work_completion) (&(&lgr->free_work)->work)); lock((wq_completion)smc_tx_wq-00000000#2); * DEADLOCK * 2 locks held by kworker/3:0/176251: #0: 0000000080183548 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x232/0x730 #1: 0000037fffe97dc8 ((work_completion) (&(&lgr->free_work)->work)){+.+.}-{0:0}, at: process_one_work+0x232/0x730 stack backtrace: CPU: 3 PID: 176251 Comm: kworker/3:0 Not tainted Hardware name: IBM 8561 T01 701 (z/VM 7.2.0) Call Trace: [<000000002983c3e4>] dump_stack_lvl+0xac/0x100 [<0000000028b477ae>] check_noncircular+0x13e/0x160 [<0000000028b48808>] check_prev_add+0xd8/0xe88 [<0000000028b49cc4>] validate_chain+0x70c/0xb20 [<0000000028b4bd26>] __lock_acquire+0x58e/0xbd8 [<0000000028b4cf6a>] lock_acquire.part.0+0xe2/0x248 [<0000000028b4d17c>] lock_acquire+0xac/0x1c8 [<0000000028addaaa>] __flush_workqueue+0xaa/0x4f0 [<0000000028addf9a>] drain_workqueue+0xaa/0x158 [<0000000028ae303c>] destroy_workqueue+0x44/0x2d8 [<000003ff8029af26>] smc_lgr_free+0x9e/0xf8 [smc] [<0000000028adf3d4>] process_one_work+0x30c/0x730 [<0000000028adf85a>] worker_thread+0x62/0x420 [<0000000028aeac50>] kthread+0x138/0x150 [<0000000028a63914>] __ret_from_fork+0x3c/0x58 [<00000000298503da>] ret_from_fork+0xa/0x40 INFO: lockdep is turned off. =================================================================== This deadlock occurs because cancel_delayed_work_sync() waits for the work(&lgr->free_work) to finish, while the &lgr->free_work waits for the work(lgr->tx_wq), which needs the sk_lock-AF_SMC, that is already used under the mutex_lock. The solution is to use cancel_delayed_work() instead, which kills off a pending work. Fixes: `a52bcc919b` ("net/smc: improve termination processing") Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-by: Jan Karcher <jaka@linux.ibm.com> Reviewed-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2023-05-26 09:39:21 +00:00
Tobias Huschle	56673cb1f5	net/smc: fix application data exception Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2160099 Upstream status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Tested: by IBM Build-Info: ihttps://brewweb.engineering.redhat.com/brew/taskinfo?taskID=52893145 Conflicts: None commit 475f9ff63ee8c296aa46c6e9e9ad9bdd301c6bdf Author: D. Wythe <alibuda@linux.alibaba.com> Date: Thu Feb 16 14:39:05 2023 +0800 net/smc: fix application data exception There is a certain probability that following exceptions will occur in the wrk benchmark test: Running 10s test @ http://11.213.45.6:80 8 threads and 64 connections Thread Stats Avg Stdev Max +/- Stdev Latency 3.72ms 13.94ms 245.33ms 94.17% Req/Sec 1.96k 713.67 5.41k 75.16% 155262 requests in 10.10s, 23.10MB read Non-2xx or 3xx responses: 3 We will find that the error is HTTP 400 error, which is a serious exception in our test, which means the application data was corrupted. Consider the following scenarios: CPU0 CPU1 buf_desc->used = 0; cmpxchg(buf_desc->used, 0, 1) deal_with(buf_desc) memset(buf_desc->cpu_addr,0); This will cause the data received by a victim connection to be cleared, thus triggering an HTTP 400 error in the server. This patch exchange the order between clear used and memset, add barrier to ensure memory consistency. Fixes: 1c5526968e27 ("net/smc: Clear memory when release and reuse buffer") Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2023-05-26 09:39:20 +00:00
Tobias Huschle	34f0cea532	net/smc: replace mutex rmbs_lock and sndbufs_lock with rw_semaphore Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2160099 Upstream status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Tested: by IBM Build-Info: ihttps://brewweb.engineering.redhat.com/brew/taskinfo?taskID=52893145 Conflicts: None commit aff7bfed9097435ea38de919befbe2d7771a3e87 Author: D. Wythe <alibuda@linux.alibaba.com> Date: Thu Feb 2 16:26:42 2023 +0800 net/smc: replace mutex rmbs_lock and sndbufs_lock with rw_semaphore It's clear that rmbs_lock and sndbufs_lock are aims to protect the rmbs list or the sndbufs list. During connection establieshment, smc_buf_get_slot() will always be invoked, and it only performs read semantics in rmbs list and sndbufs list. Based on the above considerations, we replace mutex with rw_semaphore. Only smc_buf_get_slot() use down_read() to allow smc_buf_get_slot() run concurrently, other part use down_write() to keep exclusive semantics. Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2023-05-26 09:39:19 +00:00
Tobias Huschle	f20361505b	net/smc: use read semaphores to reduce unnecessary blocking in smc_buf_create() & smcr_buf_unuse() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2160099 Upstream status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Tested: by IBM Build-Info: ihttps://brewweb.engineering.redhat.com/brew/taskinfo?taskID=52893145 Conflicts: None commit f6421014e88983c5bb7a25c71c01ae6278a01df9 Author: D. Wythe <alibuda@linux.alibaba.com> Date: Thu Feb 2 16:26:40 2023 +0800 net/smc: use read semaphores to reduce unnecessary blocking in smc_buf_create() & smcr_buf_unuse() Following is part of Off-CPU graph during frequent SMC-R short-lived processing: process_one_work (51.19%) smc_close_passive_work (28.36%) smcr_buf_unuse (28.34%) rwsem_down_write_slowpath (28.22%) smc_listen_work (22.83%) smc_clc_wait_msg (1.84%) smc_buf_create (20.45%) smcr_buf_map_usable_links rwsem_down_write_slowpath (20.43%) smcr_lgr_reg_rmbs (0.53%) rwsem_down_write_slowpath (0.43%) smc_llc_do_confirm_rkey (0.08%) We can clearly see that during the connection establishment time, waiting time of connections is not on IO, but on llc_conf_mutex. What is more important, the core critical area (smcr_buf_unuse() & smc_buf_create()) only perfroms read semantics on links, we can easily replace it with read semaphore. Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2023-05-26 09:39:18 +00:00
Tobias Huschle	d6259c44a9	net/smc: llc_conf_mutex refactor, replace it with rw_semaphore Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2160099 Upstream status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Tested: by IBM Build-Info: ihttps://brewweb.engineering.redhat.com/brew/taskinfo?taskID=52893145 Conflicts: None commit b5dd4d6981717f7e2682c0419fe832328c7441cf Author: D. Wythe <alibuda@linux.alibaba.com> Date: Thu Feb 2 16:26:39 2023 +0800 net/smc: llc_conf_mutex refactor, replace it with rw_semaphore llc_conf_mutex was used to protect links and link related configurations in the same link group, for example, add or delete links. However, in most cases, the protected critical area has only read semantics and with no write semantics at all, such as obtaining a usable link or an available rmb_desc. This patch do simply code refactoring, replace mutex with rw_semaphore, replace mutex_lock with down_write and replace mutex_unlock with up_write. Theoretically, this replacement is equivalent, but after this patch, we can distinguish lock granularity according to different semantics of critical areas. Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2023-05-26 09:39:18 +00:00
Tobias Huschle	f0afc0ee2a	net/smc: De-tangle ism and smc device initialization Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2160099 Upstream status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Tested: by IBM Build-Info: ihttps://brewweb.engineering.redhat.com/brew/taskinfo?taskID=52893145 Conflicts: None commit 8c81ba20349daf9f7e58bb05a0c12f4b71813a30 Author: Stefan Raspl <raspl@linux.ibm.com> Date: Mon Jan 23 19:17:52 2023 +0100 net/smc: De-tangle ism and smc device initialization The struct device for ISM devices was part of struct smcd_dev. Move to struct ism_dev, provide a new API call in struct smcd_ops, and convert existing SMCD code accordingly. Furthermore, remove struct smcd_dev from struct ism_dev. This is the final part of a bigger overhaul of the interfaces between SMC and ISM. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2023-05-26 09:39:18 +00:00
Tobias Huschle	a45c725bf0	net/smc: Separate SMC-D and ISM APIs Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2160099 Upstream status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Tested: by IBM Build-Info: ihttps://brewweb.engineering.redhat.com/brew/taskinfo?taskID=52893145 Conflicts: None commit 9de4df7b6be1cfca500f8ba21137d53eec45418a Author: Stefan Raspl <raspl@linux.ibm.com> Date: Mon Jan 23 19:17:50 2023 +0100 net/smc: Separate SMC-D and ISM APIs We separate the code implementing the struct smcd_ops API in the ISM device driver from the functions that may be used by other exploiters of ISM devices. Note: We start out small, and don't offer the whole breadth of the ISM device for public use, as many functions are specific to or likely only ever used in the context of SMC-D. This is the third part of a bigger overhaul of the interfaces between SMC and ISM. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2023-05-26 09:39:17 +00:00
Tobias Huschle	c2eb1d5eaa	net/smc: Register SMC-D as ISM client Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2160099 Upstream status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Tested: by IBM Build-Info: ihttps://brewweb.engineering.redhat.com/brew/taskinfo?taskID=52893145 Conflicts: None commit 8747716f3942a610efdd12e3655df47269c268ac Author: Stefan Raspl <raspl@linux.ibm.com> Date: Mon Jan 23 19:17:49 2023 +0100 net/smc: Register SMC-D as ISM client Register the smc module with the new ism device driver API. This is the second part of a bigger overhaul of the interfaces between SMC and ISM. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2023-05-26 09:39:16 +00:00
Tobias Huschle	31759541c7	net/smc: Fix an error code in smc_lgr_create() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2160099 Upstream status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Tested: by IBM Build-Info: ihttps://brewweb.engineering.redhat.com/brew/taskinfo?taskID=52893145 Conflicts: None commit bdee15e8c58b450ad736a2b62ef8c7a12548b704 Author: Dan Carpenter <error27@gmail.com> Date: Fri Oct 14 12:34:36 2022 +0300 net/smc: Fix an error code in smc_lgr_create() If smc_wr_alloc_lgr_mem() fails then return an error code. Don't return success. Fixes: 8799e310fb3f ("net/smc: add v2 support to the work request layer") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2023-05-26 09:39:14 +00:00
Tobias Huschle	e00604b9a9	net/smc: Stop the CLC flow if no link to map buffers on Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2160099 Upstream status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Tested: by IBM Build-Info: ihttps://brewweb.engineering.redhat.com/brew/taskinfo?taskID=52893145 Conflicts: None commit e738455b2c6dcdab03e45d97de36476f93f557d2 Author: Wen Gu <guwen@linux.alibaba.com> Date: Tue Sep 20 14:43:09 2022 +0800 net/smc: Stop the CLC flow if no link to map buffers on There might be a potential race between SMC-R buffer map and link group termination. smc_smcr_terminate_all() \| smc_connect_rdma() -------------------------------------------------------------- \| smc_conn_create() for links in smcibdev \| schedule links down \| \| smc_buf_create() \| \- smcr_buf_map_usable_links() \| \- no usable links found, \| (rmb->mr = NULL) \| \| smc_clc_send_confirm() \| \- access conn->rmb_desc->mr[]->rkey \| (panic) During reboot and IB device module remove, all links will be set down and no usable links remain in link groups. In such situation smcr_buf_map_usable_links() should return an error and stop the CLC flow accessing to uninitialized mr. Fixes: `b9247544c1` ("net/smc: convert static link ID instances to support multiple links") Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Link: https://lore.kernel.org/r/1663656189-32090-1-git-send-email-guwen@linux.alibaba.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2023-05-26 09:39:13 +00:00
Tobias Huschle	91bd156268	net/smc: Fix possible access to freed memory in link clear Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2160099 Upstream status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Tested: by IBM Build-Info: ihttps://brewweb.engineering.redhat.com/brew/taskinfo?taskID=52893145 Conflicts: None commit e9b1a4f867ae9c1dbd1d71cd09cbdb3239fb4968 Author: Yacan Liu <liuyacan@corp.netease.com> Date: Tue Sep 6 21:01:39 2022 +0800 net/smc: Fix possible access to freed memory in link clear After modifying the QP to the Error state, all RX WR would be completed with WC in IB_WC_WR_FLUSH_ERR status. Current implementation does not wait for it is done, but destroy the QP and free the link group directly. So there is a risk that accessing the freed memory in tasklet context. Here is a crash example: BUG: unable to handle page fault for address: ffffffff8f220860 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD f7300e067 P4D f7300e067 PUD f7300f063 PMD 8c4e45063 PTE 800ffff08c9df060 Oops: 0002 [#1] SMP PTI CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Tainted: G S OE 5.10.0-0607+ #23 Hardware name: Inspur NF5280M4/YZMB-00689-101, BIOS 4.1.20 07/09/2018 RIP: 0010:native_queued_spin_lock_slowpath+0x176/0x1b0 Code: f3 90 48 8b 32 48 85 f6 74 f6 eb d5 c1 ee 12 83 e0 03 83 ee 01 48 c1 e0 05 48 63 f6 48 05 00 c8 02 00 48 03 04 f5 00 09 98 8e <48> 89 10 8b 42 08 85 c0 75 09 f3 90 8b 42 08 85 c0 74 f7 48 8b 32 RSP: 0018:ffffb3b6c001ebd8 EFLAGS: 00010086 RAX: ffffffff8f220860 RBX: 0000000000000246 RCX: 0000000000080000 RDX: ffff91db1f86c800 RSI: 000000000000173c RDI: ffff91db62bace00 RBP: ffff91db62bacc00 R08: 0000000000000000 R09: c00000010000028b R10: 0000000000055198 R11: ffffb3b6c001ea58 R12: ffff91db80e05010 R13: 000000000000000a R14: 0000000000000006 R15: 0000000000000040 FS: 0000000000000000(0000) GS:ffff91db1f840000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffff8f220860 CR3: 00000001f9580004 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> _raw_spin_lock_irqsave+0x30/0x40 mlx5_ib_poll_cq+0x4c/0xc50 [mlx5_ib] smc_wr_rx_tasklet_fn+0x56/0xa0 [smc] tasklet_action_common.isra.21+0x66/0x100 __do_softirq+0xd5/0x29c asm_call_irq_on_stack+0x12/0x20 </IRQ> do_softirq_own_stack+0x37/0x40 irq_exit_rcu+0x9d/0xa0 sysvec_call_function_single+0x34/0x80 asm_sysvec_call_function_single+0x12/0x20 Fixes: `bd4ad57718` ("smc: initialize IB transport incl. PD, MR, QP, CQ, event, WR") Signed-off-by: Yacan Liu <liuyacan@corp.netease.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2023-05-26 09:39:12 +00:00
Tobias Huschle	2797114541	net/smc: Extend SMC-R link group netlink attribute Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2160099 Upstream status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Tested: by IBM Build-Info: ihttps://brewweb.engineering.redhat.com/brew/taskinfo?taskID=52893145 Conflicts: None commit ddefb2d205539418f3c3851a3e06fac9624f257d Author: Wen Gu <guwen@linux.alibaba.com> Date: Thu Jul 14 17:44:05 2022 +0800 net/smc: Extend SMC-R link group netlink attribute Extend SMC-R link group netlink attribute SMC_GEN_LGR_SMCR. Introduce SMC_NLA_LGR_R_BUF_TYPE to show the buffer type of SMC-R link group. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2023-05-26 09:39:10 +00:00
Tobias Huschle	bddfce67c1	net/smc: Allow virtually contiguous sndbufs or RMBs for SMC-R Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2160099 Upstream status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Tested: by IBM Build-Info: ihttps://brewweb.engineering.redhat.com/brew/taskinfo?taskID=52893145 Conflicts: None commit b8d199451c99b3796b840c350eb74b830c5c869b Author: Wen Gu <guwen@linux.alibaba.com> Date: Thu Jul 14 17:44:04 2022 +0800 net/smc: Allow virtually contiguous sndbufs or RMBs for SMC-R On long-running enterprise production servers, high-order contiguous memory pages are usually very rare and in most cases we can only get fragmented pages. When replacing TCP with SMC-R in such production scenarios, attempting to allocate high-order physically contiguous sndbufs and RMBs may result in frequent memory compaction, which will cause unexpected hung issue and further stability risks. So this patch is aimed to allow SMC-R link group to use virtually contiguous sndbufs and RMBs to avoid potential issues mentioned above. Whether to use physically or virtually contiguous buffers can be set by sysctl smcr_buf_type. Note that using virtually contiguous buffers will bring an acceptable performance regression, which can be mainly divided into two parts: 1) regression in data path, which is brought by additional address translation of sndbuf by RNIC in Tx. But in general, translating address through MTT is fast. Taking 256KB sndbuf and RMB as an example, the comparisons in qperf latency and bandwidth test with physically and virtually contiguous buffers are as follows: - client: smc_run taskset -c <cpu> qperf <server> -oo msg_size:1:64K:*2\ -t 5 -vu tcp_{bw\|lat} - server: smc_run taskset -c <cpu> qperf [latency] msgsize tcp smcr smcr-use-virt-buf 1 11.17 us 7.56 us 7.51 us (-0.67%) 2 10.65 us 7.74 us 7.56 us (-2.31%) 4 11.11 us 7.52 us 7.59 us ( 0.84%) 8 10.83 us 7.55 us 7.51 us (-0.48%) 16 11.21 us 7.46 us 7.51 us ( 0.71%) 32 10.65 us 7.53 us 7.58 us ( 0.61%) 64 10.95 us 7.74 us 7.80 us ( 0.76%) 128 11.14 us 7.83 us 7.87 us ( 0.47%) 256 10.97 us 7.94 us 7.92 us (-0.28%) 512 11.23 us 7.94 us 8.20 us ( 3.25%) 1024 11.60 us 8.12 us 8.20 us ( 0.96%) 2048 14.04 us 8.30 us 8.51 us ( 2.49%) 4096 16.88 us 9.13 us 9.07 us (-0.64%) 8192 22.50 us 10.56 us 11.22 us ( 6.26%) 16384 28.99 us 12.88 us 13.83 us ( 7.37%) 32768 40.13 us 16.76 us 16.95 us ( 1.16%) 65536 68.70 us 24.68 us 24.85 us ( 0.68%) [bandwidth] msgsize tcp smcr smcr-use-virt-buf 1 1.65 MB/s 1.59 MB/s 1.53 MB/s (-3.88%) 2 3.32 MB/s 3.17 MB/s 3.08 MB/s (-2.67%) 4 6.66 MB/s 6.33 MB/s 6.09 MB/s (-3.85%) 8 13.67 MB/s 13.45 MB/s 11.97 MB/s (-10.99%) 16 25.36 MB/s 27.15 MB/s 24.16 MB/s (-11.01%) 32 48.22 MB/s 54.24 MB/s 49.41 MB/s (-8.89%) 64 106.79 MB/s 107.32 MB/s 99.05 MB/s (-7.71%) 128 210.21 MB/s 202.46 MB/s 201.02 MB/s (-0.71%) 256 400.81 MB/s 416.81 MB/s 393.52 MB/s (-5.59%) 512 746.49 MB/s 834.12 MB/s 809.99 MB/s (-2.89%) 1024 1292.33 MB/s 1641.96 MB/s 1571.82 MB/s (-4.27%) 2048 2007.64 MB/s 2760.44 MB/s 2717.68 MB/s (-1.55%) 4096 2665.17 MB/s 4157.44 MB/s 4070.76 MB/s (-2.09%) 8192 3159.72 MB/s 4361.57 MB/s 4270.65 MB/s (-2.08%) 16384 4186.70 MB/s 4574.13 MB/s 4501.17 MB/s (-1.60%) 32768 4093.21 MB/s 4487.42 MB/s 4322.43 MB/s (-3.68%) 65536 4057.14 MB/s 4735.61 MB/s 4555.17 MB/s (-3.81%) 2) regression in buffer initialization and destruction path, which is brought by additional MR operations of sndbufs. But thanks to link group buffer reuse mechanism, the impact of this kind of regression decreases as times of buffer reuse increases. Taking 256KB sndbuf and RMB as an example, latency of some key SMC-R buffer-related function obtained by bpftrace are as follows: Function Phys-bufs Virt-bufs smcr_new_buf_create() 67154 ns 79164 ns smc_ib_buf_map_sg() 525 ns 928 ns smc_ib_get_memory_region() 162294 ns 161191 ns smc_wr_reg_send() 9957 ns 9635 ns smc_ib_put_memory_region() 203548 ns 198374 ns smc_ib_buf_unmap_sg() 508 ns 1158 ns ------------ Test environment notes: 1. Above tests run on 2 VMs within the same Host. 2. The NIC is ConnectX-4Lx, using SRIOV and passing through 2 VFs to the each VM respectively. 3. VMs' vCPUs are binded to different physical CPUs, and the binded physical CPUs are isolated by `isolcpus=xxx` cmdline. 4. NICs' queue number are set to 1. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2023-05-26 09:39:09 +00:00
Tobias Huschle	78e54713f6	net/smc: Use sysctl-specified types of buffers in new link group Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2160099 Upstream status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Tested: by IBM Build-Info: ihttps://brewweb.engineering.redhat.com/brew/taskinfo?taskID=52893145 Conflicts: None commit b984f370ed5182d180f92dbf14bdf847ff6ccc04 Author: Wen Gu <guwen@linux.alibaba.com> Date: Thu Jul 14 17:44:03 2022 +0800 net/smc: Use sysctl-specified types of buffers in new link group This patch introduces a new SMC-R specific element buf_type in struct smc_link_group, for recording the value of sysctl smcr_buf_type when link group is created. New created link group will create and reuse buffers of the type specified by buf_type. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2023-05-26 09:39:09 +00:00
Tobias Huschle	fdcc3bb7fc	net/smc: optimize for smc_sndbuf_sync_sg_for_device and smc_rmb_sync_sg_for_cpu Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2160099 Upstream status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Tested: by IBM Build-Info: ihttps://brewweb.engineering.redhat.com/brew/taskinfo?taskID=52893145 Conflicts: None commit 0ef69e788411cba2af017db731a9fc62d255e9ac Author: Guangguan Wang <guangguan.wang@linux.alibaba.com> Date: Thu Jul 14 17:44:01 2022 +0800 net/smc: optimize for smc_sndbuf_sync_sg_for_device and smc_rmb_sync_sg_for_cpu Some CPU, such as Xeon, can guarantee DMA cache coherency. So it is no need to use dma sync APIs to flush cache on such CPUs. In order to avoid calling dma sync APIs on the IO path, use the dma_need_sync to check whether smc_buf_desc needs dma sync when creating smc_buf_desc. Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2023-05-26 09:39:08 +00:00
Tobias Huschle	3e71ec8595	net/smc: remove redundant dma sync ops Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2160099 Upstream status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Tested: by IBM Build-Info: ihttps://brewweb.engineering.redhat.com/brew/taskinfo?taskID=52893145 Conflicts: None commit 6d52e2de6415b7a035b3e8dc4ccffd0da25bbfb9 Author: Guangguan Wang <guangguan.wang@linux.alibaba.com> Date: Thu Jul 14 17:44:00 2022 +0800 net/smc: remove redundant dma sync ops smc_ib_sync_sg_for_cpu/device are the ops used for dma memory cache consistency. Smc sndbufs are dma buffers, where CPU writes data to it and PCIE device reads data from it. So for sndbufs, smc_ib_sync_sg_for_device is needed and smc_ib_sync_sg_for_cpu is redundant as PCIE device will not write the buffers. Smc rmbs are dma buffers, where PCIE device write data to it and CPU read data from it. So for rmbs, smc_ib_sync_sg_for_cpu is needed and smc_ib_sync_sg_for_device is redundant as CPU will not write the buffers. Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2023-05-26 09:39:08 +00:00
Tobias Huschle	4f48bb24b1	[s390] net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error cause by server Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294 Upstream Status: https://github.com/torvalds/linux.git Tested: by IBM Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016 Conflicts: None commit 4940a1fdf31c39f0806ac831cde333134862030b Author: D. Wythe <alibuda@linux.alibaba.com> Date: Wed Mar 2 21:25:12 2022 +0800 net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error cause by server The problem of SMC_CLC_DECL_ERR_REGRMB on the server is very clear. Based on the fact that whether a new SMC connection can be accepted or not depends on not only the limit of conn nums, but also the available entries of rtoken. Since the rtoken release is trigger by peer, while the conn nums is decrease by local, tons of thing can happen in this time difference. This only thing that needs to be mentioned is that now all connection creations are completely protected by smc_server_lgr_pending lock, it's enough to check only the available entries in rtokens_used_mask. Fixes: `cd6851f303` ("smc: remote memory buffers (RMBs)") Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2022-06-15 06:47:46 +02:00
Tobias Huschle	222f9445ef	[s390] net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error generated by client Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294 Upstream Status: https://github.com/torvalds/linux.git Tested: by IBM Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016 Conflicts: None commit 0537f0a2151375dcf90c1bbfda6a0aaf57164e89 Author: D. Wythe <alibuda@linux.alibaba.com> Date: Wed Mar 2 21:25:11 2022 +0800 net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error generated by client The main reason for this unexpected SMC_CLC_DECL_ERR_REGRMB in client dues to following execution sequence: Server Conn A: Server Conn B: Client Conn B: smc_lgr_unregister_conn smc_lgr_register_conn smc_clc_send_accept -> smc_rtoken_add smcr_buf_unuse -> Client Conn A: smc_rtoken_delete smc_lgr_unregister_conn() makes current link available to assigned to new incoming connection, while smcr_buf_unuse() has not executed yet, which means that smc_rtoken_add may fail because of insufficient rtoken_entry, reversing their execution order will avoid this problem. Fixes: `3e034725c0` ("net/smc: common functions for RMBs and send buffers") Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2022-06-15 06:47:46 +02:00
Tobias Huschle	c53e70df91	[s390] net/smc: correct settings of RMB window update limit Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294 Upstream Status: https://github.com/torvalds/linux.git Tested: by IBM Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016 Conflicts: None commit 6bf536eb5c8ca011d1ff57b5c5f7c57ceac06a37 Author: Dust Li <dust.li@linux.alibaba.com> Date: Tue Mar 1 17:44:00 2022 +0800 net/smc: correct settings of RMB window update limit rmbe_update_limit is used to limit announcing receive window updating too frequently. RFC7609 request a minimal increase in the window size of 10% of the receive buffer space. But current implementation used: min_t(int, rmbe_size / 10, SOCK_MIN_SNDBUF / 2) and SOCK_MIN_SNDBUF / 2 == 2304 Bytes, which is almost always less then 10% of the receive buffer space. This causes the receiver always sending CDC message to update its consumer cursor when it consumes more then 2K of data. And as a result, we may encounter something like "TCP silly window syndrome" when sending 2.5~8K message. This patch fixes this using max(rmbe_size / 10, SOCK_MIN_SNDBUF / 2). With this patch and SMC autocorking enabled, qperf 2K/4K/8K tcp_bw test shows 45%/75%/40% increase in throughput respectively. Signed-off-by: Dust Li <dust.li@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2022-06-15 06:47:44 +02:00
Tobias Huschle	ea839fb15e	[s390] net/smc: Fix hung_task when removing SMC-R devices Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294 Upstream Status: https://github.com/torvalds/linux.git Tested: by IBM Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016 Conflicts: None commit 56d99e81ecbc997a5f984684d0eeb583992b2072 Author: Wen Gu <guwen@linux.alibaba.com> Date: Sun Jan 16 15:43:42 2022 +0800 net/smc: Fix hung_task when removing SMC-R devices A hung_task is observed when removing SMC-R devices. Suppose that a link group has two active links(lnk_A, lnk_B) associated with two different SMC-R devices(dev_A, dev_B). When dev_A is removed, the link group will be removed from smc_lgr_list and added into lgr_linkdown_list. lnk_A will be cleared and smcibdev(A)->lnk_cnt will reach to zero. However, when dev_B is removed then, the link group can't be found in smc_lgr_list and lnk_B won't be cleared, making smcibdev->lnk_cnt never reaches zero, which causes a hung_task. This patch fixes this issue by restoring the implementation of smc_smcr_terminate_all() to what it was before commit 349d43127dac ("net/smc: fix kernel panic caused by race of smc_sock"). The original implementation also satisfies the intention that make sure QP destroy earlier than CQ destroy because we will always wait for smcibdev->lnk_cnt reaches zero, which guarantees QP has been destroyed. Fixes: 349d43127dac ("net/smc: fix kernel panic caused by race of smc_sock") Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2022-06-15 06:47:36 +02:00
Tobias Huschle	e4a60a1b3c	[s390] net/smc: Resolve the race between SMC-R link access and clear Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294 Upstream Status: https://github.com/torvalds/linux.git Tested: by IBM Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016 Conflicts: None commit 20c9398d3309d170300d67643b851fd26783af24 Author: Wen Gu <guwen@linux.alibaba.com> Date: Thu Jan 13 16:36:42 2022 +0800 net/smc: Resolve the race between SMC-R link access and clear We encountered some crashes caused by the race between SMC-R link access and link clear that triggered by abnormal link group termination, such as port error. Here is an example of this kind of crashes: BUG: kernel NULL pointer dereference, address: 0000000000000000 Workqueue: smc_hs_wq smc_listen_work [smc] RIP: 0010:smc_llc_flow_initiate+0x44/0x190 [smc] Call Trace: <TASK> ? __smc_buf_create+0x75a/0x950 [smc] smcr_lgr_reg_rmbs+0x2a/0xbf [smc] smc_listen_work+0xf72/0x1230 [smc] ? process_one_work+0x25c/0x600 process_one_work+0x25c/0x600 worker_thread+0x4f/0x3a0 ? process_one_work+0x600/0x600 kthread+0x15d/0x1a0 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x1f/0x30 </TASK> smc_listen_work() __smc_lgr_terminate() --------------------------------------------------------------- \| smc_lgr_free() \| \|- smcr_link_clear() \| \|- memset(lnk, 0) smc_listen_rdma_reg() \| \|- smcr_lgr_reg_rmbs() \| \|- smc_llc_flow_initiate() \| \|- access lnk->lgr (panic) \| These crashes are similarly caused by clearing SMC-R link resources when some functions is still accessing to them. This patch tries to fix the issue by introducing reference count of SMC-R links and ensuring that the sensitive resources of links won't be cleared until reference count reaches zero. The operation to the SMC-R link reference count can be concluded as follows: object [hold or initialized as 1] [put] -------------------------------------------------------------------- links smcr_link_init() smcr_link_clear() connections smc_conn_create() smc_conn_free() Through this way, the clear of SMC-R links is later than the free of all the smc connections above it, thus avoiding the unsafe reference to SMC-R links. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2022-06-15 06:47:35 +02:00
Tobias Huschle	8b399872fb	[s390] net/smc: Introduce a new conn->lgr validity check helper Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294 Upstream Status: https://github.com/torvalds/linux.git Tested: by IBM Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016 Conflicts: None commit ea89c6c0983c39702a4a52ccaa4702e0cb71179b Author: Wen Gu <guwen@linux.alibaba.com> Date: Thu Jan 13 16:36:41 2022 +0800 net/smc: Introduce a new conn->lgr validity check helper It is no longer suitable to identify whether a smc connection is registered in a link group through checking if conn->lgr is NULL, because conn->lgr won't be reset even the connection is unregistered from a link group. So this patch introduces a new helper smc_conn_lgr_valid() and replaces all the check of conn->lgr in original implementation with the new helper to judge if conn->lgr is valid to use. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2022-06-15 06:47:35 +02:00
Tobias Huschle	b141c34e82	[s390] net/smc: Resolve the race between link group access and termination Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294 Upstream Status: https://github.com/torvalds/linux.git Tested: by IBM Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016 Conflicts: None commit 61f434b0280ed65495831f1b6e1a5c21a90f47c6 Author: Wen Gu <guwen@linux.alibaba.com> Date: Thu Jan 13 16:36:40 2022 +0800 net/smc: Resolve the race between link group access and termination We encountered some crashes caused by the race between the access and the termination of link groups. Here are some of panic stacks we met: 1) Race between smc_clc_wait_msg() and __smc_lgr_terminate() BUG: kernel NULL pointer dereference, address: 00000000000002f0 Workqueue: smc_hs_wq smc_listen_work [smc] RIP: 0010:smc_clc_wait_msg+0x3eb/0x5c0 [smc] Call Trace: <TASK> ? smc_clc_send_accept+0x45/0xa0 [smc] ? smc_clc_send_accept+0x45/0xa0 [smc] smc_listen_work+0x783/0x1220 [smc] ? finish_task_switch+0xc4/0x2e0 ? process_one_work+0x1ad/0x3c0 process_one_work+0x1ad/0x3c0 worker_thread+0x4c/0x390 ? rescuer_thread+0x320/0x320 kthread+0x149/0x190 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x1f/0x30 </TASK> smc_listen_work() abnormal case like port error --------------------------------------------------------------- \| __smc_lgr_terminate() \| \|- smc_conn_kill() \| \|- smc_lgr_unregister_conn() \| \|- set conn->lgr = NULL smc_clc_wait_msg() \| \|- access conn->lgr (panic) \| 2) Race between smc_setsockopt() and __smc_lgr_terminate() BUG: kernel NULL pointer dereference, address: 00000000000002e8 RIP: 0010:smc_setsockopt+0x17a/0x280 [smc] Call Trace: <TASK> __sys_setsockopt+0xfc/0x190 __x64_sys_setsockopt+0x20/0x30 do_syscall_64+0x34/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> smc_setsockopt() abnormal case like port error -------------------------------------------------------------- \| __smc_lgr_terminate() \| \|- smc_conn_kill() \| \|- smc_lgr_unregister_conn() \| \|- set conn->lgr = NULL mod_delayed_work() \| \|- access conn->lgr (panic) \| There are some other panic places and they are caused by the similar reason as described above, which is accessing link group after termination, thus getting a NULL pointer or invalid resource. Currently, there seems to be no synchronization between the link group access and a sudden termination of it. This patch tries to fix this by introducing reference count of link group and not freeing link group until reference count is zero. Link group might be referred to by links or smc connections. So the operation to the link group reference count can be concluded as follows: object [hold or initialized as 1] [put] ------------------------------------------------------------------- link group smc_lgr_create() smc_lgr_free() connections smc_conn_create() smc_conn_free() links smcr_link_init() smcr_link_clear() Througth this way, we extend the life cycle of link group and ensure it is longer than the life cycle of connections and links above it, so that avoid invalid access to link group after its termination. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2022-06-15 06:47:34 +02:00
Tobias Huschle	393f03c09b	[s390] net/smc: Reset conn->lgr when link group registration fails Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294 Upstream Status: https://github.com/torvalds/linux.git Tested: by IBM Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016 Conflicts: None commit 36595d8ad46d9e4c41cc7c48c4405b7c3322deac Author: Wen Gu <guwen@linux.alibaba.com> Date: Thu Jan 6 20:42:08 2022 +0800 net/smc: Reset conn->lgr when link group registration fails SMC connections might fail to be registered in a link group due to unable to find a usable link during its creation. As a result, smc_conn_create() will return a failure and most resources related to the connection won't be applied or initialized, such as conn->abort_work or conn->lnk. If smc_conn_free() is invoked later, it will try to access the uninitialized resources related to the connection, thus causing a warning or crash. This patch tries to fix this by resetting conn->lgr to NULL if an abnormal exit occurs in smc_lgr_register_conn(), thus avoiding the access to uninitialized resources in smc_conn_free(). Meanwhile, the new created link group should be terminated if smc connections can't be registered in it. So smc_lgr_cleanup_early() is modified to take care of link group only and invoked to terminate unusable link group by smc_conn_create(). The call to smc_conn_free() is moved out from smc_lgr_cleanup_early() to smc_conn_abort(). Fixes: `56bc3b2094` ("net/smc: assign link to a new connection") Suggested-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2022-06-15 06:47:34 +02:00
Tobias Huschle	ec18bd7be6	[s390] net/smc: Print net namespace in log Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294 Upstream Status: https://github.com/torvalds/linux.git Tested: by IBM Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016 Conflicts: None commit de2fea7b39bfa1ee9db8726f7b71d54fec385d80 Author: Tony Lu <tonylu@linux.alibaba.com> Date: Tue Dec 28 21:06:11 2021 +0800 net/smc: Print net namespace in log This adds net namespace ID to the kernel log, net_cookie is unique in the whole system. It is useful in container environment. Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2022-06-15 06:47:32 +02:00
Tobias Huschle	b1dbd093ff	[s390] net/smc: Add netlink net namespace support Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294 Upstream Status: https://github.com/torvalds/linux.git Tested: by IBM Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016 Conflicts: None commit 79d39fc503b43b566feae5bc9a57dfcffdf41bd1 Author: Tony Lu <tonylu@linux.alibaba.com> Date: Tue Dec 28 21:06:10 2021 +0800 net/smc: Add netlink net namespace support This adds net namespace ID to diag of linkgroup, helps us to distinguish different namespaces, and net_cookie is unique in the whole system. Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2022-06-15 06:47:32 +02:00
Tobias Huschle	42012cf975	[s390] net/smc: Introduce net namespace support for linkgroup Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294 Upstream Status: https://github.com/torvalds/linux.git Tested: by IBM Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016 Conflicts: None commit 0237a3a683e4844ddc52782d83d439d6192e11f9 Author: Tony Lu <tonylu@linux.alibaba.com> Date: Tue Dec 28 21:06:09 2021 +0800 net/smc: Introduce net namespace support for linkgroup Currently, rdma device supports exclusive net namespace isolation, however linkgroup doesn't know and support ibdev net namespace. Applications in the containers don't want to share the nics if we enabled rdma exclusive mode. Every net namespaces should have their own linkgroups. This patch introduce a new field net for linkgroup, which is standing for the ibdev net namespace in the linkgroup. The net in linkgroup is initialized with the net namespace of link's ibdev. It compares the net of linkgroup and sock or ibdev before choose it, if no matched, create new one in current net namespace. If rdma net namespace exclusive mode is not enabled, it behaves as before. Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2022-06-15 06:47:31 +02:00
Tobias Huschle	5da4f5e2ca	[s390] net/smc: fix kernel panic caused by race of smc_sock Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294 Upstream Status: https://github.com/torvalds/linux.git Tested: by IBM Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016 Conflicts: None commit 349d43127dac00c15231e8ffbcaabd70f7b0e544 Author: Dust Li <dust.li@linux.alibaba.com> Date: Tue Dec 28 17:03:25 2021 +0800 net/smc: fix kernel panic caused by race of smc_sock A crash occurs when smc_cdc_tx_handler() tries to access smc_sock but smc_release() has already freed it. [ 4570.695099] BUG: unable to handle page fault for address: 000000002eae9e88 [ 4570.696048] #PF: supervisor write access in kernel mode [ 4570.696728] #PF: error_code(0x0002) - not-present page [ 4570.697401] PGD 0 P4D 0 [ 4570.697716] Oops: 0002 [#1] PREEMPT SMP NOPTI [ 4570.698228] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-rc4+ #111 [ 4570.699013] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8c24b4c 04/0 [ 4570.699933] RIP: 0010:_raw_spin_lock+0x1a/0x30 <...> [ 4570.711446] Call Trace: [ 4570.711746] <IRQ> [ 4570.711992] smc_cdc_tx_handler+0x41/0xc0 [ 4570.712470] smc_wr_tx_tasklet_fn+0x213/0x560 [ 4570.712981] ? smc_cdc_tx_dismisser+0x10/0x10 [ 4570.713489] tasklet_action_common.isra.17+0x66/0x140 [ 4570.714083] __do_softirq+0x123/0x2f4 [ 4570.714521] irq_exit_rcu+0xc4/0xf0 [ 4570.714934] common_interrupt+0xba/0xe0 Though smc_cdc_tx_handler() checked the existence of smc connection, smc_release() may have already dismissed and released the smc socket before smc_cdc_tx_handler() further visits it. smc_cdc_tx_handler() \|smc_release() if (!conn) \| \| \|smc_cdc_tx_dismiss_slots() \| smc_cdc_tx_dismisser() \| \|sock_put(&smc->sk) <- last sock_put, \| smc_sock freed bh_lock_sock(&smc->sk) (panic) \| To make sure we won't receive any CDC messages after we free the smc_sock, add a refcount on the smc_connection for inflight CDC message(posted to the QP but haven't received related CQE), and don't release the smc_connection until all the inflight CDC messages haven been done, for both success or failed ones. Using refcount on CDC messages brings another problem: when the link is going to be destroyed, smcr_link_clear() will reset the QP, which then remove all the pending CQEs related to the QP in the CQ. To make sure all the CQEs will always come back so the refcount on the smc_connection can always reach 0, smc_ib_modify_qp_reset() was replaced by smc_ib_modify_qp_error(). And remove the timeout in smc_wr_tx_wait_no_pending_sends() since we need to wait for all pending WQEs done, or we may encounter use-after- free when handling CQEs. For IB device removal routine, we need to wait for all the QPs on that device been destroyed before we can destroy CQs on the device, or the refcount on smc_connection won't reach 0 and smc_sock cannot be released. Fixes: `5f08318f61` ("smc: connection data control (CDC)") Reported-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: Dust Li <dust.li@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2022-06-15 06:47:31 +02:00
Tobias Huschle	8d04415218	[s390] net/smc: don't send CDC/LLC message if link not ready Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294 Upstream Status: https://github.com/torvalds/linux.git Tested: by IBM Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016 Conflicts: None commit 90cee52f2e780345d3629e278291aea5ac74f40f Author: Dust Li <dust.li@linux.alibaba.com> Date: Tue Dec 28 17:03:24 2021 +0800 net/smc: don't send CDC/LLC message if link not ready We found smc_llc_send_link_delete_all() sometimes wait for 2s timeout when testing with RDMA link up/down. It is possible when a smc_link is in ACTIVATING state, the underlaying QP is still in RESET or RTR state, which cannot send any messages out. smc_llc_send_link_delete_all() use smc_link_usable() to checks whether the link is usable, if the QP is still in RESET or RTR state, but the smc_link is in ACTIVATING, this LLC message will always fail without any CQE entering the CQ, and we will always wait 2s before timeout. Since we cannot send any messages through the QP before the QP enter RTS. I add a wrapper smc_link_sendable() which checks the state of QP along with the link state. And replace smc_link_usable() with smc_link_sendable() in all LLC & CDC message sending routine. Fixes: `5f08318f61` ("smc: connection data control (CDC)") Signed-off-by: Dust Li <dust.li@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2022-06-15 06:47:30 +02:00
Tobias Huschle	e3ba7dd79c	[s390] net/smc: Clear memory when release and reuse buffer Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294 Upstream Status: https://github.com/torvalds/linux.git Tested: by IBM Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016 Conflicts: None commit 1c5526968e270e4efccfa1da21d211a4915cdeda Author: Tony Lu <tonylu@linux.alibaba.com> Date: Fri Dec 3 12:33:31 2021 +0100 net/smc: Clear memory when release and reuse buffer Currently, buffers are cleared when smc connections are created and buffers are reused. This slows down the speed of establishing new connections. In most cases, the applications want to establish connections as quickly as possible. This patch moves memset() from connection creation path to release and buffer unuse path, this trades off between speed of establishing and release. Test environments: - CPU Intel Xeon Platinum 8 core, mem 32 GiB, nic Mellanox CX4 - socket sndbuf / rcvbuf: 16384 / 131072 bytes - w/o first round, 5 rounds, avg, 100 conns batch per round - smc_buf_create() use bpftrace kprobe, introduces extra latency Latency benchmarks for smc_buf_create(): w/o patch : 19040.0 ns w/ patch : 1932.6 ns ratio : 10.2% (-89.8%) Latency benchmarks for socket create and connect: w/o patch : 143.3 us w/ patch : 102.2 us ratio : 71.3% (-28.7%) The latency of establishing connections is reduced by 28.7%. Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Reviewed-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Link: https://lore.kernel.org/r/20211203113331.2818873-1-kgraul@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2022-06-15 06:47:29 +02:00
Tobias Huschle	675b2d7d0b	[s390] net/smc: fix wrong list_del in smc_lgr_cleanup_early Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294 Upstream Status: https://github.com/torvalds/linux.git Tested: by IBM Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016 Conflicts: None commit 789b6cc2a5f9123b9c549b886fdc47c865cfe0ba Author: Dust Li <dust.li@linux.alibaba.com> Date: Wed Dec 1 11:02:30 2021 +0800 net/smc: fix wrong list_del in smc_lgr_cleanup_early smc_lgr_cleanup_early() meant to delete the link group from the link group list, but it deleted the list head by mistake. This may cause memory corruption since we didn't remove the real link group from the list and later memseted the link group structure. We got a list corruption panic when testing: [ 231.277259] list_del corruption. prev->next should be ffff8881398a8000, but was 0000000000000000 [ 231.278222] ------------[ cut here ]------------ [ 231.278726] kernel BUG at lib/list_debug.c:53! [ 231.279326] invalid opcode: 0000 [#1] SMP NOPTI [ 231.279803] CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.10.46+ #435 [ 231.280466] Hardware name: Alibaba Cloud ECS, BIOS 8c24b4c 04/01/2014 [ 231.281248] Workqueue: events smc_link_down_work [ 231.281732] RIP: 0010:__list_del_entry_valid+0x70/0x90 [ 231.282258] Code: 4c 60 82 e8 7d cc 6a 00 0f 0b 48 89 fe 48 c7 c7 88 4c 60 82 e8 6c cc 6a 00 0f 0b 48 89 fe 48 c7 c7 c0 4c 60 82 e8 5b cc 6a 00 <0f> 0b 48 89 fe 48 c7 c7 00 4d 60 82 e8 4a cc 6a 00 0f 0b cc cc cc [ 231.284146] RSP: 0018:ffffc90000033d58 EFLAGS: 00010292 [ 231.284685] RAX: 0000000000000054 RBX: ffff8881398a8000 RCX: 0000000000000000 [ 231.285415] RDX: 0000000000000001 RSI: ffff88813bc18040 RDI: ffff88813bc18040 [ 231.286141] RBP: ffffffff8305ad40 R08: 0000000000000003 R09: 0000000000000001 [ 231.286873] R10: ffffffff82803da0 R11: ffffc90000033b90 R12: 0000000000000001 [ 231.287606] R13: 0000000000000000 R14: ffff8881398a8000 R15: 0000000000000003 [ 231.288337] FS: 0000000000000000(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000 [ 231.289160] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 231.289754] CR2: 0000000000e72058 CR3: 000000010fa96006 CR4: 00000000003706f0 [ 231.290485] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 231.291211] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 231.291940] Call Trace: [ 231.292211] smc_lgr_terminate_sched+0x53/0xa0 [ 231.292677] smc_switch_conns+0x75/0x6b0 [ 231.293085] ? update_load_avg+0x1a6/0x590 [ 231.293517] ? ttwu_do_wakeup+0x17/0x150 [ 231.293907] ? update_load_avg+0x1a6/0x590 [ 231.294317] ? newidle_balance+0xca/0x3d0 [ 231.294716] smcr_link_down+0x50/0x1a0 [ 231.295090] ? __wake_up_common_lock+0x77/0x90 [ 231.295534] smc_link_down_work+0x46/0x60 [ 231.295933] process_one_work+0x18b/0x350 Fixes: `a0a62ee15a` ("net/smc: separate locks for SMCD and SMCR link group lists") Signed-off-by: Dust Li <dust.li@linux.alibaba.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2022-06-15 06:47:28 +02:00
Tobias Huschle	6052fdc4d7	[s390] net/smc: Fix NULL pointer dereferencing in smc_vlan_by_tcpsk() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294 Upstream Status: https://github.com/torvalds/linux.git Tested: by IBM Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016 Conflicts: None commit 587acad41f1bc48e16f42bb2aca63bf323380be8 Author: Karsten Graul <kgraul@linux.ibm.com> Date: Wed Nov 24 13:32:37 2021 +0100 net/smc: Fix NULL pointer dereferencing in smc_vlan_by_tcpsk() Coverity reports a possible NULL dereferencing problem: in smc_vlan_by_tcpsk(): 6. returned_null: netdev_lower_get_next returns NULL (checked 29 out of 30 times). 7. var_assigned: Assigning: ndev = NULL return value from netdev_lower_get_next. 1623 ndev = (struct net_device *)netdev_lower_get_next(ndev, &lower); CID 1468509 (#1 of 1): Dereference null return value (NULL_RETURNS) 8. dereference: Dereferencing a pointer that might be NULL ndev when calling is_vlan_dev. 1624 if (is_vlan_dev(ndev)) { Remove the manual implementation and use netdev_walk_all_lower_dev() to iterate over the lower devices. While on it remove an obsolete function parameter comment. Fixes: `cb9d43f677` ("net/smc: determine vlan_id of stacked net_device") Suggested-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2022-06-15 06:47:27 +02:00
Tobias Huschle	5f0eac90e8	[s390] net/smc: Make sure the link_id is unique Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294 Upstream Status: https://github.com/torvalds/linux.git Tested: by IBM Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016 Conflicts: None commit cf4f5530bb55ef7d5a91036b26676643b80b1616 Author: Wen Gu <guwen@linux.alibaba.com> Date: Mon Nov 15 17:45:07 2021 +0800 net/smc: Make sure the link_id is unique The link_id is supposed to be unique, but smcr_next_link_id() doesn't skip the used link_id as expected. So the patch fixes this. Fixes: `026c381fb4` ("net/smc: introduce link_idx for link group array") Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2022-06-15 06:47:26 +02:00
Tobias Huschle	06e251007a	[s390] net/smc: Introduce tracepoint for smcr link down Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294 Upstream Status: https://github.com/torvalds/linux.git Tested: by IBM Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016 Conflicts: None commit a3a0e81b6fd55745e100735c7667cd99a0650811 Author: Tony Lu <tonylu@linux.alibaba.com> Date: Mon Nov 1 15:39:16 2021 +0800 net/smc: Introduce tracepoint for smcr link down SMC-R link down event is important to help us find links' issues, we should track this event, especially in the single nic mode, which means upper layer connection would be shut down. Then find out the direct link-down reason in time, not only increased the counter, also the location of the code who triggered this event. Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Reviewed-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2022-06-15 06:47:24 +02:00
Tobias Huschle	d97f87ca20	[s390] net/smc: add netlink support for SMC-Rv2 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294 Upstream Status: https://github.com/torvalds/linux.git Tested: by IBM Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016 Conflicts: None commit b0539f5eddc2eefd24378bda3ee9cbbca916f58d Author: Karsten Graul <kgraul@linux.ibm.com> Date: Sat Oct 16 11:37:51 2021 +0200 net/smc: add netlink support for SMC-Rv2 Implement the netlink support for SMC-Rv2 related attributes that are provided to user space. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2022-06-15 06:47:22 +02:00
Tobias Huschle	ddd334253b	[s390] net/smc: add v2 support to the work request layer Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294 Upstream Status: https://github.com/torvalds/linux.git Tested: by IBM Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016 Conflicts: None commit 8799e310fb3f15759824a78b6b93d7e6d5def067 Author: Karsten Graul <kgraul@linux.ibm.com> Date: Sat Oct 16 11:37:49 2021 +0200 net/smc: add v2 support to the work request layer In the work request layer define one large v2 buffer for each link group that is used to transmit and receive large LLC control messages. Add the completion queue handling for this buffer. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2022-06-15 06:47:21 +02:00
Tobias Huschle	fea676e354	[s390] net/smc: retrieve v2 gid from IB device Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294 Upstream Status: https://github.com/torvalds/linux.git Tested: by IBM Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016 Conflicts: None commit 24fb68111d4509524b483b2577f1b20a24f5fdfd Author: Karsten Graul <kgraul@linux.ibm.com> Date: Sat Oct 16 11:37:48 2021 +0200 net/smc: retrieve v2 gid from IB device In smc_ib.c, scan for RoCE devices that support UDP encapsulation. Find an eligible device and check that there is a route to the remote peer. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2022-06-15 06:47:21 +02:00
Tobias Huschle	4c2e548f6e	[s390] net/smc: add listen processing for SMC-Rv2 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294 Upstream Status: https://github.com/torvalds/linux.git Tested: by IBM Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016 Conflicts: None commit e49300a6bf6218c835403545e9356141a6340181 Author: Karsten Graul <kgraul@linux.ibm.com> Date: Sat Oct 16 11:37:46 2021 +0200 net/smc: add listen processing for SMC-Rv2 Implement the server side of the SMC-Rv2 processing. Process incoming CLC messages, find eligible devices and check for a valid route to the remote peer. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2022-06-15 06:47:20 +02:00
Tobias Huschle	64bd390c97	[s390] net/smc: keep static copy of system EID Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294 Upstream Status: https://github.com/torvalds/linux.git Tested: by IBM Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016 Conflicts: None commit 11a26c59fc510091facd0d80236ac848da844830 Author: Karsten Graul <kgraul@linux.ibm.com> Date: Tue Sep 14 10:35:06 2021 +0200 net/smc: keep static copy of system EID The system EID is retrieved using an registered ISM device each time when needed. This adds some unnecessary complexity at all places where the system EID is needed, but no ISM device is at hand. Simplify the code and save the system EID in a static variable in smc_ism.c. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2022-06-15 06:47:18 +02:00
Tobias Huschle	5902e79404	[s390] net/smc: Allow SMC-D 1MB DMB allocations Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044294 Upstream Status: https://github.com/torvalds/linux.git Tested: by IBM Build-info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45951016 Conflicts: None commit 67161779a9ea926fccee8de047ae66cbd3482b91 Author: Stefan Raspl <raspl@linux.ibm.com> Date: Mon Aug 9 10:10:14 2021 +0200 net/smc: Allow SMC-D 1MB DMB allocations Commit `a3fe3d01bd` ("net/smc: introduce sg-logic for RMBs") introduced a restriction for RMB allocations as used by SMC-R. However, SMC-D does not use scatter-gather lists to back its DMBs, yet it was limited by this restriction, still. This patch exempts SMC, but limits allocations to the maximum RMB/DMB size respectively. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tobias Huschle <thuschle@redhat.com>	2022-06-15 06:47:18 +02:00
Mete Durlu	fa4d5ce7c8	[s390] net/smc: improved fix wait on already cleared link Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1869652 Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Build Info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=40812265 Tested: by IBM Conflicts: None commit 95f7f3e7dc6bd2e735cb5de11734ea2222b1e05a Author: Karsten Graul <kgraul@linux.ibm.com> Date: Thu Oct 7 16:14:40 2021 +0200 net/smc: improved fix wait on already cleared link Commit `8f3d65c166` ("net/smc: fix wait on already cleared link") introduced link refcounting to avoid waits on already cleared links. This patch extents and improves the refcounting to cover all remaining possible cases for this kind of error situation. Fixes: `15e1b99aad` ("net/smc: no WR buffer wait for terminating link group") Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Mete Durlu <mdurlu@redhat.com>	2021-11-03 06:03:22 -04:00
Mete Durlu	0b4a66178f	[s390] net/smc: fix 'workqueue leaked lock' in smc_conn_abort_work Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1869652 Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Build Info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=40812265 Tested: by IBM Conflicts: None commit a18cee4791b1123d0a6579a7c89f4b87e48abe03 Author: Karsten Graul <kgraul@linux.ibm.com> Date: Mon Sep 20 21:18:15 2021 +0200 net/smc: fix 'workqueue leaked lock' in smc_conn_abort_work The abort_work is scheduled when a connection was detected to be out-of-sync after a link failure. The work calls smc_conn_kill(), which calls smc_close_active_abort() and that might end up calling smc_close_cancel_work(). smc_close_cancel_work() cancels any pending close_work and tx_work but needs to release the sock_lock before and acquires the sock_lock again afterwards. So when the sock_lock was NOT acquired before then it may be held after the abort_work completes. Thats why the sock_lock is acquired before the call to smc_conn_kill() in __smc_lgr_terminate(), but this is missing in smc_conn_abort_work(). Fix that by acquiring the sock_lock first and release it after the call to smc_conn_kill(). Fixes: `b286a0651e` ("net/smc: handle incoming CDC validation message") Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Mete Durlu <mdurlu@redhat.com>	2021-11-03 06:02:59 -04:00
Guvenc Gulce	64513d269e	net/smc: Correct smc link connection counter in case of smc client SMC clients may be assigned to a different link after the initial connection between two peers was established. In such a case, the connection counter was not correctly set. Update the connection counter correctly when a smc client connection is assigned to a different smc link. Fixes: `07d51580ff` ("net/smc: Add connection counters for links") Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Tested-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-09 10:46:59 +01:00
Guvenc Gulce	194730a9be	net/smc: Make SMC statistics network namespace aware Make the gathered SMC statistics network namespace aware, for each namespace collect an own set of statistic information. Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-16 12:54:02 -07:00
Guvenc Gulce	e0e4b8fa53	net/smc: Add SMC statistics support Add the ability to collect SMC statistics information. Per-cpu variables are used to collect the statistic information for better performance and for reducing concurrency pitfalls. The code that is collecting statistic data is implemented in macros to increase code reuse and readability. Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-16 12:54:02 -07:00
Karsten Graul	f8e0a68bab	net/smc: avoid possible duplicate dmb unregistration smc_lgr_cleanup() calls smcd_unregister_all_dmbs() as part of the link group termination process. This is a leftover from the times when smc_lgr_cleanup() scheduled a worker to actually free the link group. Nowadays smc_lgr_cleanup() directly calls smc_lgr_free() without any delay so an earlier dmb unregistration is no longer needed. So remove smcd_unregister_all_dmbs() and the call to it. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-03 13:54:49 -07:00
Guvenc Gulce	8a44653689	net/smc: use memcpy instead of snprintf to avoid out of bounds read Using snprintf() to convert not null-terminated strings to null terminated strings may cause out of bounds read in the source string. Therefore use memcpy() and terminate the target string with a null afterwards. Fixes: `a3db10efcc` ("net/smc: Add support for obtaining SMCR device list") Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-12 20:22:01 -08:00
Jakub Kicinski	25fe2c9c4c	smc: fix out of bound access in smc_nl_get_sys_info() smc_clc_get_hostname() sets the host pointer to a buffer which is not NULL-terminated (see smc_clc_init()). Reported-by: syzbot+f4708c391121cfc58396@syzkaller.appspotmail.com Fixes: `099b990bd1` ("net/smc: Add support for obtaining system information") Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-12 20:22:01 -08:00
Guvenc Gulce	a3db10efcc	net/smc: Add support for obtaining SMCR device list Deliver SMCR device information via netlink based diagnostic interface. Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 17:56:13 -08:00
Guvenc Gulce	8f9dde4bf2	net/smc: Add SMC-D Linkgroup diagnostic support Deliver SMCD Linkgroup information via netlink based diagnostic interface. Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 17:56:13 -08:00

1 2 3 4 5

230 Commits