Centos-kernel-stream-9

Commit Graph

Author	SHA1	Message	Date
Nigel Croxon	bf2723f450	md: fix mddev uaf while iterating all_mddevs list JIRA: https://issues.redhat.com/browse/RHEL-83988 commit 8542870237c3a48ff049b6c5df5f50c8728284fa Author: Yu Kuai <yukuai3@huawei.com> Date: Thu Feb 20 20:43:48 2025 +0800 md: fix mddev uaf while iterating all_mddevs list While iterating all_mddevs list from md_notify_reboot() and md_exit(), list_for_each_entry_safe is used, and this can race with deletint the next mddev, causing UAF: t1: spin_lock //list_for_each_entry_safe(mddev, n, ...) mddev_get(mddev1) // assume mddev2 is the next entry spin_unlock t2: //remove mddev2 ... mddev_free spin_lock list_del spin_unlock kfree(mddev2) mddev_put(mddev1) spin_lock //continue dereference mddev2->all_mddevs The old helper for_each_mddev() actually grab the reference of mddev2 while holding the lock, to prevent from being freed. This problem can be fixed the same way, however, the code will be complex. Hence switch to use list_for_each_entry, in this case mddev_put() can free the mddev1 and it's not safe as well. Refer to md_seq_show(), also factor out a helper mddev_put_locked() to fix this problem. Cc: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/linux-raid/20250220124348.845222-1-yukuai1@huaweicloud.com Fixes: f26514342255 ("md: stop using for_each_mddev in md_notify_reboot") Fixes: 16648bac862f ("md: stop using for_each_mddev in md_exit") Reported-and-tested-by: Guillaume Morin <guillaume@morinfr.org> Closes: https://lore.kernel.org/all/Z7Y0SURoA8xwg7vn@bender.morinfr.org/ Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>	2025-03-21 14:55:02 -04:00
Nigel Croxon	b637f08a77	md: switch md-cluster to use md_submodle_head JIRA: https://issues.redhat.com/browse/RHEL-83988 commit 87a86277c9f54953e184318bf71630388aeaf000 Author: Yu Kuai <yukuai3@huawei.com> Date: Sat Feb 15 17:22:25 2025 +0800 md: switch md-cluster to use md_submodle_head To make code cleaner, and prepare to add kconfig for bitmap. Also remove the unsed global variables pers_lock, md_cluster_ops and md_cluster_mod, and exported symbols register_md_cluster_operations(), unregister_md_cluster_operations() and md_cluster_ops. Link: https://lore.kernel.org/linux-raid/20250215092225.2427977-8-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Su Yue <glass.su@suse.com> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>	2025-03-21 14:55:02 -04:00
Nigel Croxon	f1ba1477bf	md: switch personalities to use md_submodule_head JIRA: https://issues.redhat.com/browse/RHEL-83988 commit 3d44e1d1575a877cf75a7776802506ce7ab8ecc4 Author: Yu Kuai <yukuai3@huawei.com> Date: Sat Feb 15 17:22:22 2025 +0800 md: switch personalities to use md_submodule_head Remove the global list 'pers_list', and switch to use md_submodule_head, which is managed by xarry. Prepare to unify registration and unregistration for all sub modules. Link: https://lore.kernel.org/linux-raid/20250215092225.2427977-5-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> (cherry picked from commit 3d44e1d1575a877cf75a7776802506ce7ab8ecc4) Signed-off-by: Nigel Croxon <ncroxon@redhat.com>	2025-03-21 14:55:02 -04:00
Nigel Croxon	50bfca5808	md: don't export md_cluster_ops JIRA: https://issues.redhat.com/browse/RHEL-83988 commit c594de0455b3d65525bad2020f7f7e41af233045 Author: Yu Kuai <yukuai3@huawei.com> Date: Sat Feb 15 17:22:24 2025 +0800 md: don't export md_cluster_ops Add a new field 'cluster_ops' and initialize it md_setup_cluster(), so that the gloable variable 'md_cluter_ops' doesn't need to be exported. Also prepare to switch md-cluster to use md_submod_head. Link: https://lore.kernel.org/linux-raid/20250215092225.2427977-7-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Su Yue <glass.su@suse.com> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>	2025-03-21 14:55:02 -04:00
Nigel Croxon	9545adda24	md: introduce struct md_submodule_head and APIs JIRA: https://issues.redhat.com/browse/RHEL-83988 commit d3beb7c9c61d239e73cb93481b27c7b94130dd03 Author: Yu Kuai <yukuai3@huawei.com> Date: Sat Feb 15 17:22:21 2025 +0800 md: introduce struct md_submodule_head and APIs Prepare to unify registration and unregistration of md personalities and md-cluster, also prepare for add kconfig for md-bitmap. Link: https://lore.kernel.org/linux-raid/20250215092225.2427977-4-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>	2025-03-21 14:55:02 -04:00
Nigel Croxon	7029fe28ea	md: merge common code into find_pers() JIRA: https://issues.redhat.com/browse/RHEL-83988 commit 9faab548974e3eb858250fea1ab7e823a689b44b Author: Yu Kuai <yukuai3@huawei.com> Date: Sat Feb 15 17:22:19 2025 +0800 md: merge common code into find_pers() - pers_lock() are held and released from caller - try_module_get() is called from caller - error message from caller Merge above code into find_pers(), and rename it to get_pers(), also add a wrapper to module_put() as put_pers(). Link: https://lore.kernel.org/linux-raid/20250215092225.2427977-2-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Su Yue <glass.su@suse.com> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>	2025-03-21 14:55:02 -04:00
Nigel Croxon	86315723dd	md: ensure resync is prioritized over recovery JIRA: https://issues.redhat.com/browse/RHEL-83988 commit 4b10a3bc67c1232f76aa1e04778ca26d6c0ddf7f Author: Li Nan <linan122@huawei.com> Date: Thu Feb 13 21:15:30 2025 +0800 md: ensure resync is prioritized over recovery If a new disk is added during resync, the resync process is interrupted, and recovery is triggered, causing the previous resync to be lost. In reality, disk addition should not terminate resync, fix it. Steps to reproduce the issue: mdadm -CR /dev/md0 -l1 -n3 -x1 /dev/sd[abcd] mdadm --fail /dev/md0 /dev/sdc Fixes: `24dd469d72` ("[PATCH] md: allow a manual resync with md") Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/linux-raid/20250213131530.3698600-1-linan666@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>	2025-03-21 14:55:02 -04:00
Nigel Croxon	bbf142254a	md: reintroduce md-linear JIRA: https://issues.redhat.com/browse/RHEL-83988 commit 127186cfb184eaccdfe948e6da66940cfa03efc5 Author: Yu Kuai <yukuai3@huawei.com> Date: Thu Jan 2 19:28:41 2025 +0800 md: reintroduce md-linear THe md-linear is removed by commit 849d18e27be9 ("md: Remove deprecated CONFIG_MD_LINEAR") because it has been marked as deprecated for a long time. However, md-linear is used widely for underlying disks with different size, sadly we didn't know this until now, and it's true useful to create partitions and assemble multiple raid and then append one to the other. People have to use dm-linear in this case now, however, they will prefer to minimize the number of involved modules. Fixes: 849d18e27be9 ("md: Remove deprecated CONFIG_MD_LINEAR") Cc: stable@vger.kernel.org Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Coly Li <colyli@kernel.org> Acked-by: Mike Snitzer <snitzer@kernel.org> Link: https://lore.kernel.org/r/20250102112841.1227111-1-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> (cherry picked from commit 127186cfb184eaccdfe948e6da66940cfa03efc5) Signed-off-by: Nigel Croxon <ncroxon@redhat.com>	2025-03-21 14:55:02 -04:00
Nigel Croxon	897ed4ed38	md: Remove deprecated CONFIG_MD_LINEAR JIRA: https://issues.redhat.com/browse/RHEL-83988 commit 849d18e27be9a1253f2318cb4549cc857219d991 Author: Song Liu <song@kernel.org> Date: Thu Dec 14 14:21:05 2023 -0800 md: Remove deprecated CONFIG_MD_LINEAR md-linear has been marked as deprecated for 2.5 years. Remove it. Cc: Christoph Hellwig <hch@lst.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Neil Brown <neilb@suse.de> Cc: Guoqing Jiang <guoqing.jiang@linux.dev> Cc: Mateusz Grzonka <mateusz.grzonka@intel.com> Cc: Jes Sorensen <jes@trained-monkey.org> Signed-off-by: Song Liu <song@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20231214222107.2016042-2-song@kernel.org (cherry picked from commit 849d18e27be9a1253f2318cb4549cc857219d991) Signed-off-by: Nigel Croxon <ncroxon@redhat.com>	2025-03-21 14:55:02 -04:00
Nigel Croxon	392de5dc0b	md/md-bitmap: Synchronize bitmap_get_stats() with bitmap lifetime JIRA: https://issues.redhat.com/browse/RHEL-73514 commit 8d28d0ddb986f56920ac97ae704cc3340a699a30 Author: Yu Kuai <yukuai3@huawei.com> Date: Fri Jan 24 17:20:55 2025 +0800 md/md-bitmap: Synchronize bitmap_get_stats() with bitmap lifetime After commit ec6bb299c7c3 ("md/md-bitmap: add 'sync_size' into struct md_bitmap_stats"), following panic is reported: Oops: general protection fault, probably for non-canonical address RIP: 0010:bitmap_get_stats+0x2b/0xa0 Call Trace: <TASK> md_seq_show+0x2d2/0x5b0 seq_read_iter+0x2b9/0x470 seq_read+0x12f/0x180 proc_reg_read+0x57/0xb0 vfs_read+0xf6/0x380 ksys_read+0x6c/0xf0 do_syscall_64+0x82/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e Root cause is that bitmap_get_stats() can be called at anytime if mddev is still there, even if bitmap is destroyed, or not fully initialized. Deferenceing bitmap in this case can crash the kernel. Meanwhile, the above commit start to deferencing bitmap->storage, make the problem easier to trigger. Fix the problem by protecting bitmap_get_stats() with bitmap_info.mutex. Cc: stable@vger.kernel.org # v6.12+ Fixes: `32a7627cf3` ("[PATCH] md: optimised resync using Bitmap based intent logging") Reported-and-tested-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Closes: https://lore.kernel.org/linux-raid/ca3a91a2-50ae-4f68-b317-abd9889f3907@oracle.com/T/#m6e5086c95201135e4941fe38f9efa76daf9666c5 Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20250124092055.4050195-1-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>	2025-03-10 10:28:20 -04:00
Nigel Croxon	ce51da6c92	md/md-bitmap: move bitmap_{start, end}write to md upper layer JIRA: https://issues.redhat.com/browse/RHEL-73514 commit cd5fc653381811f1e0ba65f5d169918cab61476f Author: Yu Kuai <yukuai3@huawei.com> Date: Thu Jan 9 09:51:45 2025 +0800 md/md-bitmap: move bitmap_{start, end}write to md upper layer There are two BUG reports that raid5 will hang at bitmap_startwrite([1],[2]), root cause is that bitmap start write and end write is unbalanced, it's not quite clear where, and while reviewing raid5 code, it's found that bitmap operations can be optimized. For example, for a 4 disks raid5, with chunksize=8k, if user issue a IO (0 + 48k) to the array: ┌────────────────────────────────────────────────────────────┐ │chunk 0 │ │ ┌────────────┬─────────────┬─────────────┬────────────┼ │ sh0 │A0: 0 + 4k │A1: 8k + 4k │A2: 16k + 4k │A3: P │ │ ┼────────────┼─────────────┼─────────────┼────────────┼ │ sh1 │B0: 4k + 4k │B1: 12k + 4k │B2: 20k + 4k │B3: P │ ┼──────┴────────────┴─────────────┴─────────────┴────────────┼ │chunk 1 │ │ ┌────────────┬─────────────┬─────────────┬────────────┤ │ sh2 │C0: 24k + 4k│C1: 32k + 4k │C2: P │C3: 40k + 4k│ │ ┼────────────┼─────────────┼─────────────┼────────────┼ │ sh3 │D0: 28k + 4k│D1: 36k + 4k │D2: P │D3: 44k + 4k│ └──────┴────────────┴─────────────┴─────────────┴────────────┘ Before this patch, 4 stripe head will be used, and each sh will attach bio for 3 disks, and each attached bio will trigger bitmap_startwrite() once, which means total 12 times. - 3 times (0 + 4k), for (A0, A1 and A2) - 3 times (4 + 4k), for (B0, B1 and B2) - 3 times (8 + 4k), for (C0, C1 and C3) - 3 times (12 + 4k), for (D0, D1 and D3) After this patch, md upper layer will calculate that IO range (0 + 48k) is corresponding to the bitmap (0 + 16k), and call bitmap_startwrite() just once. Noted that this patch will align bitmap ranges to the chunks, for example, if user issue a IO (0 + 4k) to array: - Before this patch, 1 time (0 + 4k), for A0; - After this patch, 1 time (0 + 8k) for chunk 0; Usually, one bitmap bit will represent more than one disk chunk, and this doesn't have any difference. And even if user really created a array that one chunk contain multiple bits, the overhead is that more data will be recovered after power failure. Also remove STRIPE_BITMAP_PENDING since it's not used anymore. [1] https://lore.kernel.org/all/CAJpMwyjmHQLvm6zg1cmQErttNNQPDAAXPKM3xgTjMhbfts986Q@mail.gmail.com/ [2] https://lore.kernel.org/all/ADF7D720-5764-4AF3-B68E-1845988737AA@flyingcircus.io/ Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20250109015145.158868-6-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>	2025-03-10 10:28:20 -04:00
Nigel Croxon	89963dc328	md: don't record new badblocks for faulty rdev JIRA: https://issues.redhat.com/browse/RHEL-73514 commit 29967332ced51a15a22f11381eeebbc500ba1858 Author: Yu Kuai <yukuai3@huawei.com> Date: Thu Oct 31 11:31:10 2024 +0800 md: don't record new badblocks for faulty rdev Faulty will be checked before issuing IO to the rdev, however, rdev can be faulty at any time, hence it's possible that rdev_set_badblocks() will be called for faulty rdev. In this case, mddev->sb_flags will be set and some other path can be blocked by updating super block. Since faulty rdev will not be accesed anymore, there is no need to record new babblocks for faulty rdev and forcing updating super block. Noted this is not a bugfix, just prevent updating superblock in some corner cases, and will help to slice a bug related to external metadata[1], testing also shows that devices are removed faster in the case IO error. [1] https://lore.kernel.org/all/f34452df-810b-48b2-a9b4-7f925699a9e7@linux.intel.com/ Signed-off-by: Yu Kuai <yukuai3@huawei.com> Tested-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> Link: https://lore.kernel.org/r/20241031033114.3845582-4-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>	2025-03-10 10:28:17 -04:00
Nigel Croxon	3276d99483	md: don't wait faulty rdev in md_wait_for_blocked_rdev() JIRA: https://issues.redhat.com/browse/RHEL-73514 commit 50e8274855e7ab5499ff8296e09802874a3f03b1 Author: Yu Kuai <yukuai3@huawei.com> Date: Thu Oct 31 11:31:09 2024 +0800 md: don't wait faulty rdev in md_wait_for_blocked_rdev() md_wait_for_blocked_rdev() is called for write IO while rdev is blocked, howerver, rdev can be faulty after choosing this rdev to write, and faulty rdev should never be accessed anymore, hence there is no point to wait for faulty rdev to be unblocked. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Tested-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> Link: https://lore.kernel.org/r/20241031033114.3845582-3-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>	2025-03-10 10:28:17 -04:00
Nigel Croxon	66243da722	md: ensure child flush IO does not affect origin bio->bi_status JIRA: https://issues.redhat.com/browse/RHEL-73514 commit 62ce0782bbacd32ec10292b9bdd127330e9b6968 Author: Li Nan <linan122@huawei.com> Date: Thu Sep 19 14:30:48 2024 +0800 md: ensure child flush IO does not affect origin bio->bi_status When a flush is issued to an RAID array, a child flush IO is created and issued for each member disk in the RAID array. Since commit b75197e86e6d ("md: Remove flush handling"), each child flush IO has been chained with the original bio. As a result, the failure of any child IO could modify the bi_status of the original bio, potentially impacting the upper-layer filesystem. Fix the issue by preventing child flush IO from altering the original bio->bi_status as before. However, this design introduces a known issue: in the event of a power failure, if a flush IO on a member disk fails, the upper layers may not be informed. This issue is not easy to fix and will not be addressed for the time being in this issue. Fixes: b75197e86e6d ("md: Remove flush handling") Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240919063048.2887579-1-linan666@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>	2025-03-10 10:28:17 -04:00
Rado Vrbovsky	00f182a2ef	Merge branch 'centos-stream-9-rhel9.6-update-v6.11'	2024-10-18 15:08:53 +00:00
Nigel Croxon	aa3aa30c35	md: Add new_level sysfs interface JIRA: https://issues.redhat.com/browse/RHEL-61196 commit d981ed8419303ed12351eea8541ad6cb76455fe3 Author: Xiao Ni <xni@redhat.com> Date: Thu Sep 5 07:54:53 2024 +0800 md: Add new_level sysfs interface Now reshape supports two ways: with backup file or without backup file. For the situation without backup file, it needs to change data offset. It doesn't need systemd service mdadm-grow-continue. So it can finish the reshape job in one process environment. It can know the new level from mdadm --grow command and can change to new level after reshape finishes. For the situation with backup file, it needs systemd service mdadm-grow-continue to monitor reshape progress. So there are two process envolved. One is mdadm --grow command whick kicks off reshape and wakes up mdadm-grow-continue service. The second process is the service, which doesn't know the new level from the first process. In kernel space mddev->new_level is used to record the new level when doing reshape. This patch adds a new interface to help mdadm update new_level and sync it to metadata. Then mdadm-grow-continue can read the right new_level. Commit log revised by Song Liu. Please refer to the link for more details. Signed-off-by: Xiao Ni <xni@redhat.com> Link: https://lore.kernel.org/r/20240904235453.99120-1-xni@redhat.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>	2024-10-01 16:10:09 -04:00
Nigel Croxon	3fea7e753a	md: Report failed arrays as broken in mdstat JIRA: https://issues.redhat.com/browse/RHEL-61196 commit 2d2b3bc145b9d5b5c6f07d22291723ddb024ca76 Author: Mateusz Kusiak <mateusz.kusiak@intel.com> Date: Tue Sep 3 16:29:49 2024 +0200 md: Report failed arrays as broken in mdstat Depending on if array has personality, it is either reported as active or inactive. This patch adds third status "broken" for arrays with personality that became inoperative. The reason is end users tend to assume that "active" indicates array is operational. Add "broken" state for inoperative arrays with personality and refactor the code. Signed-off-by: Mateusz Kusiak <mateusz.kusiak@intel.com> Link: https://lore.kernel.org/r/20240903142949.53628-1-mateusz.kusiak@intel.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>	2024-10-01 16:10:02 -04:00
Nigel Croxon	c644f50a7e	md: Remove flush handling JIRA: https://issues.redhat.com/browse/RHEL-61196 commit b75197e86e6d3de4e611869ef30a27cf414a5f77 Author: Yu Kuai <yukuai3@huawei.com> Date: Tue Aug 27 19:06:16 2024 +0800 md: Remove flush handling For flush request, md has a special flush handling to merge concurrent flush request into single one, however, the whole mechanism is based on a disk level spin_lock 'mddev->lock'. And fsync can be called quite often in some user cases, for consequence, spin lock from IO fast path can cause performance degradation. Fortunately, the block layer already has flush handling to merge concurrent flush request, and it only acquires hctx level spin lock. (see details in blk-flush.c) This patch removes the flush handling in md, and converts to use general block layer flush handling in underlying disks. Flush test for 4 nvme raid10: start 128 threads to do fsync 100000 times, on arm64, see how long it takes. Test script: void* thread_func(void* arg) { int fd = (int)arg; for (int i = 0; i < FSYNC_COUNT; i++) { fsync(fd); } return NULL; } int main() { int fd = open("/dev/md0", O_RDWR); if (fd < 0) { perror("open"); exit(1); } pthread_t threads[THREADS]; struct timeval start, end; gettimeofday(&start, NULL); for (int i = 0; i < THREADS; i++) { pthread_create(&threads[i], NULL, thread_func, &fd); } for (int i = 0; i < THREADS; i++) { pthread_join(threads[i], NULL); } gettimeofday(&end, NULL); close(fd); long long elapsed = (end.tv_sec - start.tv_sec) * 1000000LL + (end.tv_usec - start.tv_usec); printf("Elapsed time: %lld microseconds\n", elapsed); return 0; } Test result: about 10 times faster: Before this patch: 50943374 microseconds After this patch: `5096347` microseconds Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240827110616.3860190-1-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> (cherry picked from commit b75197e86e6d3de4e611869ef30a27cf414a5f77) Signed-off-by: Nigel Croxon <ncroxon@redhat.com>	2024-10-01 15:25:26 -04:00
Nigel Croxon	5dc63f5dff	md/md-bitmap: merge md_bitmap_wait_behind_writes() into bitmap_operations JIRA: https://issues.redhat.com/browse/RHEL-61196 commit 49f5f5e309e6127957babed7834f5a0e1022f936 Author: Yu Kuai <yukuai3@huawei.com> Date: Mon Aug 26 15:44:50 2024 +0800 md/md-bitmap: merge md_bitmap_wait_behind_writes() into bitmap_operations So that the implementation won't be exposed, and it'll be possible to invent a new bitmap by replacing bitmap_operations. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240826074452.1490072-41-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>	2024-10-01 15:14:53 -04:00
Nigel Croxon	d8e67d6acc	md/md-bitmap: merge md_bitmap_daemon_work() into bitmap_operations JIRA: https://issues.redhat.com/browse/RHEL-61196 commit 18db2a9c60aefc61e796f6a384a952999d3b8885 Author: Yu Kuai <yukuai3@huawei.com> Date: Mon Aug 26 15:44:43 2024 +0800 md/md-bitmap: merge md_bitmap_daemon_work() into bitmap_operations So that the implementation won't be exposed, and it'll be possible to invent a new bitmap by replacing bitmap_operations. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240826074452.1490072-34-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>	2024-10-01 15:13:37 -04:00
Nigel Croxon	b8644c047c	md/md-bitmap: merge bitmap_unplug() into bitmap_operations JIRA: https://issues.redhat.com/browse/RHEL-61196 commit 3c9883e77a36ca76b8d92afa99599263ca587ae7 Author: Yu Kuai <yukuai3@huawei.com> Date: Mon Aug 26 15:44:42 2024 +0800 md/md-bitmap: merge bitmap_unplug() into bitmap_operations So that the implementation won't be exposed, and it'll be possible to invent a new bitmap by replacing bitmap_operations. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240826074452.1490072-33-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>	2024-10-01 15:13:37 -04:00
Nigel Croxon	014f009476	md/md-bitmap: merge md_bitmap_unplug_async() into md_bitmap_unplug() JIRA: https://issues.redhat.com/browse/RHEL-61196 commit 48eb95810a9241afd871de917d70712e2ddfda31 Author: Yu Kuai <yukuai3@huawei.com> Date: Mon Aug 26 15:44:41 2024 +0800 md/md-bitmap: merge md_bitmap_unplug_async() into md_bitmap_unplug() Add a parameter 'bool sync' to distinguish them, and md_bitmap_unplug_async() won't be exported anymore, hence bitmap_operations only need one op to cover them. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240826074452.1490072-32-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>	2024-10-01 15:13:37 -04:00
Nigel Croxon	fde03e7355	md/md-bitmap: merge md_bitmap_dirty_bits() into bitmap_operations JIRA: https://issues.redhat.com/browse/RHEL-61196 commit 2d3b130e177f14b461c47880b6e0b338fd6872f5 Author: Yu Kuai <yukuai3@huawei.com> Date: Mon Aug 26 15:44:32 2024 +0800 md/md-bitmap: merge md_bitmap_dirty_bits() into bitmap_operations So that the implementation won't be exposed, and it'll be possible to invent a new bitmap by replacing bitmap_operations. Also change the parameter from bitmap to mddev, to avoid access bitmap outside md-bitmap.c as much as possible. And while we're here, also fix coding style for bitmap_store(). Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240826074452.1490072-23-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>	2024-10-01 15:12:54 -04:00
Nigel Croxon	22d76f5f94	md/md-bitmap: merge bitmap_write_all() into bitmap_operations JIRA: https://issues.redhat.com/browse/RHEL-61196 commit b26313cb96f1b3fd6f07d3243f6cd426c5cbaf39 Author: Yu Kuai <yukuai3@huawei.com> Date: Mon Aug 26 15:44:31 2024 +0800 md/md-bitmap: merge bitmap_write_all() into bitmap_operations So that the implementation won't be exposed, and it'll be possible to invent a new bitmap by replacing bitmap_operations. Also change the parameter from bitmap to mddev, to avoid access bitmap outside md-bitmap.c as much as possible. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240826074452.1490072-22-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>	2024-10-01 15:12:54 -04:00
Nigel Croxon	f9231aa350	md/md-bitmap: merge md_bitmap_status() into bitmap_operations JIRA: https://issues.redhat.com/browse/RHEL-61196 commit 696936838bc18a761ed778910975d51cf2c35e3a Author: Yu Kuai <yukuai3@huawei.com> Date: Mon Aug 26 15:44:29 2024 +0800 md/md-bitmap: merge md_bitmap_status() into bitmap_operations So that the implementation won't be exposed, and it'll be possible to invent a new bitmap by replacing bitmap_operations. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240826074452.1490072-20-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>	2024-10-01 15:12:53 -04:00
Nigel Croxon	dc11d78127	md/md-bitmap: merge md_bitmap_update_sb() into bitmap_operations JIRA: https://issues.redhat.com/browse/RHEL-61196 commit fe59b34676b4ec6b48a7b436d3422fc9317e047a Author: Yu Kuai <yukuai3@huawei.com> Date: Mon Aug 26 15:44:28 2024 +0800 md/md-bitmap: merge md_bitmap_update_sb() into bitmap_operations So that the implementation won't be exposed, and it'll be possible to invent a new bitmap by replacing bitmap_operations. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240826074452.1490072-19-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>	2024-10-01 15:12:53 -04:00
Nigel Croxon	92b172d83b	md/md-bitmap: merge md_bitmap_flush() into bitmap_operations JIRA: https://issues.redhat.com/browse/RHEL-61196 commit ca925302e841ff0a0598b283f87c472d92b389f3 Author: Yu Kuai <yukuai3@huawei.com> Date: Mon Aug 26 15:44:26 2024 +0800 md/md-bitmap: merge md_bitmap_flush() into bitmap_operations So that the implementation won't be exposed, and it'll be possible to invent a new bitmap by replacing bitmap_operations. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240826074452.1490072-17-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>	2024-10-01 15:12:53 -04:00
Nigel Croxon	0c8e74bde2	md/md-bitmap: merge md_bitmap_destroy() into bitmap_operations JIRA: https://issues.redhat.com/browse/RHEL-61196 commit a2bd70319290d80127dc4257b8c17df3f027c15d Author: Yu Kuai <yukuai3@huawei.com> Date: Mon Aug 26 15:44:25 2024 +0800 md/md-bitmap: merge md_bitmap_destroy() into bitmap_operations So that the implementation won't be exposed, and it'll be possible to invent a new bitmap by replacing bitmap_operations. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240826074452.1490072-16-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>	2024-10-01 15:11:53 -04:00
Nigel Croxon	6dcbf10abf	md/md-bitmap: merge md_bitmap_load() into bitmap_operations JIRA: https://issues.redhat.com/browse/RHEL-61196 commit e1e490805958617327be14eaf0ed31d71adc2c54 Author: Yu Kuai <yukuai3@huawei.com> Date: Mon Aug 26 15:44:24 2024 +0800 md/md-bitmap: merge md_bitmap_load() into bitmap_operations So that the implementation won't be exposed, and it'll be possible to invent a new bitmap by replacing bitmap_operations. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240826074452.1490072-15-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>	2024-10-01 15:11:49 -04:00
Nigel Croxon	e61d53e425	md/md-bitmap: merge md_bitmap_create() into bitmap_operations JIRA: https://issues.redhat.com/browse/RHEL-61196 commit 04c80e649512f2c24f99052440cc808163eff40c Author: Yu Kuai <yukuai3@huawei.com> Date: Mon Aug 26 15:44:23 2024 +0800 md/md-bitmap: merge md_bitmap_create() into bitmap_operations So that the implementation won't be exposed, and it'll be possible to invent a new bitmap by replacing bitmap_operations. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240826074452.1490072-14-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>	2024-10-01 15:11:44 -04:00
Nigel Croxon	5247f3b335	md/md-bitmap: simplify md_bitmap_create() + md_bitmap_load() JIRA: https://issues.redhat.com/browse/RHEL-61196 commit 7545d385ec7e4c0d5e86e7cde4fe3fb8f4555fb9 Author: Yu Kuai <yukuai3@huawei.com> Date: Mon Aug 26 15:44:22 2024 +0800 md/md-bitmap: simplify md_bitmap_create() + md_bitmap_load() Other than internal api get_bitmap_from_slot(), all other places will set returned bitmap to mddev->bitmap. So move the setting of mddev->bitmap into md_bitmap_create() to simplify code. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240826074452.1490072-13-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>	2024-10-01 15:11:39 -04:00
Nigel Croxon	bc4b9bf09d	md/md-bitmap: introduce struct bitmap_operations JIRA: https://issues.redhat.com/browse/RHEL-61196 commit 7add9db6ba3e9bd12d2be97abbc13f3881a515db Author: Yu Kuai <yukuai3@huawei.com> Date: Mon Aug 26 15:44:21 2024 +0800 md/md-bitmap: introduce struct bitmap_operations The structure is empty for now, and will be used in later patches to merge in bitmap operations, so that bitmap implementation won't be exposed. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240826074452.1490072-12-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> (cherry picked from commit 7add9db6ba3e9bd12d2be97abbc13f3881a515db) Signed-off-by: Nigel Croxon <ncroxon@redhat.com>	2024-10-01 15:10:53 -04:00
Nigel Croxon	6d4a54dff3	md/md-bitmap: add 'file_pages' into struct md_bitmap_stats JIRA: https://issues.redhat.com/browse/RHEL-61196 commit 10bc2ac10597ebc0b25afbc72fa4284565548e36 Author: Yu Kuai <yukuai3@huawei.com> Date: Mon Aug 26 15:44:17 2024 +0800 md/md-bitmap: add 'file_pages' into struct md_bitmap_stats There are no functional changes, avoid dereferencing bitmap directly to prepare inventing a new bitmap. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240826074452.1490072-8-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>	2024-10-01 14:50:00 -04:00
Nigel Croxon	a8260f8cfd	md/md-bitmap: add 'events_cleared' into struct md_bitmap_stats JIRA: https://issues.redhat.com/browse/RHEL-61196 commit d004442f46ccae9ea90fdda7a2b0516f1d42b88e Author: Yu Kuai <yukuai3@huawei.com> Date: Mon Aug 26 15:44:14 2024 +0800 md/md-bitmap: add 'events_cleared' into struct md_bitmap_stats Also add a new helper to get events_cleared to avoid dereferencing bitmap directly to prepare inventing a new bitmap. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240826074452.1490072-5-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>	2024-10-01 14:49:44 -04:00
Nigel Croxon	3c1d78320a	md: use new helper md_bitmap_get_stats() in update_array_info() JIRA: https://issues.redhat.com/browse/RHEL-61196 commit 968153812215d68c27c0c9d90da6ec2f6d17a606 Author: Yu Kuai <yukuai3@huawei.com> Date: Mon Aug 26 15:44:13 2024 +0800 md: use new helper md_bitmap_get_stats() in update_array_info() There are no functional changes, avoid dereferencing bitmap directly to prepare inventing a new bitmap. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240826074452.1490072-4-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>	2024-10-01 14:49:40 -04:00
Nigel Croxon	b4fa7c04b4	md/md-bitmap: replace md_bitmap_status() with a new helper md_bitmap_get_stats() JIRA: https://issues.redhat.com/browse/RHEL-61196 commit 38f287d7e495ae00d4481702f44ff7ca79f5c9bc Author: Yu Kuai <yukuai3@huawei.com> Date: Mon Aug 26 15:44:12 2024 +0800 md/md-bitmap: replace md_bitmap_status() with a new helper md_bitmap_get_stats() There are no functional changes, and the new helper will be used in multiple places in following patches to avoid dereferencing bitmap directly. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240826074452.1490072-3-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>	2024-10-01 14:49:37 -04:00
Nigel Croxon	ce5d123fe1	md: Don't flush sync_work in md_write_start() JIRA: https://issues.redhat.com/browse/RHEL-61196 commit 86ad4cda79e0dade87d4bb0d32e1fe541d4a63e8 Author: Yu Kuai <yukuai3@huawei.com> Date: Thu Aug 1 20:47:46 2024 +0800 md: Don't flush sync_work in md_write_start() Because flush sync_work may trigger mddev_suspend() if there are spares, and this should never be done in IO path because mddev_suspend() is used to wait for IO. This problem is found by code review. Fixes: bc08041b32ab ("md: suspend array in md_start_sync() if array need reconfiguration") Cc: stable@vger.kernel.org Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240801124746.242558-1-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>	2024-10-01 14:49:27 -04:00
Ming Lei	4c4a2238d3	md: set md-specific flags for all queue limits JIRA: https://issues.redhat.com/browse/RHEL-56837 commit 573d5abf3df00c879fbd25774e4cf3e22c9cabd0 Author: Christoph Hellwig <hch@lst.de> Date: Wed Jun 26 16:26:22 2024 +0200 md: set md-specific flags for all queue limits The md driver wants to enforce a number of flags for all devices, even when not inheriting them from the underlying devices. To make sure these flags survive the queue_limits_set calls that md uses to update the queue limits without deriving them form the previous limits add a new md_init_stacking_limits helper that calls blk_set_stacking_limits and sets these flags. Fixes: 1122c0c1cc71 ("block: move cache control settings out of queue->flags") Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240626142637.300624-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2024-09-27 11:19:10 +08:00
Ming Lei	72cc0ad565	block: move the nowait flag to queue_limits JIRA: https://issues.redhat.com/browse/RHEL-56837 commit f76af42f8bf13d2620084f305f01691de9238fc7 Author: Christoph Hellwig <hch@lst.de> Date: Mon Jun 17 08:04:46 2024 +0200 block: move the nowait flag to queue_limits Move the nowait flag into the queue_limits feature field so that it can be set atomically with the queue frozen. Stacking drivers are simplified in that they now can simply set the flag, and blk_stack_limits will clear it when the features is not supported by any of the underlying devices. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240617060532.127975-20-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2024-09-27 11:19:08 +08:00
Ming Lei	a8090e43fe	block: move the io_stat flag setting to queue_limits JIRA: https://issues.redhat.com/browse/RHEL-56837 commit cdb2497918cc2929691408bac87b58433b45b6d3 Author: Christoph Hellwig <hch@lst.de> Date: Mon Jun 17 08:04:43 2024 +0200 block: move the io_stat flag setting to queue_limits Move the io_stat flag into the queue_limits feature field so that it can be set atomically with the queue frozen. Simplify md and dm to set the flag unconditionally instead of avoiding setting a simple flag for cases where it already is set by other means, which is a bit pointless. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240617060532.127975-17-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2024-09-27 11:19:08 +08:00
Ming Lei	38f0dd51b5	block: move the nonrot flag to queue_limits JIRA: https://issues.redhat.com/browse/RHEL-56837 Conflicts: drop change on ublk & bcache commit bd4a633b6f7c3c6b6ebc1a07317643270e751a94 Author: Christoph Hellwig <hch@lst.de> Date: Mon Jun 17 08:04:41 2024 +0200 block: move the nonrot flag to queue_limits Move the nonrot flag into the queue_limits feature field so that it can be set atomically with the queue frozen. Use the chance to switch to defaulting to non-rotational and require the driver to opt into rotational, which matches the polarity of the sysfs interface. For the z2ram, ps3vram, 2x memstick, ubiblock and dcssblk the new rotational flag is not set as they clearly are not rotational despite this being a behavior change. There are some other drivers that unconditionally set the rotational flag to keep the existing behavior as they arguably can be used on rotational devices even if that is probably not their main use today (e.g. virtio_blk and drbd). The flag is automatically inherited in blk_stack_limits matching the existing behavior in dm and md. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240617060532.127975-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2024-09-27 11:19:08 +08:00
Ming Lei	c2430f6692	block: move cache control settings out of queue->flags JIRA: https://issues.redhat.com/browse/RHEL-56837 Conflicts: drop change on bcache & ublk commit 1122c0c1cc71f740fa4d5f14f239194e06a1d5e7 Author: Christoph Hellwig <hch@lst.de> Date: Mon Jun 17 08:04:40 2024 +0200 block: move cache control settings out of queue->flags Move the cache control settings into the queue_limits so that the flags can be set atomically with the device queue frozen. Add new features and flags field for the driver set flags, and internal (usually sysfs-controlled) flags in the block layer. Note that we'll eventually remove enough field from queue_limits to bring it back to the previous size. The disable flag is inverted compared to the previous meaning, which means it now survives a rescan, similar to the max_sectors and max_discard_sectors user limits. The FLUSH and FUA flags are now inherited by blk_stack_limits, which simplified the code in dm a lot, but also causes a slight behavior change in that dm-switch and dm-unstripe now advertise a write cache despite setting num_flush_bios to 0. The I/O path will handle this gracefully, but as far as I can tell the lack of num_flush_bios and thus flush support is a pre-existing data integrity bug in those targets that really needs fixing, after which a non-zero num_flush_bios should be required in dm for targets that map to underlying devices. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240617060532.127975-14-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2024-09-27 11:19:08 +08:00
Ming Lei	9fbe332e6d	block: move integrity information into queue_limits JIRA: https://issues.redhat.com/browse/RHEL-56837 commit c6e56cf6b2e79a463af21286ba951714ed20828c Author: Christoph Hellwig <hch@lst.de> Date: Thu Jun 13 10:48:22 2024 +0200 block: move integrity information into queue_limits Move the integrity information into the queue limits so that it can be set atomically with other queue limits, and that the sysfs changes to the read_verify and write_generate flags are properly synchronized. This also allows to provide a more useful helper to stack the integrity fields, although it still is separate from the main stacking function as not all stackable devices want to inherit the integrity settings. Even with that it greatly simplifies the code in md and dm. Note that the integrity field is moved as-is into the queue limits. While there are good arguments for removing the separate blk_integrity structure, this would cause a lot of churn and might better be done at a later time if desired. However the integrity field in the queue_limits structure is now unconditional so that various ifdefs can be avoided or replaced with IS_ENABLED(). Given that tiny size of it that seems like a worthwhile trade off. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240613084839.1044015-13-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2024-09-27 11:19:06 +08:00
Ming Lei	7122d1fe5c	md: remove mddev->queue JIRA: https://issues.redhat.com/browse/RHEL-56837 commit 396799eb5b6f87ec2d759e1a90e179f7058ab9e6 Author: Christoph Hellwig <hch@lst.de> Date: Sun Mar 3 07:01:49 2024 -0700 md: remove mddev->queue Just use the request_queue from the gendisk pointer in the relatively few places that sill need it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed--by: Song Liu <song@kernel.org> Tested-by: Song Liu <song@kernel.org> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240303140150.5435-11-hch@lst.de Signed-off-by: Ming Lei <ming.lei@redhat.com>	2024-09-27 11:18:41 +08:00
Ming Lei	f4f7a58f95	md: don't initialize queue limits JIRA: https://issues.redhat.com/browse/RHEL-56837 commit 81a16e19d545fd244ad176f7222d92b67215a33b Author: Christoph Hellwig <hch@lst.de> Date: Sun Mar 3 07:01:48 2024 -0700 md: don't initialize queue limits Initial queue limits are now set from ->run. Remove the superfluous initialization in md_alloc and level_store. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed--by: Song Liu <song@kernel.org> Tested-by: Song Liu <song@kernel.org> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240303140150.5435-10-hch@lst.de Signed-off-by: Ming Lei <ming.lei@redhat.com>	2024-09-27 11:18:41 +08:00
Ming Lei	13a8d4641b	md: add queue limit helpers JIRA: https://issues.redhat.com/browse/RHEL-56837 commit e305fce1883128a9468efe1876a057df48a261d6 Author: Christoph Hellwig <hch@lst.de> Date: Sun Mar 3 07:01:43 2024 -0700 md: add queue limit helpers Add a few helpers that wrap the block queue limits API for use in MD. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed--by: Song Liu <song@kernel.org> Tested-by: Song Liu <song@kernel.org> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240303140150.5435-5-hch@lst.de Signed-off-by: Ming Lei <ming.lei@redhat.com>	2024-09-27 11:18:40 +08:00
Ming Lei	ba42c16212	block: pass a queue_limits argument to blk_alloc_disk JIRA: https://issues.redhat.com/browse/RHEL-56837 commit 74fa8f9c553f7b5ccab7d103acae63cc2e080465 Author: Christoph Hellwig <hch@lst.de> Date: Thu Feb 15 08:10:47 2024 +0100 block: pass a queue_limits argument to blk_alloc_disk Pass a queue_limits to blk_alloc_disk and apply it if non-NULL. This will allow allocating queues with valid queue limits instead of setting the values one at a time later. Also change blk_alloc_disk to return an ERR_PTR instead of just NULL which can't distinguish errors. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Link: https://lore.kernel.org/r/20240215071055.2201424-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2024-09-27 11:18:35 +08:00
Rado Vrbovsky	78fa0da45f	Merge: md: fix deadlock between mddev_suspend and flush bio MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5067 JIRA: https://issues.redhat.com/browse/RHEL-54757 CVE: CVE-2024-43855 Upstream Status: commit found in Linus's git tree Brew: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=63498077 Signed-off-by: Nigel Croxon <ncroxon@redhat.com> Approved-by: Jay Shin <jaeshin@redhat.com> Approved-by: Heinz Mauelshagen <heinzm@redhat.com> Approved-by: Xiao Ni <xni@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>	2024-09-23 08:11:01 +00:00
Nigel Croxon	37a7a4be7a	md-cluster: fix no recovery job when adding/re-adding a disk JIRA: https://issues.redhat.com/browse/RHEL-46615 commit 35a0a409fa269c287c4378f1aefe84ae8b5211a1 Author: Heming Zhao <heming.zhao@suse.com> Date: Tue Jul 9 18:41:20 2024 +0800 md-cluster: fix no recovery job when adding/re-adding a disk The commit db5e653d7c9f ("md: delay choosing sync action to md_start_sync()") delays the start of the sync action. In a clustered environment, this will cause another node to first activate the spare disk and skip recovery. As a result, no nodes will perform recovery when a disk is added or re-added. Before db5e653d7c9f: ``` node1 node2 ---------------------------------------------------------------- md_check_recovery + md_update_sb \| sendmsg: METADATA_UPDATED + md_choose_sync_action process_metadata_update \| remove_and_add_spares //node1 has not finished adding + call mddev->sync_work //the spare disk:do nothing md_start_sync starts md_do_sync md_do_sync + grabbed resync_lockres:DLM_LOCK_EX + do syncing job md_check_recovery sendmsg: METADATA_UPDATED process_metadata_update //activate spare disk ... ... md_do_sync waiting to grab resync_lockres:EX ``` After db5e653d7c9f: (note: if 'cmd:idle' sets MD_RECOVERY_INTR after md_check_recovery starts md_start_sync, setting the INTR action will exacerbate the delay in node1 calling the md_do_sync function.) ``` node1 node2 ---------------------------------------------------------------- md_check_recovery + md_update_sb \| sendmsg: METADATA_UPDATED + calls mddev->sync_work process_metadata_update //node1 has not finished adding //the spare disk:do nothing md_start_sync + md_choose_sync_action \| remove_and_add_spares + calls md_do_sync md_check_recovery md_update_sb sendmsg: METADATA_UPDATED process_metadata_update //activate spare disk ... ... ... ... md_do_sync + grabbed resync_lockres:EX + raid1_sync_request skip sync under conf->fullsync:0 md_do_sync 1. waiting to grab resync_lockres:EX 2. when node1 could grab EX lock, node1 will skip resync under recovery_offset:MaxSector ``` How to trigger: ```(commands @node1) # to easily watch the recovery status echo 2000 > /proc/sys/dev/raid/speed_limit_max ssh root@node2 "echo 2000 > /proc/sys/dev/raid/speed_limit_max" mdadm -CR /dev/md0 -l1 -b clustered -n 2 /dev/sda /dev/sdb --assume-clean ssh root@node2 mdadm -A /dev/md0 /dev/sda /dev/sdb mdadm --manage /dev/md0 --fail /dev/sda --remove /dev/sda mdadm --manage /dev/md0 --add /dev/sdc === "cat /proc/mdstat" on both node, there are no recovery action. === ``` How to fix: because md layer code logic is hard to restore for speeding up sync job on local node, we add new cluster msg to pending the another node to active disk. Signed-off-by: Heming Zhao <heming.zhao@suse.com> Reviewed-by: Su Yue <glass.su@suse.com> Acked-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240709104120.22243-2-heming.zhao@suse.com (cherry picked from commit 35a0a409fa269c287c4378f1aefe84ae8b5211a1) Signed-off-by: Nigel Croxon <ncroxon@redhat.com>	2024-08-28 13:32:32 -04:00
Nigel Croxon	b059da8c12	md: Don't wait for MD_RECOVERY_NEEDED for HOT_REMOVE_DISK ioctl JIRA: https://issues.redhat.com/browse/RHEL-46615 commit a1fd37f97808db4fa1bf55da0275790c42521e45 Author: Yu Kuai <yukuai3@huawei.com> Date: Thu Jun 27 19:23:21 2024 +0800 md: Don't wait for MD_RECOVERY_NEEDED for HOT_REMOVE_DISK ioctl Commit `90f5f7ad4f` ("md: Wait for md_check_recovery before attempting device removal.") explained in the commit message that failed device must be reomoved from the personality first by md_check_recovery(), before it can be removed from the array. That's the reason the commit add the code to wait for MD_RECOVERY_NEEDED. However, this is not the case now, because remove_and_add_spares() is called directly from hot_remove_disk() from ioctl path, hence failed device(marked faulty) can be removed from the personality by ioctl. On the other hand, the commit introduced a performance problem that if MD_RECOVERY_NEEDED is set and the array is not running, ioctl will wait for 5s before it can return failure to user. Since the waiting is not needed now, fix the problem by removing the waiting. Fixes: `90f5f7ad4f` ("md: Wait for md_check_recovery before attempting device removal.") Reported-by: Mateusz Kusiak <mateusz.kusiak@linux.intel.com> Closes: https://lore.kernel.org/all/814ff6ee-47a2-4ba0-963e-cf256ee4ecfa@linux.intel.com/ Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240627112321.3044744-1-yukuai1@huaweicloud.com (cherry picked from commit a1fd37f97808db4fa1bf55da0275790c42521e45) Signed-off-by: Nigel Croxon <ncroxon@redhat.com>	2024-08-28 13:32:26 -04:00

1 2 3 4 5 ...

1449 Commits