MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5037
JIRA: https://issues.redhat.com/browse/RHEL-54875
CVE: CVE-2024-43820
```
dm-raid: Fix WARN_ON_ONCE check for sync_thread in raid_resume
rm-raid devices will occasionally trigger the following warning when
being resumed after a table load because DM_RECOVERY_RUNNING is set:
WARNING: CPU: 7 PID: 5660 at drivers/md/dm-raid.c:4105 raid_resume+0xee/0x100 [dm_raid]
The failing check is:
WARN_ON_ONCE(test_bit(MD_RECOVERY_RUNNING, &mddev->recovery));
This check is designed to make sure that the sync thread isn't
registered, but md_check_recovery can set MD_RECOVERY_RUNNING without
the sync_thread ever getting registered. Instead of checking if
MD_RECOVERY_RUNNING is set, check if sync_thread is non-NULL.
Fixes: 16c4770c75b1 ("dm-raid: really frozen sync_thread during suspend")
Suggested-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
(cherry picked from commit 3199a34bfaf7561410e0be1e33a61eba870768fc)
```
Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>
Approved-by: Heinz Mauelshagen <heinzm@redhat.com>
Approved-by: Xiao Ni <xni@redhat.com>
Approved-by: Nigel Croxon <ncroxon@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>
Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-34750
Upstream Status: kernel/git/torvalds/linux.git
commit d176fadb9e783c152d0820a50f84882b6c5ae314
Author: Heinz Mauelshagen <heinzm@redhat.com>
Date: Tue Jul 9 13:56:38 2024 +0200
dm raid: fix stripes adding reshape size issues
Adding stripes to an existing raid4/5/6/10 mapped device grows its
capacity though it'll be only made available _after_ the respective
reshape finished as of MD kernel reshape semantics. Such reshaping
involves moving a window forward starting at BOD reading content
from previous lesser stripes and writing them back in the new
layout with more stripes. Once that process finishes at end of
previous data, the grown size may be announced and used. In order
to avoid writing over any existing data in place, out-of-place space
is added to the beginning of each data device by lvm2 before starting
the reshape process. That reshape space wasn't taken into acount for
data device size calculation.
Fixes resulting from above:
- correct event handling conditions in do_table_event() to set
the device's capacity after the stripe adding reshape ended
- subtract mentioned out-of-place space doing data device and
array size calculations
- conditionally set capacity as of superblock in preresume
Testing:
- passes all LVM2 RAID tests including new lvconvert-raid-reshape-size.sh one
Tested-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-34750
Upstream Status: kernel/git/torvalds/linux.git
commit 453496b899b5f62ff193bca46097f0f7211cec46
Author: Heinz Mauelshagen <heinzm@redhat.com>
Date: Tue Jul 9 13:56:12 2024 +0200
dm raid: move _get_reshape_sectors() as prerequisite to fixing reshape size issues
rs_set_dev_and_array_sectors() needs this function to
calculate device and array size properly in case leg data
devices have out-of-place reshape space allocated.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-59523
Upstream Status: kernel/git/torvalds/linux.git
Conflicts: Hunk patching drivers/md/dm-vdo/dm-vdo-target.c removed due
to missing upstream commit 03d1e20fa16e0 ("dm vdo: add the
top-level DM target"). dm-vdo is not part of the kernel
package in rhel-9.
commit 0a94a469a4f02bdcc223517fd578810ffc21c548
Author: Christoph Hellwig <hch@lst.de>
Date: Wed Jul 3 15:12:08 2024 +0200
dm: stop using blk_limits_io_{min,opt}
Remove use of the blk_limits_io_{min,opt} and assign the values directly
to the queue_limits structure. For the io_opt this is a completely
mechanical change, for io_min it removes flooring the limit to the
physical and logical block size in the particular caller. But as
blk_validate_limits will do the same later when actually applying the
limits, there still is no change in overall behavior.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-61196
commit 77c09640eea56dbfed069ac67b1cd79397d41be8
Author: Yu Kuai <yukuai3@huawei.com>
Date: Mon Aug 26 15:44:45 2024 +0800
md/md-bitmap: merge md_bitmap_resize() into bitmap_operations
So that the implementation won't be exposed, and it'll be possible
to invent a new bitmap by replacing bitmap_operations.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-36-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-61196
commit e1791dae6cbd65e5102dca40b8adadef1d89c1b9
Author: Yu Kuai <yukuai3@huawei.com>
Date: Mon Aug 26 15:44:44 2024 +0800
md/md-bitmap: pass in mddev directly for md_bitmap_resize()
And move the condition "if (mddev->bitmap)" into md_bitmap_resize() as
well, on the one hand make code cleaner, on the other hand try not to
access bitmap directly.
Since we are here, also change the parameter 'init' from int to bool.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-35-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-61196
commit e1e490805958617327be14eaf0ed31d71adc2c54
Author: Yu Kuai <yukuai3@huawei.com>
Date: Mon Aug 26 15:44:24 2024 +0800
md/md-bitmap: merge md_bitmap_load() into bitmap_operations
So that the implementation won't be exposed, and it'll be possible
to invent a new bitmap by replacing bitmap_operations.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-15-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-46615
commit d249e541887a966df37544f7c4d301cdee0f0e27
Author: Yu Kuai <yukuai3@huawei.com>
Date: Tue Jun 11 21:22:48 2024 +0800
md: replace last_sync_action with new enum type
The only difference is that "none" is removed and initial
last_sync_action will be idle.
On the one hand, this value is introduced by commit c4a3955145
("MD: Remember the last sync operation that was performed"), and the
usage described in commit message is not affected. On the other hand,
last_sync_action is not used in mdadm or mdmon, and none of the tests
that I can find.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240611132251.1967786-10-yukuai1@huaweicloud.com
(cherry picked from commit d249e541887a966df37544f7c4d301cdee0f0e27)
Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-54875
CVE: CVE-2024-43820
commit 3199a34bfaf7561410e0be1e33a61eba870768fc
Author: Benjamin Marzinski <bmarzins@redhat.com>
Date: Tue Jul 2 17:02:48 2024 +0200
dm-raid: Fix WARN_ON_ONCE check for sync_thread in raid_resume
rm-raid devices will occasionally trigger the following warning when
being resumed after a table load because DM_RECOVERY_RUNNING is set:
WARNING: CPU: 7 PID: 5660 at drivers/md/dm-raid.c:4105 raid_resume+0xee/0x100 [dm_raid]
The failing check is:
WARN_ON_ONCE(test_bit(MD_RECOVERY_RUNNING, &mddev->recovery));
This check is designed to make sure that the sync thread isn't
registered, but md_check_recovery can set MD_RECOVERY_RUNNING without
the sync_thread ever getting registered. Instead of checking if
MD_RECOVERY_RUNNING is set, check if sync_thread is non-NULL.
Fixes: 16c4770c75b1 ("dm-raid: really frozen sync_thread during suspend")
Suggested-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-34599
Upstream Status: kernel/git/torvalds/linux.git
Conflict: Context change due to usptream patch 41425f96d7aa
("dm-raid456, md/raid456: fix a deadlock for dm-raid456 while
io concurrent with reshape") having already been applied as
5266d3b3fb.
commit b25b8f4b8ecef0f48c05f0c3572daeabefe16526
Author: Ming Lei <ming.lei@redhat.com>
Date: Mon Mar 11 13:42:55 2024 -0400
dm raid: fix false positive for requeue needed during reshape
An empty flush doesn't have a payload, so it should never be looked at
when considering to possibly requeue a bio for the case when a reshape
is in progress.
Fixes: 9dbd1aa3a8 ("dm raid: add reshaping support to the target")
Reported-by: Patrick Plenefisch <simonpatp@gmail.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-34599
Upstream Status: kernel/git/torvalds/linux.git
commit e3260d90c8f35c03ce182bfd2eeea75805586c25
Author: Kees Cook <keescook@chromium.org>
Date: Fri Sep 15 13:03:36 2023 -0700
dm raid: Annotate struct raid_set with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct raid_set.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Cc: dm-devel@redhat.com
Reviewed-by: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20230915200335.never.098-kees@kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-30951
Upstream Status: kernel/git/torvalds/linux.git
commit db29d79b34d9593179de5f868be45c650923e7b4
Author: Yu Kuai <yukuai3@huawei.com>
Date: Fri Nov 24 15:59:53 2023 +0800
dm-raid: delay flushing event_work() after reconfig_mutex is released
After commit db5e653d7c9f ("md: delay choosing sync action to
md_start_sync()"), md_start_sync() will hold 'reconfig_mutex', however,
in order to make sure event_work is done, __md_stop() will flush
workqueue with reconfig_mutex grabbed, hence if sync_work is still
pending, deadlock will be triggered.
Fortunately, former pacthes to fix stopping sync_thread already make sure
all sync_work is done already, hence such deadlock is not possible
anymore. However, in order not to cause confusions for people by this
implicit dependency, delay flushing event_work to dm-raid where
'reconfig_mutex' is not held, and add some comments to emphasize that
the workqueue can't be flushed with 'reconfig_mutex'.
Fixes: db5e653d7c9f ("md: delay choosing sync action to md_start_sync()")
Depends-on: f52f5c71f3d4 ("md: fix stopping sync thread")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Acked-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-26279
commit 95009ae904b1e9dca8db6f649f2d7c18a6e42c75
Author: Yu Kuai <yukuai3@huawei.com>
Date: Tue Mar 5 15:23:06 2024 +0800
dm-raid: fix lockdep waring in "pers->hot_add_disk"
The lockdep assert is added by commit a448af25becf ("md/raid10: remove
rcu protection to access rdev from conf") in print_conf(). And I didn't
notice that dm-raid is calling "pers->hot_add_disk" without holding
'reconfig_mutex'.
"pers->hot_add_disk" read and write many fields that is protected by
'reconfig_mutex', and raid_resume() already grab the lock in other
contex. Hence fix this problem by protecting "pers->host_add_disk"
with the lock.
Fixes: 9092c02d94 ("DM RAID: Add ability to restore transiently failed devices on resume")
Fixes: a448af25becf ("md/raid10: remove rcu protection to access rdev from conf")
Cc: stable@vger.kernel.org # v6.7+
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Acked-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240305072306.2562024-10-yukuai1@huaweicloud.com
Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-26279
commit cd32b27a66db8776d8b8e82ec7d7dde97a8693b0
Author: Yu Kuai <yukuai3@huawei.com>
Date: Tue Mar 5 15:23:03 2024 +0800
md/dm-raid: don't call md_reap_sync_thread() directly
Currently md_reap_sync_thread() is called from raid_message() directly
without holding 'reconfig_mutex', this is definitely unsafe because
md_reap_sync_thread() can change many fields that is protected by
'reconfig_mutex'.
However, hold 'reconfig_mutex' here is still problematic because this
will cause deadlock, for example, commit 130443d60b1b ("md: refactor
idle/frozen_sync_thread() to fix deadlock").
Fix this problem by using stop_sync_thread() to unregister sync_thread,
like md/raid did.
Fixes: be83651f00 ("DM RAID: Add message/status support for changing sync action")
Cc: stable@vger.kernel.org # v6.7+
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Acked-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240305072306.2562024-7-yukuai1@huaweicloud.com
Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-26279
commit 41425f96d7aa59bc865f60f5dda3d7697b555677
Author: Yu Kuai <yukuai3@huawei.com>
Date: Tue Mar 5 15:23:05 2024 +0800
dm-raid456, md/raid456: fix a deadlock for dm-raid456 while io concurrent with reshape
For raid456, if reshape is still in progress, then IO across reshape
position will wait for reshape to make progress. However, for dm-raid,
in following cases reshape will never make progress hence IO will hang:
1) the array is read-only;
2) MD_RECOVERY_WAIT is set;
3) MD_RECOVERY_FROZEN is set;
After commit c467e97f079f ("md/raid6: use valid sector values to determine
if an I/O should wait on the reshape") fix the problem that IO across
reshape position doesn't wait for reshape, the dm-raid test
shell/lvconvert-raid-reshape.sh start to hang:
[root@fedora ~]# cat /proc/979/stack
[<0>] wait_woken+0x7d/0x90
[<0>] raid5_make_request+0x929/0x1d70 [raid456]
[<0>] md_handle_request+0xc2/0x3b0 [md_mod]
[<0>] raid_map+0x2c/0x50 [dm_raid]
[<0>] __map_bio+0x251/0x380 [dm_mod]
[<0>] dm_submit_bio+0x1f0/0x760 [dm_mod]
[<0>] __submit_bio+0xc2/0x1c0
[<0>] submit_bio_noacct_nocheck+0x17f/0x450
[<0>] submit_bio_noacct+0x2bc/0x780
[<0>] submit_bio+0x70/0xc0
[<0>] mpage_readahead+0x169/0x1f0
[<0>] blkdev_readahead+0x18/0x30
[<0>] read_pages+0x7c/0x3b0
[<0>] page_cache_ra_unbounded+0x1ab/0x280
[<0>] force_page_cache_ra+0x9e/0x130
[<0>] page_cache_sync_ra+0x3b/0x110
[<0>] filemap_get_pages+0x143/0xa30
[<0>] filemap_read+0xdc/0x4b0
[<0>] blkdev_read_iter+0x75/0x200
[<0>] vfs_read+0x272/0x460
[<0>] ksys_read+0x7a/0x170
[<0>] __x64_sys_read+0x1c/0x30
[<0>] do_syscall_64+0xc6/0x230
[<0>] entry_SYSCALL_64_after_hwframe+0x6c/0x74
This is because reshape can't make progress.
For md/raid, the problem doesn't exist because register new sync_thread
doesn't rely on the IO to be done any more:
1) If array is read-only, it can switch to read-write by ioctl/sysfs;
2) md/raid never set MD_RECOVERY_WAIT;
3) If MD_RECOVERY_FROZEN is set, mddev_suspend() doesn't hold
'reconfig_mutex', hence it can be cleared and reshape can continue by
sysfs api 'sync_action'.
However, I'm not sure yet how to avoid the problem in dm-raid yet. This
patch on the one hand make sure raid_message() can't change
sync_thread() through raid_message() after presuspend(), on the other
hand detect the above 3 cases before wait for IO do be done in
dm_suspend(), and let dm-raid requeue those IO.
Cc: stable@vger.kernel.org # v6.7+
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Acked-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240305072306.2562024-9-yukuai1@huaweicloud.com
(cherry picked from commit 41425f96d7aa59bc865f60f5dda3d7697b555677)
Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-26279
commit 5625ff8b72b0e5c13b0fc1fc1f198155af45f729
Author: Yu Kuai <yukuai3@huawei.com>
Date: Tue Mar 5 15:23:04 2024 +0800
dm-raid: add a new helper prepare_suspend() in md_personality
There are no functional changes for now, prepare to fix a deadlock for
dm-raid456.
Cc: stable@vger.kernel.org # v6.7+
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Acked-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240305072306.2562024-8-yukuai1@huaweicloud.com
Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-26279
commit 16c4770c75b1223998adbeb7286f9a15c65fba73
Author: Yu Kuai <yukuai3@huawei.com>
Date: Tue Mar 5 15:23:02 2024 +0800
dm-raid: really frozen sync_thread during suspend
1) commit f52f5c71f3d4 ("md: fix stopping sync thread") remove
MD_RECOVERY_FROZEN from __md_stop_writes() and doesn't realize that
dm-raid relies on __md_stop_writes() to frozen sync_thread
indirectly. Fix this problem by adding MD_RECOVERY_FROZEN in
md_stop_writes(), and since stop_sync_thread() is only used for
dm-raid in this case, also move stop_sync_thread() to
md_stop_writes().
2) The flag MD_RECOVERY_FROZEN doesn't mean that sync thread is frozen,
it only prevent new sync_thread to start, and it can't stop the
running sync thread; In order to frozen sync_thread, after seting the
flag, stop_sync_thread() should be used.
3) The flag MD_RECOVERY_FROZEN doesn't mean that writes are stopped, use
it as condition for md_stop_writes() in raid_postsuspend() doesn't
look correct. Consider that reentrant stop_sync_thread() do nothing,
always call md_stop_writes() in raid_postsuspend().
4) raid_message can set/clear the flag MD_RECOVERY_FROZEN at anytime,
and if MD_RECOVERY_FROZEN is cleared while the array is suspended,
new sync_thread can start unexpected. Fix this by disallow
raid_message() to change sync_thread status during suspend.
Note that after commit f52f5c71f3d4 ("md: fix stopping sync thread"), the
test shell/lvconvert-raid-reshape.sh start to hang in stop_sync_thread(),
and with previous fixes, the test won't hang there anymore, however, the
test will still fail and complain that ext4 is corrupted. And with this
patch, the test won't hang due to stop_sync_thread() or fail due to ext4
is corrupted anymore. However, there is still a deadlock related to
dm-raid456 that will be fixed in following patches.
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Closes: https://lore.kernel.org/all/e5e8afe2-e9a8-49a2-5ab0-958d4065c55e@redhat.com/
Fixes: 1af2048a3e ("dm raid: fix deadlock caused by premature md_stop_writes()")
Fixes: 9dbd1aa3a8 ("dm raid: add reshaping support to the target")
Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
Cc: stable@vger.kernel.org # v6.7+
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Acked-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240305072306.2562024-6-yukuai1@huaweicloud.com
Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-26279
commit 2b16a52549d51937a98d82b07b4d83dce6c43683
Author: Yu Kuai <yukuai3@huawei.com>
Date: Tue Oct 10 23:19:58 2023 +0800
md: rename __mddev_suspend/resume() back to mddev_suspend/resume()
Now that the old apis are removed, __mddev_suspend/resume() can be
renamed to their original names.
This is done by:
sed -i "s/__mddev_suspend/mddev_suspend/g" *.[ch]
sed -i "s/__mddev_resume/mddev_resume/g" *.[ch]
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231010151958.145896-20-yukuai1@huaweicloud.com
Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-26279
commit 4eb3327aa28f3a737c2d3f7e35e83575f1d52283
Author: Yu Kuai <yukuai3@huawei.com>
Date: Tue Oct 10 23:19:45 2023 +0800
md/dm-raid: use new apis to suspend array
Convert to use new apis, the old apis will be removed eventually.
These are not hot path, so performance is not concerned.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231010151958.145896-7-yukuai1@huaweicloud.com
Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12342
JIRA: https://issues.redhat.com/browse/RHEL-12435
Upstream Status: kernel/git/torvalds/linux.git
commit 7d5fff8982a2199d49ec067818af7d84d4f95ca0
Author: Yu Kuai <yukuai3@huawei.com>
Date: Sat Jul 8 17:21:53 2023 +0800
dm raid: protect md_stop() with 'reconfig_mutex'
__md_stop_writes() and __md_stop() will modify many fields that are
protected by 'reconfig_mutex', and all the callers will grab
'reconfig_mutex' except for md_stop().
Also, update md_stop() to make certain 'reconfig_mutex' is held using
lockdep_assert_held().
Fixes: 9d09e663d5 ("dm: raid456 basic support")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12342
JIRA: https://issues.redhat.com/browse/RHEL-12435
Upstream Status: kernel/git/torvalds/linux.git
commit e74c874eabe2e9173a8fbdad616cd89c70eb8ffd
Author: Yu Kuai <yukuai3@huawei.com>
Date: Sat Jul 8 17:21:52 2023 +0800
dm raid: clean up four equivalent goto tags in raid_ctr()
There are four equivalent goto tags in raid_ctr(), clean them up to
use just one.
There is no functional change and this is preparation to fix
raid_ctr()'s unprotected md_stop().
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12342
Upstream Status: kernel/git/torvalds/linux.git
commit bae3028799dc4f1109acc4df37c8ff06f2d8f1a0
Author: Yu Kuai <yukuai3@huawei.com>
Date: Sat Jul 8 17:21:51 2023 +0800
dm raid: fix missing reconfig_mutex unlock in raid_ctr() error paths
In the error paths 'bad_stripe_cache' and 'bad_check_reshape',
'reconfig_mutex' is still held after raid_ctr() returns.
Fixes: 9dbd1aa3a8 ("dm raid: add reshaping support to the target")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12455
commit d58eff83bd3c6166944f6b159544438385d48549
Author: Yu Kuai <yukuai3@huawei.com>
Date: Fri Aug 25 11:09:50 2023 +0800
md: initialize 'active_io' while allocating mddev
'active_io' is used for mddev_suspend() and it's initialized in
md_run(), this restrict that 'reconfig_mutex' must be held and
"mddev->pers" must be set before calling mddev_suspend().
Initialize 'active_io' early so that mddev_suspend() is safe to call
once mddev is allocated, this will be helpful to refactor
mddev_suspend() in following patches.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230825030956.1527023-2-yukuai1@huaweicloud.com
Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-3359
commit a865b96c513bcaeec49669010d67c40aa8e58619
Author: Yu Kuai <yukuai3@huawei.com>
Date: Mon May 29 21:20:32 2023 +0800
Revert "md: unlock mddev before reap sync_thread in action_store"
This reverts commit 9dfbdafda3b34e262e43e786077bab8e476a89d1.
Because it will introduce a defect that sync_thread can be running while
MD_RECOVERY_RUNNING is cleared, which will cause some unexpected problems,
for example:
list_add corruption. prev->next should be next (ffff0001ac1daba0), but was ffff0000ce1a02a0. (prev=ffff0000ce1a02a0).
Call trace:
__list_add_valid+0xfc/0x140
insert_work+0x78/0x1a0
__queue_work+0x500/0xcf4
queue_work_on+0xe8/0x12c
md_check_recovery+0xa34/0xf30
raid10d+0xb8/0x900 [raid10]
md_thread+0x16c/0x2cc
kthread+0x1a4/0x1ec
ret_from_fork+0x10/0x18
This is because work is requeued while it's still inside workqueue:
t1: t2:
action_store
mddev_lock
if (mddev->sync_thread)
mddev_unlock
md_unregister_thread
// first sync_thread is done
md_check_recovery
mddev_try_lock
/*
* once MD_RECOVERY_DONE is set, new sync_thread
* can start.
*/
set_bit(MD_RECOVERY_RUNNING, &mddev->recovery)
INIT_WORK(&mddev->del_work, md_start_sync)
queue_work(md_misc_wq, &mddev->del_work)
test_and_set_bit(WORK_STRUCT_PENDING_BIT, ...)
// set pending bit
insert_work
list_add_tail
mddev_unlock
mddev_lock_nointr
md_reap_sync_thread
// MD_RECOVERY_RUNNING is cleared
mddev_unlock
t3:
// before queued work started from t2
md_check_recovery
// MD_RECOVERY_RUNNING is not set, a new sync_thread can be started
INIT_WORK(&mddev->del_work, md_start_sync)
work->data = 0
// work pending bit is cleared
queue_work(md_misc_wq, &mddev->del_work)
insert_work
list_add_tail
// list is corrupted
The above commit is reverted to fix the problem, the deadlock this
commit tries to fix will be fixed in following patches.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230529132037.2124527-2-yukuai1@huaweicloud.com
Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-1516
commit 955a257d69e44cea09b0375b8f2f3d4d9fcf7b4e
Author: Yu Kuai <yukuai3@huawei.com>
Date: Tue May 23 10:10:14 2023 +0800
dm-raid: remove useless checking in raid_message()
md_wakeup_thread() handle the case that pass in md_thread is NULL, there
is no need to check this.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230523021017.3048783-3-yukuai1@huaweicloud.com
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2189971
Upstream Status: kernel/git/torvalds/linux.git
commit 3664ff82dae1ef9f14f7763d3dd30565e7ef9e14
Author: Yangtao Li <frank.li@vivo.com>
Date: Mon Apr 10 00:43:37 2023 +0800
dm: add helper macro for simple DM target module init and exit
Eliminate duplicate boilerplate code for simple modules that contain
a single DM target driver without any additional setup code.
Add a new module_dm() macro, which replaces the module_init() and
module_exit() with template functions that call dm_register_target()
and dm_unregister_target() respectively.
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2189971
Upstream Status: kernel/git/torvalds/linux.git
commit 306fbc2e041c227be7c934efe8a49ddb87bd31f1
Author: Tom Rix <trix@redhat.com>
Date: Thu Mar 30 17:27:53 2023 -0400
dm raid: remove unused d variable
clang with W=1 reports
drivers/md/dm-raid.c:2212:15: error: variable
'd' set but not used [-Werror,-Wunused-but-set-variable]
unsigned int d;
^
This variable is not used so remove it.
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2179168
Upstream Status: kernel/git/torvalds/linux.git
commit 3bd940030752a33ff665eefdd74a1cdb74a4f9b0
Author: Heinz Mauelshagen <heinzm@redhat.com>
Date: Wed Jan 25 21:00:44 2023 +0100
dm: add missing SPDX-License-Indentifiers
'GPL-2.0-only' is used instead of 'GPL-2.0' because SPDX has
deprecated its use.
Suggested-by: John Wiele <jwiele@redhat.com>
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2138462
Upstream Status: kernel/git/torvalds/linux.git
commit 9dd1cd3220eca534f2d47afad7ce85f4c40118d8
Author: Mike Snitzer <snitzer@kernel.org>
Date: Wed Jul 20 13:58:04 2022 -0400
dm: fix dm-raid crash if md_handle_request() splits bio
Commit ca522482e3eaf ("dm: pass NULL bdev to bio_alloc_clone")
introduced the optimization to _not_ perform bio_associate_blkg()'s
relatively costly work when DM core clones its bio. But in doing so it
exposed the possibility for DM's cloned bio to alter DM target
behavior (e.g. crash) if a target were to issue IO without first
calling bio_set_dev().
The DM raid target can trigger an MD crash due to its need to split
the DM bio that is passed to md_handle_request(). The split will
recurse to submit_bio_noacct() using a bio with an uninitialized
->bi_blkg. This NULL bio->bi_blkg causes blk_throtl_bio() to
dereference a NULL blkg_to_tg(bio->bi_blkg).
Fix this in DM core by adding a new 'needs_bio_set_dev' target flag that
will make alloc_tio() call bio_set_dev() on behalf of the target.
dm-raid is the only target that requires this flag. bio_set_dev()
initializes the DM cloned bio's ->bi_blkg, using bio_associate_blkg,
before passing the bio to md_handle_request().
Long-term fix would be to audit and refactor MD code to rely on DM to
split its bio, using dm_accept_partial_bio(), but there are MD raid
personalities (e.g. raid1 and raid10) whose implementation are tightly
coupled to handling the bio splitting inline.
Fixes: ca522482e3eaf ("dm: pass NULL bdev to bio_alloc_clone")
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2113822
commit 9dfbdafda3b34e262e43e786077bab8e476a89d1
Author: Guoqing Jiang <guoqing.jiang@linux.dev>
Date: Tue Jun 21 11:11:29 2022 +0800
md: unlock mddev before reap sync_thread in action_store
Since the bug which commit 8b48ec23cc51a ("md: don't unregister sync_thread
with reconfig_mutex held") fixed is related with action_store path, other
callers which reap sync_thread didn't need to be changed.
Let's pull md_unregister_thread from md_reap_sync_thread, then fix previous
bug with belows.
1. unlock mddev before md_reap_sync_thread in action_store.
2. save reshape_position before unlock, then restore it to ensure position
not changed accidentally by others.
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2113822
commit d0a180341fe00cd0bd1cc259d196dc255c13f229
Author: Guoqing Jiang <guoqing.jiang@linux.dev>
Date: Tue Jun 7 10:03:56 2022 +0800
Revert "md: don't unregister sync_thread with reconfig_mutex held"
The 07reshape5intr test is broke because of below path.
md_reap_sync_thread
-> mddev_unlock
-> md_unregister_thread(&mddev->sync_thread)
And md_check_recovery is triggered by,
mddev_unlock -> md_wakeup_thread(mddev->thread)
then mddev->reshape_position is set to MaxSector in raid5_finish_reshape
since MD_RECOVERY_INTR is cleared in md_check_recovery, which means
feature_map is not set with MD_FEATURE_RESHAPE_ACTIVE and superblock's
reshape_position can't be updated accordingly.
Fixes: 8b48ec23cc51a ("md: don't unregister sync_thread with reconfig_mutex held")
Reported-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2118511
commit 4ce4c73f662bdb0ae5bfb058bc7ec6f6829ca078
Author: Bart Van Assche <bvanassche@acm.org>
Date: Thu Jul 14 11:06:57 2022 -0700
md/core: Combine two sync_page_io() arguments
Improve uniformity in the kernel of handling of request operation and
flags by passing these as a single argument.
Cc: Song Liu <song@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220714180729.1065367-32-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2115117
Upstream Status: kernel/git/torvalds/linux.git
commit 7dad24db59d2d2803576f2e3645728866a056dab
Author: Mikulas Patocka <mpatocka@redhat.com>
Date: Sun Jul 24 14:33:52 2022 -0400
dm raid: fix address sanitizer warning in raid_resume
There is a KASAN warning in raid_resume when running the lvm test
lvconvert-raid.sh. The reason for the warning is that mddev->raid_disks
is greater than rs->raid_disks, so the loop touches one entry beyond
the allocated length.
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2115117
Upstream Status: kernel/git/torvalds/linux.git
commit 1fbeea217d8f297fe0e0956a1516d14ba97d0396
Author: Mikulas Patocka <mpatocka@redhat.com>
Date: Sun Jul 24 14:31:35 2022 -0400
dm raid: fix address sanitizer warning in raid_status
There is this warning when using a kernel with the address sanitizer
and running this testsuite:
https://gitlab.com/cki-project/kernel-tests/-/tree/main/storage/swraid/scsi_raid
==================================================================
BUG: KASAN: slab-out-of-bounds in raid_status+0x1747/0x2820 [dm_raid]
Read of size 4 at addr ffff888079d2c7e8 by task lvcreate/13319
CPU: 0 PID: 13319 Comm: lvcreate Not tainted 5.18.0-0.rc3.<snip> #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Call Trace:
<TASK>
dump_stack_lvl+0x6a/0x9c
print_address_description.constprop.0+0x1f/0x1e0
print_report.cold+0x55/0x244
kasan_report+0xc9/0x100
raid_status+0x1747/0x2820 [dm_raid]
dm_ima_measure_on_table_load+0x4b8/0xca0 [dm_mod]
table_load+0x35c/0x630 [dm_mod]
ctl_ioctl+0x411/0x630 [dm_mod]
dm_ctl_ioctl+0xa/0x10 [dm_mod]
__x64_sys_ioctl+0x12a/0x1a0
do_syscall_64+0x5b/0x80
The warning is caused by reading conf->max_nr_stripes in raid_status. The
code in raid_status reads mddev->private, casts it to struct r5conf and
reads the entry max_nr_stripes.
However, if we have different raid type than 4/5/6, mddev->private
doesn't point to struct r5conf; it may point to struct r0conf, struct
r1conf, struct r10conf or struct mpconf. If we cast a pointer to one
of these structs to struct r5conf, we will be reading invalid memory
and KASAN warns about it.
Fix this bug by reading struct r5conf only if raid type is 4, 5 or 6.
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2105293
Upstream Status: kernel.org Linus tree
Tested: ran QE MD Test suite and has passed the sanity test.
Unregister sync_thread doesn't need to hold reconfig_mutex since it
doesn't reconfigure array.
And it could cause deadlock problem for raid5 as follows:
1. process A tried to reap sync thread with reconfig_mutex held after echo
idle to sync_action.
2. raid5 sync thread was blocked if there were too many active stripes.
3. SB_CHANGE_PENDING was set (because of write IO comes from upper layer)
which causes the number of active stripes can't be decreased.
4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was not able
to hold reconfig_mutex.
More details in the link:
https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t
And add one parameter to md_reap_sync_thread since it could be called by
dm-raid which doesn't hold reconfig_mutex.
Reported-and-tested-by: Donald Buczek <buczek@molgen.mpg.de>
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Song Liu <song@kernel.org>
(cherry picked from commit 8b48ec23cc51a4e7c8dbaef5f34ebe67e1a80934)
Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2090507
Upstream Status: kernel/git/device-mapper/linux-dm.git
commit 332bd0778775d0cf105c4b9e03e460b590749916
Author: Heinz Mauelshagen <heinzm@redhat.com>
Date: Tue Jun 28 00:37:22 2022 +0200
dm raid: fix accesses beyond end of raid member array
On dm-raid table load (using raid_ctr), dm-raid allocates an array
rs->devs[rs->raid_disks] for the raid device members. rs->raid_disks
is defined by the number of raid metadata and image tupples passed
into the target's constructor.
In the case of RAID layout changes being requested, that number can be
different from the current number of members for existing raid sets as
defined in their superblocks. Example RAID layout changes include:
- raid1 legs being added/removed
- raid4/5/6/10 number of stripes changed (stripe reshaping)
- takeover to higher raid level (e.g. raid5 -> raid6)
When accessing array members, rs->raid_disks must be used in control
loops instead of the potentially larger value in rs->md.raid_disks.
Otherwise it will cause memory access beyond the end of the rs->devs
array.
Fix this by changing code that is prone to out-of-bounds access.
Also fix validate_raid_redundancy() to validate all devices that are
added. Also, use braces to help clean up raid_iterate_devices().
The out-of-bounds memory accesses was discovered using KASAN.
This commit was verified to pass all LVM2 RAID tests (with KASAN
enabled).
Cc: stable@vger.kernel.org
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917
Conflicts: deal with xfs conflict because we don't backport commit ("
0560f31a09e5 xfs: convert mount flags to features"), and what we need
is just to replace blk_queue_discard() with bdev_max_discard_sectors().
commit 70200574cc229f6ba038259e8142af2aa09e6976
Author: Christoph Hellwig <hch@lst.de>
Date: Fri Apr 15 06:52:55 2022 +0200
block: remove QUEUE_FLAG_DISCARD
Just use a non-zero max_discard_sectors as an indicator for discard
support, similar to what is done for write zeroes.
The only places where needs special attention is the RAID5 driver,
which must clear discard support for security reasons by default,
even if the default stacking rules would allow for it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd]
Acked-by: Jan Höppner <hoeppner@linux.ibm.com> [s390]
Acked-by: Coly Li <colyli@suse.de> [bcache]
Acked-by: David Sterba <dsterba@suse.com> [btrfs]
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-25-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ming Lei <ming.lei@redhat.com>