Centos-kernel-stream-9

Commit Graph

Author	SHA1	Message	Date
Bill O'Donnell	7872b37dd0	xfs: read only mounts with fsopen mount API are busted JIRA: https://issues.redhat.com/browse/RHEL-65728 commit d8d222e09dab84a17bb65dda4b94d01c565f5327 Author: Dave Chinner <dchinner@redhat.com> Date: Tue Jan 16 15:33:07 2024 +1100 xfs: read only mounts with fsopen mount API are busted Recently xfs/513 started failing on my test machines testing "-o ro,norecovery" mount options. This was being emitted in dmesg: [ 9906.932724] XFS (pmem0): no-recovery mounts must be read-only. Turns out, readonly mounts with the fsopen()/fsconfig() mount API have been busted since day zero. It's only taken 5 years for debian unstable to start using this "new" mount API, and shortly after this I noticed xfs/513 had started to fail as per above. The syscall trace is: fsopen("xfs", FSOPEN_CLOEXEC) = 3 mount_setattr(-1, NULL, 0, NULL, 0) = -1 EINVAL (Invalid argument) ..... fsconfig(3, FSCONFIG_SET_STRING, "source", "/dev/pmem0", 0) = 0 fsconfig(3, FSCONFIG_SET_FLAG, "ro", NULL, 0) = 0 fsconfig(3, FSCONFIG_SET_FLAG, "norecovery", NULL, 0) = 0 fsconfig(3, FSCONFIG_CMD_CREATE, NULL, NULL, 0) = -1 EINVAL (Invalid argument) close(3) = 0 Showing that the actual mount instantiation (FSCONFIG_CMD_CREATE) is what threw out the error. During mount instantiation, we call xfs_fs_validate_params() which does: /* No recovery flag requires a read-only mount / if (xfs_has_norecovery(mp) && !xfs_is_readonly(mp)) { xfs_warn(mp, "no-recovery mounts must be read-only."); return -EINVAL; } and xfs_is_readonly() checks internal mount flags for read only state. This state is set in xfs_init_fs_context() from the context superblock flag state: / * Copy binary VFS mount flags we are interested in. / if (fc->sb_flags & SB_RDONLY) set_bit(XFS_OPSTATE_READONLY, &mp->m_opstate); With the old mount API, all of the VFS specific superblock flags had already been parsed and set before xfs_init_fs_context() is called, so this all works fine. However, in the brave new fsopen/fsconfig world, xfs_init_fs_context() is called from fsopen() context, before any VFS superblock have been set or parsed. Hence if we use fsopen(), the internal XFS readonly state is never set*. Hence anything that depends on xfs_is_readonly() actually returning true for read only mounts is broken if fsopen() has been used to mount the filesystem. Fix this by moving this internal state initialisation to xfs_fs_fill_super() before we attempt to validate the parameters that have been set prior to the FSCONFIG_CMD_CREATE call being made. Signed-off-by: Dave Chinner <dchinner@redhat.com> Fixes: `73e5fff98b` ("xfs: switch to use the new mount-api") cc: stable@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2024-11-20 11:26:21 -06:00
Bill O'Donnell	0ab240b7f7	xfs: clean up the xfs_reserve_blocks interface JIRA: https://issues.redhat.com/browse/RHEL-65728 commit 646ddf0c4df5181a7057ecccd29e535baaf034b2 Author: Christoph Hellwig <hch@lst.de> Date: Mon Dec 4 18:40:56 2023 +0100 xfs: clean up the xfs_reserve_blocks interface xfs_reserve_blocks has a very odd interface that can only be explained by it directly deriving from the IRIX fcntl handler back in the day. Split reporting out the reserved blocks out of xfs_reserve_blocks into the only caller that cares. This means that the value reported from XFS_IOC_SET_RESBLKS isn't atomically sampled in the same critical section as when it was set anymore, but as the values could change right after setting them anyway that does not matter. It does provide atomic sampling of both values for XFS_IOC_GET_RESBLKS now, though. Also pass a normal scalar integer value for the requested value instead of the pointless pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2024-11-20 11:25:56 -06:00
Bill O'Donnell	3ec04f5ca0	xfs: create a helper to convert rtextents to rtblocks JIRA: https://issues.redhat.com/browse/RHEL-62760 commit fa5a387230861116c2434c20d29fc4b3fd077d24 Author: Darrick J. Wong <djwong@kernel.org> Date: Mon Oct 16 09:32:54 2023 -0700 xfs: create a helper to convert rtextents to rtblocks Create a helper to convert a realtime extent to a realtime block. Later on we'll change the helper to use bit shifts when possible. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2024-11-09 10:06:36 -06:00
Bill O'Donnell	16a68f2016	xfs: track usage statistics of online fsck JIRA: https://issues.redhat.com/browse/RHEL-57114 Conflicts: diff due to previous out of order application of scrub patches Add redhat/configs/common/generic/XFS_ONLINE_SCRUB_STATS. commit d7a74cad8f45133935c59ed0adf949f85238624b Author: Darrick J. Wong <djwong@kernel.org> Date: Thu Aug 10 07:48:07 2023 -0700 xfs: track usage statistics of online fsck Track the usage, outcomes, and run times of the online fsck code, and report these values via debugfs. The columns in the file are: * scrubber name * number of scrub invocations * clean objects found * corruptions found * optimizations found * cross referencing failures * inconsistencies found during cross referencing * incomplete scrubs * warnings * number of time scrub had to retry * cumulative amount of time spent scrubbing (microseconds) * number of repair inovcations * successfully repaired objects * cumuluative amount of time spent repairing (microseconds) Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2024-10-15 10:46:26 -05:00
Bill O'Donnell	964f38ed28	xfs: create scaffolding for creating debugfs entries JIRA: https://issues.redhat.com/browse/RHEL-57114 Conflicts: diff due to previous out of order application of 35a93b148b0 (rhel `f9ca79532a` xfs: close the external block devices in xfs_mount_free). commit a76dba3b248cb0c2b93d66f463d5ca3cf7037d28 Author: Darrick J. Wong <djwong@kernel.org> Date: Thu Aug 10 07:48:07 2023 -0700 xfs: create scaffolding for creating debugfs entries Set up debugfs directories for xfs as a whole, and a subdirectory for each mounted filesystem. This will enable the creation of debugfs files in the next patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2024-10-15 10:46:25 -05:00
Lucas Zampieri	a37318513e	Merge: xfs: warn deprecation of V4 format beginning with RHEL10 instead of 2030. MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4448 JIRA: https://issues.redhat.com/browse/RHEL-40421 Replace 2030 with RHEL10 in deprecation warning for V4 format. Signed-off-by: Bill O'Donnell <bodonnel@redhat.com> Approved-by: Andrey Albershteyn <aalbersh@redhat.com> Approved-by: Brian Foster <bfoster@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Lucas Zampieri <lzampier@redhat.com>	2024-07-01 12:47:55 +00:00
Bill O'Donnell	b82c88656e	xfs: warn deprecation of V4 format beginning with RHEL10 instead of 2030. JIRA: https://issues.redhat.com/browse/RHEL-40421 Upstream Status: RHEL-only Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2024-06-07 13:10:03 -05:00
Bill O'Donnell	c2563745c8	xfs: drop EXPERIMENTAL tag for large extent counts JIRA: https://issues.redhat.com/browse/RHEL-25419 commit 61d7e8274cd84f574e686b24048ebf29bac861cc Author: Darrick J. Wong <djwong@kernel.org> Date: Mon Jun 12 18:09:04 2023 -0700 xfs: drop EXPERIMENTAL tag for large extent counts This feature has been baking in upstream for ~10mo with no bug reports. It seems to work fine here, let's get rid of the scary warnings? Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2024-06-06 10:32:51 -05:00
Bill O'Donnell	a2aea1128d	xfs: deprecate the ascii-ci feature Conflicts: added redhat/configs/common/generic/CONFIG_XFS_SUPPORT_ASCII_CI deprecated now, but is completely removed in RHEL10. Deprecated ASCII case-insensitivity feature (ascii-ci=1) will not be supported in RHEL10. JIRA: https://issues.redhat.com/browse/RHEL-25419 commit 7ba83850ca2691865713b307ed001bde5fddb084 Author: Darrick J. Wong <djwong@kernel.org> Date: Tue Apr 11 19:05:19 2023 -0700 xfs: deprecate the ascii-ci feature This feature is a mess -- the hash function has been broken for the entire 15 years of its existence if you create names with extended ascii bytes; metadump name obfuscation has silently failed for just as long; and the feature clashes horribly with the UTF8 encodings that most systems use today. There is exactly one fstest for this feature. In other words, this feature is crap. Let's deprecate it now so we can remove it from the codebase in 2030. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2024-06-06 10:32:47 -05:00
Bill O'Donnell	c7160f6553	xfs: dax - remove tech preview tag JIRA: https://issues.redhat.com/browse/RHEL-35289 Upstream Status: RHEL only Since we've backported the dax patches that remove experimental, remove the tech-preview designation for RHEL. Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2024-05-02 17:42:50 -05:00
Ming Lei	ca8eaf1249	fs,block: yield devices early JIRA: https://issues.redhat.com/browse/RHEL-29564 Conflicts: drop change on f2fs, bcachefs and reiserfs; context difference on ext4 & fs/super.c change. commit 22650a99821dda3d05f1c334ea90330b4982de56 Author: Christian Brauner <brauner@kernel.org> Date: Tue Mar 26 13:47:22 2024 +0100 fs,block: yield devices early Currently a device is only really released once the umount returns to userspace due to how file closing works. That ultimately could cause an old umount assumption to be violated that concurrent umount and mount don't fail. So an exclusively held device with a temporary holder should be yielded before the filesystem is gone. Add a helper that allows callers to do that. This also allows us to remove the two holder ops that Linus wasn't excited about. Link: https://lore.kernel.org/r/20240326-vfs-bdev-end_holder-v1-1-20af85202918@kernel.org Fixes: f3a608827d1f ("bdev: open block device as files") # mainline only Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2024-04-17 10:39:09 +08:00
Ming Lei	0c712a8085	xfs: port block device access to files JIRA: https://issues.redhat.com/browse/RHEL-29564 commit 1b9e2d90141c5e25faefbb7891f0ed8606aa02cf Author: Christian Brauner <brauner@kernel.org> Date: Tue Jan 23 14:26:24 2024 +0100 xfs: port block device access to files Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-7-adbd023e19cc@kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2024-04-17 10:18:37 +08:00
Ming Lei	04d1386d16	bdev: open block device as files JIRA: https://issues.redhat.com/browse/RHEL-29564 Conflicts: context difference since we don't carry 68279f9c9f59 ("treewide: mark stuff as __ro_after_init"); drop f2fs change commit f3a608827d1f8de0dd12813e8d9c6803fe64e119 Author: Christian Brauner <brauner@kernel.org> Date: Thu Feb 8 18:47:35 2024 +0100 bdev: open block device as files Add two new helpers to allow opening block devices as files. This is not the final infrastructure. This still opens the block device before opening a struct a file. Until we have removed all references to struct bdev_handle we can't switch the order: * Introduce blk_to_file_flags() to translate from block specific to flags usable to pen a new file. * Introduce bdev_file_open_by_{dev,path}(). * Introduce temporary sb_bdev_handle() helper to retrieve a struct bdev_handle from a block device file and update places that directly reference struct bdev_handle to rely on it. * Don't count block device openes against the number of open files. A bdev_file_open_by_{dev,path}() file is never installed into any file descriptor table. One idea that came to mind was to use kernel_tmpfile_open() which would require us to pass a path and it would then call do_dentry_open() going through the regular fops->open::blkdev_open() path. But then we're back to the problem of routing block specific flags such as BLK_OPEN_RESTRICT_WRITES through the open path and would have to waste FMODE_* flags every time we add a new one. With this we can avoid using a flag bit and we have more leeway in how we open block devices from bdev_open_by_{dev,path}(). Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-1-adbd023e19cc@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2024-04-17 10:13:07 +08:00
Ming Lei	1966211b12	xfs: Block writes to log device JIRA: https://issues.redhat.com/browse/RHEL-29564 commit 3584c8f48a70c1f74c7b7bab59cf22bb66224649 Author: Jan Kara <jack@suse.cz> Date: Wed Nov 1 18:43:11 2023 +0100 xfs: Block writes to log device Ask block layer to not allow other writers to open block devices used for xfs log and realtime devices. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231101174325.10596-6-jack@suse.cz Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2024-04-17 10:04:36 +08:00
Bill O'Donnell	13b704d0f9	xfs: drop experimental warning for FSDAX JIRA: https://issues.redhat.com/browse/RHEL-15319 commit 27c86d43bcdb97d00359702713bfff6c006f0d90 Author: Shiyang Ruan <ruansy.fnst@fujitsu.com> Date: Fri Sep 15 14:38:54 2023 +0800 xfs: drop experimental warning for FSDAX FSDAX and reflink can work together now, let's drop this warning. Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2024-04-05 11:58:49 -05:00
Ming Lei	079721ca6f	xfs: Convert to bdev_open_by_path() JIRA: https://issues.redhat.com/browse/RHEL-29262 commit e340dd63f6a11402424b3d77e51149bce8fcba7d Author: Jan Kara <jack@suse.cz> Date: Wed Sep 27 11:34:34 2023 +0200 xfs: Convert to bdev_open_by_path() Convert xfs to use bdev_open_by_path() and pass the handle around. CC: "Darrick J. Wong" <djwong@kernel.org> CC: linux-xfs@vger.kernel.org Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-28-jack@suse.cz Acked-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2024-03-19 10:07:46 +08:00
Ming Lei	857ddf58bd	xfs use fs_holder_ops for the log and RT devices JIRA: https://issues.redhat.com/browse/RHEL-29262 commit 8ffa54e3370c5a8b9538dbe4077fc9c4b5a08f45 Author: Christoph Hellwig <hch@lst.de> Date: Wed Aug 2 17:41:31 2023 +0200 xfs use fs_holder_ops for the log and RT devices Use the generic fs_holder_ops to shut down the file system when the log or RT device goes away instead of duplicating the logic. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Message-Id: <20230802154131.2221419-13-hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2024-03-19 10:07:44 +08:00
Ming Lei	3507ce1209	xfs: drop s_umount over opening the log and RT devices JIRA: https://issues.redhat.com/browse/RHEL-29262 commit 8d945b595ed07db13fef1f3311ad456c97941930 Author: Christoph Hellwig <hch@lst.de> Date: Wed Aug 2 17:41:30 2023 +0200 xfs: drop s_umount over opening the log and RT devices Just like get_tree_bdev needs to drop s_umount when opening the main device, we need to do the same for the xfs log and RT devices to avoid a potential lock order reversal with s_unmount for the mark_dead path. It might be preferable to just drop s_umount over ->fill_super entirely, but that will require a fairly massive audit first, so we'll do the easy version here first. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Message-Id: <20230802154131.2221419-12-hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2024-03-19 10:07:44 +08:00
Ming Lei	e73ab373cf	xfs: document the invalidate_bdev call in invalidate_bdev JIRA: https://issues.redhat.com/browse/RHEL-29262 commit 1a0a5dad67b60250dce151c9533ccbecdfd822d4 Author: Christoph Hellwig <hch@lst.de> Date: Wed Aug 9 15:05:39 2023 -0700 xfs: document the invalidate_bdev call in invalidate_bdev Copy and paste the commit message from Darrick into a comment to explain the seemingly odd invalidate_bdev in xfs_shutdown_devices. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Message-Id: <20230809220545.1308228-8-hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2024-03-19 10:07:43 +08:00
Ming Lei	f9ca79532a	xfs: close the external block devices in xfs_mount_free JIRA: https://issues.redhat.com/browse/RHEL-29262 commit 35a93b148b0363dca23c3db1cc9d48100eb8b276 Author: Christoph Hellwig <hch@lst.de> Date: Wed Aug 9 15:05:38 2023 -0700 xfs: close the external block devices in xfs_mount_free blkdev_put must not be called under sb->s_umount to avoid a lock order reversal with disk->open_mutex. Move closing the buftargs into ->kill_sb to archive that. Note that the flushing of the disk caches and block device mapping invalidated needs to stay in ->put_super as the main block device is closed in kill_block_super already. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Message-Id: <20230809220545.1308228-7-hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2024-03-19 10:07:43 +08:00
Ming Lei	c732d9002a	xfs: remove xfs_blkdev_put JIRA: https://issues.redhat.com/browse/RHEL-29262 commit d3ef7e94ee36adc8f0006d253a9ad45793b874cd Author: Christoph Hellwig <hch@lst.de> Date: Wed Aug 9 15:05:36 2023 -0700 xfs: remove xfs_blkdev_put There isn't much use for this trivial wrapper, especially as the NULL check is only needed in a single call site. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Message-Id: <20230809220545.1308228-5-hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2024-03-19 10:07:43 +08:00
Ming Lei	9f146111a0	xfs: free the xfs_mount in ->kill_sb JIRA: https://issues.redhat.com/browse/RHEL-29262 commit 2a9311adb87c98599989b80405fe2c60cd4075dd Author: Christoph Hellwig <hch@lst.de> Date: Wed Aug 9 15:05:35 2023 -0700 xfs: free the xfs_mount in ->kill_sb As a rule of thumb everything allocated to the fs_context and moved into the super_block should be freed by ->kill_sb so that the teardown handling doesn't need to be duplicated between the fill_super error path and put_super. Implement a XFS-specific kill_sb method to do that. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Message-Id: <20230809220545.1308228-4-hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2024-03-19 10:07:43 +08:00
Ming Lei	2101e2bdfc	xfs: remove a superfluous s_fs_info NULL check in xfs_fs_put_super JIRA: https://issues.redhat.com/browse/RHEL-29262 commit 1aa2d074d4c777e2150382878d0a5611d829b380 Author: Christoph Hellwig <hch@lst.de> Date: Wed Aug 9 15:05:34 2023 -0700 xfs: remove a superfluous s_fs_info NULL check in xfs_fs_put_super ->put_super is only called when sb->s_root is set, and thus when fill_super succeeds. Thus drop the NULL check that can't happen in xfs_fs_put_super. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Message-Id: <20230809220545.1308228-3-hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2024-03-19 10:07:43 +08:00
Ming Lei	5e678b3586	xfs: reformat the xfs_fs_free prototype JIRA: https://issues.redhat.com/browse/RHEL-29262 commit dbbff489064d89391c4f0c7a73e77e61ce29fe96 Author: Christoph Hellwig <hch@lst.de> Date: Wed Aug 9 15:05:33 2023 -0700 xfs: reformat the xfs_fs_free prototype The xfs_fs_free prototype formatting is a weird mix of the classic XFS style and the Linux style. Fix it up to be consistent. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Message-Id: <20230809220545.1308228-2-hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2024-03-19 10:07:43 +08:00
Bill O'Donnell	05ef68b0f5	xfs: remove CPU hotplug infrastructure JIRA: https://issues.redhat.com/browse/RHEL-15844 commit ef7d9593390a050c50eba5fc02d2cb65a1104434 Author: Darrick J. Wong <djwong@kernel.org> Date: Mon Sep 11 08:39:04 2023 -0700 xfs: remove CPU hotplug infrastructure There are no users of the cpu hotplug hooks in xfs now, so remove it. This reverts f1653c2e2831e ("xfs: introduce CPU hotplug infrastructure"). Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2023-11-19 16:21:09 -06:00
Bill O'Donnell	4832ccaf00	xfs: remove the all-mounts list JIRA: https://issues.redhat.com/browse/RHEL-15844 commit f5bfa695f02e02415e4bfb36bd83a8bc933a6d4f Author: Darrick J. Wong <djwong@kernel.org> Date: Mon Sep 11 08:39:04 2023 -0700 xfs: remove the all-mounts list Revert commit 0ed17f01c8540 ("xfs: introduce all-mounts list for cpu hotplug notifications") because the cpu hotplug hooks are now pointless, so we don't need this list anymore. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2023-11-19 16:21:09 -06:00
Bill O'Donnell	28e7f5db46	xfs: use per-mount cpumask to track nonempty percpu inodegc lists JIRA: https://issues.redhat.com/browse/RHEL-15844 commit 62334fab47621dd91ab30dd5bb6c43d78a8ec279 Author: Darrick J. Wong <djwong@kernel.org> Date: Mon Sep 11 08:39:03 2023 -0700 xfs: use per-mount cpumask to track nonempty percpu inodegc lists Directly track which CPUs have contributed to the inodegc percpu lists instead of trusting the cpu online mask. This eliminates a theoretical problem where the inodegc flush functions might fail to flush a CPU's inodes if that CPU happened to be dying at exactly the same time. Most likely nobody's noticed this because the CPU dead hook moves the percpu inodegc list to another CPU and schedules that worker immediately. But it's quite possible that this is a subtle race leading to UAF if the inodegc flush were part of an unmount. Further benefits: This reduces the overhead of the inodegc flush code slightly by allowing us to ignore CPUs that have empty lists. Better yet, it reduces our dependence on the cpu online masks, which have been the cause of confusion and drama lately. Fixes: ab23a7768739 ("xfs: per-cpu deferred inode inactivation queues") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2023-11-19 16:21:09 -06:00
Bill O'Donnell	1ce0a32435	xfs: fix per-cpu CIL structure aggregation racing with dying cpus JIRA: https://issues.redhat.com/browse/RHEL-15844 commit ecd49f7a36fbccc884471f86fc43de6ca8d1f786 Author: Darrick J. Wong <djwong@kernel.org> Date: Mon Sep 11 08:39:02 2023 -0700 xfs: fix per-cpu CIL structure aggregation racing with dying cpus In commit 7c8ade2121200 ("xfs: implement percpu cil space used calculation"), the XFS committed (log) item list code was converted to use per-cpu lists and space tracking to reduce cpu contention when multiple threads are modifying different parts of the filesystem and hence end up contending on the log structures during transaction commit. Each CPU tracks its own commit items and space usage, and these do not have to be merged into the main CIL until either someone wants to push the CIL items, or we run over a soft threshold and switch to slower (but more accurate) accounting with atomics. Unfortunately, the for_each_cpu iteration suffers from the same race with cpu dying problem that was identified in commit 8b57b11cca88f ("pcpcntrs: fix dying cpu summation race") -- CPUs are removed from cpu_online_mask before the CPUHP_XFS_DEAD callback gets called. As a result, both CIL percpu structure aggregation functions fail to collect the items and accounted space usage at the correct point in time. If we're lucky, the items that are collected from the online cpus exceed the space given to those cpus, and the log immediately shuts down in xlog_cil_insert_items due to the (apparent) log reservation overrun. This happens periodically with generic/650, which exercises cpu hotplug vs. the filesystem code: smpboot: CPU 3 is now offline XFS (sda3): ctx ticket reservation ran out. Need to up reservation XFS (sda3): ticket reservation summary: XFS (sda3): unit res = 9268 bytes XFS (sda3): current res = -40 bytes XFS (sda3): original count = 1 XFS (sda3): remaining count = 1 XFS (sda3): Filesystem has been shut down due to log error (0x2). Applying the same sort of fix from 8b57b11cca88f to the CIL code seems to make the generic/650 problem go away, but I've been told that tglx was not happy when he saw: "...the only thing we actually need to care about is that percpu_counter_sum() iterates dying CPUs. That's trivial to do, and when there are no CPUs dying, it has no addition overhead except for a cpumask_or() operation." The CPU hotplug code is rather complex and difficult to understand and I don't want to try to understand the cpu hotplug locking well enough to use cpu_dying mask. Furthermore, there's a performance improvement that could be had here. Attach a private cpu mask to the CIL structure so that we can track exactly which cpus have accessed the percpu data at all. It doesn't matter if the cpu has since gone offline; log item aggregation will still find the items. Better yet, we skip cpus that have not recently logged anything. Worse yet, Ritesh Harjani and Eric Sandeen both reported today that CPU hot remove racing with an xfs mount can crash if the cpu_dead notifier tries to access the log but the mount hasn't yet set up the log. Link: https://lore.kernel.org/linux-xfs/ZOLzgBOuyWHapOyZ@dread.disaster.area/T/ Link: https://lore.kernel.org/lkml/877cuj1mt1.ffs@tglx/ Link: https://lore.kernel.org/lkml/20230414162755.281993820@linutronix.de/ Link: https://lore.kernel.org/linux-xfs/ZOVkjxWZq0YmjrJu@dread.disaster.area/T/ Cc: tglx@linutronix.de Cc: peterz@infradead.org Reported-by: ritesh.list@gmail.com Reported-by: sandeen@sandeen.net Fixes: af1c2146a50b ("xfs: introduce per-cpu CIL tracking structure") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2023-11-19 16:21:09 -06:00
Bill O'Donnell	14fa9c73d2	xfs: check that per-cpu inodegc workers actually run on that cpu JIRA: https://issues.redhat.com/browse/RHEL-15844 Conflicts: diff in xfs_super.c due to previous out of order patch commit b37c4c8339cd394ea6b8b415026603320a185651 Author: Darrick J. Wong <djwong@kernel.org> Date: Tue May 2 09:16:12 2023 +1000 xfs: check that per-cpu inodegc workers actually run on that cpu Now that we've allegedly worked out the problem of the per-cpu inodegc workers being scheduled on the wrong cpu, let's put in a debugging knob to let us know if a worker ever gets mis-scheduled again. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2023-11-19 16:21:08 -06:00
Bill O'Donnell	19fab0b814	xfs: collect errors from inodegc for unlinked inode recovery JIRA: https://issues.redhat.com/browse/RHEL-2002 Conflicts: context differences due to out of order patch application commit d4d12c02bf5f768f1b423c7ae2909c5afdfe0d5f Author: Dave Chinner <dchinner@redhat.com> Date: Mon Jun 5 14:48:15 2023 +1000 xfs: collect errors from inodegc for unlinked inode recovery Unlinked list recovery requires errors removing the inode the from the unlinked list get fed back to the main recovery loop. Now that we offload the unlinking to the inodegc work, we don't get errors being fed back when we trip over a corruption that prevents the inode from being removed from the unlinked list. This means we never clear the corrupt unlinked list bucket, resulting in runtime operations eventually tripping over it and shutting down. Fix this by collecting inodegc worker errors and feed them back to the flush caller. This is largely best effort - the only context that really cares is log recovery, and it only flushes a single inode at a time so we don't need complex synchronised handling. Essentially the inodegc workers will capture the first error that occurs and the next flush will gather them and clear them. The flush itself will only report the first gathered error. In the cases where callers can return errors, propagate the collected inodegc flush error up the error handling chain. In the case of inode unlinked list recovery, there are several superfluous calls to flush queued unlinked inodes - xlog_recover_iunlink_bucket() guarantees that it has flushed the inodegc and collected errors before it returns. Hence nothing in the calling path needs to run a flush, even when an error is returned. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2023-11-10 07:22:27 -06:00
Bill O'Donnell	e9e80ba818	xfs: test dir/attr hash when loading module JIRA: https://issues.redhat.com/browse/RHEL-2002 commit 3cfb9290da3d87a5877b03bda96c3d5d3ed9fcb0 Author: Darrick J. Wong <djwong@kernel.org> Date: Thu Mar 16 09:31:20 2023 -0700 xfs: test dir/attr hash when loading module Back in the 6.2-rc1 days, Eric Whitney reported a fstests regression in ext4 against generic/454. The cause of this test failure was the unfortunate combination of setting an xattr name containing UTF8 encoded emoji, an xattr hash function that accepted a char pointer with no explicit signedness, signed type extension of those chars to an int, and the 6.2 build tools maintainers deciding to mandate -funsigned-char across the board. As a result, the ondisk extended attribute structure written out by 6.1 and 6.2 were not the same. This discrepancy, in fact, had been noticeable if a filesystem with such an xattr were moved between any two architectures that don't employ the same signedness of a raw "char" declaration. The only reason anyone noticed is that x86 gcc defaults to signed, and no such -funsigned-char update was made to e2fsprogs, so e2fsck immediately started reporting data corruption. After a day and a half of discussing how to handle this use case (xattrs with bit 7 set anywhere in the name) without breaking existing users, Linus merged his own patch and didn't tell the maintainer. None of the ext4 developers realized this until AUTOSEL announced that the commit had been backported to stable. In the end, this problem could have been detected much earlier if there had been any useful tests of hash function(s) in use inside ext4 to make sure that they always produce the same outputs given the same inputs. The XFS dirent/xattr name hash takes a uint8_t*, so I don't think it's vulnerable to this problem. However, let's avoid all this drama by adding our own self test to check that the da hash produces the same outputs for a static pile of inputs on various platforms. This enables us to fix any breakage that may result in a controlled fashion. The buffer and test data are identical to the patches submitted to xfsprogs. Link: https://lore.kernel.org/linux-ext4/Y8bpkm3jA3bDm3eL@debian-BULLSEYE-live-builder-AMD64/ Link: https://lore.kernel.org/linux-xfs/ZBUKCRR7xvIqPrpX@destitution/T/#md38272cc684e2c0d61494435ccbb91f022e8dee4 Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2023-11-10 07:22:24 -06:00
Bill O'Donnell	6ee6b421b0	xfs: perags need atomic operational state JIRA: https://issues.redhat.com/browse/RHEL-2002 commit 7ac2ff8bb3713c7cb43564c04384af2ee7cc1f8d Author: Dave Chinner <dchinner@redhat.com> Date: Mon Feb 13 09:14:52 2023 +1100 xfs: perags need atomic operational state We currently don't have any flags or operational state in the xfs_perag except for the pagf_init and pagi_init flags. And the agflreset flag. Oh, there's also the pagf_metadata and pagi_inodeok flags, too. For controlling per-ag operations, we are going to need some atomic state flags. Hence add an opstate field similar to what we already have in the mount and log, and convert all these state flags across to atomic bit operations. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2023-11-10 07:22:21 -06:00
Bill O'Donnell	fcd881af53	xfs: convert xfs_ialloc_next_ag() to an atomic JIRA: https://issues.redhat.com/browse/RHEL-2002 commit 20a5eab49d354a2837e0af3f07f92a104de52804 Author: Dave Chinner <dchinner@redhat.com> Date: Mon Feb 13 09:14:52 2023 +1100 xfs: convert xfs_ialloc_next_ag() to an atomic This is currently a spinlock lock protected rotor which can be implemented with a single atomic operation. Change it to be more efficient and get rid of the m_agirotor_lock. Noticed while converting the inode allocation AG selection loop to active perag references. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2023-11-10 07:22:21 -06:00
Bill O'Donnell	91cbfcc6bf	xfs: Print XFS UUID on mount and umount events. JIRA: https://issues.redhat.com/browse/RHEL-2002 commit 64c80dfd04d1dd2ecf550542c8f3f41b54b20207 Author: Lukas Herbolt <lukas@herbolt.com> Date: Wed Nov 16 19:20:21 2022 -0800 xfs: Print XFS UUID on mount and umount events. As of now only device names are printed out over __xfs_printk(). The device names are not persistent across reboots which in case of searching for origin of corruption brings another task to properly identify the devices. This patch add XFS UUID upon every mount/umount event which will make the identification much easier. Signed-off-by: Lukas Herbolt <lukas@herbolt.com> [sandeen: rebase onto current upstream kernel] Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2023-11-10 07:22:17 -06:00
Bill O'Donnell	8f3ac96b1b	xfs: refactor all the EFI/EFD log item sizeof logic JIRA: https://issues.redhat.com/browse/RHEL-2002 commit 3c5aaaced99912c9fb3352fc5af5b104df67d4aa Author: Darrick J. Wong <djwong@kernel.org> Date: Fri Oct 21 09:10:05 2022 -0700 xfs: refactor all the EFI/EFD log item sizeof logic Refactor all the open-coded sizeof logic for EFI/EFD log item and log format structures into common helper functions whose names reflect the struct names. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2023-11-06 19:42:18 -06:00
Bill O'Donnell	e2550967f8	xfs: fix memcpy fortify errors in EFI log format copying JIRA: https://issues.redhat.com/browse/RHEL-2002 commit 03a7485cd701e1c08baadcf39d9592d83715e224 Author: Darrick J. Wong <djwong@kernel.org> Date: Thu Oct 20 16:39:59 2022 -0700 xfs: fix memcpy fortify errors in EFI log format copying Starting in 6.1, CONFIG_FORTIFY_SOURCE checks the length parameter of memcpy. Since we're already fixing problems with BUI item copying, we should fix it everything else. An extra difficulty here is that the ef[id]_extents arrays are declared as single-element arrays. This is not the convention for flex arrays in the modern kernel, and it causes all manner of problems with static checking tools, since they often cannot tell the difference between a single element array and a flex array. So for starters, change those array[1] declarations to array[] declarations to signal that they are proper flex arrays and adjust all the "size-1" expressions to fit the new declaration style. Next, refactor the xfs_efi_copy_format function to handle the copying of the head and the flex array members separately. While we're at it, fix a minor validation deficiency in the recovery function. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2023-11-06 19:42:18 -06:00
Ming Lei	74aa7f1345	block: replace fmode_t with a block-specific type for block open flags JIRA: https://issues.redhat.com/browse/RHEL-1516 Conflicts: drop change on btrfs, f2fs, erofs, ublk, all are not enabled in rhel9; drop change on dm's open_table_device() because of code base difference, 'mode' isn't used in this function. commit 05bdb9965305bbfdae79b31d22df03d1e2cfcb22 Author: Christoph Hellwig <hch@lst.de> Date: Thu Jun 8 13:02:55 2023 +0200 block: replace fmode_t with a block-specific type for block open flags The only overlap between the block open flags mapped into the fmode_t and other uses of fmode_t are FMODE_READ and FMODE_WRITE. Define a new blk_mode_t instead for use in blkdev_get_by_{dev,path}, ->open and ->ioctl and stop abusing fmode_t. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230608110258.189493-28-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2023-09-18 17:59:18 +08:00
Ming Lei	e405cd2d35	block: use the holder as indication for exclusive opens JIRA: https://issues.redhat.com/browse/RHEL-1516 Conflicts: drop change in btrfs which isn't enabled in rhel, and drop change in erofs which needn't such change since the affected interface isn't used in erofs. commit 2736e8eeb0ccdc71d1f4256c9c9a28f58cc43307 Author: Christoph Hellwig <hch@lst.de> Date: Thu Jun 8 13:02:43 2023 +0200 block: use the holder as indication for exclusive opens The current interface for exclusive opens is rather confusing as it requires both the FMODE_EXCL flag and a holder. Remove the need to pass FMODE_EXCL and just key off the exclusive open off a non-NULL holder. For blkdev_put this requires adding the holder argument, which provides better debug checking that only the holder actually releases the hold, but at the same time allows removing the now superfluous mode argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: David Sterba <dsterba@suse.com> [btrfs] Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Link: https://lore.kernel.org/r/20230608110258.189493-16-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2023-09-18 17:57:43 +08:00
Ming Lei	c3dbc9f426	xfs: wire up the ->mark_dead holder operation for log and RT devices JIRA: https://issues.redhat.com/browse/RHEL-1516 commit 8067ca1dcdfcc2a5e0a51bff3730ad3eef0623d6 Author: Christoph Hellwig <hch@lst.de> Date: Thu Jun 1 11:44:56 2023 +0200 xfs: wire up the ->mark_dead holder operation for log and RT devices Implement a set of holder_ops that shut down the file system when the block device used as log or RT device is removed undeneath the file system. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20230601094459.1350643-14-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2023-09-18 15:59:32 +08:00
Ming Lei	e05e754c4d	xfs: wire up sops->shutdown JIRA: https://issues.redhat.com/browse/RHEL-1516 commit e7caa877e5ddac63886f4a8376cb3ffbd4dfe569 Author: Christoph Hellwig <hch@lst.de> Date: Thu Jun 1 11:44:55 2023 +0200 xfs: wire up sops->shutdown Wire up the shutdown method to shut down the file system when the underlying block device is marked dead. Add a new message to clearly distinguish this shutdown reason from other shutdowns. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20230601094459.1350643-13-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2023-09-18 15:59:32 +08:00
Ming Lei	2ef574f2fa	block: introduce holder ops JIRA: https://issues.redhat.com/browse/RHEL-1516 Conflicts: drop change on fs/erofs, which isn't enabled on rhel, and the affected symobol doesn't exit in erofs code too commit 0718afd47f70cf46877c39c25d06b786e1a3f36c Author: Christoph Hellwig <hch@lst.de> Date: Thu Jun 1 11:44:52 2023 +0200 block: introduce holder ops Add a new blk_holder_ops structure, which is passed to blkdev_get_by_* and installed in the block_device for exclusive claims. It will be used to allow the block layer to call back into the user of the block device for thing like notification of a removed device or a device resize. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20230601094459.1350643-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2023-09-18 15:59:32 +08:00
Carlos Maiolino	0e8baee5cd	fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2228888 Tested: xfstests Currently the I_DIRTY_TIME will never get set if the inode already has I_DIRTY_INODE with assumption that it supersedes I_DIRTY_TIME. That's true, however ext4 will only update the on-disk inode in ->dirty_inode(), not on actual writeback. As a result if the inode already has I_DIRTY_INODE state by the time we get to __mark_inode_dirty() only with I_DIRTY_TIME, the time was already filled into on-disk inode and will not get updated until the next I_DIRTY_INODE update, which might never come if we crash or get a power failure. The problem can be reproduced on ext4 by running xfstest generic/622 with -o iversion mount option. Fix it by allowing I_DIRTY_TIME to be set even if the inode already has I_DIRTY_INODE. Also make sure that the case is properly handled in writeback_single_inode() as well. Additionally changes in xfs_fs_dirty_inode() was made to accommodate for I_DIRTY_TIME in flag. Thanks Jan Kara for suggestions on how to make this work properly. Cc: Dave Chinner <david@fromorbit.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: stable@kernel.org Signed-off-by: Lukas Czerner <lczerner@redhat.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220825100657.44217-1-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> (cherry picked from commit cbfecb927f429a6fa613d74b998496bd71e4438a) Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>	2023-08-04 14:19:30 +02:00
Bill O'Donnell	f525103e49	xfs: fail dax mount if reflink is enabled on a partition Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832 commit 35fcd75af3edf035638e632bb49607cc8fc3cdf4 Author: Shiyang Ruan <ruansy.fnst@fujitsu.com> Date: Thu Jun 9 22:34:35 2022 +0800 xfs: fail dax mount if reflink is enabled on a partition Failure notification is not supported on partitions. So, when we mount a reflink enabled xfs on a partition with dax option, let it fail with -EINVAL code. Link: https://lkml.kernel.org/r/20220609143435.393724-1-ruansy.fnst@fujitsu.com Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2023-05-18 11:11:49 -05:00
Bill O'Donnell	426777a415	xfs: xfs_buf cache destroy isn't RCU safe Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832 commit 231f91ab504ecebcb88e942341b3d7dd91de45f1 Author: Dave Chinner <dchinner@redhat.com> Date: Mon Jul 18 18:20:37 2022 -0700 xfs: xfs_buf cache destroy isn't RCU safe Darrick and Sachin Sant reported that xfs/435 and xfs/436 would report an non-empty xfs_buf slab on module remove. This isn't easily to reproduce, but is clearly a side effect of converting the buffer caceh to RUC freeing and lockless lookups. Sachin bisected and Darrick hit it when testing the patchset directly. Turns out that the xfs_buf slab is not destroyed when all the other XFS slab caches are destroyed. Instead, it's got it's own little wrapper function that gets called separately, and so it doesn't have an rcu_barrier() call in it that is needed to drain all the rcu callbacks before the slab is destroyed. Fix it by removing the xfs_buf_init/terminate wrappers that just allocate and destroy the xfs_buf slab, and move them to the same place that all the other slab caches are set up and destroyed. Reported-and-tested-by: Sachin Sant <sachinp@linux.ibm.com> Fixes: 298f34224506 ("xfs: lockless buffer lookup") Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2023-05-18 11:11:48 -05:00
Bill O'Donnell	0507113b79	xfs: add in-memory iunlink log item Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832 commit 784eb7d8dd4163b82a19b914f76b2834a58a3e4c Author: Dave Chinner <dchinner@redhat.com> Date: Thu Jul 14 11:47:42 2022 +1000 xfs: add in-memory iunlink log item Now that we have a clean operation to update the di_next_unlinked field of inode cluster buffers, we can easily defer this operation to transaction commit time so we can order the inode cluster buffer locking consistently. To do this, we introduce a new in-memory log item to track the unlinked list item modification that we are going to make. This follows the same observations as the in-memory double linked list used to track unlinked inodes in that the inodes on the list are pinned in memory and cannot go away, and hence we can simply reference them for the duration of the transaction without needing to take active references or pin them or look them up. This allows us to pass the xfs_inode to the transaction commit code along with the modification to be made, and then order the logged modifications via the ->iop_sort and ->iop_precommit operations for the new log item type. As this is an in-memory log item, it doesn't have formatting, CIL or AIL operational hooks - it exists purely to run the inode unlink modifications and is then removed from the transaction item list and freed once the precommit operation has run. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2023-05-18 11:11:45 -05:00
Bill O'Donnell	8da0b4c669	xfs: introduce per-cpu CIL tracking structure Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832 commit af1c2146a50b1ffe7e10cae1f7e64ab56b7f8c1f Author: Dave Chinner <dchinner@redhat.com> Date: Sat Jul 2 02:13:52 2022 +1000 xfs: introduce per-cpu CIL tracking structure The CIL push lock is highly contended on larger machines, becoming a hard bottleneck that about 700,000 transaction commits/s on >16p machines. To address this, start moving the CIL tracking infrastructure to utilise per-CPU structures. We need to track the space used, the amount of log reservation space reserved to write the CIL, the log items in the CIL and the busy extents that need to be completed by the CIL commit. This requires a couple of per-cpu counters, an unordered per-cpu list and a globally ordered per-cpu list. Create a per-cpu structure to hold these and all the management interfaces needed, as well as the hooks to handle hotplug CPUs. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2023-05-18 11:11:35 -05:00
Bill O'Donnell	28daf69aec	xfs: move xfs_attr_use_log_assist out of xfs_log.c Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832 commit d9c61ccb3b09d8f892cccbf662ce0c870f8e4ade Author: Darrick J. Wong <djwong@kernel.org> Date: Fri May 27 10:33:29 2022 +1000 xfs: move xfs_attr_use_log_assist out of xfs_log.c The LARP patchset added an awkward coupling point between libxfs and what would be libxlog, if the XFS log were actually its own library. Move the code that enables logged xattr updates out of "lib"xlog and into xfs_xattr.c so that it no longer has to know about xlog_* functions. While we're at it, give xfs_xattr.c its own header file. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2023-05-18 11:11:32 -05:00
Bill O'Donnell	1442807720	xfs: put attr[id] log item cache init with the others Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832 commit 4136e38af728eddcab2e51aecde28e94d0782b9b Author: Darrick J. Wong <djwong@kernel.org> Date: Sun May 22 15:59:48 2022 +1000 xfs: put attr[id] log item cache init with the others Initialize and destroy the xattr log item caches in the same places that we do all the other log item caches. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2023-05-18 11:11:27 -05:00
Bill O'Donnell	f33bdf2018	xfs: Add XFS_SB_FEAT_INCOMPAT_NREXT64 to the list of supported flags Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832 commit 973ac0eb3a7dfedecd385bd2b48b12e62a0492f2 Author: Chandan Babu R <chandan.babu@oracle.com> Date: Wed Aug 11 10:33:20 2021 +0530 xfs: Add XFS_SB_FEAT_INCOMPAT_NREXT64 to the list of supported flags This commit enables XFS module to work with fs instances having 64-bit per-inode extent counters by adding XFS_SB_FEAT_INCOMPAT_NREXT64 flag to the list of supported incompat feature flags. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2023-05-18 11:11:00 -05:00
Bill O'Donnell	1c2b1203c8	xfs: use a separate frextents counter for rt extent reservations Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832 commit 2229276c5283264b8c2241c1ed972bbb136cab22 Author: Darrick J. Wong <djwong@kernel.org> Date: Tue Apr 12 06:49:42 2022 +1000 xfs: use a separate frextents counter for rt extent reservations As mentioned in the previous commit, the kernel misuses sb_frextents in the incore mount to reflect both incore reservations made by running transactions as well as the actual count of free rt extents on disk. This results in the superblock being written to the log with an underestimate of the number of rt extents that are marked free in the rtbitmap. Teaching XFS to recompute frextents after log recovery avoids operational problems in the current mount, but it doesn't solve the problem of us writing undercounted frextents which are then recovered by an older kernel that doesn't have that fix. Create an incore percpu counter to mirror the ondisk frextents. This new counter will track transaction reservations and the only time we will touch the incore super counter (i.e the one that gets logged) is when those transactions commit updates to the rt bitmap. This is in contrast to the lazysbcount counters (e.g. fdblocks), where we know that log recovery will always fix any incorrect counter that we log. As a bonus, we only take m_sb_lock at transaction commit time. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2023-05-18 11:10:59 -05:00

1 2 3 4 5 ...

447 Commits