Commit Graph

447 Commits

Author SHA1 Message Date
Bill O'Donnell 7872b37dd0 xfs: read only mounts with fsopen mount API are busted
JIRA: https://issues.redhat.com/browse/RHEL-65728

commit d8d222e09dab84a17bb65dda4b94d01c565f5327
Author: Dave Chinner <dchinner@redhat.com>
Date:   Tue Jan 16 15:33:07 2024 +1100

    xfs: read only mounts with fsopen mount API are busted

    Recently xfs/513 started failing on my test machines testing "-o
    ro,norecovery" mount options. This was being emitted in dmesg:

    [ 9906.932724] XFS (pmem0): no-recovery mounts must be read-only.

    Turns out, readonly mounts with the fsopen()/fsconfig() mount API
    have been busted since day zero. It's only taken 5 years for debian
    unstable to start using this "new" mount API, and shortly after this
    I noticed xfs/513 had started to fail as per above.

    The syscall trace is:

    fsopen("xfs", FSOPEN_CLOEXEC)           = 3
    mount_setattr(-1, NULL, 0, NULL, 0)     = -1 EINVAL (Invalid argument)
    .....
    fsconfig(3, FSCONFIG_SET_STRING, "source", "/dev/pmem0", 0) = 0
    fsconfig(3, FSCONFIG_SET_FLAG, "ro", NULL, 0) = 0
    fsconfig(3, FSCONFIG_SET_FLAG, "norecovery", NULL, 0) = 0
    fsconfig(3, FSCONFIG_CMD_CREATE, NULL, NULL, 0) = -1 EINVAL (Invalid argument)
    close(3)                                = 0

    Showing that the actual mount instantiation (FSCONFIG_CMD_CREATE) is
    what threw out the error.

    During mount instantiation, we call xfs_fs_validate_params() which
    does:

            /* No recovery flag requires a read-only mount */
            if (xfs_has_norecovery(mp) && !xfs_is_readonly(mp)) {
                    xfs_warn(mp, "no-recovery mounts must be read-only.");
                    return -EINVAL;
            }

    and xfs_is_readonly() checks internal mount flags for read only
    state. This state is set in xfs_init_fs_context() from the
    context superblock flag state:

            /*
             * Copy binary VFS mount flags we are interested in.
             */
            if (fc->sb_flags & SB_RDONLY)
                    set_bit(XFS_OPSTATE_READONLY, &mp->m_opstate);

    With the old mount API, all of the VFS specific superblock flags
    had already been parsed and set before xfs_init_fs_context() is
    called, so this all works fine.

    However, in the brave new fsopen/fsconfig world,
    xfs_init_fs_context() is called from fsopen() context, before any
    VFS superblock have been set or parsed. Hence if we use fsopen(),
    the internal XFS readonly state is *never set*. Hence anything that
    depends on xfs_is_readonly() actually returning true for read only
    mounts is broken if fsopen() has been used to mount the filesystem.

    Fix this by moving this internal state initialisation to
    xfs_fs_fill_super() before we attempt to validate the parameters
    that have been set prior to the FSCONFIG_CMD_CREATE call being made.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Fixes: 73e5fff98b ("xfs: switch to use the new mount-api")
    cc: stable@vger.kernel.org
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-11-20 11:26:21 -06:00
Bill O'Donnell 0ab240b7f7 xfs: clean up the xfs_reserve_blocks interface
JIRA: https://issues.redhat.com/browse/RHEL-65728

commit 646ddf0c4df5181a7057ecccd29e535baaf034b2
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Dec 4 18:40:56 2023 +0100

    xfs: clean up the xfs_reserve_blocks interface

    xfs_reserve_blocks has a very odd interface that can only be explained
    by it directly deriving from the IRIX fcntl handler back in the day.

    Split reporting out the reserved blocks out of xfs_reserve_blocks into
    the only caller that cares.  This means that the value reported from
    XFS_IOC_SET_RESBLKS isn't atomically sampled in the same critical
    section as when it was set anymore, but as the values could change
    right after setting them anyway that does not matter.  It does
    provide atomic sampling of both values for XFS_IOC_GET_RESBLKS now,
    though.

    Also pass a normal scalar integer value for the requested value instead
    of the pointless pointer.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
    Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-11-20 11:25:56 -06:00
Bill O'Donnell 3ec04f5ca0 xfs: create a helper to convert rtextents to rtblocks
JIRA: https://issues.redhat.com/browse/RHEL-62760

commit fa5a387230861116c2434c20d29fc4b3fd077d24
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Mon Oct 16 09:32:54 2023 -0700

    xfs: create a helper to convert rtextents to rtblocks

    Create a helper to convert a realtime extent to a realtime block.  Later
    on we'll change the helper to use bit shifts when possible.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-11-09 10:06:36 -06:00
Bill O'Donnell 16a68f2016 xfs: track usage statistics of online fsck
JIRA: https://issues.redhat.com/browse/RHEL-57114

Conflicts: diff due to previous out of order application of scrub patches
	   Add redhat/configs/common/generic/XFS_ONLINE_SCRUB_STATS.

commit d7a74cad8f45133935c59ed0adf949f85238624b
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Thu Aug 10 07:48:07 2023 -0700

    xfs: track usage statistics of online fsck

    Track the usage, outcomes, and run times of the online fsck code, and
    report these values via debugfs.  The columns in the file are:

     * scrubber name

     * number of scrub invocations
     * clean objects found
     * corruptions found
     * optimizations found
     * cross referencing failures
     * inconsistencies found during cross referencing
     * incomplete scrubs
     * warnings
     * number of time scrub had to retry
     * cumulative amount of time spent scrubbing (microseconds)

     * number of repair inovcations
     * successfully repaired objects
     * cumuluative amount of time spent repairing (microseconds)

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-10-15 10:46:26 -05:00
Bill O'Donnell 964f38ed28 xfs: create scaffolding for creating debugfs entries
JIRA: https://issues.redhat.com/browse/RHEL-57114

Conflicts: diff due to previous out of order application of 35a93b148b0
	   (rhel f9ca79532a xfs: close the external block devices in
	   xfs_mount_free).

commit a76dba3b248cb0c2b93d66f463d5ca3cf7037d28
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Thu Aug 10 07:48:07 2023 -0700

    xfs: create scaffolding for creating debugfs entries

    Set up debugfs directories for xfs as a whole, and a subdirectory for
    each mounted filesystem.  This will enable the creation of debugfs files
    in the next patch.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-10-15 10:46:25 -05:00
Lucas Zampieri a37318513e Merge: xfs: warn deprecation of V4 format beginning with RHEL10 instead of 2030.
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4448

JIRA: https://issues.redhat.com/browse/RHEL-40421


Replace 2030 with RHEL10 in deprecation warning for V4 format.

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>

Approved-by: Andrey Albershteyn <aalbersh@redhat.com>
Approved-by: Brian Foster <bfoster@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Lucas Zampieri <lzampier@redhat.com>
2024-07-01 12:47:55 +00:00
Bill O'Donnell b82c88656e xfs: warn deprecation of V4 format beginning with RHEL10 instead of 2030.
JIRA: https://issues.redhat.com/browse/RHEL-40421

Upstream Status: RHEL-only

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-06-07 13:10:03 -05:00
Bill O'Donnell c2563745c8 xfs: drop EXPERIMENTAL tag for large extent counts
JIRA: https://issues.redhat.com/browse/RHEL-25419

commit 61d7e8274cd84f574e686b24048ebf29bac861cc
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Mon Jun 12 18:09:04 2023 -0700

    xfs: drop EXPERIMENTAL tag for large extent counts

    This feature has been baking in upstream for ~10mo with no bug reports.
    It seems to work fine here, let's get rid of the scary warnings?

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-06-06 10:32:51 -05:00
Bill O'Donnell a2aea1128d xfs: deprecate the ascii-ci feature
Conflicts: added redhat/configs/common/generic/CONFIG_XFS_SUPPORT_ASCII_CI
deprecated now, but is completely removed in RHEL10. Deprecated ASCII
case-insensitivity feature (ascii-ci=1) will not be supported in RHEL10.

JIRA: https://issues.redhat.com/browse/RHEL-25419

commit 7ba83850ca2691865713b307ed001bde5fddb084
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Tue Apr 11 19:05:19 2023 -0700

    xfs: deprecate the ascii-ci feature

    This feature is a mess -- the hash function has been broken for the
    entire 15 years of its existence if you create names with extended ascii
    bytes; metadump name obfuscation has silently failed for just as long;
    and the feature clashes horribly with the UTF8 encodings that most
    systems use today.  There is exactly one fstest for this feature.

    In other words, this feature is crap.  Let's deprecate it now so we can
    remove it from the codebase in 2030.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-06-06 10:32:47 -05:00
Bill O'Donnell c7160f6553 xfs: dax - remove tech preview tag
JIRA: https://issues.redhat.com/browse/RHEL-35289

Upstream Status: RHEL only

Since we've backported the dax patches that remove experimental, remove the
tech-preview designation for RHEL.

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-05-02 17:42:50 -05:00
Ming Lei ca8eaf1249 fs,block: yield devices early
JIRA: https://issues.redhat.com/browse/RHEL-29564
Conflicts: drop change on f2fs, bcachefs and reiserfs; context
	difference on ext4 & fs/super.c change.

commit 22650a99821dda3d05f1c334ea90330b4982de56
Author: Christian Brauner <brauner@kernel.org>
Date:   Tue Mar 26 13:47:22 2024 +0100

    fs,block: yield devices early

    Currently a device is only really released once the umount returns to
    userspace due to how file closing works. That ultimately could cause
    an old umount assumption to be violated that concurrent umount and mount
    don't fail. So an exclusively held device with a temporary holder should
    be yielded before the filesystem is gone. Add a helper that allows
    callers to do that. This also allows us to remove the two holder ops
    that Linus wasn't excited about.

    Link: https://lore.kernel.org/r/20240326-vfs-bdev-end_holder-v1-1-20af85202918@kernel.org
    Fixes: f3a608827d1f ("bdev: open block device as files") # mainline only
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Christian Brauner <brauner@kernel.org>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2024-04-17 10:39:09 +08:00
Ming Lei 0c712a8085 xfs: port block device access to files
JIRA: https://issues.redhat.com/browse/RHEL-29564

commit 1b9e2d90141c5e25faefbb7891f0ed8606aa02cf
Author: Christian Brauner <brauner@kernel.org>
Date:   Tue Jan 23 14:26:24 2024 +0100

    xfs: port block device access to files

    Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-7-adbd023e19cc@kernel.org
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Christian Brauner <brauner@kernel.org>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2024-04-17 10:18:37 +08:00
Ming Lei 04d1386d16 bdev: open block device as files
JIRA: https://issues.redhat.com/browse/RHEL-29564
Conflicts: context difference since we don't carry 68279f9c9f59
	("treewide: mark stuff as __ro_after_init"); drop f2fs
	change

commit f3a608827d1f8de0dd12813e8d9c6803fe64e119
Author: Christian Brauner <brauner@kernel.org>
Date:   Thu Feb 8 18:47:35 2024 +0100

    bdev: open block device as files

    Add two new helpers to allow opening block devices as files.
    This is not the final infrastructure. This still opens the block device
    before opening a struct a file. Until we have removed all references to
    struct bdev_handle we can't switch the order:

    * Introduce blk_to_file_flags() to translate from block specific to
      flags usable to pen a new file.
    * Introduce bdev_file_open_by_{dev,path}().
    * Introduce temporary sb_bdev_handle() helper to retrieve a struct
      bdev_handle from a block device file and update places that directly
      reference struct bdev_handle to rely on it.
    * Don't count block device openes against the number of open files. A
      bdev_file_open_by_{dev,path}() file is never installed into any
      file descriptor table.

    One idea that came to mind was to use kernel_tmpfile_open() which
    would require us to pass a path and it would then call do_dentry_open()
    going through the regular fops->open::blkdev_open() path. But then we're
    back to the problem of routing block specific flags such as
    BLK_OPEN_RESTRICT_WRITES through the open path and would have to waste
    FMODE_* flags every time we add a new one. With this we can avoid using
    a flag bit and we have more leeway in how we open block devices from
    bdev_open_by_{dev,path}().

    Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-1-adbd023e19cc@kernel.org
    Signed-off-by: Christian Brauner <brauner@kernel.org>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2024-04-17 10:13:07 +08:00
Ming Lei 1966211b12 xfs: Block writes to log device
JIRA: https://issues.redhat.com/browse/RHEL-29564

commit 3584c8f48a70c1f74c7b7bab59cf22bb66224649
Author: Jan Kara <jack@suse.cz>
Date:   Wed Nov 1 18:43:11 2023 +0100

    xfs: Block writes to log device

    Ask block layer to not allow other writers to open block devices used
    for xfs log and realtime devices.

    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Jan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/20231101174325.10596-6-jack@suse.cz
    Reviewed-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Christian Brauner <brauner@kernel.org>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2024-04-17 10:04:36 +08:00
Bill O'Donnell 13b704d0f9 xfs: drop experimental warning for FSDAX
JIRA: https://issues.redhat.com/browse/RHEL-15319

commit 27c86d43bcdb97d00359702713bfff6c006f0d90
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date:   Fri Sep 15 14:38:54 2023 +0800

    xfs: drop experimental warning for FSDAX

    FSDAX and reflink can work together now, let's drop this warning.

    Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
    Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
    Acked-by: Dan Williams <dan.j.williams@intel.com>
    Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-04-05 11:58:49 -05:00
Ming Lei 079721ca6f xfs: Convert to bdev_open_by_path()
JIRA: https://issues.redhat.com/browse/RHEL-29262

commit e340dd63f6a11402424b3d77e51149bce8fcba7d
Author: Jan Kara <jack@suse.cz>
Date:   Wed Sep 27 11:34:34 2023 +0200

    xfs: Convert to bdev_open_by_path()

    Convert xfs to use bdev_open_by_path() and pass the handle around.

    CC: "Darrick J. Wong" <djwong@kernel.org>
    CC: linux-xfs@vger.kernel.org
    Acked-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christian Brauner <brauner@kernel.org>
    Signed-off-by: Jan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/20230927093442.25915-28-jack@suse.cz
    Acked-by: "Darrick J. Wong" <djwong@kernel.org>
    Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
    Signed-off-by: Christian Brauner <brauner@kernel.org>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2024-03-19 10:07:46 +08:00
Ming Lei 857ddf58bd xfs use fs_holder_ops for the log and RT devices
JIRA: https://issues.redhat.com/browse/RHEL-29262

commit 8ffa54e3370c5a8b9538dbe4077fc9c4b5a08f45
Author: Christoph Hellwig <hch@lst.de>
Date:   Wed Aug 2 17:41:31 2023 +0200

    xfs use fs_holder_ops for the log and RT devices

    Use the generic fs_holder_ops to shut down the file system when the
    log or RT device goes away instead of duplicating the logic.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
    Message-Id: <20230802154131.2221419-13-hch@lst.de>
    Signed-off-by: Christian Brauner <brauner@kernel.org>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2024-03-19 10:07:44 +08:00
Ming Lei 3507ce1209 xfs: drop s_umount over opening the log and RT devices
JIRA: https://issues.redhat.com/browse/RHEL-29262

commit 8d945b595ed07db13fef1f3311ad456c97941930
Author: Christoph Hellwig <hch@lst.de>
Date:   Wed Aug 2 17:41:30 2023 +0200

    xfs: drop s_umount over opening the log and RT devices

    Just like get_tree_bdev needs to drop s_umount when opening the main
    device, we need to do the same for the xfs log and RT devices to avoid a
    potential lock order reversal with s_unmount for the mark_dead path.

    It might be preferable to just drop s_umount over ->fill_super entirely,
    but that will require a fairly massive audit first, so we'll do the easy
    version here first.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
    Message-Id: <20230802154131.2221419-12-hch@lst.de>
    Signed-off-by: Christian Brauner <brauner@kernel.org>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2024-03-19 10:07:44 +08:00
Ming Lei e73ab373cf xfs: document the invalidate_bdev call in invalidate_bdev
JIRA: https://issues.redhat.com/browse/RHEL-29262

commit 1a0a5dad67b60250dce151c9533ccbecdfd822d4
Author: Christoph Hellwig <hch@lst.de>
Date:   Wed Aug 9 15:05:39 2023 -0700

    xfs: document the invalidate_bdev call in invalidate_bdev

    Copy and paste the commit message from Darrick into a comment to explain
    the seemingly odd invalidate_bdev in xfs_shutdown_devices.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
    Message-Id: <20230809220545.1308228-8-hch@lst.de>
    Signed-off-by: Christian Brauner <brauner@kernel.org>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2024-03-19 10:07:43 +08:00
Ming Lei f9ca79532a xfs: close the external block devices in xfs_mount_free
JIRA: https://issues.redhat.com/browse/RHEL-29262

commit 35a93b148b0363dca23c3db1cc9d48100eb8b276
Author: Christoph Hellwig <hch@lst.de>
Date:   Wed Aug 9 15:05:38 2023 -0700

    xfs: close the external block devices in xfs_mount_free

    blkdev_put must not be called under sb->s_umount to avoid a lock order
    reversal with disk->open_mutex.  Move closing the buftargs into ->kill_sb
    to archive that.  Note that the flushing of the disk caches and
    block device mapping invalidated needs to stay in ->put_super as the main
    block device is closed in kill_block_super already.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
    Message-Id: <20230809220545.1308228-7-hch@lst.de>
    Signed-off-by: Christian Brauner <brauner@kernel.org>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2024-03-19 10:07:43 +08:00
Ming Lei c732d9002a xfs: remove xfs_blkdev_put
JIRA: https://issues.redhat.com/browse/RHEL-29262

commit d3ef7e94ee36adc8f0006d253a9ad45793b874cd
Author: Christoph Hellwig <hch@lst.de>
Date:   Wed Aug 9 15:05:36 2023 -0700

    xfs: remove xfs_blkdev_put

    There isn't much use for this trivial wrapper, especially as the NULL
    check is only needed in a single call site.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Christian Brauner <brauner@kernel.org>
    Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
    Message-Id: <20230809220545.1308228-5-hch@lst.de>
    Signed-off-by: Christian Brauner <brauner@kernel.org>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2024-03-19 10:07:43 +08:00
Ming Lei 9f146111a0 xfs: free the xfs_mount in ->kill_sb
JIRA: https://issues.redhat.com/browse/RHEL-29262

commit 2a9311adb87c98599989b80405fe2c60cd4075dd
Author: Christoph Hellwig <hch@lst.de>
Date:   Wed Aug 9 15:05:35 2023 -0700

    xfs: free the xfs_mount in ->kill_sb

    As a rule of thumb everything allocated to the fs_context and moved into
    the super_block should be freed by ->kill_sb so that the teardown
    handling doesn't need to be duplicated between the fill_super error
    path and put_super.  Implement a XFS-specific kill_sb method to do that.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Christian Brauner <brauner@kernel.org>
    Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
    Message-Id: <20230809220545.1308228-4-hch@lst.de>
    Signed-off-by: Christian Brauner <brauner@kernel.org>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2024-03-19 10:07:43 +08:00
Ming Lei 2101e2bdfc xfs: remove a superfluous s_fs_info NULL check in xfs_fs_put_super
JIRA: https://issues.redhat.com/browse/RHEL-29262

commit 1aa2d074d4c777e2150382878d0a5611d829b380
Author: Christoph Hellwig <hch@lst.de>
Date:   Wed Aug 9 15:05:34 2023 -0700

    xfs: remove a superfluous s_fs_info NULL check in xfs_fs_put_super

    ->put_super is only called when sb->s_root is set, and thus when
    fill_super succeeds.  Thus drop the NULL check that can't happen in
    xfs_fs_put_super.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Christian Brauner <brauner@kernel.org>
    Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
    Message-Id: <20230809220545.1308228-3-hch@lst.de>
    Signed-off-by: Christian Brauner <brauner@kernel.org>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2024-03-19 10:07:43 +08:00
Ming Lei 5e678b3586 xfs: reformat the xfs_fs_free prototype
JIRA: https://issues.redhat.com/browse/RHEL-29262

commit dbbff489064d89391c4f0c7a73e77e61ce29fe96
Author: Christoph Hellwig <hch@lst.de>
Date:   Wed Aug 9 15:05:33 2023 -0700

    xfs: reformat the xfs_fs_free prototype

    The xfs_fs_free prototype formatting is a weird mix of the classic XFS
    style and the Linux style.  Fix it up to be consistent.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
    Message-Id: <20230809220545.1308228-2-hch@lst.de>
    Signed-off-by: Christian Brauner <brauner@kernel.org>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2024-03-19 10:07:43 +08:00
Bill O'Donnell 05ef68b0f5 xfs: remove CPU hotplug infrastructure
JIRA: https://issues.redhat.com/browse/RHEL-15844

commit ef7d9593390a050c50eba5fc02d2cb65a1104434
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Mon Sep 11 08:39:04 2023 -0700

    xfs: remove CPU hotplug infrastructure

    There are no users of the cpu hotplug hooks in xfs now, so remove it.
    This reverts f1653c2e2831e ("xfs: introduce CPU hotplug
    infrastructure").

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-11-19 16:21:09 -06:00
Bill O'Donnell 4832ccaf00 xfs: remove the all-mounts list
JIRA: https://issues.redhat.com/browse/RHEL-15844

commit f5bfa695f02e02415e4bfb36bd83a8bc933a6d4f
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Mon Sep 11 08:39:04 2023 -0700

    xfs: remove the all-mounts list

    Revert commit 0ed17f01c8540 ("xfs: introduce all-mounts list for cpu
    hotplug notifications") because the cpu hotplug hooks are now pointless,
    so we don't need this list anymore.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-11-19 16:21:09 -06:00
Bill O'Donnell 28e7f5db46 xfs: use per-mount cpumask to track nonempty percpu inodegc lists
JIRA: https://issues.redhat.com/browse/RHEL-15844

commit 62334fab47621dd91ab30dd5bb6c43d78a8ec279
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Mon Sep 11 08:39:03 2023 -0700

    xfs: use per-mount cpumask to track nonempty percpu inodegc lists

    Directly track which CPUs have contributed to the inodegc percpu lists
    instead of trusting the cpu online mask.  This eliminates a theoretical
    problem where the inodegc flush functions might fail to flush a CPU's
    inodes if that CPU happened to be dying at exactly the same time.  Most
    likely nobody's noticed this because the CPU dead hook moves the percpu
    inodegc list to another CPU and schedules that worker immediately.  But
    it's quite possible that this is a subtle race leading to UAF if the
    inodegc flush were part of an unmount.

    Further benefits: This reduces the overhead of the inodegc flush code
    slightly by allowing us to ignore CPUs that have empty lists.  Better
    yet, it reduces our dependence on the cpu online masks, which have been
    the cause of confusion and drama lately.

    Fixes: ab23a7768739 ("xfs: per-cpu deferred inode inactivation queues")
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-11-19 16:21:09 -06:00
Bill O'Donnell 1ce0a32435 xfs: fix per-cpu CIL structure aggregation racing with dying cpus
JIRA: https://issues.redhat.com/browse/RHEL-15844

commit ecd49f7a36fbccc884471f86fc43de6ca8d1f786
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Mon Sep 11 08:39:02 2023 -0700

    xfs: fix per-cpu CIL structure aggregation racing with dying cpus

    In commit 7c8ade2121200 ("xfs: implement percpu cil space used
    calculation"), the XFS committed (log) item list code was converted to
    use per-cpu lists and space tracking to reduce cpu contention when
    multiple threads are modifying different parts of the filesystem and
    hence end up contending on the log structures during transaction commit.
    Each CPU tracks its own commit items and space usage, and these do not
    have to be merged into the main CIL until either someone wants to push
    the CIL items, or we run over a soft threshold and switch to slower (but
    more accurate) accounting with atomics.

    Unfortunately, the for_each_cpu iteration suffers from the same race
    with cpu dying problem that was identified in commit 8b57b11cca88f
    ("pcpcntrs: fix dying cpu summation race") -- CPUs are removed from
    cpu_online_mask before the CPUHP_XFS_DEAD callback gets called.  As a
    result, both CIL percpu structure aggregation functions fail to collect
    the items and accounted space usage at the correct point in time.

    If we're lucky, the items that are collected from the online cpus exceed
    the space given to those cpus, and the log immediately shuts down in
    xlog_cil_insert_items due to the (apparent) log reservation overrun.
    This happens periodically with generic/650, which exercises cpu hotplug
    vs. the filesystem code:

    smpboot: CPU 3 is now offline
    XFS (sda3): ctx ticket reservation ran out. Need to up reservation
    XFS (sda3): ticket reservation summary:
    XFS (sda3):   unit res    = 9268 bytes
    XFS (sda3):   current res = -40 bytes
    XFS (sda3):   original count  = 1
    XFS (sda3):   remaining count = 1
    XFS (sda3): Filesystem has been shut down due to log error (0x2).

    Applying the same sort of fix from 8b57b11cca88f to the CIL code seems
    to make the generic/650 problem go away, but I've been told that tglx
    was not happy when he saw:

    "...the only thing we actually need to care about is that
    percpu_counter_sum() iterates dying CPUs. That's trivial to do, and when
    there are no CPUs dying, it has no addition overhead except for a
    cpumask_or() operation."

    The CPU hotplug code is rather complex and difficult to understand and I
    don't want to try to understand the cpu hotplug locking well enough to
    use cpu_dying mask.  Furthermore, there's a performance improvement that
    could be had here.  Attach a private cpu mask to the CIL structure so
    that we can track exactly which cpus have accessed the percpu data at
    all.  It doesn't matter if the cpu has since gone offline; log item
    aggregation will still find the items.  Better yet, we skip cpus that
    have not recently logged anything.

    Worse yet, Ritesh Harjani and Eric Sandeen both reported today that CPU
    hot remove racing with an xfs mount can crash if the cpu_dead notifier
    tries to access the log but the mount hasn't yet set up the log.

    Link: https://lore.kernel.org/linux-xfs/ZOLzgBOuyWHapOyZ@dread.disaster.area/T/
    Link: https://lore.kernel.org/lkml/877cuj1mt1.ffs@tglx/
    Link: https://lore.kernel.org/lkml/20230414162755.281993820@linutronix.de/
    Link: https://lore.kernel.org/linux-xfs/ZOVkjxWZq0YmjrJu@dread.disaster.area/T/
    Cc: tglx@linutronix.de
    Cc: peterz@infradead.org
    Reported-by: ritesh.list@gmail.com
    Reported-by: sandeen@sandeen.net
    Fixes: af1c2146a50b ("xfs: introduce per-cpu CIL tracking structure")
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-11-19 16:21:09 -06:00
Bill O'Donnell 14fa9c73d2 xfs: check that per-cpu inodegc workers actually run on that cpu
JIRA: https://issues.redhat.com/browse/RHEL-15844

Conflicts: diff in xfs_super.c due to previous out of order patch

commit b37c4c8339cd394ea6b8b415026603320a185651
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Tue May 2 09:16:12 2023 +1000

    xfs: check that per-cpu inodegc workers actually run on that cpu

    Now that we've allegedly worked out the problem of the per-cpu inodegc
    workers being scheduled on the wrong cpu, let's put in a debugging knob
    to let us know if a worker ever gets mis-scheduled again.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Dave Chinner <david@fromorbit.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-11-19 16:21:08 -06:00
Bill O'Donnell 19fab0b814 xfs: collect errors from inodegc for unlinked inode recovery
JIRA: https://issues.redhat.com/browse/RHEL-2002

Conflicts: context differences due to out of order patch application

commit d4d12c02bf5f768f1b423c7ae2909c5afdfe0d5f
Author: Dave Chinner <dchinner@redhat.com>
Date:   Mon Jun 5 14:48:15 2023 +1000

    xfs: collect errors from inodegc for unlinked inode recovery

    Unlinked list recovery requires errors removing the inode the from
    the unlinked list get fed back to the main recovery loop. Now that
    we offload the unlinking to the inodegc work, we don't get errors
    being fed back when we trip over a corruption that prevents the
    inode from being removed from the unlinked list.

    This means we never clear the corrupt unlinked list bucket,
    resulting in runtime operations eventually tripping over it and
    shutting down.

    Fix this by collecting inodegc worker errors and feed them
    back to the flush caller. This is largely best effort - the only
    context that really cares is log recovery, and it only flushes a
    single inode at a time so we don't need complex synchronised
    handling. Essentially the inodegc workers will capture the first
    error that occurs and the next flush will gather them and clear
    them. The flush itself will only report the first gathered error.

    In the cases where callers can return errors, propagate the
    collected inodegc flush error up the error handling chain.

    In the case of inode unlinked list recovery, there are several
    superfluous calls to flush queued unlinked inodes -
    xlog_recover_iunlink_bucket() guarantees that it has flushed the
    inodegc and collected errors before it returns. Hence nothing in the
    calling path needs to run a flush, even when an error is returned.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Dave Chinner <david@fromorbit.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-11-10 07:22:27 -06:00
Bill O'Donnell e9e80ba818 xfs: test dir/attr hash when loading module
JIRA: https://issues.redhat.com/browse/RHEL-2002

commit 3cfb9290da3d87a5877b03bda96c3d5d3ed9fcb0
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Thu Mar 16 09:31:20 2023 -0700

    xfs: test dir/attr hash when loading module

    Back in the 6.2-rc1 days, Eric Whitney reported a fstests regression in
    ext4 against generic/454.  The cause of this test failure was the
    unfortunate combination of setting an xattr name containing UTF8 encoded
    emoji, an xattr hash function that accepted a char pointer with no
    explicit signedness, signed type extension of those chars to an int, and
    the 6.2 build tools maintainers deciding to mandate -funsigned-char
    across the board.  As a result, the ondisk extended attribute structure
    written out by 6.1 and 6.2 were not the same.

    This discrepancy, in fact, had been noticeable if a filesystem with such
    an xattr were moved between any two architectures that don't employ the
    same signedness of a raw "char" declaration.  The only reason anyone
    noticed is that x86 gcc defaults to signed, and no such -funsigned-char
    update was made to e2fsprogs, so e2fsck immediately started reporting
    data corruption.

    After a day and a half of discussing how to handle this use case (xattrs
    with bit 7 set anywhere in the name) without breaking existing users,
    Linus merged his own patch and didn't tell the maintainer.  None of the
    ext4 developers realized this until AUTOSEL announced that the commit
    had been backported to stable.

    In the end, this problem could have been detected much earlier if there
    had been any useful tests of hash function(s) in use inside ext4 to make
    sure that they always produce the same outputs given the same inputs.

    The XFS dirent/xattr name hash takes a uint8_t*, so I don't think it's
    vulnerable to this problem.  However, let's avoid all this drama by
    adding our own self test to check that the da hash produces the same
    outputs for a static pile of inputs on various platforms.  This enables
    us to fix any breakage that may result in a controlled fashion.  The
    buffer and test data are identical to the patches submitted to xfsprogs.

    Link: https://lore.kernel.org/linux-ext4/Y8bpkm3jA3bDm3eL@debian-BULLSEYE-live-builder-AMD64/
    Link: https://lore.kernel.org/linux-xfs/ZBUKCRR7xvIqPrpX@destitution/T/#md38272cc684e2c0d61494435ccbb91f022e8dee4
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-11-10 07:22:24 -06:00
Bill O'Donnell 6ee6b421b0 xfs: perags need atomic operational state
JIRA: https://issues.redhat.com/browse/RHEL-2002

commit 7ac2ff8bb3713c7cb43564c04384af2ee7cc1f8d
Author: Dave Chinner <dchinner@redhat.com>
Date:   Mon Feb 13 09:14:52 2023 +1100

    xfs: perags need atomic operational state

    We currently don't have any flags or operational state in the
    xfs_perag except for the pagf_init and pagi_init flags. And the
    agflreset flag. Oh, there's also the pagf_metadata and pagi_inodeok
    flags, too.

    For controlling per-ag operations, we are going to need some atomic
    state flags. Hence add an opstate field similar to what we already
    have in the mount and log, and convert all these state flags across
    to atomic bit operations.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-11-10 07:22:21 -06:00
Bill O'Donnell fcd881af53 xfs: convert xfs_ialloc_next_ag() to an atomic
JIRA: https://issues.redhat.com/browse/RHEL-2002

commit 20a5eab49d354a2837e0af3f07f92a104de52804
Author: Dave Chinner <dchinner@redhat.com>
Date:   Mon Feb 13 09:14:52 2023 +1100

    xfs: convert xfs_ialloc_next_ag() to an atomic

    This is currently a spinlock lock protected rotor which can be
    implemented with a single atomic operation. Change it to be more
    efficient and get rid of the m_agirotor_lock. Noticed while
    converting the inode allocation AG selection loop to active perag
    references.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-11-10 07:22:21 -06:00
Bill O'Donnell 91cbfcc6bf xfs: Print XFS UUID on mount and umount events.
JIRA: https://issues.redhat.com/browse/RHEL-2002

commit 64c80dfd04d1dd2ecf550542c8f3f41b54b20207
Author: Lukas Herbolt <lukas@herbolt.com>
Date:   Wed Nov 16 19:20:21 2022 -0800

    xfs: Print XFS UUID on mount and umount events.

    As of now only device names are printed out over __xfs_printk().
    The device names are not persistent across reboots which in case
    of searching for origin of corruption brings another task to properly
    identify the devices. This patch add XFS UUID upon every mount/umount
    event which will make the identification much easier.

    Signed-off-by: Lukas Herbolt <lukas@herbolt.com>
    [sandeen: rebase onto current upstream kernel]
    Signed-off-by: Eric Sandeen <sandeen@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-11-10 07:22:17 -06:00
Bill O'Donnell 8f3ac96b1b xfs: refactor all the EFI/EFD log item sizeof logic
JIRA: https://issues.redhat.com/browse/RHEL-2002

commit 3c5aaaced99912c9fb3352fc5af5b104df67d4aa
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Fri Oct 21 09:10:05 2022 -0700

    xfs: refactor all the EFI/EFD log item sizeof logic

    Refactor all the open-coded sizeof logic for EFI/EFD log item and log
    format structures into common helper functions whose names reflect the
    struct names.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-11-06 19:42:18 -06:00
Bill O'Donnell e2550967f8 xfs: fix memcpy fortify errors in EFI log format copying
JIRA: https://issues.redhat.com/browse/RHEL-2002

commit 03a7485cd701e1c08baadcf39d9592d83715e224
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Thu Oct 20 16:39:59 2022 -0700

    xfs: fix memcpy fortify errors in EFI log format copying

    Starting in 6.1, CONFIG_FORTIFY_SOURCE checks the length parameter of
    memcpy.  Since we're already fixing problems with BUI item copying, we
    should fix it everything else.

    An extra difficulty here is that the ef[id]_extents arrays are declared
    as single-element arrays.  This is not the convention for flex arrays in
    the modern kernel, and it causes all manner of problems with static
    checking tools, since they often cannot tell the difference between a
    single element array and a flex array.

    So for starters, change those array[1] declarations to array[]
    declarations to signal that they are proper flex arrays and adjust all
    the "size-1" expressions to fit the new declaration style.

    Next, refactor the xfs_efi_copy_format function to handle the copying of
    the head and the flex array members separately.  While we're at it, fix
    a minor validation deficiency in the recovery function.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Kees Cook <keescook@chromium.org>
    Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-11-06 19:42:18 -06:00
Ming Lei 74aa7f1345 block: replace fmode_t with a block-specific type for block open flags
JIRA: https://issues.redhat.com/browse/RHEL-1516
Conflicts: drop change on btrfs, f2fs, erofs, ublk, all are not enabled
	in rhel9; drop change on dm's open_table_device() because of
	code base difference, 'mode' isn't used in this function.

commit 05bdb9965305bbfdae79b31d22df03d1e2cfcb22
Author: Christoph Hellwig <hch@lst.de>
Date:   Thu Jun 8 13:02:55 2023 +0200

    block: replace fmode_t with a block-specific type for block open flags

    The only overlap between the block open flags mapped into the fmode_t and
    other uses of fmode_t are FMODE_READ and FMODE_WRITE.  Define a new
    blk_mode_t instead for use in blkdev_get_by_{dev,path}, ->open and
    ->ioctl and stop abusing fmode_t.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Acked-by: Jack Wang <jinpu.wang@ionos.com>              [rnbd]
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Reviewed-by: Christian Brauner <brauner@kernel.org>
    Link: https://lore.kernel.org/r/20230608110258.189493-28-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2023-09-18 17:59:18 +08:00
Ming Lei e405cd2d35 block: use the holder as indication for exclusive opens
JIRA: https://issues.redhat.com/browse/RHEL-1516
Conflicts: drop change in btrfs which isn't enabled in rhel,
	and drop change in erofs which needn't such change since
	the affected interface isn't used in erofs.

commit 2736e8eeb0ccdc71d1f4256c9c9a28f58cc43307
Author: Christoph Hellwig <hch@lst.de>
Date:   Thu Jun 8 13:02:43 2023 +0200

    block: use the holder as indication for exclusive opens

    The current interface for exclusive opens is rather confusing as it
    requires both the FMODE_EXCL flag and a holder.  Remove the need to pass
    FMODE_EXCL and just key off the exclusive open off a non-NULL holder.

    For blkdev_put this requires adding the holder argument, which provides
    better debug checking that only the holder actually releases the hold,
    but at the same time allows removing the now superfluous mode argument.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Acked-by: Christian Brauner <brauner@kernel.org>
    Acked-by: David Sterba <dsterba@suse.com>               [btrfs]
    Acked-by: Jack Wang <jinpu.wang@ionos.com>              [rnbd]
    Link: https://lore.kernel.org/r/20230608110258.189493-16-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2023-09-18 17:57:43 +08:00
Ming Lei c3dbc9f426 xfs: wire up the ->mark_dead holder operation for log and RT devices
JIRA: https://issues.redhat.com/browse/RHEL-1516

commit 8067ca1dcdfcc2a5e0a51bff3730ad3eef0623d6
Author: Christoph Hellwig <hch@lst.de>
Date:   Thu Jun 1 11:44:56 2023 +0200

    xfs: wire up the ->mark_dead holder operation for log and RT devices

    Implement a set of holder_ops that shut down the file system when the
    block device used as log or RT device is removed undeneath the file
    system.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Acked-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Link: https://lore.kernel.org/r/20230601094459.1350643-14-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2023-09-18 15:59:32 +08:00
Ming Lei e05e754c4d xfs: wire up sops->shutdown
JIRA: https://issues.redhat.com/browse/RHEL-1516

commit e7caa877e5ddac63886f4a8376cb3ffbd4dfe569
Author: Christoph Hellwig <hch@lst.de>
Date:   Thu Jun 1 11:44:55 2023 +0200

    xfs: wire up sops->shutdown

    Wire up the shutdown method to shut down the file system when the
    underlying block device is marked dead.  Add a new message to
    clearly distinguish this shutdown reason from other shutdowns.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Acked-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Link: https://lore.kernel.org/r/20230601094459.1350643-13-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2023-09-18 15:59:32 +08:00
Ming Lei 2ef574f2fa block: introduce holder ops
JIRA: https://issues.redhat.com/browse/RHEL-1516
Conflicts: drop change on fs/erofs, which isn't enabled on rhel,
	and the affected symobol doesn't exit in erofs code too

commit 0718afd47f70cf46877c39c25d06b786e1a3f36c
Author: Christoph Hellwig <hch@lst.de>
Date:   Thu Jun 1 11:44:52 2023 +0200

    block: introduce holder ops

    Add a new blk_holder_ops structure, which is passed to blkdev_get_by_* and
    installed in the block_device for exclusive claims.  It will be used to
    allow the block layer to call back into the user of the block device for
    thing like notification of a removed device or a device resize.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Acked-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Link: https://lore.kernel.org/r/20230601094459.1350643-10-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2023-09-18 15:59:32 +08:00
Carlos Maiolino 0e8baee5cd fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2228888
Tested: xfstests

Currently the I_DIRTY_TIME will never get set if the inode already has
I_DIRTY_INODE with assumption that it supersedes I_DIRTY_TIME.  That's
true, however ext4 will only update the on-disk inode in
->dirty_inode(), not on actual writeback. As a result if the inode
already has I_DIRTY_INODE state by the time we get to
__mark_inode_dirty() only with I_DIRTY_TIME, the time was already filled
into on-disk inode and will not get updated until the next I_DIRTY_INODE
update, which might never come if we crash or get a power failure.

The problem can be reproduced on ext4 by running xfstest generic/622
with -o iversion mount option.

Fix it by allowing I_DIRTY_TIME to be set even if the inode already has
I_DIRTY_INODE. Also make sure that the case is properly handled in
writeback_single_inode() as well. Additionally changes in
xfs_fs_dirty_inode() was made to accommodate for I_DIRTY_TIME in flag.

Thanks Jan Kara for suggestions on how to make this work properly.

Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: stable@kernel.org
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220825100657.44217-1-lczerner@redhat.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit cbfecb927f429a6fa613d74b998496bd71e4438a)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2023-08-04 14:19:30 +02:00
Bill O'Donnell f525103e49 xfs: fail dax mount if reflink is enabled on a partition
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 35fcd75af3edf035638e632bb49607cc8fc3cdf4
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date:   Thu Jun 9 22:34:35 2022 +0800

    xfs: fail dax mount if reflink is enabled on a partition

    Failure notification is not supported on partitions.  So, when we mount a
    reflink enabled xfs on a partition with dax option, let it fail with
    -EINVAL code.

    Link: https://lkml.kernel.org/r/20220609143435.393724-1-ruansy.fnst@fujitsu.com
    Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Cc: Dave Chinner <david@fromorbit.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:49 -05:00
Bill O'Donnell 426777a415 xfs: xfs_buf cache destroy isn't RCU safe
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 231f91ab504ecebcb88e942341b3d7dd91de45f1
Author: Dave Chinner <dchinner@redhat.com>
Date:   Mon Jul 18 18:20:37 2022 -0700

    xfs: xfs_buf cache destroy isn't RCU safe

    Darrick and Sachin Sant reported that xfs/435 and xfs/436 would
    report an non-empty xfs_buf slab on module remove. This isn't easily
    to reproduce, but is clearly a side effect of converting the buffer
    caceh to RUC freeing and lockless lookups. Sachin bisected and
    Darrick hit it when testing the patchset directly.

    Turns out that the xfs_buf slab is not destroyed when all the other
    XFS slab caches are destroyed. Instead, it's got it's own little
    wrapper function that gets called separately, and so it doesn't have
    an rcu_barrier() call in it that is needed to drain all the rcu
    callbacks before the slab is destroyed.

    Fix it by removing the xfs_buf_init/terminate wrappers that just
    allocate and destroy the xfs_buf slab, and move them to the same
    place that all the other slab caches are set up and destroyed.

    Reported-and-tested-by: Sachin Sant <sachinp@linux.ibm.com>
    Fixes: 298f34224506 ("xfs: lockless buffer lookup")
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:48 -05:00
Bill O'Donnell 0507113b79 xfs: add in-memory iunlink log item
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 784eb7d8dd4163b82a19b914f76b2834a58a3e4c
Author: Dave Chinner <dchinner@redhat.com>
Date:   Thu Jul 14 11:47:42 2022 +1000

    xfs: add in-memory iunlink log item

    Now that we have a clean operation to update the di_next_unlinked
    field of inode cluster buffers, we can easily defer this operation
    to transaction commit time so we can order the inode cluster buffer
    locking consistently.

    To do this, we introduce a new in-memory log item to track the
    unlinked list item modification that we are going to make. This
    follows the same observations as the in-memory double linked list
    used to track unlinked inodes in that the inodes on the list are
    pinned in memory and cannot go away, and hence we can simply
    reference them for the duration of the transaction without needing
    to take active references or pin them or look them up.

    This allows us to pass the xfs_inode to the transaction commit code
    along with the modification to be made, and then order the logged
    modifications via the ->iop_sort and ->iop_precommit operations
    for the new log item type. As this is an in-memory log item, it
    doesn't have formatting, CIL or AIL operational hooks - it exists
    purely to run the inode unlink modifications and is then removed
    from the transaction item list and freed once the precommit
    operation has run.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:45 -05:00
Bill O'Donnell 8da0b4c669 xfs: introduce per-cpu CIL tracking structure
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit af1c2146a50b1ffe7e10cae1f7e64ab56b7f8c1f
Author: Dave Chinner <dchinner@redhat.com>
Date:   Sat Jul 2 02:13:52 2022 +1000

    xfs: introduce per-cpu CIL tracking structure

    The CIL push lock is highly contended on larger machines, becoming a
    hard bottleneck that about 700,000 transaction commits/s on >16p
    machines. To address this, start moving the CIL tracking
    infrastructure to utilise per-CPU structures.

    We need to track the space used, the amount of log reservation space
    reserved to write the CIL, the log items in the CIL and the busy
    extents that need to be completed by the CIL commit.  This requires
    a couple of per-cpu counters, an unordered per-cpu list and a
    globally ordered per-cpu list.

    Create a per-cpu structure to hold these and all the management
    interfaces needed, as well as the hooks to handle hotplug CPUs.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:35 -05:00
Bill O'Donnell 28daf69aec xfs: move xfs_attr_use_log_assist out of xfs_log.c
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit d9c61ccb3b09d8f892cccbf662ce0c870f8e4ade
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Fri May 27 10:33:29 2022 +1000

    xfs: move xfs_attr_use_log_assist out of xfs_log.c

    The LARP patchset added an awkward coupling point between libxfs and
    what would be libxlog, if the XFS log were actually its own library.
    Move the code that enables logged xattr updates out of "lib"xlog and into
    xfs_xattr.c so that it no longer has to know about xlog_* functions.

    While we're at it, give xfs_xattr.c its own header file.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Dave Chinner <david@fromorbit.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:32 -05:00
Bill O'Donnell 1442807720 xfs: put attr[id] log item cache init with the others
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 4136e38af728eddcab2e51aecde28e94d0782b9b
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Sun May 22 15:59:48 2022 +1000

    xfs: put attr[id] log item cache init with the others

    Initialize and destroy the xattr log item caches in the same places that
    we do all the other log item caches.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Dave Chinner <david@fromorbit.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:27 -05:00
Bill O'Donnell f33bdf2018 xfs: Add XFS_SB_FEAT_INCOMPAT_NREXT64 to the list of supported flags
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 973ac0eb3a7dfedecd385bd2b48b12e62a0492f2
Author: Chandan Babu R <chandan.babu@oracle.com>
Date:   Wed Aug 11 10:33:20 2021 +0530

    xfs: Add XFS_SB_FEAT_INCOMPAT_NREXT64 to the list of supported flags

    This commit enables XFS module to work with fs instances having 64-bit
    per-inode extent counters by adding XFS_SB_FEAT_INCOMPAT_NREXT64 flag to the
    list of supported incompat feature flags.

    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:00 -05:00
Bill O'Donnell 1c2b1203c8 xfs: use a separate frextents counter for rt extent reservations
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 2229276c5283264b8c2241c1ed972bbb136cab22
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Tue Apr 12 06:49:42 2022 +1000

    xfs: use a separate frextents counter for rt extent reservations

    As mentioned in the previous commit, the kernel misuses sb_frextents in
    the incore mount to reflect both incore reservations made by running
    transactions as well as the actual count of free rt extents on disk.
    This results in the superblock being written to the log with an
    underestimate of the number of rt extents that are marked free in the
    rtbitmap.

    Teaching XFS to recompute frextents after log recovery avoids
    operational problems in the current mount, but it doesn't solve the
    problem of us writing undercounted frextents which are then recovered by
    an older kernel that doesn't have that fix.

    Create an incore percpu counter to mirror the ondisk frextents.  This
    new counter will track transaction reservations and the only time we
    will touch the incore super counter (i.e the one that gets logged) is
    when those transactions commit updates to the rt bitmap.  This is in
    contrast to the lazysbcount counters (e.g. fdblocks), where we know that
    log recovery will always fix any incorrect counter that we log.
    As a bonus, we only take m_sb_lock at transaction commit time.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Dave Chinner <david@fromorbit.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:10:59 -05:00