Commit Graph

199 Commits

Author SHA1 Message Date
Patrick Talbert 9db53db9ce Merge: xfs: Mark all experimental code as tech preview
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/6056

xfs: Mark all experimental code as tech preview

JIRA: https://issues.redhat.com/browse/RHEL-64940

Upstream Status: RHEL-only

Tested: Run xfstests on newly booted system with kernel configured for tech-preview code (e.g. scrub).

Amend "EXPERIMENTAL" warnings with a single tech preview notice per boot/xfs-module load.

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>

Approved-by: Brian Foster <bfoster@redhat.com>
Approved-by: Eric Sandeen <esandeen@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Patrick Talbert <ptalbert@redhat.com>
2025-01-27 15:24:24 +01:00
Bill O'Donnell 4736386bf2 xfs: Mark all experimental code as tech preview
JIRA: https://issues.redhat.com/browse/RHEL-64940

Upstream Status: RHEL-only

Amend "EXPERIMENTAL" warnings with tech preview notices.

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-12-18 13:20:56 -06:00
Bill O'Donnell 0ab240b7f7 xfs: clean up the xfs_reserve_blocks interface
JIRA: https://issues.redhat.com/browse/RHEL-65728

commit 646ddf0c4df5181a7057ecccd29e535baaf034b2
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Dec 4 18:40:56 2023 +0100

    xfs: clean up the xfs_reserve_blocks interface

    xfs_reserve_blocks has a very odd interface that can only be explained
    by it directly deriving from the IRIX fcntl handler back in the day.

    Split reporting out the reserved blocks out of xfs_reserve_blocks into
    the only caller that cares.  This means that the value reported from
    XFS_IOC_SET_RESBLKS isn't atomically sampled in the same critical
    section as when it was set anymore, but as the values could change
    right after setting them anyway that does not matter.  It does
    provide atomic sampling of both values for XFS_IOC_GET_RESBLKS now,
    though.

    Also pass a normal scalar integer value for the requested value instead
    of the pointless pointer.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
    Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-11-20 11:25:56 -06:00
Bill O'Donnell 072771b9db xfs: clean up the XFS_IOC_FSCOUNTS handler
JIRA: https://issues.redhat.com/browse/RHEL-65728

commit c2c2620de7577db66a859b934715e98e4501e4f4
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Dec 4 18:40:55 2023 +0100

    xfs: clean up the XFS_IOC_FSCOUNTS handler

    Split XFS_IOC_FSCOUNTS out of the main xfs_file_ioctl function, and
    merge the xfs_fs_counts helper into the ioctl handler.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
    Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-11-20 11:25:56 -06:00
Bill O'Donnell cf65fb211c bdev: rename freeze and thaw helpers
JIRA: https://issues.redhat.com/browse/RHEL-65728

Conflicts: limit coverage to xfs and ext4.

commit 982c3b3058433f20aba9fb032599cee5dfc17328
Author: Christian Brauner <brauner@kernel.org>
Date:   Tue Oct 24 15:01:08 2023 +0200

    bdev: rename freeze and thaw helpers

    We have bdev_mark_dead() etc and we're going to move block device
    freezing to holder ops in the next patch. Make the naming consistent:

    * freeze_bdev() -> bdev_freeze()
    * thaw_bdev()   -> bdev_thaw()

    Also document the return code.

    Link: https://lore.kernel.org/r/20231024-vfs-super-freeze-v2-2-599c19f4faac@kernel.org
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Christian Brauner <brauner@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-11-20 11:25:46 -06:00
Bill O'Donnell a1a07b0581 xfs: fix perag leak when growfs fails
JIRA: https://issues.redhat.com/browse/RHEL-25419

commit 7823921887750b39d02e6b44faafdd1cc617c651
Author: Long Li <leo.lilong@huawei.com>
Date:   Fri Dec 15 16:22:34 2023 +0800

    xfs: fix perag leak when growfs fails

    During growfs, if new ag in memory has been initialized, however
    sb_agcount has not been updated, if an error occurs at this time it
    will cause perag leaks as follows, these new AGs will not been freed
    during umount , because of these new AGs are not visible(that is
    included in mp->m_sb.sb_agcount).

    unreferenced object 0xffff88810be40200 (size 512):
      comm "xfs_growfs", pid 857, jiffies 4294909093
      hex dump (first 32 bytes):
        00 c0 c1 05 81 88 ff ff 04 00 00 00 00 00 00 00  ................
        01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      backtrace (crc 381741e2):
        [<ffffffff8191aef6>] __kmalloc+0x386/0x4f0
        [<ffffffff82553e65>] kmem_alloc+0xb5/0x2f0
        [<ffffffff8238dac5>] xfs_initialize_perag+0xc5/0x810
        [<ffffffff824f679c>] xfs_growfs_data+0x9bc/0xbc0
        [<ffffffff8250b90e>] xfs_file_ioctl+0x5fe/0x14d0
        [<ffffffff81aa5194>] __x64_sys_ioctl+0x144/0x1c0
        [<ffffffff83c3d81f>] do_syscall_64+0x3f/0xe0
        [<ffffffff83e00087>] entry_SYSCALL_64_after_hwframe+0x62/0x6a
    unreferenced object 0xffff88810be40800 (size 512):
      comm "xfs_growfs", pid 857, jiffies 4294909093
      hex dump (first 32 bytes):
        20 00 00 00 00 00 00 00 57 ef be dc 00 00 00 00   .......W.......
        10 08 e4 0b 81 88 ff ff 10 08 e4 0b 81 88 ff ff  ................
      backtrace (crc bde50e2d):
        [<ffffffff8191b43a>] __kmalloc_node+0x3da/0x540
        [<ffffffff81814489>] kvmalloc_node+0x99/0x160
        [<ffffffff8286acff>] bucket_table_alloc.isra.0+0x5f/0x400
        [<ffffffff8286bdc5>] rhashtable_init+0x405/0x760
        [<ffffffff8238dda3>] xfs_initialize_perag+0x3a3/0x810
        [<ffffffff824f679c>] xfs_growfs_data+0x9bc/0xbc0
        [<ffffffff8250b90e>] xfs_file_ioctl+0x5fe/0x14d0
        [<ffffffff81aa5194>] __x64_sys_ioctl+0x144/0x1c0
        [<ffffffff83c3d81f>] do_syscall_64+0x3f/0xe0
        [<ffffffff83e00087>] entry_SYSCALL_64_after_hwframe+0x62/0x6a

      Factor out xfs_free_unused_perag_range() from xfs_initialize_perag(),
      used for freeing unused perag within a specified range in error handling,
      included in the error path of the growfs failure.

      Fixes: 1c1c6ebcf5 ("xfs: Replace per-ag array with a radix tree")
      Signed-off-by: Long Li <leo.lilong@huawei.com>
      Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
      Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-06-06 10:32:57 -05:00
Bill O'Donnell 6e6fc42354 xfs: fix uninit warning in xfs_growfs_data
JIRA: https://issues.redhat.com/browse/RHEL-25419

commit ed04a91f718e6e1ab82d47a22b26e4b50c1666f6
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Thu Jul 6 18:00:59 2023 -0700

    xfs: fix uninit warning in xfs_growfs_data

    Quiet down this gcc warning:

    fs/xfs/xfs_fsops.c: In function ‘xfs_growfs_data’:
    fs/xfs/xfs_fsops.c:219:21: error: ‘lastag_extended’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
      219 |                 if (lastag_extended) {
          |                     ^~~~~~~~~~~~~~~
    fs/xfs/xfs_fsops.c💯33: note: ‘lastag_extended’ was declared here
      100 |         bool                    lastag_extended;
          |                                 ^~~~~~~~~~~~~~~

    By setting its value explicitly.  From code analysis I don't think this
    is a real problem, but I have better things to do than analyse this
    closely.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-06-06 10:32:55 -05:00
Bill O'Donnell 92d35bb67f xfs: fix ag count overflow during growfs
JIRA: https://issues.redhat.com/browse/RHEL-25419

commit c3b880acadc95d6e019eae5d669e072afda24f1b
Author: Long Li <leo.lilong@huaweicloud.com>
Date:   Tue Jun 13 08:49:20 2023 -0700

    xfs: fix ag count overflow during growfs

    I found a corruption during growfs:

     XFS (loop0): Internal error agbno >= mp->m_sb.sb_agblocks at line 3661 of
       file fs/xfs/libxfs/xfs_alloc.c.  Caller __xfs_free_extent+0x28e/0x3c0
     CPU: 0 PID: 573 Comm: xfs_growfs Not tainted 6.3.0-rc7-next-20230420-00001-gda8c95746257
     Call Trace:
      <TASK>
      dump_stack_lvl+0x50/0x70
      xfs_corruption_error+0x134/0x150
      __xfs_free_extent+0x2c1/0x3c0
      xfs_ag_extend_space+0x291/0x3e0
      xfs_growfs_data+0xd72/0xe90
      xfs_file_ioctl+0x5f9/0x14a0
      __x64_sys_ioctl+0x13e/0x1c0
      do_syscall_64+0x39/0x80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
     XFS (loop0): Corruption detected. Unmount and run xfs_repair
     XFS (loop0): Internal error xfs_trans_cancel at line 1097 of file
       fs/xfs/xfs_trans.c.  Caller xfs_growfs_data+0x691/0xe90
     CPU: 0 PID: 573 Comm: xfs_growfs Not tainted 6.3.0-rc7-next-20230420-00001-gda8c95746257
     Call Trace:
      <TASK>
      dump_stack_lvl+0x50/0x70
      xfs_error_report+0x93/0xc0
      xfs_trans_cancel+0x2c0/0x350
      xfs_growfs_data+0x691/0xe90
      xfs_file_ioctl+0x5f9/0x14a0
      __x64_sys_ioctl+0x13e/0x1c0
      do_syscall_64+0x39/0x80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
     RIP: 0033:0x7f2d86706577

    The bug can be reproduced with the following sequence:

     # truncate -s  1073741824 xfs_test.img
     # mkfs.xfs -f -b size=1024 -d agcount=4 xfs_test.img
     # truncate -s 2305843009213693952  xfs_test.img
     # mount -o loop xfs_test.img /mnt/test
     # xfs_growfs -D  1125899907891200  /mnt/test

    The root cause is that during growfs, user space passed in a large value
    of newblcoks to xfs_growfs_data_private(), due to current sb_agblocks is
    too small, new AG count will exceed UINT_MAX. Because of AG number type
    is unsigned int and it would overflow, that caused nagcount much smaller
    than the actual value. During AG extent space, delta blocks in
    xfs_resizefs_init_new_ags() will much larger than the actual value due to
    incorrect nagcount, even exceed UINT_MAX. This will cause corruption and
    be detected in __xfs_free_extent. Fix it by growing the filesystem to up
    to the maximally allowed AGs and not return EINVAL when new AG count
    overflow.

    Signed-off-by: Long Li <leo.lilong@huawei.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-06-06 10:32:51 -05:00
Andrey Albershteyn 1d729fd5ab xfs: short circuit xfs_growfs_data_private() if delta is zero
JIRA: https://issues.redhat.com/browse/RHEL-21392

commit 84712492e6dab803bf595fb8494d11098b74a652
Author: Eric Sandeen <sandeen@redhat.com>
Date:   Thu Dec 14 13:28:08 2023 -0600

    xfs: short circuit xfs_growfs_data_private() if delta is zero

    Although xfs_growfs_data() doesn't call xfs_growfs_data_private()
    if in->newblocks == mp->m_sb.sb_dblocks, xfs_growfs_data_private()
    further massages the new block count so that we don't i.e. try
    to create a too-small new AG.

    This may lead to a delta of "0" in xfs_growfs_data_private(), so
    we end up in the shrink case and emit the EXPERIMENTAL warning
    even if we're not changing anything at all.

    Fix this by returning straightaway if the block delta is zero.

    (nb: in older kernels, the result of entering the shrink case
    with delta == 0 may actually let an -ENOSPC escape to userspace,
    which is confusing for users.)

    Fixes: fb2fc17201 ("xfs: support shrinking unused space in the last AG")
    Signed-off-by: Eric Sandeen <sandeen@redhat.com>
    Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
    Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
2024-01-12 14:32:25 +01:00
Bill O'Donnell da0e203942 xfs: don't deplete the reserve pool when trying to shrink the fs
JIRA: https://issues.redhat.com/browse/RHEL-2002

commit 06f3ef6e1705612b88aa0b6991e2ac3b8ed3f8ec
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Mon Jun 12 18:09:04 2023 -0700

    xfs: don't deplete the reserve pool when trying to shrink the fs

    Every now and then, xfs/168 fails with this logged in dmesg:

    Reserve blocks depleted! Consider increasing reserve pool size.
    EXPERIMENTAL online shrink feature in use. Use at your own risk!
    Per-AG reservation for AG 1 failed.  Filesystem may run out of space.
    Per-AG reservation for AG 1 failed.  Filesystem may run out of space.
    Error -28 reserving per-AG metadata reserve pool.
    Corruption of in-memory data (0x8) detected at xfs_ag_shrink_space+0x23c/0x3b0 [xfs] (fs/xfs/libxfs/xfs_ag.c:1007).  Shutting down filesystem.

    It's silly to deplete the reserved blocks pool just to shrink the
    filesystem, particularly since the fs goes down after that.

    Fixes: fb2fc17201 ("xfs: support shrinking unused space in the last AG")
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-11-10 07:22:27 -06:00
Ming Lei e05e754c4d xfs: wire up sops->shutdown
JIRA: https://issues.redhat.com/browse/RHEL-1516

commit e7caa877e5ddac63886f4a8376cb3ffbd4dfe569
Author: Christoph Hellwig <hch@lst.de>
Date:   Thu Jun 1 11:44:55 2023 +0200

    xfs: wire up sops->shutdown

    Wire up the shutdown method to shut down the file system when the
    underlying block device is marked dead.  Add a new message to
    clearly distinguish this shutdown reason from other shutdowns.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Acked-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Link: https://lore.kernel.org/r/20230601094459.1350643-13-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2023-09-18 15:59:32 +08:00
Bill O'Donnell c6108fc126 xfs: implement ->notify_failure() for XFS
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2192730

Conflicts: change to xfs_perag_get() and xfs_perag_put() api from previous out
	   of order patch from upstream fa044ae70 xfs: pass perag to xfs_read_agf
	   required changes to xfs_notify_failure.c

commit 6f643c57d57c56d4677bc05f1fca2ef3f249797c
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date:   Fri Jun 3 13:37:30 2022 +0800

    xfs: implement ->notify_failure() for XFS

    Introduce xfs_notify_failure.c to handle failure related works, such as
    implement ->notify_failure(), register/unregister dax holder in xfs, and
    so on.

    If the rmap feature of XFS enabled, we can query it to find files and
    metadata which are associated with the corrupt data.  For now all we do is
    kill processes with that file mapped into their address spaces, but future
    patches could actually do something about corrupt metadata.

    After that, the memory failure needs to notify the processes who are using
    those files.

    Link: https://lkml.kernel.org/r/20220603053738.1218681-7-ruansy.fnst@fujitsu.com
    Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Dan Williams <dan.j.wiliams@intel.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
    Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
    Cc: Jane Chu <jane.chu@oracle.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Cc: Ritesh Harjani <riteshh@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-06-16 10:34:27 -05:00
Bill O'Donnell a27ea962ac xfs: Pre-calculate per-AG agbno geometry
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 0800169e3e2c97a033e8b7f3d1e6c689e0d71a19
Author: Dave Chinner <dchinner@redhat.com>
Date:   Thu Jul 7 19:13:02 2022 +1000

    xfs: Pre-calculate per-AG agbno geometry

    There is a lot of overhead in functions like xfs_verify_agbno() that
    repeatedly calculate the geometry limits of an AG. These can be
    pre-calculated as they are static and the verification context has
    a per-ag context it can quickly reference.

    In the case of xfs_verify_agbno(), we now always have a perag
    context handy, so we can store the AG length and the minimum valid
    block in the AG in the perag. This means we don't have to calculate
    it on every call and it can be inlined in callers if we move it
    to xfs_ag.h.

    Move xfs_ag_block_count() to xfs_ag.c because it's really a
    per-ag function and not an XFS type function. We need a little
    bit of rework that is specific to xfs_initialise_perag() to allow
    growfs to calculate the new perag sizes before we've updated the
    primary superblock during the grow (chicken/egg situation).

    Note that we leave the original xfs_verify_agbno in place in
    xfs_types.c as a static function as other callers in that file do
    not have per-ag contexts so still need to go the long way. It's been
    renamed to xfs_verify_agno_agbno() to indicate it takes both an agno
    and an agbno to differentiate it from new function.

    Future commits will make similar changes for other per-ag geometry
    validation functions.

    Further:

    $ size --totals fs/xfs/built-in.a
               text    data     bss     dec     hex filename
    before  1483006  329588     572 1813166  1baaae (TOTALS)
    after   1482185  329588     572 1812345  1ba779 (TOTALS)

    This rework reduces the binary size by ~820 bytes, indicating
    that much less work is being done to bounds check the agbno values
    against on per-ag geometry information.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:40 -05:00
Bill O'Donnell 7283a93a37 xfs: make last AG grow/shrink perag centric
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit c6aee2481419b638a5257adbd3ffd33b11c59fa8
Author: Dave Chinner <dchinner@redhat.com>
Date:   Thu Jul 7 19:07:09 2022 +1000

    xfs: make last AG grow/shrink perag centric

    Because the perag must exist for these operations, look it up as
    part of the common shrink operations and pass it instead of the
    mount/agno pair.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:38 -05:00
Bill O'Donnell f22acad41e xfs: implement per-mount warnings for scrub and shrink usage
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit df5660cf63bbafb5a1250954b91d9ec26558536f
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Fri May 27 10:31:34 2022 +1000

    xfs: implement per-mount warnings for scrub and shrink usage

    Currently, we don't have a consistent story around logging when an
    EXPERIMENTAL feature gets turned on at runtime -- online fsck and shrink
    log a message once per day across all mounts, and the recently merged
    LARP mode only ever does it once per insmod cycle or reboot.

    Because EXPERIMENTAL tags are supposed to go away eventually, convert
    the existing daily warnings into state flags that travel with the mount,
    and warn once per mount.  Making this an opstate flag means that we'll
    be able to capture the experimental usage in the ftrace output too.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Dave Chinner <david@fromorbit.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:31 -05:00
Bill O'Donnell 1c2b1203c8 xfs: use a separate frextents counter for rt extent reservations
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 2229276c5283264b8c2241c1ed972bbb136cab22
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Tue Apr 12 06:49:42 2022 +1000

    xfs: use a separate frextents counter for rt extent reservations

    As mentioned in the previous commit, the kernel misuses sb_frextents in
    the incore mount to reflect both incore reservations made by running
    transactions as well as the actual count of free rt extents on disk.
    This results in the superblock being written to the log with an
    underestimate of the number of rt extents that are marked free in the
    rtbitmap.

    Teaching XFS to recompute frextents after log recovery avoids
    operational problems in the current mount, but it doesn't solve the
    problem of us writing undercounted frextents which are then recovered by
    an older kernel that doesn't have that fix.

    Create an incore percpu counter to mirror the ondisk frextents.  This
    new counter will track transaction reservations and the only time we
    will touch the incore super counter (i.e the one that gets logged) is
    when those transactions commit updates to the rt bitmap.  This is in
    contrast to the lazysbcount counters (e.g. fdblocks), where we know that
    log recovery will always fix any incorrect counter that we log.
    As a bonus, we only take m_sb_lock at transaction commit time.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Dave Chinner <david@fromorbit.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:10:59 -05:00
Bill O'Donnell 40168700cf xfs: xfs_do_force_shutdown needs to block racing shutdowns
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 41e6362183589afd2cd51d653e277d256daab11f
Author: Dave Chinner <dchinner@redhat.com>
Date:   Tue Mar 29 18:22:01 2022 -0700

    xfs: xfs_do_force_shutdown needs to block racing shutdowns

    When we call xfs_forced_shutdown(), the caller often expects the
    filesystem to be completely shut down when it returns. However,
    if we have racing xfs_forced_shutdown() calls, the first caller sets
    the mount shutdown flag then goes to shutdown the log. The second
    caller sees the mount shutdown flag and returns immediately - it
    does not wait for the log to be shut down.

    Unfortunately, xfs_forced_shutdown() is used in some places that
    expect it to completely shut down the filesystem before it returns
    (e.g. xfs_trans_log_inode()). As such, returning before the log has
    been shut down leaves us in a place where the transaction failed to
    complete correctly but we still call xfs_trans_commit(). This
    situation arises because xfs_trans_log_inode() does not return an
    error and instead calls xfs_force_shutdown() to ensure that the
    transaction being committed is aborted.

    Unfortunately, we have a race condition where xfs_trans_commit()
    needs to check xlog_is_shutdown() because it can't abort log items
    before the log is shut down, but it needs to use xfs_is_shutdown()
    because xfs_forced_shutdown() does not block waiting for the log to
    shut down.

    To fix this conundrum, first we make all calls to
    xfs_forced_shutdown() block until the log is also shut down. This
    means we can then safely use xfs_forced_shutdown() as a mechanism
    that ensures the currently running transaction will be aborted by
    xfs_trans_commit() regardless of the shutdown check it uses.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:10:53 -05:00
Bill O'Donnell 1f95e96c76 xfs: don't report reserved bnobt space as available
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 85bcfa26f9a3782be37d4feafd49668b98b8bdbe
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Wed Mar 16 13:38:43 2022 -0700

    xfs: don't report reserved bnobt space as available

    On a modern filesystem, we don't allow userspace to allocate blocks for
    data storage from the per-AG space reservations, the user-controlled
    reservation pool that prevents ENOSPC in the middle of internal
    operations, or the internal per-AG set-aside that prevents unwanted
    filesystem shutdowns due to ENOSPC during a bmap btree split.

    Since we now consider freespace btree blocks as unavailable for
    allocation for data storage, we shouldn't report those blocks via statfs
    either.  This makes the numbers that we return via the statfs f_bavail
    and f_bfree fields a more conservative estimate of actual free space.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:10:52 -05:00
Bill O'Donnell ee364c017f xfs: fix overfilling of reserve pool
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 82be38bcf8a2e056b4c99ce79a3827fa743df6ec
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Thu Mar 24 10:57:07 2022 -0700

    xfs: fix overfilling of reserve pool

    Due to cycling of m_sb_lock, it's possible for multiple callers of
    xfs_reserve_blocks to race at changing the pool size, subtracting blocks
    from fdblocks, and actually putting it in the pool.  The result of all
    this is that we can overfill the reserve pool to hilarious levels.

    xfs_mod_fdblocks, when called with a positive value, already knows how
    to take freed blocks and either fill the reserve until it's full, or put
    them in fdblocks.  Use that instead of setting m_resblks_avail directly.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:10:52 -05:00
Bill O'Donnell b0612e5697 xfs: always succeed at setting the reserve pool size
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 0baa2657dc4d79202148be79a3dc36c35f425060
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Thu Mar 24 12:43:32 2022 -0700

    xfs: always succeed at setting the reserve pool size

    Nowadays, xfs_mod_fdblocks will always choose to fill the reserve pool
    with freed blocks before adding to fdblocks.  Therefore, we can change
    the behavior of xfs_reserve_blocks slightly -- setting the target size
    of the pool should always succeed, since a deficiency will eventually
    be made up as blocks get freed.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:10:52 -05:00
Bill O'Donnell 014d0f5b57 xfs: remove infinite loop when reserving free block pool
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 15f04fdc75aaaa1cccb0b8b3af1be290e118a7bc
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Fri Mar 11 10:56:01 2022 -0800

    xfs: remove infinite loop when reserving free block pool

    Infinite loops in kernel code are scary.  Calls to xfs_reserve_blocks
    should be rare (people should just use the defaults!) so we really don't
    need to try so hard.  Simplify the logic here by removing the infinite
    loop.

    Cc: Brian Foster <bfoster@redhat.com>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:10:52 -05:00
Bill O'Donnell 057e1a6479 xfs: don't include bnobt blocks when reserving free block pool
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit c8c568259772751a14e969b7230990508de73d9d
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Wed Mar 16 11:54:18 2022 -0700

    xfs: don't include bnobt blocks when reserving free block pool

    xfs_reserve_blocks controls the size of the user-visible free space
    reserve pool.  Given the difference between the current and requested
    pool sizes, it will try to reserve free space from fdblocks.  However,
    the amount requested from fdblocks is also constrained by the amount of
    space that we think xfs_mod_fdblocks will give us.  If we forget to
    subtract m_allocbt_blks before calling xfs_mod_fdblocks, it will will
    return ENOSPC and we'll hang the kernel at mount due to the infinite
    loop.

    In commit fd43cf600c, we decided that xfs_mod_fdblocks should not hand
    out the "free space" used by the free space btrees, because some portion
    of the free space btrees hold in reserve space for future btree
    expansion.  Unfortunately, xfs_reserve_blocks' estimation of the number
    of blocks that it could request from xfs_mod_fdblocks was not updated to
    include m_allocbt_blks, so if space is extremely low, the caller hangs.

    Fix this by creating a function to estimate the number of blocks that
    can be reserved from fdblocks, which needs to exclude the set-aside and
    m_allocbt_blks.

    Found by running xfs/306 (which formats a single-AG 20MB filesystem)
    with an fstests configuration that specifies a 1k blocksize and a
    specially crafted log size that will consume 7/8 of the space (17920
    blocks, specifically) in that AG.

    Cc: Brian Foster <bfoster@redhat.com>
    Fixes: fd43cf600c ("xfs: set aside allocation btree blocks from block reservation")
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:10:51 -05:00
Chris von Recklinghausen 41c8d02b66 xfs: convert shutdown reasons to unsigned.
Bugzilla: https://bugzilla.redhat.com/2160210

commit 2eb7550d2c0dd7c383839018991dfa602790dc77
Author: Dave Chinner <dchinner@redhat.com>
Date:   Thu Apr 21 10:47:38 2022 +1000

    xfs: convert shutdown reasons to unsigned.

    5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned
    fields to be unsigned.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
    Signed-off-by: Dave Chinner <david@fromorbit.com>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:18:51 -04:00
Brian Foster a672539203 xfs: convert remaining mount flags to state flags
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143
Upstream Status: linux.git

commit 2e973b2cd4cdb993be94cca4c33f532f1ed05316
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Aug 18 18:46:52 2021 -0700

    xfs: convert remaining mount flags to state flags

    The remaining mount flags kept in m_flags are actually runtime state
    flags. These change dynamically, so they really should be updated
    atomically so we don't potentially lose an update due to racing
    modifications.

    Convert these remaining flags to be stored in m_opstate and use
    atomic bitops to set and clear the flags. This also adds a couple of
    simple wrappers for common state checks - read only and shutdown.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Brian Foster <bfoster@redhat.com>
2022-08-25 08:11:34 -04:00
Brian Foster d54a790d1d xfs: replace xfs_sb_version checks with feature flag checks
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143
Upstream Status: linux.git

commit 38c26bfd90e1999650d5ef40f90d721f05916643
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Aug 18 18:46:37 2021 -0700

    xfs: replace xfs_sb_version checks with feature flag checks

    Convert the xfs_sb_version_hasfoo() to checks against
    mp->m_features. Checks of the superblock itself during disk
    operations (e.g. in the read/write verifiers and the to/from disk
    formatters) are not converted - they operate purely on the
    superblock state. Everything else should use the mount features.

    Large parts of this conversion were done with sed with commands like
    this:

    for f in `git grep -l xfs_sb_version_has fs/xfs/*.c`; do
            sed -i -e 's/xfs_sb_version_has\(.*\)(&\(.*\)->m_sb)/xfs_has_\1(\2)/' $f
    done

    With manual cleanups for things like "xfs_has_extflgbit" and other
    little inconsistencies in naming.

    The result is ia lot less typing to check features and an XFS binary
    size reduced by a bit over 3kB:

    $ size -t fs/xfs/built-in.a
            text       data     bss     dec     hex filenam
    before  1130866  311352     484 1442702  16038e (TOTALS)
    after   1127727  311352     484 1439563  15f74b (TOTALS)

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Brian Foster <bfoster@redhat.com>
2022-08-25 08:11:34 -04:00
Brian Foster c59e813d73 xfs: add trace point for fs shutdown
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143
Upstream Status: linux.git

commit 7f89c838396e2e5b484dd59cdd59eb990a79fd9a
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Tue Aug 10 17:00:54 2021 -0700

    xfs: add trace point for fs shutdown

    Add a tracepoint for fs shutdowns so we can capture that in ftrace
    output.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Brian Foster <bfoster@redhat.com>
2022-08-25 08:11:29 -04:00
Brian Foster efbf2a740b xfs: make forced shutdown processing atomic
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143
Upstream Status: linux.git

commit b36d4651e1650082d27fa477318183c4a7210e30
Author: Dave Chinner <dchinner@redhat.com>
Date:   Tue Aug 10 18:00:39 2021 -0700

    xfs: make forced shutdown processing atomic

    The running of a forced shutdown is a bit of a mess. It does racy
    checks for XFS_MOUNT_SHUTDOWN in xfs_do_force_shutdown(), then
    does more racy checks in xfs_log_force_unmount() before finally
    setting XFS_MOUNT_SHUTDOWN and XLOG_IO_ERROR under the
    log->icloglock.

    Move the checking and setting of XFS_MOUNT_SHUTDOWN into
    xfs_do_force_shutdown() so we only process a shutdown once and once
    only. Serialise this with the mp->m_sb_lock spinlock so that the
    state change is atomic and won't race. Move all the mount specific
    shutdown state changes from xfs_log_force_unmount() to
    xfs_do_force_shutdown() so they are done atomically with setting
    XFS_MOUNT_SHUTDOWN.

    Then get rid of the racy xlog_is_shutdown() check from
    xlog_force_shutdown(), and gate the log shutdown on the
    test_and_set_bit(XLOG_IO_ERROR) test under the icloglock. This
    means that the log is shutdown once and once only, and code that
    needs to prevent races with shutdown can do so by holding the
    icloglock and checking the return value of xlog_is_shutdown().

    This results in a predictable shutdown execution process - we set the
    shutdown flags once and process the shutdown once rather than the
    current "as many concurrent shutdowns as can race to the flag
    setting" situation we have now.

    Also, now that shutdown is atomic, alway emit a stack trace when the
    error level for the filesystem is high enough. This means that we
    always get a stack trace when trying to diagnose the cause of
    shutdowns in the field, rather than just for SHUTDOWN_CORRUPT_INCORE
    cases.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Brian Foster <bfoster@redhat.com>
2022-08-25 08:11:26 -04:00
Darrick J. Wong c06ad17cfa xfs: shorten the shutdown messages to a single line
Consolidate the shutdown messages to a single line containing the
reason, the passed-in flags, the source of the shutdown, and the end
result.  This means we now only have one line to look for when
debugging, which is useful when the fs goes down while something else is
flooding dmesg.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
2021-06-21 10:14:13 -07:00
Darrick J. Wong 3a1c3abe89 xfs: print name of function causing fs shutdown instead of hex pointer
In xfs_do_force_shutdown, print the symbolic name of the function that
called us to shut down the filesystem instead of a raw hex pointer.
This makes debugging a lot easier:

XFS (sda): xfs_do_force_shutdown(0x2) called from line 2440 of file
	fs/xfs/xfs_log.c. Return address = ffffffffa038bc38

becomes:

XFS (sda): xfs_do_force_shutdown(0x2) called from line 2440 of file
	fs/xfs/xfs_log.c. Return address = xfs_trans_mod_sb+0x25

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
2021-06-21 10:13:57 -07:00
Dave Chinner f250eedcf7 xfs: make for_each_perag... a first class citizen
for_each_perag_tag() is defined in xfs_icache.c for local use.
Promote this to xfs_ag.h and define equivalent iteration functions
so that we can use them to iterate AGs instead to replace open coded
perag walks and perag lookups.

We also convert as many of the straight forward open coded AG walks
to use these iterators as possible. Anything that is not a direct
conversion to an iterator is ignored and will be updated in future
commits.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2021-06-02 10:48:24 +10:00
Darrick J. Wong 1aec7c3d05 xfs: remove obsolete AGF counter debugging
In commit f8f2835a9c we changed the behavior of XFS to use EFIs to
remove blocks from an overfilled AGFL because there were complaints
about transaction overruns that stemmed from trying to free multiple
blocks in a single transaction.

Unfortunately, that commit missed a subtlety in the debug-mode
transaction accounting when a realtime volume is attached.  If a
realtime file undergoes a data fork mapping change such that realtime
extents are allocated (or freed) in the same transaction that a data
device block is also allocated (or freed), we can trip a debugging
assertion.  This can happen (for example) if a realtime extent is
allocated and it is necessary to reshape the bmbt to hold the new
mapping.

When we go to allocate a bmbt block from an AG, the first thing the data
device block allocator does is ensure that the freelist is the proper
length.  If the freelist is too long, it will trim the freelist to the
proper length.

In debug mode, trimming the freelist calls xfs_trans_agflist_delta() to
record the decrement in the AG free list count.  Prior to f8f28 we would
put the free block back in the free space btrees in the same
transaction, which calls xfs_trans_agblocks_delta() to record the
increment in the AG free block count.  Since AGFL blocks are included in
the global free block count (fdblocks), there is no corresponding
fdblocks update, so the AGFL free satisfies the following condition in
xfs_trans_apply_sb_deltas:

	/*
	 * Check that superblock mods match the mods made to AGF counters.
	 */
	ASSERT((tp->t_fdblocks_delta + tp->t_res_fdblocks_delta) ==
	       (tp->t_ag_freeblks_delta + tp->t_ag_flist_delta +
		tp->t_ag_btree_delta));

The comparison here used to be: (X + 0) == ((X+1) + -1 + 0), where X is
the number blocks that were allocated.

After commit f8f28 we defer the block freeing to the next chained
transaction, which means that the calls to xfs_trans_agflist_delta and
xfs_trans_agblocks_delta occur in separate transactions.  The (first)
transaction that shortens the free list trips on the comparison, which
has now become:

(X + 0) == ((X) + -1 + 0)

because we haven't freed the AGFL block yet; we've only logged an
intention to free it.  When the second transaction (the deferred free)
commits, it will evaluate the expression as:

(0 + 0) == (1 + 0 + 0)

and trip over that in turn.

At this point, the astute reader may note that the two commits tagged by
this patch have been in the kernel for a long time but haven't generated
any bug reports.  How is it that the author became aware of this bug?

This originally surfaced as an intermittent failure when I was testing
realtime rmap, but a different bug report by Zorro Lang reveals the same
assertion occuring on !lazysbcount filesystems.

The common factor to both reports (and why this problem wasn't
previously reported) becomes apparent if we consider when
xfs_trans_apply_sb_deltas is called by __xfs_trans_commit():

	if (tp->t_flags & XFS_TRANS_SB_DIRTY)
		xfs_trans_apply_sb_deltas(tp);

With a modern lazysbcount filesystem, transactions update only the
percpu counters, so they don't need to set XFS_TRANS_SB_DIRTY, hence
xfs_trans_apply_sb_deltas is rarely called.

However, updates to the count of free realtime extents are not part of
lazysbcount, so XFS_TRANS_SB_DIRTY will be set on transactions adding or
removing data fork mappings to realtime files; similarly,
XFS_TRANS_SB_DIRTY is always set on !lazysbcount filesystems.

Dave mentioned in response to an earlier version of this patch:

"IIUC, what you are saying is that this debug code is simply not
exercised in normal testing and hasn't been for the past decade?  And it
still won't be exercised on anything other than realtime device testing?

"...it was debugging code from 1994 that was largely turned into dead
code when lazysbcounters were introduced in 2007. Hence I'm not sure it
holds any value anymore."

This debugging code isn't especially helpful - you can modify the
flcount on one AG and the freeblks of another AG, and it won't trigger.
Add the fact that nobody noticed for a decade, and let's just get rid of
it (and start testing realtime :P).

This bug was found by running generic/051 on either a V4 filesystem
lacking lazysbcount; or a V5 filesystem with a realtime volume.

Cc: bfoster@redhat.com, zlang@redhat.com
Fixes: f8f2835a9c ("xfs: defer agfl block frees when dfops is available")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2021-04-29 07:44:18 -07:00
Gao Xiang fb2fc17201 xfs: support shrinking unused space in the last AG
As the first step of shrinking, this attempts to enable shrinking
unused space in the last allocation group by fixing up freespace
btree, agi, agf and adjusting super block and use a helper
xfs_ag_shrink_space() to fixup the last AG.

This can be all done in one transaction for now, so I think no
additional protection is needed.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25 16:47:52 -07:00
Gao Xiang c789c83c7e xfs: hoist out xfs_resizefs_init_new_ags()
Move out related logic for initializing new added AGs to a new helper
in preparation for shrinking. No logic changes.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25 16:47:52 -07:00
Gao Xiang 014695c0a7 xfs: update lazy sb counters immediately for resizefs
sb_fdblocks will be updated lazily if lazysbcount is enabled,
therefore when shrinking the filesystem sb_fdblocks could be
larger than sb_dblocks and xfs_validate_sb_write() would fail.

Even for growfs case, it'd be better to update lazy sb counters
immediately to reflect the real sb counters.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25 16:47:52 -07:00
Gao Xiang 07aabd9c4a xfs: get rid of xfs_growfs_{data,log}_t
Such usage isn't encouraged by the kernel coding style. Leave the
definitions alone in case of userspace users.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-02-03 09:18:50 -08:00
Gao Xiang ce5e1062e2 xfs: rename `new' to `delta' in xfs_growfs_data_private()
It actually means the delta block count of growfs. Rename it in order
to make it clear. Also introduce nb_div to avoid reusing `delta`.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-02-03 09:18:50 -08:00
Linus Torvalds a0b9631487 New code for 5.11:
- Introduce a "needsrepair" "feature" to flag a filesystem as needing a
   pass through xfs_repair.  This is key to enabling filesystem upgrades
   (in xfs_db) that require xfs_repair to make minor adjustments to metadata.
 - Refactor parameter checking of recovered log intent items so that we
   actually use the same validation code as them that generate the intent
   items.
 - Various fixes to online scrub not reacting correctly to directory
   entries pointing to inodes that cannot be igetted.
 - Refactor validation helpers for data and rt volume extents.
 - Refactor XFS_TRANS_DQ_DIRTY out of existence.
 - Fix a longstanding bug where mounting with "uqnoenforce" would start
   user quotas in non-enforcing mode but /proc/mounts would display
   "usrquota", implying that they are being enforced.
 - Don't flag dax+reflink inodes as corruption since that is a valid (but
   not fully functional) combination right now.
 - Clean up raid stripe validation functions.
 - Refactor the inode allocation code to be more straightforward.
 - Small prep cleanup for idmapping support.
 - Get rid of the xfs_buf_t typedef.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl/bjbwACgkQ+H93GTRK
 tOsKhg//YW1fjY5HS7O4SojkhpJXvWQ8xgSmKP6hzmaEoKtSdqk9F7c1Nm+ZF3hH
 qBpmlSyVYvoFnRwMnEU+P2MZ78x64XeDYabG9qJ0GFLcrL0uzq9EVM5xJJMSgETd
 Bo7i9JSMGumT2J2LCNUMpahnjgFuhc+C5Wn4cIdTonkMdLBLMOuTHBemDWom9CT+
 6vNm6/cAi2IhxFlXMEPVBLmcUEpkZ869/eArwC1hQShGuUzSGhdztcuGdl9wtItm
 WpYNPhB+wuHkC+mn6IYNFm+Wa30CE4iuk2tL9cFbSxX9DOQ/sxILjQ1eRPnSJzUD
 dXoKkVI3NqSmOeL/EyewNmOx2BzO/WyisPLV2dftIA3D+a7rd0iCJ+ZEagVlzqJG
 krjwK+IA/y9ckwIjg1Nia8+mc5u858yF8r9VZLwafgaLurL2o/wBSPRE/lbaM8xG
 6S+84MhKXzhkh1XW7b/pf2oM0ab4doAJD3+PclqI4djYxnbn7jrebzKj//CKL1a9
 0Sl8ZF2yrFfjBUvvDH5r8IAP9DfdbcrcGbl+6HuKdVS1naW0v2l4J2T0hCjHXnt4
 P5mtUl0U2K/b6vR2C41BuCgkFul9aLV78OJa3SF31/KaebJQrvVbuwL+pEfr9y8/
 mVjbmlYqLBJ22fMQK1uW7TkA7hIG8zNPJjamwv69pasT8j1Q3iE=
 =job0
 -----END PGP SIGNATURE-----

Merge tag 'xfs-5.11-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs updates from Darrick Wong:
 "In this release we add the ability to set a 'needsrepair' flag
  indicating that we /know/ the filesystem requires xfs_repair, but
  other than that, it's the usual strengthening of metadata validation
  and miscellaneous cleanups.

  Summary:

   - Introduce a "needsrepair" "feature" to flag a filesystem as needing
     a pass through xfs_repair. This is key to enabling filesystem
     upgrades (in xfs_db) that require xfs_repair to make minor
     adjustments to metadata.

   - Refactor parameter checking of recovered log intent items so that
     we actually use the same validation code as them that generate the
     intent items.

   - Various fixes to online scrub not reacting correctly to directory
     entries pointing to inodes that cannot be igetted.

   - Refactor validation helpers for data and rt volume extents.

   - Refactor XFS_TRANS_DQ_DIRTY out of existence.

   - Fix a longstanding bug where mounting with "uqnoenforce" would
     start user quotas in non-enforcing mode but /proc/mounts would
     display "usrquota", implying that they are being enforced.

   - Don't flag dax+reflink inodes as corruption since that is a valid
     (but not fully functional) combination right now.

   - Clean up raid stripe validation functions.

   - Refactor the inode allocation code to be more straightforward.

   - Small prep cleanup for idmapping support.

   - Get rid of the xfs_buf_t typedef"

* tag 'xfs-5.11-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (40 commits)
  xfs: remove xfs_buf_t typedef
  fs/xfs: convert comma to semicolon
  xfs: open code updating i_mode in xfs_set_acl
  xfs: remove xfs_vn_setattr_nonsize
  xfs: kill ialloced in xfs_dialloc()
  xfs: spilt xfs_dialloc() into 2 functions
  xfs: move xfs_dialloc_roll() into xfs_dialloc()
  xfs: move on-disk inode allocation out of xfs_ialloc()
  xfs: introduce xfs_dialloc_roll()
  xfs: convert noroom, okalloc in xfs_dialloc() to bool
  xfs: don't catch dax+reflink inodes as corruption in verifier
  xfs: fix the forward progress assertion in xfs_iwalk_run_callbacks
  xfs: remove unneeded return value check for *init_cursor()
  xfs: introduce xfs_validate_stripe_geometry()
  xfs: show the proper user quota options
  xfs: remove the unused XFS_B_FSB_OFFSET macro
  xfs: remove unnecessary null check in xfs_generic_create
  xfs: directly return if the delta equal to zero
  xfs: check tp->t_dqinfo value instead of the XFS_TRANS_DQ_DIRTY flag
  xfs: delete duplicated tp->t_dqinfo null check and allocation
  ...
2020-12-18 12:50:18 -08:00
Dave Chinner e82226138b xfs: remove xfs_buf_t typedef
Prepare for kernel xfs_buf  alignment by getting rid of the
xfs_buf_t typedef from userspace.

[darrick: This patch is a port of a userspace patch removing the
xfs_buf_t typedef in preparation to make the userspace xfs_buf code
behave more like its kernel counterpart.]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-12-16 16:07:34 -08:00
Christoph Hellwig 040f04bd2e fs: simplify freeze_bdev/thaw_bdev
Store the frozen superblock in struct block_device to avoid the awkward
interface that can return a sb only used a cookie, an ERR_PTR or NULL.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Chao Yu <yuchao0@huawei.com>		[f2fs]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01 14:53:38 -07:00
Brian Foster 28d8462079 xfs: remove unused shutdown types
Both types control shutdown messaging and neither is used in the
current codebase.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-07 08:27:48 -07:00
Eric Sandeen 250d4b4c40 xfs: remove unused header files
There are many, many xfs header files which are included but
unneeded (or included twice) in the xfs code, so remove them.

nb: xfs_linux.h includes about 9 headers for everyone, so those
explicit includes get removed by this.  I'm not sure what the
preference is, but if we wanted explicit includes everywhere,
a followup patch could remove those xfs_*.h includes from
xfs_linux.h and move them into the files that need them.
Or it could be left as-is.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:30:43 -07:00
Darrick J. Wong ef32595999 xfs: separate inode geometry
Separate the inode geometry information into a distinct structure.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2019-06-12 08:37:40 -07:00
Eric Sandeen 910832697c xfs: change some error-less functions to void types
There are several functions which have no opportunity to return
an error, and don't contain any ASSERTs which could be argued
to be better constructed as error cases.  So, make them voids
to simplify the callers.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2019-05-01 20:26:30 -07:00
Darrick J. Wong 15a268d9f2 xfs: reserve blocks for ifree transaction during log recovery
Log recovery frees all the inodes stored in the unlinked list, which can
cause expansion of the free inode btree.  The ifree code skips block
reservations if it thinks there's a per-AG space reservation, but we
don't set up the reservation until after log recovery, which means that
a finobt expansion blows up in xfs_trans_mod_sb when we exceed the
transaction's block reservation.

To fix this, we set the "no finobt reservation" flag to true when we
create the xfs_mount and only set it to false if we confirm that every
AG had enough free space to put aside for the finobt.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2019-02-14 22:42:57 -08:00
Julia Lawall 90be9b86da xfs: xfs_fsops: drop useless LIST_HEAD
Drop LIST_HEAD where the variable it declares is never used.

Commit 0410c3bb2b ("xfs: factor ag btree root block
initialisation") stopped using buffer_list and started using a
buffer list in an aghdr_init_data structure, but the declaration
of buffer_list was not removed.

The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
identifier x;
@@
- LIST_HEAD(x);
  ... when != x
// </smpl>

Fixes: 0410c3bb2b ("xfs: factor ag btree root block initialisation")
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-12-29 10:47:58 -08:00
Darrick J. Wong 43004b2a8d xfs: add a block to inode count converter
Add new helpers to convert units of fs blocks into inodes, and AG blocks
into AG inodes, respectively.  Convert all the open-coded conversions
and XFS_OFFBNO_TO_AGINO(, , 0) calls to use them, as appropriate.  The
OFFBNO_TO_AGINO macro is retained for xfs_repair.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-12-12 08:47:16 -08:00
Dave Chinner 56668a5cc4 xfs: issue log message on user force shutdown
The kernel only issues a log message that it's been shut down when
the filesystem triggers a shutdown itself. Hence there is no trace
in the log when a shutdown is triggered manually from userspace.
This can make it hard to see sequence of events in the log when
things go wrong, so make sure we always log a message when a
shutdown is run.

While there, clean up the logic flow so we don't have to continually
check if the shutdown trigger was user initiated before logging
shutdown messages.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-18 17:20:39 +11:00
Darrick J. Wong ebcbef3a61 xfs: pass transaction lock while setting up agresv on cyclic metadata
Pass a tranaction pointer through to all helpers that calculate the
per-AG block reservation.  Online repair will use this to reinitialize
per-ag reservations while it still holds all the AG headers locked to
the repair transaction.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-07-29 22:37:08 -07:00
Darrick J. Wong aafe12cee0 xfs: don't trip over negative free space in xfs_reserve_blocks
If we somehow end up with a filesystem that has fewer free blocks than
the blocks set aside to avoid ENOSPC deadlocks, it's possible that the
free space calculation in xfs_reserve_blocks will spit out a negative
number (because percpu_counter_sum returns s64).  We fail to notice
this negative number and set fdblks_delta to it.  Now we increment
fdblocks(!) and the unsigned type of m_resblks means that we end up
setting a ridiculously huge m_resblks reservation.

Avoid this comedy of errors by detecting the negative free space and
returning -ENOSPC.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-06-24 11:56:36 -07:00
Dave Chinner 0b61f8a407 xfs: convert to SPDX license tags
Remove the verbose license text from XFS files and replace them
with SPDX tags. This does not change the license of any of the code,
merely refers to the common, up-to-date license files in LICENSES/

This change was mostly scripted. fs/xfs/Makefile and
fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
and modified by the following command:

for f in `git grep -l "GNU General" fs/xfs/` ; do
	echo $f
	cat $f | awk -f hdr.awk > $f.new
	mv -f $f.new $f
done

And the hdr.awk script that did the modification (including
detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
is as follows:

$ cat hdr.awk
BEGIN {
	hdr = 1.0
	tag = "GPL-2.0"
	str = ""
}

/^ \* This program is free software/ {
	hdr = 2.0;
	next
}

/any later version./ {
	tag = "GPL-2.0+"
	next
}

/^ \*\// {
	if (hdr > 0.0) {
		print "// SPDX-License-Identifier: " tag
		print str
		print $0
		str=""
		hdr = 0.0
		next
	}
	print $0
	next
}

/^ \* / {
	if (hdr > 1.0)
		next
	if (hdr > 0.0) {
		if (str != "")
			str = str "\n"
		str = str $0
		next
	}
	print $0
	next
}

/^ \*/ {
	if (hdr > 0.0)
		next
	print $0
	next
}

// {
	if (hdr > 0.0) {
		if (str != "")
			str = str "\n"
		str = str $0
		next
	}
	print $0
}

END { }
$

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-06 14:17:53 -07:00