Commit Graph

446 Commits

Author SHA1 Message Date
Bill O'Donnell 08529f7680 xfs: convert XFS_IFORK_PTR to a static inline helper
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 732436ef916b4f338d672ea56accfdb11e8d0732
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Sat Jul 9 10:56:05 2022 -0700

    xfs: convert XFS_IFORK_PTR to a static inline helper

    We're about to make this logic do a bit more, so convert the macro to a
    static inline function for better typechecking and fewer shouty macros.
    No functional changes here.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:42 -05:00
Bill O'Donnell d1a077edc7 xfs: pass perag to xfs_alloc_read_agf()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 08d3e84feeb8cb8e20d54f659446b98fe17913aa
Author: Dave Chinner <dchinner@redhat.com>
Date:   Thu Jul 7 19:07:40 2022 +1000

    xfs: pass perag to xfs_alloc_read_agf()

    xfs_alloc_read_agf() initialises the perag if it hasn't been done
    yet, so it makes sense to pass it the perag rather than pull a
    reference from the buffer. This allows callers to be per-ag centric
    rather than passing mount/agno pairs everywhere.

    Whilst modifying the xfs_reflink_find_shared() function definition,
    declare it static and remove the extern declaration as it is an
    internal function only these days.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:39 -05:00
Bill O'Donnell 51f3b2d709 xfs: kill xfs_alloc_pagf_init()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 76b47e528e3a27a3bf3b3f9153aad9435e03be8c
Author: Dave Chinner <dchinner@redhat.com>
Date:   Thu Jul 7 19:07:32 2022 +1000

    xfs: kill xfs_alloc_pagf_init()

    Trivial wrapper around xfs_alloc_read_agf(), can be easily replaced
    by passing a NULL agfbp to xfs_alloc_read_agf().

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:39 -05:00
Bill O'Donnell 37bb5cd300 xfs: stop artificially limiting the length of bunmap calls
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 4ed6435cc369cce722966983f6e07b872562276f
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Mon Apr 25 18:37:15 2022 -0700

    xfs: stop artificially limiting the length of bunmap calls

    In commit e1a4e37cc7, we clamped the length of bunmapi calls on the
    data forks of shared files to avoid two failure scenarios: one where the
    extent being unmapped is so sparsely shared that we exceed the
    transaction reservation with the sheer number of refcount btree updates
    and EFI intent items; and the other where we attach so many deferred
    updates to the transaction that we pin the log tail and later the log
    head meets the tail, causing the log to livelock.

    We avoid triggering the first problem by tracking the number of ops in
    the refcount btree cursor and forcing a requeue of the refcount intent
    item any time we think that we might be close to overflowing.  This has
    been baked into XFS since before the original e1a4 patch.

    A recent patchset fixed the second problem by changing the deferred ops
    code to finish all the work items created by each round of trying to
    complete a refcount intent item, which eliminates the long chains of
    deferred items (27dad); and causing long-running transactions to relog
    their intent log items when space in the log gets low (74f4d).

    Because this clamp affects /any/ unmapping request regardless of the
    sharing factors of the component blocks, it degrades the performance of
    all large unmapping requests -- whereas with an unshared file we can
    unmap millions of blocks in one go, shared files are limited to
    unmapping a few thousand blocks at a time, which causes the upper level
    code to spin in a bunmapi loop even if it wasn't needed.

    This also eliminates one more place where log recovery behavior can
    differ from online behavior, because bunmapi operations no longer need
    to requeue.  The fstest generic/447 was created to test the old fix, and
    it still passes with this applied.

    Partial-revert-of: e1a4e37cc7 ("xfs: try to avoid blowing out the transaction reservation when bunmaping a shared extent")
    Depends: 27dada070d ("xfs: change the order in which child and parent defer ops ar finished")
    Depends: 74f4d6a1e0 ("xfs: only relog deferred intent items if free space in the log gets low")
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:10 -05:00
Bill O'Donnell acc1d821aa xfs: convert bmapi flags to unsigned.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit e7d410ac336856cdae934e14b9c2c749ca5a32ea
Author: Dave Chinner <dchinner@redhat.com>
Date:   Thu Apr 21 10:46:09 2022 +1000

    xfs: convert bmapi flags to unsigned.

    5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned
    fields to be unsigned.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
    Signed-off-by: Dave Chinner <david@fromorbit.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:06 -05:00
Bill O'Donnell bc25846db5 xfs: convert bmap extent type flags to unsigned.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 0e5b8e45229bc2680f4b10505da338f1ca15a6d2
Author: Dave Chinner <dchinner@redhat.com>
Date:   Thu Apr 21 10:46:01 2022 +1000

    xfs: convert bmap extent type flags to unsigned.

    5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned
    fields to be unsigned.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
    Signed-off-by: Dave Chinner <david@fromorbit.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:05 -05:00
Bill O'Donnell cf7ff3302c xfs: Conditionally upgrade existing inodes to use large extent counters
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 4f86bb4b66c999ad9ddcfd49fec93992eeba2715
Author: Chandan Babu R <chandan.babu@oracle.com>
Date:   Wed Mar 9 07:49:36 2022 +0000

    xfs: Conditionally upgrade existing inodes to use large extent counters

    This commit enables upgrading existing inodes to use large extent counters
    provided that underlying filesystem's superblock has large extent counter
    feature enabled.

    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:10:59 -05:00
Bill O'Donnell b67c061975 xfs: Directory's data fork extent counter can never overflow
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 83a21c18441f75aec64548692b52d34582b98a6a
Author: Chandan Babu R <chandan.babu@oracle.com>
Date:   Tue Mar 29 06:14:00 2022 +0000

    xfs: Directory's data fork extent counter can never overflow

    The maximum file size that can be represented by the data fork extent counter
    in the worst case occurs when all extents are 1 block in length and each block
    is 1KB in size.

    With XFS_MAX_EXTCNT_DATA_FORK_SMALL representing maximum extent count and with
    1KB sized blocks, a file can reach upto,
    (2^31) * 1KB = 2TB

    This is much larger than the theoretical maximum size of a directory
    i.e. XFS_DIR2_SPACE_SIZE * 3 = ~96GB.

    Since a directory's inode can never overflow its data fork extent counter,
    this commit removes all the overflow checks associated with
    it. xfs_dinode_verify() now performs a rough check to verify if a diretory's
    data fork is larger than 96GB.

    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:10:59 -05:00
Bill O'Donnell 014ed3670e xfs: Introduce macros to represent new maximum extent counts for data/attr forks
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit df9ad5cc7a524048ea7ff983d6feeb6d8c47a761
Author: Chandan Babu R <chandan.babu@oracle.com>
Date:   Tue Nov 16 09:54:37 2021 +0000

    xfs: Introduce macros to represent new maximum extent counts for data/attr forks

    This commit defines new macros to represent maximum extent counts allowed by
    filesystems which have support for large per-inode extent counters.

    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:10:58 -05:00
Bill O'Donnell 7677a2eed9 xfs: Use uint64_t to count maximum blocks that can be used by BMBT
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 0c35e7ba18508e9344a1f27b412924bc8b34eba8
Author: Chandan Babu R <chandan.babu@oracle.com>
Date:   Tue Nov 16 09:20:01 2021 +0000

    xfs: Use uint64_t to count maximum blocks that can be used by BMBT

    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:10:57 -05:00
Bill O'Donnell 0dc526ef61 xfs: Promote xfs_extnum_t and xfs_aextnum_t to 64 and 32-bits respectively
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 755c38ffe1a5937d8fa03419018f49f3a23fa9a7
Author: Chandan Babu R <chandan.babu@oracle.com>
Date:   Tue Nov 16 07:28:40 2021 +0000

    xfs: Promote xfs_extnum_t and xfs_aextnum_t to 64 and 32-bits respectively

    A future commit will introduce a 64-bit on-disk data extent counter and a
    32-bit on-disk attr extent counter. This commit promotes xfs_extnum_t and
    xfs_aextnum_t to 64 and 32-bits in order to correctly handle in-core versions
    of these quantities.

    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:10:56 -05:00
Bill O'Donnell 3ec2354520 xfs: Use xfs_extnum_t instead of basic data types
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

Conflicts: minor line adjustment in xfs_inode_buf.c (previous out-of-order patch application)

commit bb1d50494cbdd9c5991ddc7feeeb14982872b2a8
Author: Chandan Babu R <chandan.babu@oracle.com>
Date:   Fri Feb 26 11:24:31 2021 +0530

    xfs: Use xfs_extnum_t instead of basic data types

    xfs_extnum_t is the type to use to declare variables which have values
    obtained from xfs_dinode->di_[a]nextents. This commit replaces basic
    types (e.g. uint32_t) with xfs_extnum_t for such variables.

    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:10:56 -05:00
Bill O'Donnell 118a1c9d62 xfs: Introduce xfs_iext_max_nextents() helper
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

Conflicts: minor line adjustment in xfs_inode_buf.c (previous out-of-order patch application)

commit 9feb8f19665c8ba051c6a81aa7897149e7748e1e
Author: Chandan Babu R <chandan.babu@oracle.com>
Date:   Thu Aug 27 15:09:10 2020 +0530

    xfs: Introduce xfs_iext_max_nextents() helper

    xfs_iext_max_nextents() returns the maximum number of extents possible for one
    of data, cow or attribute fork. This helper will be extended further in a
    future commit when maximum extent counts associated with data/attribute forks
    are increased.

    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:10:55 -05:00
Bill O'Donnell 79e16d14ed xfs: Define max extent length based on on-disk format definition
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 95f0b95e2b686ceaa3f465e9fa079f22e0fe7665
Author: Chandan Babu R <chandan.babu@oracle.com>
Date:   Mon Aug 9 12:05:22 2021 +0530

    xfs: Define max extent length based on on-disk format definition

    The maximum extent length depends on maximum block count that can be stored in
    a BMBT record. Hence this commit defines MAXEXTLEN based on
    BMBT_BLOCKCOUNT_BITLEN.

    While at it, the commit also renames MAXEXTLEN to XFS_MAX_BMBT_EXTLEN.

    Suggested-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:10:55 -05:00
Jeff Moyer 2329cfdda0 xfs: pass the mapping flags to xfs_bmbt_to_iomap
Bugzilla: https://bugzilla.redhat.com/2162211
Conflicts: Upstream commit 304a68b9c63b ("xfs: use iomap_valid method
  to detect stale cached iomaps") was already backported, and that
  commit also changes the signature of xfs_bmbt_to_iomap.  There are a
  lot of conflicts, but resolution is simple.

commit 740fd671e04f8a977018eb9cfe440b4817850f0d
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Nov 29 11:21:57 2021 +0100

    xfs: pass the mapping flags to xfs_bmbt_to_iomap
    
    To prepare for looking at the IOMAP_DAX flag in xfs_bmbt_to_iomap pass in
    the input mapping flags to xfs_bmbt_to_iomap.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Link: https://lore.kernel.org/r/20211129102203.2243509-24-hch@lst.de
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-03-14 10:54:18 -04:00
Carlos Maiolino 5625ea06a6 xfs: use iomap_valid method to detect stale cached iomaps
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2155605
Tested: With xfstests and bz reproducer

Conflicts:
	- Context

Now that iomap supports a mechanism to validate cached iomaps for
buffered write operations, hook it up to the XFS buffered write ops
so that we can avoid data corruptions that result from stale cached
iomaps. See:

https://lore.kernel.org/linux-xfs/20220817093627.GZ3600936@dread.disaster.area/

or the ->iomap_valid() introduction commit for exact details of the
corruption vector.

The validity cookie we store in the iomap is based on the type of
iomap we return. It is expected that the iomap->flags we set in
xfs_bmbt_to_iomap() is not perturbed by the iomap core and are
returned to us in the iomap passed via the .iomap_valid() callback.
This ensures that the validity cookie is always checking the correct
inode fork sequence numbers to detect potential changes that affect
the extent cached by the iomap.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
(cherry picked from commit 304a68b9c63bbfc1f6e159d68e8892fc54a06067)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2023-02-06 11:03:28 +01:00
Carlos Maiolino 93fff0d397 xfs: rename xfs_bmap_add_free to xfs_free_extent_later
Bugzilla: https://bugzilla.redhat.com/2125724

xfs_bmap_add_free isn't a block mapping function; it schedules deferred
freeing operations for a later point in a compound transaction chain.
While it's primarily used by bunmapi, its use has expanded beyond that.
Move it to xfs_alloc.c and rename the function since it's now general
freeing functionality.  Bring the slab cache bits in line with the
way we handle the other intent items.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
(cherry picked from commit c201d9ca5392b20f04882848a071025b0e194c17)
2022-10-21 12:50:46 +02:00
Carlos Maiolino c63a9018d9 xfs: create slab caches for frequently-used deferred items
Bugzilla: https://bugzilla.redhat.com/2125724

Create slab caches for the high-level structures that coordinate
deferred intent items, since they're used fairly heavily.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
(cherry picked from commit f3c799c22c661e181c71a0d9914fc923023f65fb)
2022-10-21 12:50:46 +02:00
Carlos Maiolino 25a40d32f8 xfs: rename _zone variables to _cache
Bugzilla: https://bugzilla.redhat.com/2125724

Conflicts:
	Small conflict at xfs_inode_alloc() due to out of order
	backport. Inode alloc using kmem_cache_alloc() has been
	converted to use alloc_inode_sb() before this patch.

Now that we've gotten rid of the kmem_zone_t typedef, rename the
variables to _cache since that's what they are.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
(cherry picked from commit 182696fb021fc196e5cbe641565ca40fcf0f885a)
2022-10-21 12:50:46 +02:00
Carlos Maiolino d912d565bb xfs: remove kmem_zone typedef
Bugzilla: https://bugzilla.redhat.com/2125724

Remove these typedefs by referencing kmem_cache directly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
(cherry picked from commit e7720afad068a6729d9cd3aaa08212f2f5a7ceff)
2022-10-21 12:50:46 +02:00
Carlos Maiolino f164f431d6 xfs: compute absolute maximum nlevels for each btree type
Bugzilla: https://bugzilla.redhat.com/2125724

Add code for all five btree types so that we can compute the absolute
maximum possible btree height for each btree type.  This is a setup for
the next patch, which makes every btree type have its own cursor cache.

The functions are exported so that we can have xfs_db report the
absolute maximum btree heights for each btree type, rather than making
everyone run their own ad-hoc computations.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
(cherry picked from commit 0ed5f7356daee74244b02e100b3cc043e886e686)
2022-10-21 12:50:46 +02:00
Carlos Maiolino 95a47ce884 xfs: encode the max btree height in the cursor
Bugzilla: https://bugzilla.redhat.com/2125724

Encode the maximum btree height in the cursor, since we're soon going to
allow smaller cursors for AG btrees and larger cursors for file btrees.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
(cherry picked from commit c0643f6fdd6d3c448142ed1492a9a6b6505f9afb)
2022-10-21 12:50:46 +02:00
Carlos Maiolino 636638031a xfs: prepare xfs_btree_cur for dynamic cursor heights
Bugzilla: https://bugzilla.redhat.com/2125724

Split out the btree level information into a separate struct and put it
at the end of the cursor structure as a VLA.  Files with huge data forks
(and in the future, the realtime rmap btree) will require the ability to
support many more levels than a per-AG btree cursor, which means that
we're going to create per-btree type cursor caches to conserve memory
for the more common case.

Note that a subsequent patch actually introduces dynamic cursor heights.
This one merely rearranges the structure to prepare for that.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
(cherry picked from commit 6ca444cfd663545e9e1c19ad2695836ffafad0a6)
2022-10-21 12:50:46 +02:00
Carlos Maiolino 6081ac446f xfs: remove xfs_btree_cur_t typedef
Bugzilla: https://bugzilla.redhat.com/2125724

Get rid of this old typedef before we start changing other things.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
(cherry picked from commit ae127f087dc22b6e37edc870079abf0721a6aed0)
2022-10-21 12:50:46 +02:00
Brian Foster 5872597dac xfs: convert bp->b_bn references to xfs_buf_daddr()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143
Upstream Status: linux.git

commit 9343ee76909e3f6466d85c9ebb0e343cdf54de71
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Aug 18 18:47:05 2021 -0700

    xfs: convert bp->b_bn references to xfs_buf_daddr()

    Stop directly referencing b_bn in code outside the buffer cache, as
    b_bn is supposed to be used only as an internal cache index.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Brian Foster <bfoster@redhat.com>
2022-08-25 08:11:36 -04:00
Brian Foster 74f147b83b xfs: introduce xfs_buf_daddr()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143
Upstream Status: linux.git

commit 04fcad80cd068731a779fb442f78234732683755
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Aug 18 18:46:57 2021 -0700

    xfs: introduce xfs_buf_daddr()

    Introduce a helper function xfs_buf_daddr() to extract the disk
    address of the buffer from the struct xfs_buf. This will replace
    direct accesses to bp->b_bn and bp->b_maps[0].bm_bn, as well as
    the XFS_BUF_ADDR() macro.

    This patch introduces the helper function and replaces all uses of
    XFS_BUF_ADDR() as this is just a simple sed replacement.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Brian Foster <bfoster@redhat.com>
2022-08-25 08:11:36 -04:00
Brian Foster d179379de4 xfs: replace XFS_FORCED_SHUTDOWN with xfs_is_shutdown
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143
Upstream Status: linux.git

commit 75c8c50fa16a23f8ac89ea74834ae8ddd1558d75
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Aug 18 18:46:53 2021 -0700

    xfs: replace XFS_FORCED_SHUTDOWN with xfs_is_shutdown

    Remove the shouty macro and instead use the inline function that
    matches other state/feature check wrapper naming. This conversion
    was done with sed.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Brian Foster <bfoster@redhat.com>
2022-08-25 08:11:34 -04:00
Brian Foster 6def1029c3 xfs: convert mount flags to features
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143
Upstream Status: linux.git
Conflicts: Work around out of order backport in xfs_fs_fill_super().

commit 0560f31a09e523090d1ab2bfe21c69d028c2bdf2
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Aug 18 18:46:52 2021 -0700

    xfs: convert mount flags to features

    Replace m_flags feature checks with xfs_has_<feature>() calls and
    rework the setup code to set flags in m_features.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Brian Foster <bfoster@redhat.com>
2022-08-25 08:11:34 -04:00
Brian Foster d54a790d1d xfs: replace xfs_sb_version checks with feature flag checks
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143
Upstream Status: linux.git

commit 38c26bfd90e1999650d5ef40f90d721f05916643
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Aug 18 18:46:37 2021 -0700

    xfs: replace xfs_sb_version checks with feature flag checks

    Convert the xfs_sb_version_hasfoo() to checks against
    mp->m_features. Checks of the superblock itself during disk
    operations (e.g. in the read/write verifiers and the to/from disk
    formatters) are not converted - they operate purely on the
    superblock state. Everything else should use the mount features.

    Large parts of this conversion were done with sed with commands like
    this:

    for f in `git grep -l xfs_sb_version_has fs/xfs/*.c`; do
            sed -i -e 's/xfs_sb_version_has\(.*\)(&\(.*\)->m_sb)/xfs_has_\1(\2)/' $f
    done

    With manual cleanups for things like "xfs_has_extflgbit" and other
    little inconsistencies in naming.

    The result is ia lot less typing to check features and an XFS binary
    size reduced by a bit over 3kB:

    $ size -t fs/xfs/built-in.a
            text       data     bss     dec     hex filenam
    before  1130866  311352     484 1442702  16038e (TOTALS)
    after   1127727  311352     484 1439563  15f74b (TOTALS)

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Brian Foster <bfoster@redhat.com>
2022-08-25 08:11:34 -04:00
Darrick J. Wong 8b943d21d4 xfs: assorted fixes for 5.14, part 1
This branch contains the first round of various small fixes for 5.14.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmC5YvwACgkQ+H93GTRK
 tOuDTxAAmW+Mc4oPx/Fsa2xqvkDYCGaM1QtkIY9McrGsMdzF0JbB7bR5rrUET2+N
 /FsFTIQ31vZfMEJ69HM6NjQ4xXIKhdEj+PlGsjvg62BqhpPBRuFH7dLBWChiRcf2
 M5SN1P9kcAjIoMR2tKzh8FMDmJTcJlJxxBi9kxb3YK1+UesUHsPS8/jSW3OuIov3
 AJzAZ08SrA8ful5uCMy9Mf5uBgfRuAUjDzA5CM5kA1xqlHREQi+oHl62E81N33mu
 RR8tdHQzvPO5rHyX84GzV5cu2CsmDuOPF2nA5SUxRhIZFfMo5mEezA/nxqqACYti
 rnRGxZVwIG9YYBESVxXFQXIjn5lHoQ2jomk/CraszeVEJCteNRnpbVGzd1xczM3u
 0iuDuHy+aVB/3QhfA6/0vjfttCzkMEld9U9c3WEjIaw5iUCxe531yfrvVEyF9blx
 NBjnQyHGbt+y26BzBjD33NJEdDoZqS3UIQ/rmb4f2mitGN5d9faAcJ754uRJt3o4
 K9HXGjuR+iOH/tCZKDL1hBc4M/pFgNdeBWyFYdQhh8eSj9HSCTCG58zAyQ2WPOSr
 6D/f4BMqivKzzz0HicZPJAoazrtcKGrWTHTxidHIkI4le367NOwv6YJquJ0pFMBs
 8L7OdRqYT3yw6+qErCjEn03WkP9O7V8lHf8hFxt7+dNxDukIj40=
 =6eZB
 -----END PGP SIGNATURE-----

Merge tag 'assorted-fixes-5.14-1_2021-06-03' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-5.14-merge2

xfs: assorted fixes for 5.14, part 1

This branch contains the first round of various small fixes for 5.14.

* tag 'assorted-fixes-5.14-1_2021-06-03' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
  xfs: don't take a spinlock unconditionally in the DIO fastpath
  xfs: mark xfs_bmap_set_attrforkoff static
  xfs: Remove redundant assignment to busy
  xfs: sort variable alphabetically to avoid repeated declaration
2021-06-08 09:22:34 -07:00
Darrick J. Wong c3eabd3650 xfs: initial agnumber -> perag conversions for shrink
If we want to use active references to the perag to be able to gate
 shrink removing AGs and hence perags safely, we've got a fair bit of
 work to do actually use perags in all the places we need to.
 
 There's a lot of code that iterates ag numbers and then
 looks up perags from that, often multiple times for the same perag
 in the one operation. If we want to use reference counted perags for
 access control, then we need to convert all these uses to perag
 iterators, not agno iterators.
 
 [Patches 1-4]
 
 The first step of this is consolidating all the perag management -
 init, free, get, put, etc into a common location. THis is spread all
 over the place right now, so move it all into libxfs/xfs_ag.[ch].
 This does expose kernel only bits of the perag to libxfs and hence
 userspace, so the structures and code is rearranged to minimise the
 number of ifdefs that need to be added to the userspace codebase.
 The perag iterator in xfs_icache.c is promoted to a first class API
 and expanded to the needs of the code as required.
 
 [Patches 5-10]
 
 These are the first basic perag iterator conversions and changes to
 pass the perag down the stack from those iterators where
 appropriate. A lot of this is obvious, simple changes, though in
 some places we stop passing the perag down the stack because the
 code enters into an as yet unconverted subsystem that still uses raw
 AGs.
 
 [Patches 11-16]
 
 These replace the agno passed in the btree cursor for per-ag btree
 operations with a perag that is passed to the cursor init function.
 The cursor takes it's own reference to the perag, and the reference
 is dropped when the cursor is deleted. Hence we get reference
 coverage for the entire time the cursor is active, even if the code
 that initialised the cursor drops it's reference before the cursor
 or any of it's children (duplicates) have been deleted.
 
 The first patch adds the perag infrastructure for the cursor, the
 next four patches convert a btree cursor at a time, and the last
 removes the agno from the cursor once it is unused.
 
 [Patches 17-21]
 
 These patches are a demonstration of the simplifications and
 cleanups that come from plumbing the perag through interfaces that
 select and then operate on a specific AG. In this case the inode
 allocation algorithm does up to three walks across all AGs before it
 either allocates an inode or fails. Two of these walks are purely
 just to select the AG, and even then it doesn't guarantee inode
 allocation success so there's a third walk if the selected AG
 allocation fails.
 
 These patches collapse the selection and allocation into a single
 loop, simplifies the error handling because xfs_dir_ialloc() always
 returns ENOSPC if no AG was selected for inode allocation or we fail
 to allocate an inode in any AG, gets rid of xfs_dir_ialloc()
 wrapper, converts inode allocation to run entirely from a single
 perag instance, and then factors xfs_dialloc() into a much, much
 simpler loop which is easy to understand.
 
 Hence we end up with the same inode allocation logic, but it only
 needs two complete iterations at worst, makes AG selection and
 allocation atomic w.r.t. shrink and chops out out over 100 lines of
 code from this hot code path.
 
 [Patch 22]
 
 Converts the unlink path to pass perags through it.
 
 There's more conversion work to be done, but this patchset gets
 through a large chunk of it in one hit. Most of the iterators are
 converted, so once this is solidified we can move on to converting
 these to active references for being able to free perags while the
 fs is still active.
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEmJOoJ8GffZYWSjj/regpR/R1+h0FAmC3HUgUHGRhdmlkQGZy
 b21vcmJpdC5jb20ACgkQregpR/R1+h2yaw/+P0JzpI+6n06Ei00mjgE/Du/WhMLi
 0JQ93Grlj+miuGGT9DgGCiRpoZnefhEk+BH6JqoEw1DQ3T5ilmAzrHLUUHSQC3+S
 dv85sJduheQ6yHuoO+4MzkaSq6JWKe7E9gZwAsVyBul5aSjdmaJaQdPwYMTXSXo0
 5Uqq8ECFkMcaHVNjcBfasgR/fdyWy2Qe4PFTHTHdQpd+DNZ9UXgFKHW2og+1iry/
 zDIvdIppJULA09TvVcZuFjd/1NzHQ/fLj5PAzz8GwagB4nz2x3s78Zevmo5yW/jK
 3/+50vXa8ldhiHDYGTS3QXvS0xJRyqUyD47eyWOOiojZw735jEvAlCgjX6+0X1HC
 k3gCkQLv8l96fRkvUpgnLf/fjrUnlCuNBkm9d1Eq2Tied8dvLDtiEzoC6L05Nqob
 yd/nIUb1zwJFa9tsoheHhn0bblTGX1+zP0lbRJBje0LotpNO9DjGX5JoIK4GR7F8
 y1VojcdgRI14HlxUnbF3p8wmQByN+M2tnp6GSdv9BA65bjqi05Rj/steFdZHBV6x
 wiRs8Yh6BTvMwKgufHhRQHfRahjNHQ/T/vOE+zNbWqemS9wtEUDop+KvPhC36R/k
 o/cmr23cF8ESX2eChk7XM4On3VEYpcvp2zSFgrFqZYl6RWOwEis3Htvce3KuSTPp
 8Xq70te0gr2DVUU=
 =YNzW
 -----END PGP SIGNATURE-----

Merge tag 'xfs-perag-conv-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs into xfs-5.14-merge2

xfs: initial agnumber -> perag conversions for shrink

If we want to use active references to the perag to be able to gate
shrink removing AGs and hence perags safely, we've got a fair bit of
work to do actually use perags in all the places we need to.

There's a lot of code that iterates ag numbers and then
looks up perags from that, often multiple times for the same perag
in the one operation. If we want to use reference counted perags for
access control, then we need to convert all these uses to perag
iterators, not agno iterators.

[Patches 1-4]

The first step of this is consolidating all the perag management -
init, free, get, put, etc into a common location. THis is spread all
over the place right now, so move it all into libxfs/xfs_ag.[ch].
This does expose kernel only bits of the perag to libxfs and hence
userspace, so the structures and code is rearranged to minimise the
number of ifdefs that need to be added to the userspace codebase.
The perag iterator in xfs_icache.c is promoted to a first class API
and expanded to the needs of the code as required.

[Patches 5-10]

These are the first basic perag iterator conversions and changes to
pass the perag down the stack from those iterators where
appropriate. A lot of this is obvious, simple changes, though in
some places we stop passing the perag down the stack because the
code enters into an as yet unconverted subsystem that still uses raw
AGs.

[Patches 11-16]

These replace the agno passed in the btree cursor for per-ag btree
operations with a perag that is passed to the cursor init function.
The cursor takes it's own reference to the perag, and the reference
is dropped when the cursor is deleted. Hence we get reference
coverage for the entire time the cursor is active, even if the code
that initialised the cursor drops it's reference before the cursor
or any of it's children (duplicates) have been deleted.

The first patch adds the perag infrastructure for the cursor, the
next four patches convert a btree cursor at a time, and the last
removes the agno from the cursor once it is unused.

[Patches 17-21]

These patches are a demonstration of the simplifications and
cleanups that come from plumbing the perag through interfaces that
select and then operate on a specific AG. In this case the inode
allocation algorithm does up to three walks across all AGs before it
either allocates an inode or fails. Two of these walks are purely
just to select the AG, and even then it doesn't guarantee inode
allocation success so there's a third walk if the selected AG
allocation fails.

These patches collapse the selection and allocation into a single
loop, simplifies the error handling because xfs_dir_ialloc() always
returns ENOSPC if no AG was selected for inode allocation or we fail
to allocate an inode in any AG, gets rid of xfs_dir_ialloc()
wrapper, converts inode allocation to run entirely from a single
perag instance, and then factors xfs_dialloc() into a much, much
simpler loop which is easy to understand.

Hence we end up with the same inode allocation logic, but it only
needs two complete iterations at worst, makes AG selection and
allocation atomic w.r.t. shrink and chops out out over 100 lines of
code from this hot code path.

[Patch 22]

Converts the unlink path to pass perags through it.

There's more conversion work to be done, but this patchset gets
through a large chunk of it in one hit. Most of the iterators are
converted, so once this is solidified we can move on to converting
these to active references for being able to free perags while the
fs is still active.

* tag 'xfs-perag-conv-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (23 commits)
  xfs: remove xfs_perag_t
  xfs: use perag through unlink processing
  xfs: clean up and simplify xfs_dialloc()
  xfs: inode allocation can use a single perag instance
  xfs: get rid of xfs_dir_ialloc()
  xfs: collapse AG selection for inode allocation
  xfs: simplify xfs_dialloc_select_ag() return values
  xfs: remove agno from btree cursor
  xfs: use perag for ialloc btree cursors
  xfs: convert allocbt cursors to use perags
  xfs: convert refcount btree cursor to use perags
  xfs: convert rmap btree cursor to using a perag
  xfs: add a perag to the btree cursor
  xfs: pass perags around in fsmap data dev functions
  xfs: push perags through the ag reservation callouts
  xfs: pass perags through to the busy extent code
  xfs: convert secondary superblock walk to use perags
  xfs: convert xfs_iwalk to use perag references
  xfs: convert raw ag walks to use for_each_perag
  xfs: make for_each_perag... a first class citizen
  ...
2021-06-08 09:13:13 -07:00
Christoph Hellwig 5a981e4ea8 xfs: mark xfs_bmap_set_attrforkoff static
xfs_bmap_set_attrforkoff is only used inside of xfs_bmap.c, so mark it
static.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-06-02 14:58:59 -07:00
Dave Chinner 9bbafc7191 xfs: move xfs_perag_get/put to xfs_ag.[ch]
They are AG functions, not superblock functions, so move them to the
appropriate location.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2021-06-02 10:48:24 +10:00
Dave Chinner 0fe0bbe00a xfs: bunmapi has unnecessary AG lock ordering issues
large directory block size operations are assert failing because
xfs_bunmapi() is not completely removing fragmented directory blocks
like so:

XFS: Assertion failed: done, file: fs/xfs/libxfs/xfs_dir2.c, line: 677
....
Call Trace:
 xfs_dir2_shrink_inode+0x1a8/0x210
 xfs_dir2_block_to_sf+0x2ae/0x410
 xfs_dir2_block_removename+0x21a/0x280
 xfs_dir_removename+0x195/0x1d0
 xfs_rename+0xb79/0xc50
 ? avc_has_perm+0x8d/0x1a0
 ? avc_has_perm_noaudit+0x9a/0x120
 xfs_vn_rename+0xdb/0x150
 vfs_rename+0x719/0xb50
 ? __lookup_hash+0x6a/0xa0
 do_renameat2+0x413/0x5e0
 __x64_sys_rename+0x45/0x50
 do_syscall_64+0x3a/0x70
 entry_SYSCALL_64_after_hwframe+0x44/0xae

We are aborting the bunmapi() pass because of this specific chunk of
code:

                /*
                 * Make sure we don't touch multiple AGF headers out of order
                 * in a single transaction, as that could cause AB-BA deadlocks.
                 */
                if (!wasdel && !isrt) {
                        agno = XFS_FSB_TO_AGNO(mp, del.br_startblock);
                        if (prev_agno != NULLAGNUMBER && prev_agno > agno)
                                break;
                        prev_agno = agno;
                }

This is designed to prevent deadlocks in AGF locking when freeing
multiple extents by ensuring that we only ever lock in increasing
AG number order. Unfortunately, this also violates the "bunmapi will
always succeed" semantic that some high level callers depend on,
such as xfs_dir2_shrink_inode(), xfs_da_shrink_inode() and
xfs_inactive_symlink_rmt().

This AG lock ordering was introduced back in 2017 to fix deadlocks
triggered by generic/299 as reported here:

https://lore.kernel.org/linux-xfs/800468eb-3ded-9166-20a4-047de8018582@gmail.com/

This codebase is old enough that it was before we were defering all
AG based extent freeing from within xfs_bunmapi(). THat is, we never
actually lock AGs in xfs_bunmapi() any more - every non-rt based
extent free is added to the defer ops list, as is all BMBT block
freeing. And RT extents are not RT based, so there's no lock
ordering issues associated with them.

Hence this AGF lock ordering code is both broken and dead. Let's
just remove it so that the large directory block code works reliably
again.

Tested against xfs/538 and generic/299 which is the original test
that exposed the deadlocks that this code fixed.

Fixes: 5b094d6dac ("xfs: fix multi-AG deadlock in xfs_bunmapi")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-05-27 08:11:24 -07:00
Dave Chinner 991c2c5980 xfs: btree format inode forks can have zero extents
xfs/538 is assert failing with this trace when testing with
directory block sizes of 64kB:

XFS: Assertion failed: !xfs_need_iread_extents(ifp), file: fs/xfs/libxfs/xfs_bmap.c, line: 608
....
Call Trace:
 xfs_bmap_btree_to_extents+0x2a9/0x470
 ? kmem_cache_alloc+0xe7/0x220
 __xfs_bunmapi+0x4ca/0xdf0
 xfs_bunmapi+0x1a/0x30
 xfs_dir2_shrink_inode+0x71/0x210
 xfs_dir2_block_to_sf+0x2ae/0x410
 xfs_dir2_block_removename+0x21a/0x280
 xfs_dir_removename+0x195/0x1d0
 xfs_remove+0x244/0x460
 xfs_vn_unlink+0x53/0xa0
 ? selinux_inode_unlink+0x13/0x20
 vfs_unlink+0x117/0x220
 do_unlinkat+0x1a2/0x2d0
 __x64_sys_unlink+0x42/0x60
 do_syscall_64+0x3a/0x70
 entry_SYSCALL_64_after_hwframe+0x44/0xae

This is a check to ensure that the extents have been read into
memory before we are doing a ifork btree manipulation. This assert
is bogus in the above case.

We have a fragmented directory block that has more extents in it
than can fit in extent format, so the inode data fork is in btree
format. xfs_dir2_shrink_inode() asks to remove all remaining 16
filesystem blocks from the inode so it can convert to short form,
and __xfs_bunmapi() removes all the extents. We now have a data fork
in btree format but have zero extents in the fork. This incorrectly
trips the xfs_need_iread_extents() assert because it assumes that an
empty extent btree means the extent tree has not been read into
memory yet. This is clearly not the case with xfs_bunmapi(), as it
has an explicit call to xfs_iread_extents() in it to pull the
extents into memory before it starts unmapping.

Also, the assert directly after this bogus one is:

	ASSERT(ifp->if_format == XFS_DINODE_FMT_BTREE);

Which covers the context in which it is legal to call
xfs_bmap_btree_to_extents just fine. Hence we should just remove the
bogus assert as it is clearly wrong and causes a regression.

The returns the test behaviour to the pre-existing assert failure in
xfs_dir2_shrink_inode() that indicates xfs_bunmapi() has failed to
remove all the extents in the range it was asked to unmap.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-05-27 08:11:24 -07:00
Christoph Hellwig b2197a36c0 xfs: remove XFS_IFEXTENTS
The in-memory XFS_IFEXTENTS is now only used to check if an inode with
extents still needs the extents to be read into memory before doing
operations that need the extent map.  Add a new xfs_need_iread_extents
helper that returns true for btree format forks that do not have any
entries in the in-memory extent btree, and use that instead of checking
the XFS_IFEXTENTS flag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-15 09:35:51 -07:00
Christoph Hellwig 0779f4a68d xfs: remove XFS_IFINLINE
Just check for an inline format fork instead of the using the equivalent
in-memory XFS_IFINLINE flag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-15 09:35:51 -07:00
Christoph Hellwig ac1e067211 xfs: remove XFS_IFBROOT
Just check for a btree format fork instead of the using the equivalent
in-memory XFS_IFBROOT flag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-15 09:35:51 -07:00
Christoph Hellwig 2ac131df03 xfs: rename and simplify xfs_bmap_one_block
xfs_bmap_one_block is only called for the attribute fork.  Move it to
xfs_attr.c, drop the unused whichfork argument and code only executed for
the data fork and rename the result to xfs_attr_is_leaf.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-15 09:35:50 -07:00
Christoph Hellwig 862a804aae xfs: move the XFS_IFEXTENTS check into xfs_iread_extents
Move the XFS_IFEXTENTS check from the callers into xfs_iread_extents to
simplify the code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-15 09:35:50 -07:00
Dave Chinner b2941046ea xfs: precalculate default inode attribute offset
Default attr fork offset is based on inode size, so is a fixed
geometry parameter of the inode. Move it to the xfs_ino_geometry
structure and stop calculating it on every call to
xfs_default_attroffset().

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Tested-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
2021-04-07 14:37:07 -07:00
Dave Chinner 683ec9ba88 xfs: default attr fork size does not handle device inodes
Device inodes have a non-default data fork size of 8 bytes
as checked/enforced by xfs_repair. xfs_default_attroffset() doesn't
handle this, so lets do a minor refactor so it does.

Fixes: e6a688c332 ("xfs: initialise attr fork on inode create")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Tested-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
2021-04-07 14:37:07 -07:00
Chandan Babu R b6785e279d xfs: Use struct xfs_bmdr_block instead of struct xfs_btree_block to calculate root node size
The incore data fork of an inode stores the bmap btree root node as 'struct
xfs_btree_block'. However, the ondisk version of the inode stores the bmap
btree root node as a 'struct xfs_bmdr_block'.

xfs_bmap_add_attrfork_btree() checks if the btree root node fits inside the
data fork of the inode. However, it incorrectly uses 'struct xfs_btree_block'
to compute the size of the bmap btree root node. Since size of 'struct
xfs_btree_block' is larger than that of 'struct xfs_bmdr_block',
xfs_bmap_add_attrfork_btree() could end up unnecessarily demoting the current
root node as the child of newly allocated root node.

This commit optimizes space usage by modifying xfs_bmap_add_attrfork_btree()
to use 'struct xfs_bmdr_block' to check if the bmap btree root node fits
inside the data fork of the inode.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-07 14:37:06 -07:00
Christoph Hellwig 7821ea302d xfs: move the di_forkoff field to struct xfs_inode
In preparation of removing the historic icinode struct, move the
forkoff field into the containing xfs_inode structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-07 14:37:05 -07:00
Christoph Hellwig 031474c28a xfs: move the di_extsize field to struct xfs_inode
In preparation of removing the historic icinode struct, move the extsize
field into the containing xfs_inode structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-07 14:37:04 -07:00
Christoph Hellwig 6e73a545f9 xfs: move the di_nblocks field to struct xfs_inode
In preparation of removing the historic icinode struct, move the nblocks
field into the containing xfs_inode structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-07 14:37:03 -07:00
Christoph Hellwig 13d2c10b05 xfs: move the di_size field to struct xfs_inode
In preparation of removing the historic icinode struct, move the on-disk
size field into the containing xfs_inode structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-07 14:37:03 -07:00
Chandan Babu R 6e8bd39d72 xfs: Initialize xfs_alloc_arg->total correctly when allocating minlen extents
xfs/538 can cause the following call trace to be printed when executing on a
multi-block directory configuration,

 WARNING: CPU: 1 PID: 2578 at fs/xfs/libxfs/xfs_bmap.c:717 xfs_bmap_extents_to_btree+0x520/0x5d0
 Call Trace:
  ? xfs_buf_rele+0x4f/0x450
  xfs_bmap_add_extent_hole_real+0x747/0x960
  xfs_bmapi_allocate+0x39a/0x440
  xfs_bmapi_write+0x507/0x9e0
  xfs_da_grow_inode_int+0x1cd/0x330
  ? up+0x12/0x60
  xfs_dir2_grow_inode+0x62/0x110
  ? xfs_trans_log_inode+0x234/0x2d0
  xfs_dir2_sf_to_block+0x103/0x940
  ? xfs_dir2_sf_check+0x8c/0x210
  ? xfs_da_compname+0x19/0x30
  ? xfs_dir2_sf_lookup+0xd0/0x3d0
  xfs_dir2_sf_addname+0x10d/0x910
  xfs_dir_createname+0x1ad/0x210
  xfs_create+0x404/0x620
  xfs_generic_create+0x24c/0x320
  path_openat+0xda6/0x1030
  do_filp_open+0x88/0x130
  ? kmem_cache_alloc+0x50/0x210
  ? __cond_resched+0x16/0x40
  ? kmem_cache_alloc+0x50/0x210
  do_sys_openat2+0x97/0x150
  __x64_sys_creat+0x49/0x70
  do_syscall_64+0x33/0x40
  entry_SYSCALL_64_after_hwframe+0x44/0xae

This occurs because xfs_bmap_exact_minlen_extent_alloc() initializes
xfs_alloc_arg->total to xfs_bmalloca->minlen. In the context of
xfs_bmap_exact_minlen_extent_alloc(), xfs_bmalloca->minlen has a value of 1
and hence the space allocator could choose an AG which has less than
xfs_bmalloca->total number of free blocks available. As the transaction
proceeds, one of the future space allocation requests could fail due to
non-availability of free blocks in the AG that was originally chosen.

This commit fixes the bug by assigning xfs_alloc_arg->total to the value of
xfs_bmalloca->total.

Fixes: 3015196746 ("xfs: Introduce error injection to allocate only minlen size extents for files")
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-04-07 14:36:34 -07:00
Dave Chinner e6a688c332 xfs: initialise attr fork on inode create
When we allocate a new inode, we often need to add an attribute to
the inode as part of the create. This can happen as a result of
needing to add default ACLs or security labels before the inode is
made visible to userspace.

This is highly inefficient right now. We do the create transaction
to allocate the inode, then we do an "add attr fork" transaction to
modify the just created empty inode to set the inode fork offset to
allow attributes to be stored, then we go and do the attribute
creation.

This means 3 transactions instead of 1 to allocate an inode, and
this greatly increases the load on the CIL commit code, resulting in
excessive contention on the CIL spin locks and performance
degradation:

 18.99%  [kernel]                [k] __pv_queued_spin_lock_slowpath
  3.57%  [kernel]                [k] do_raw_spin_lock
  2.51%  [kernel]                [k] __raw_callee_save___pv_queued_spin_unlock
  2.48%  [kernel]                [k] memcpy
  2.34%  [kernel]                [k] xfs_log_commit_cil

The typical profile resulting from running fsmark on a selinux enabled
filesytem is adds this overhead to the create path:

  - 15.30% xfs_init_security
     - 15.23% security_inode_init_security
	- 13.05% xfs_initxattrs
	   - 12.94% xfs_attr_set
	      - 6.75% xfs_bmap_add_attrfork
		 - 5.51% xfs_trans_commit
		    - 5.48% __xfs_trans_commit
		       - 5.35% xfs_log_commit_cil
			  - 3.86% _raw_spin_lock
			     - do_raw_spin_lock
				  __pv_queued_spin_lock_slowpath
		 - 0.70% xfs_trans_alloc
		      0.52% xfs_trans_reserve
	      - 5.41% xfs_attr_set_args
		 - 5.39% xfs_attr_set_shortform.constprop.0
		    - 4.46% xfs_trans_commit
		       - 4.46% __xfs_trans_commit
			  - 4.33% xfs_log_commit_cil
			     - 2.74% _raw_spin_lock
				- do_raw_spin_lock
				     __pv_queued_spin_lock_slowpath
			       0.60% xfs_inode_item_format
		      0.90% xfs_attr_try_sf_addname
	- 1.99% selinux_inode_init_security
	   - 1.02% security_sid_to_context_force
	      - 1.00% security_sid_to_context_core
		 - 0.92% sidtab_entry_to_string
		    - 0.90% sidtab_sid2str_get
			 0.59% sidtab_sid2str_put.part.0
	   - 0.82% selinux_determine_inode_label
	      - 0.77% security_transition_sid
		   0.70% security_compute_sid.part.0

And fsmark creation rate performance drops by ~25%. The key point to
note here is that half the additional overhead comes from adding the
attribute fork to the newly created inode. That's crazy, considering
we can do this same thing at inode create time with a couple of
lines of code and no extra overhead.

So, if we know we are going to add an attribute immediately after
creating the inode, let's just initialise the attribute fork inside
the create transaction and chop that whole chunk of code out of
the create fast path. This completely removes the performance
drop caused by enabling SELinux, and the profile looks like:

     - 8.99% xfs_init_security
         - 9.00% security_inode_init_security
            - 6.43% xfs_initxattrs
               - 6.37% xfs_attr_set
                  - 5.45% xfs_attr_set_args
                     - 5.42% xfs_attr_set_shortform.constprop.0
                        - 4.51% xfs_trans_commit
                           - 4.54% __xfs_trans_commit
                              - 4.59% xfs_log_commit_cil
                                 - 2.67% _raw_spin_lock
                                    - 3.28% do_raw_spin_lock
                                         3.08% __pv_queued_spin_lock_slowpath
                                   0.66% xfs_inode_item_format
                        - 0.90% xfs_attr_try_sf_addname
                  - 0.60% xfs_trans_alloc
            - 2.35% selinux_inode_init_security
               - 1.25% security_sid_to_context_force
                  - 1.21% security_sid_to_context_core
                     - 1.19% sidtab_entry_to_string
                        - 1.20% sidtab_sid2str_get
                           - 0.86% sidtab_sid2str_put.part.0
                              - 0.62% _raw_spin_lock_irqsave
                                 - 0.77% do_raw_spin_lock
                                      __pv_queued_spin_lock_slowpath
               - 0.84% selinux_determine_inode_label
                  - 0.83% security_transition_sid
                       0.86% security_compute_sid.part.0

Which indicates the XFS overhead of creating the selinux xattr has
been halved. This doesn't fix the CIL lock contention problem, just
means it's not a limiting factor for this workload. Lock contention
in the security subsystems is going to be an issue soon, though...

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[djwong: fix compilation error when CONFIG_SECURITY=n]
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Gao Xiang <hsiangkao@redhat.com>
2021-03-25 16:47:51 -07:00
Darrick J. Wong 3de4eb106f xfs: allow reservation of rtblocks with xfs_trans_alloc_inode
Make it so that we can reserve rt blocks with the xfs_trans_alloc_inode
wrapper function, then convert a few more callsites.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2021-02-03 09:18:49 -08:00
Darrick J. Wong 3a1af6c317 xfs: refactor common transaction/inode/quota allocation idiom
Create a new helper xfs_trans_alloc_inode that allocates a transaction,
locks and joins an inode to it, and then reserves the appropriate amount
of quota against that transction.  Then replace all the open-coded
idioms with a single call to this helper.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2021-02-03 09:18:49 -08:00
Darrick J. Wong 02b7ee4eb6 xfs: reserve data and rt quota at the same time
Modify xfs_trans_reserve_quota_nblks so that we can reserve data and
realtime blocks from the dquot at the same time.  This change has the
theoretical side effect that for allocations to realtime files we will
reserve from the dquot both the number of rtblocks being allocated and
the number of bmbt blocks that might be needed to add the mapping.
However, since the mount code disables quota if it finds a realtime
device, this should not result in any behavior changes.

Now that we've moved the inode creation callers away from using the
_nblks function, we can repurpose the (now unused) ninos argument for
realtime blocks, so make that change.  This also replaces the flags
argument with a boolean parameter to force the reservation since we
don't need to distinguish between data and rt quota reservations any
more, and the only flag being passed in was FORCE_RES.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2021-02-03 09:18:49 -08:00
Darrick J. Wong 8554650003 xfs: create convenience wrappers for incore quota block reservations
Create a couple of convenience wrappers for creating and deleting quota
block reservations against future changes.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2021-02-03 09:18:49 -08:00
Darrick J. Wong 4abe21ad67 xfs: clean up quota reservation callsites
Convert a few xfs_trans_*reserve* callsites that are open-coding other
convenience functions.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2021-02-03 09:18:49 -08:00
Chandan Babu R 560ab6c0d1 xfs: Fix 'set but not used' warning in xfs_bmap_compute_alignments()
With both CONFIG_XFS_DEBUG and CONFIG_XFS_WARN disabled, the only reference to
local variable "error" in xfs_bmap_compute_alignments() gets eliminated during
pre-processing stage of the compilation process. This causes the compiler to
generate a "set but not used" warning.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2021-02-01 09:44:24 -08:00
Chandan Babu R 3015196746 xfs: Introduce error injection to allocate only minlen size extents for files
This commit adds XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT error tag which
helps userspace test programs to get xfs_bmap_btalloc() to always
allocate minlen sized extents.

This is required for test programs which need a guarantee that minlen
extents allocated for a file do not get merged with their existing
neighbours in the inode's BMBT. "Inode fork extent overflow check" for
Directories, Xattrs and extension of realtime inodes need this since the
file offset at which the extents are being allocated cannot be
explicitly controlled from userspace.

One way to use this error tag is to,
1. Consume all of the free space by sequentially writing to a file.
2. Punch alternate blocks of the file. This causes CNTBT to contain
   sufficient number of one block sized extent records.
3. Inject XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT error tag.
After step 3, xfs_bmap_btalloc() will issue space allocation
requests for minlen sized extents only.

ENOSPC error code is returned to userspace when there aren't any "one
block sized" extents left in any of the AGs.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2021-01-22 16:54:49 -08:00
Chandan Babu R 07c72e5562 xfs: Process allocated extent in a separate function
This commit moves over the code in xfs_bmap_btalloc() which is
responsible for processing an allocated extent to a new function. Apart
from xfs_bmap_btalloc(), the new function will be invoked by another
function introduced in a future commit.

Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2021-01-22 16:54:49 -08:00
Chandan Babu R 0961fddfdd xfs: Compute bmap extent alignments in a separate function
This commit moves over the code which computes stripe alignment and
extent size hint alignment into a separate function. Apart from
xfs_bmap_btalloc(), the new function will be used by another function
introduced in a future commit.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2021-01-22 16:54:49 -08:00
Chandan Babu R aff4db57d5 xfs: Remove duplicate assert statement in xfs_bmap_btalloc()
The check for verifying if the allocated extent is from an AG whose
index is greater than or equal to that of tp->t_firstblock is already
done a couple of statements earlier in the same function. Hence this
commit removes the redundant assert statement.

Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2021-01-22 16:54:49 -08:00
Chandan Babu R 02092a2f03 xfs: Check for extent overflow when renaming dir entries
A rename operation is essentially a directory entry remove operation
from the perspective of parent directory (i.e. src_dp) of rename's
source. Hence the only place where we check for extent count overflow
for src_dp is in xfs_bmap_del_extent_real(). xfs_bmap_del_extent_real()
returns -ENOSPC when it detects a possible extent count overflow and in
response, the higher layers of directory handling code do the following:
1. Data/Free blocks: XFS lets these blocks linger until a future remove
   operation removes them.
2. Dabtree blocks: XFS swaps the blocks with the last block in the Leaf
   space and unmaps the last block.

For target_dp, there are two cases depending on whether the destination
directory entry exists or not.

When destination directory entry does not exist (i.e. target_ip ==
NULL), extent count overflow check is performed only when transaction
has a non-zero sized space reservation associated with it.  With a
zero-sized space reservation, XFS allows a rename operation to continue
only when the directory has sufficient free space in its data/leaf/free
space blocks to hold the new entry.

When destination directory entry exists (i.e. target_ip != NULL), all
we need to do is change the inode number associated with the already
existing entry. Hence there is no need to perform an extent count
overflow check.

Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2021-01-22 16:54:47 -08:00
Chandan Babu R 0dbc5cb1a9 xfs: Check for extent overflow when removing dir entries
Directory entry removal must always succeed; Hence XFS does the
following during low disk space scenario:
1. Data/Free blocks linger until a future remove operation.
2. Dabtree blocks would be swapped with the last block in the leaf space
   and then the new last block will be unmapped.

This facility is reused during low inode extent count scenario i.e. this
commit causes xfs_bmap_del_extent_real() to return -ENOSPC error code so
that the above mentioned behaviour is exercised causing no change to the
directory's extent count.

Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2021-01-22 16:54:47 -08:00
Chandan Babu R 727e1acd29 xfs: Check for extent overflow when trivally adding a new extent
When adding a new data extent (without modifying an inode's existing
extents) the extent count increases only by 1. This commit checks for
extent count overflow in such cases.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2021-01-22 16:54:47 -08:00
Dave Chinner e82226138b xfs: remove xfs_buf_t typedef
Prepare for kernel xfs_buf  alignment by getting rid of the
xfs_buf_t typedef from userspace.

[darrick: This patch is a port of a userspace patch removing the
xfs_buf_t typedef in preparation to make the userspace xfs_buf code
behave more like its kernel counterpart.]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-12-16 16:07:34 -08:00
Darrick J. Wong 33005fd0a5 xfs: refactor file range validation
Refactor all the open-coded validation of file block ranges into a
single helper, and teach the bmap scrubber to check the ranges.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-12-09 09:49:38 -08:00
Darrick J. Wong 18695ad425 xfs: refactor realtime volume extent validation
Refactor all the open-coded validation of realtime device extents into a
single helper.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-12-09 09:49:38 -08:00
Darrick J. Wong 67457eb0d2 xfs: refactor data device extent validation
Refactor all the open-coded validation of non-static data device extents
into a single helper.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-12-09 09:49:38 -08:00
Darrick J. Wong acf104c233 xfs: detect overflows in bmbt records
Detect file block mappings with a blockcount that's either so large that
integer overflows occur or are zero, because neither are valid in the
filesystem.  Worse yet, attempting directory modifications causes the
iext code to trip over the bmbt key handling and takes the filesystem
down.  We can fix most of this by preventing the bad metadata from
entering the incore structures in the first place.

Found by setting blockcount=0 in a directory data fork mapping and
watching the fireworks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-12-09 09:49:38 -08:00
Darrick J. Wong 8df0fa39bd xfs: don't free rt blocks when we're doing a REMAP bunmapi call
When callers pass XFS_BMAPI_REMAP into xfs_bunmapi, they want the extent
to be unmapped from the given file fork without the extent being freed.
We do this for non-rt files, but we forgot to do this for realtime
files.  So far this isn't a big deal since nobody makes a bunmapi call
to a rt file with the REMAP flag set, but don't leave a logic bomb.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-09-23 08:58:51 -07:00
Darrick J. Wong d0c20d38af xfs: fix xfs_bmap_validate_extent_raw when checking attr fork of rt files
The realtime flag only applies to the data fork, so don't use the
realtime block number checks on the attr fork of a realtime file.

Fixes: 30b0984d91 ("xfs: refactor bmap record validation")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2020-09-03 08:33:50 -07:00
Carlos Maiolino 32a2b11f46 xfs: Remove kmem_zone_zalloc() usage
Use kmem_cache_zalloc() directly.

With the exception of xlog_ticket_alloc() which will be dealt on the
next patch for readability.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-07-28 20:24:14 -07:00
Carlos Maiolino 3050bd0bfe xfs: Remove kmem_zone_alloc() usage
Use kmem_cache_alloc() directly.

All kmem_zone_alloc() users pass 0 as flags, which are translated into:
GFP_KERNEL | __GFP_NOWARN, and kmem_zone_alloc() loops forever until the
allocation succeeds.

We can use __GFP_NOFAIL to tell the allocator to loop forever rather
than doing it ourself, and because the allocation will never fail, we do
not need to use __GFP_NOWARN anymore. Hence, all callers can be
converted to use GFP_KERNEL | __GFP_NOFAIL

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: add a comment back in about nofail]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-07-28 20:24:14 -07:00
Darrick J. Wong a5949d3fae xfs: force writes to delalloc regions to unwritten
When writing to a delalloc region in the data fork, commit the new
allocations (of the da reservation) as unwritten so that the mappings
are only marked written once writeback completes successfully.  This
fixes the problem of stale data exposure if the system goes down during
targeted writeback of a specific region of a file, as tested by
generic/042.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-05-27 08:49:28 -07:00
Christoph Hellwig f7e67b20ec xfs: move the fork format fields into struct xfs_ifork
Both the data and attr fork have a format that is stored in the legacy
idinode.  Move it into the xfs_ifork structure instead, where it uses
up padding.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-19 09:40:58 -07:00
Christoph Hellwig daf83964a3 xfs: move the per-fork nextents fields into struct xfs_ifork
There are there are three extents counters per inode, one for each of
the forks.  Two are in the legacy icdinode and one is directly in
struct xfs_inode.  Switch to a single counter in the xfs_ifork structure
where it uses up padding at the end of the structure.  This simplifies
various bits of code that just wants the number of extents counter and
can now directly dereference it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-19 09:40:58 -07:00
Christoph Hellwig 4b516ff4e7 xfs: remove the NULL fork handling in xfs_bmapi_read
Now that we fully verify the inode forks before they are added to the
inode cache, the crash reported in

  https://bugzilla.kernel.org/show_bug.cgi?id=204031

can't happen anymore, as we'll never let an inode that has inconsistent
nextents counts vs the presence of an in-core attr fork leak into the
inactivate code path.  So remove the work around to try to handle the
case, and just return an error and warn if the fork is not present.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-19 09:40:58 -07:00
Christoph Hellwig 1a1c57b282 xfs: remove the special COW fork handling in xfs_bmapi_read
We don't call xfs_bmapi_read for the COW fork anymore, so remove the
special casing.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-19 09:40:58 -07:00
Christoph Hellwig e9e2eae89d xfs: only check the superblock version for dinode size calculation
The size of the dinode structure is only dependent on the file system
version, so instead of checking the individual inode version just use
the newly added xfs_sb_version_has_large_dinode helper, and simplify
various calling conventions.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-19 08:48:47 -07:00
Dave Chinner 8ef547976a xfs: rename btree cursor private btree member flags
BPRV is not longer appropriate because bc_private is going away.
Script:

$ sed -i 's/BTCUR_BPRV/BTCUR_BMBT/g' fs/xfs/*[ch] fs/xfs/*/*[ch]

With manual cleanup to the definitions in fs/xfs/libxfs/xfs_btree.h

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: change "BC_BT" to "BTCUR_BMBT", fix subject line typo]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-03-13 10:37:14 -07:00
Dave Chinner 92219c292a xfs: convert btree cursor inode-private member names
bc_private.b -> bc_ino conversion via script:

$ sed -i 's/bc_private\.b/bc_ino/g' fs/xfs/*[ch] fs/xfs/*/*[ch]

And then revert the change to the bc_ino #define in
fs/xfs/libxfs/xfs_btree.h manually.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: tweak the subject line slightly]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-03-13 10:37:14 -07:00
Brian Foster b73df17e4c xfs: open code insert range extent split helper
The insert range operation currently splits the extent at the target
offset in a separate transaction and lock cycle from the one that
shifts extents. In preparation for reworking insert range into an
atomic operation, lift the code into the caller so it can be easily
condensed to a single rolling transaction and lock cycle and
eliminate the helper. No functional changes.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02 20:55:51 -08:00
Darrick J. Wong f48e2df8a8 xfs: make xfs_*read_agf return EAGAIN to ALLOC_FLAG_TRYLOCK callers
Refactor xfs_read_agf and xfs_alloc_read_agf to return EAGAIN if the
caller passed TRYLOCK and we weren't able to get the lock; and change
the callers to recognize this.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-01-26 14:32:26 -08:00
Darrick J. Wong ee647f85cb xfs: remove the xfs_btree_get_buf[ls] functions
Remove the xfs_btree_get_bufs and xfs_btree_get_bufl functions, since
they're pretty trivial oneliners.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-01-26 14:32:26 -08:00
Darrick J. Wong af952aeb4a libxfs: resync with the userspace libxfs
Prepare to resync the userspace libxfs with the kernel libxfs.  There
were a few things I missed -- a couple of static inline directory
functions that have to be exported for xfs_repair; a couple of directory
naming functions that make porting much easier if they're /not/ static
inline; and a u16 usage that should have been uint16_t.

None of these things are bugs in their own right; this just makes
porting xfsprogs easier.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2019-12-19 07:53:47 -08:00
Brian Foster d0c2204135 xfs: stabilize insert range start boundary to avoid COW writeback race
generic/522 (fsx) occasionally fails with a file corruption due to
an insert range operation. The primary characteristic of the
corruption is a misplaced insert range operation that differs from
the requested target offset. The reason for this behavior is a race
between the extent shift sequence of an insert range and a COW
writeback completion that causes a front merge with the first extent
in the shift.

The shift preparation function flushes and unmaps from the target
offset of the operation to the end of the file to ensure no
modifications can be made and page cache is invalidated before file
data is shifted. An insert range operation then splits the extent at
the target offset, if necessary, and begins to shift the start
offset of each extent starting from the end of the file to the start
offset. The shift sequence operates at extent level and so depends
on the preparation sequence to guarantee no changes can be made to
the target range during the shift. If the block immediately prior to
the target offset was dirty and shared, however, it can undergo
writeback and move from the COW fork to the data fork at any point
during the shift. If the block is contiguous with the block at the
start offset of the insert range, it can front merge and alter the
start offset of the extent. Once the shift sequence reaches the
target offset, it shifts based on the latest start offset and
silently changes the target offset of the operation and corrupts the
file.

To address this problem, update the shift preparation code to
stabilize the start boundary along with the full range of the
insert. Also update the existing corruption check to fail if any
extent is shifted with a start offset behind the target offset of
the insert range. This prevents insert from racing with COW
writeback completion and fails loudly in the event of an unexpected
extent shift.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-12-11 13:18:42 -08:00
Omar Sandoval 69ffe5960d xfs: don't check for AG deadlock for realtime files in bunmapi
Commit 5b094d6dac ("xfs: fix multi-AG deadlock in xfs_bunmapi") added
a check in __xfs_bunmapi() to stop early if we would touch multiple AGs
in the wrong order. However, this check isn't applicable for realtime
files. In most cases, it just makes us do unnecessary commits. However,
without the fix from the previous commit ("xfs: fix realtime file data
space leak"), if the last and second-to-last extents also happen to have
different "AG numbers", then the break actually causes __xfs_bunmapi()
to return without making any progress, which sends
xfs_itruncate_extents_flags() into an infinite loop.

Fixes: 5b094d6dac ("xfs: fix multi-AG deadlock in xfs_bunmapi")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-12-02 17:58:51 -08:00
Omar Sandoval 0c4da70c83 xfs: fix realtime file data space leak
Realtime files in XFS allocate extents in rextsize units. However, the
written/unwritten state of those extents is still tracked in blocksize
units. Therefore, a realtime file can be split up into written and
unwritten extents that are not necessarily aligned to the realtime
extent size. __xfs_bunmapi() has some logic to handle these various
corner cases. Consider how it handles the following case:

1. The last extent is unwritten.
2. The last extent is smaller than the realtime extent size.
3. startblock of the last extent is not aligned to the realtime extent
   size, but startblock + blockcount is.

In this case, __xfs_bunmapi() calls xfs_bmap_add_extent_unwritten_real()
to set the second-to-last extent to unwritten. This should merge the
last and second-to-last extents, so __xfs_bunmapi() moves on to the
second-to-last extent.

However, if the size of the last and second-to-last extents combined is
greater than MAXEXTLEN, xfs_bmap_add_extent_unwritten_real() does not
merge the two extents. When that happens, __xfs_bunmapi() skips past the
last extent without unmapping it, thus leaking the space.

Fix it by only unwriting the minimum amount needed to align the last
extent to the realtime extent size, which is guaranteed to merge with
the last extent.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-12-02 17:58:50 -08:00
Darrick J. Wong a71895c5da xfs: convert open coded corruption check to use XFS_IS_CORRUPT
Convert the last of the open coded corruption check and report idioms to
use the XFS_IS_CORRUPT macro.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-11-13 11:08:01 -08:00
Darrick J. Wong f9e0370648 xfs: kill the XFS_WANT_CORRUPT_* macros
The XFS_WANT_CORRUPT_* macros conceal subtle side effects such as the
creation of local variables and redirections of the code flow.  This is
pretty ugly, so replace them with explicit XFS_IS_CORRUPT tests that
remove both of those ugly points.  The change was performed with the
following coccinelle script:

@@
expression mp, test;
identifier label;
@@

- XFS_WANT_CORRUPTED_GOTO(mp, test, label);
+ if (XFS_IS_CORRUPT(mp, !test)) { error = -EFSCORRUPTED; goto label; }

@@
expression mp, test;
@@

- XFS_WANT_CORRUPTED_RETURN(mp, test);
+ if (XFS_IS_CORRUPT(mp, !test)) return -EFSCORRUPTED;

@@
expression mp, lval, rval;
@@

- XFS_IS_CORRUPT(mp, !(lval == rval))
+ XFS_IS_CORRUPT(mp, lval != rval)

@@
expression mp, e1, e2;
@@

- XFS_IS_CORRUPT(mp, !(e1 && e2))
+ XFS_IS_CORRUPT(mp, !e1 || !e2)

@@
expression e1, e2;
@@

- !(e1 == e2)
+ e1 != e2

@@
expression e1, e2, e3, e4, e5, e6;
@@

- !(e1 == e2 && e3 == e4) || e5 != e6
+ e1 != e2 || e3 != e4 || e5 != e6

@@
expression e1, e2, e3, e4, e5, e6;
@@

- !(e1 == e2 || (e3 <= e4 && e5 <= e6))
+ e1 != e2 && (e3 > e4 || e5 > e6)

@@
expression mp, e1, e2;
@@

- XFS_IS_CORRUPT(mp, !(e1 <= e2))
+ XFS_IS_CORRUPT(mp, e1 > e2)

@@
expression mp, e1, e2;
@@

- XFS_IS_CORRUPT(mp, !(e1 < e2))
+ XFS_IS_CORRUPT(mp, e1 >= e2)

@@
expression mp, e1;
@@

- XFS_IS_CORRUPT(mp, !!e1)
+ XFS_IS_CORRUPT(mp, e1)

@@
expression mp, e1, e2;
@@

- XFS_IS_CORRUPT(mp, !(e1 || e2))
+ XFS_IS_CORRUPT(mp, !e1 && !e2)

@@
expression mp, e1, e2, e3, e4;
@@

- XFS_IS_CORRUPT(mp, !(e1 == e2) && !(e3 == e4))
+ XFS_IS_CORRUPT(mp, e1 != e2 && e3 != e4)

@@
expression mp, e1, e2, e3, e4;
@@

- XFS_IS_CORRUPT(mp, !(e1 <= e2) || !(e3 >= e4))
+ XFS_IS_CORRUPT(mp, e1 > e2 || e3 < e4)

@@
expression mp, e1, e2, e3, e4;
@@

- XFS_IS_CORRUPT(mp, !(e1 == e2) && !(e3 <= e4))
+ XFS_IS_CORRUPT(mp, e1 != e2 && e3 > e4)

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-11-12 17:19:02 -08:00
Darrick J. Wong 2fe4f92834 xfs: refactor "does this fork map blocks" predicate
Replace the open-coded checks for whether or not an inode fork maps
blocks with a macro that will implant the code for us.  This helps us
declutter the bmap code a bit.

Note that I had to use a macro instead of a static inline function
because of C header dependency problems between xfs_inode.h and
xfs_inode_fork.h.

Conversion was performed with the following Coccinelle script:

@@
expression ip, w;
@@

- XFS_IFORK_FORMAT(ip, w) == XFS_DINODE_FMT_EXTENTS || XFS_IFORK_FORMAT(ip, w) == XFS_DINODE_FMT_BTREE
+ xfs_ifork_has_extents(ip, w)

@@
expression ip, w;
@@

- XFS_IFORK_FORMAT(ip, w) != XFS_DINODE_FMT_EXTENTS && XFS_IFORK_FORMAT(ip, w) != XFS_DINODE_FMT_BTREE
+ !xfs_ifork_has_extents(ip, w)

@@
expression ip, w;
@@

- XFS_IFORK_FORMAT(ip, w) == XFS_DINODE_FMT_BTREE || XFS_IFORK_FORMAT(ip, w) == XFS_DINODE_FMT_EXTENTS
+ xfs_ifork_has_extents(ip, w)

@@
expression ip, w;
@@

- XFS_IFORK_FORMAT(ip, w) != XFS_DINODE_FMT_BTREE && XFS_IFORK_FORMAT(ip, w) != XFS_DINODE_FMT_EXTENTS
+ !xfs_ifork_has_extents(ip, w)

@@
expression ip, w;
@@

- (xfs_ifork_has_extents(ip, w))
+ xfs_ifork_has_extents(ip, w)

@@
expression ip, w;
@@

- (!xfs_ifork_has_extents(ip, w))
+ !xfs_ifork_has_extents(ip, w)

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-11-10 10:22:51 -08:00
Darrick J. Wong f5be08446e xfs: null out bma->prev if no previous extent
Coverity complains that we don't check the return value of
xfs_iext_peek_prev_extent like we do nearly all of the time.  If there
is no previous extent then just null out bma->prev like we do elsewhere
in the bmap code.

Coverity-id: 1424057
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-11-07 13:00:54 -08:00
Darrick J. Wong a5155b870d xfs: always log corruption errors
Make sure we log something to dmesg whenever we return -EFSCORRUPTED up
the call stack.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-11-04 13:55:54 -08:00
Christoph Hellwig c34d570d15 xfs: cleanup use of the XFS_ALLOC_ flags
Always set XFS_ALLOC_USERDATA for data fork allocations, and check it
in xfs_alloc_is_userdata instead of the current obsfucated check.
Also remove the xfs_alloc_is_userdata and xfs_alloc_allow_busy_reuse
helpers to make the code a little easier to understand.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-03 10:22:31 -08:00
Christoph Hellwig fd638f1de1 xfs: move extent zeroing to xfs_bmapi_allocate
Move the extent zeroing case there for the XFS_BMAPI_ZERO flag outside
the low-level allocator and into xfs_bmapi_allocate, where is still
is in transaction context, but outside the very lowlevel code where
it doesn't belong.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-03 10:22:30 -08:00
Christoph Hellwig be6cacbeea xfs: refactor xfs_bmapi_allocate
Avoid duplicate userdata and data fork checks by restructuring the code
so we only have a helper for userdata allocations that combines these
checks in a straight foward way.  That also helps to obsoletes the
comments explaining what the code does as it is now clearly obvious.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-03 10:22:30 -08:00
Darrick J. Wong e992ae8afd xfs: refactor xfs_iread_extents to use xfs_btree_visit_blocks
xfs_iread_extents open-codes everything in xfs_btree_visit_blocks, so
refactor the btree helper to be able to iterate only the records on
level 0, then port iread_extents to use it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-10-29 09:50:12 -07:00
Darrick J. Wong c2414ad6e6 xfs: replace -EIO with -EFSCORRUPTED for corrupt metadata
There are a few places where we return -EIO instead of -EFSCORRUPTED
when we find corrupt metadata.  Fix those places.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-10-29 09:50:11 -07:00
Brian Foster da781e64b2 xfs: don't set bmapi total block req where minleft is
xfs_bmapi_write() takes a total block requirement parameter that is
passed down to the block allocation code and is used to specify the
total block requirement of the associated transaction. This is used
to try and select an AG that can not only satisfy the requested
extent allocation, but can also accommodate subsequent allocations
that might be required to complete the transaction. For example,
additional bmbt block allocations may be required on insertion of
the resulting extent to an inode data fork.

While it's important for callers to calculate and reserve such extra
blocks in the transaction, it is not necessary to pass the total
value to xfs_bmapi_write() in all cases. The latter automatically
sets minleft to ensure that sufficient free blocks remain after the
allocation attempt to expand the format of the associated inode
(i.e., such as extent to btree conversion, btree splits, etc).
Therefore, any callers that pass a total block requirement of the
bmap mapping length plus worst case bmbt expansion essentially
specify the additional reservation requirement twice. These callers
can pass a total of zero to rely on the bmapi minleft policy.

Beyond being superfluous, the primary motivation for this change is
that the total reservation logic in the bmbt code is dubious in
scenarios where minlen < maxlen and a maxlen extent cannot be
allocated (which is more common for data extent allocations where
contiguity is not required). The total value is based on maxlen in
the xfs_bmapi_write() caller. If the bmbt code falls back to an
allocation between minlen and maxlen, that allocation will not
succeed until total is reset to minlen, which essentially throws
away any additional reservation included in total by the caller. In
addition, the total value is not reset until after alignment is
dropped, which means that such callers drop alignment far too
aggressively than necessary.

Update all callers of xfs_bmapi_write() that pass a total block
value of the mapping length plus bmbt reservation to instead pass
zero and rely on xfs_bmapi_minleft() to enforce the bmbt reservation
requirement. This trades off slightly less conservative AG selection
for the ability to preserve alignment in more scenarios.
xfs_bmapi_write() callers that incorporate unrelated or additional
reservations in total beyond what is already included in minleft
must continue to use the former.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-23 17:01:08 -07:00
Dave Chinner 1c743574de xfs: cap longest free extent to maximum allocatable
Cap longest extent to the largest we can allocate based on limits
calculated at mount time. Dynamic state (such as finobt blocks)
can result in the longest free extent exceeding the size we can
allocate, and that results in failure to align full AG allocations
when the AG is empty.

Result:

xfs_io-4413  [003]   426.412459: xfs_alloc_vextent_loopfailed: dev 8:96 agno 0 agbno 32 minlen 243968 maxlen 244000 mod 0 prod 1 minleft 1 total 262148 alignment 32 minalignslop 0 len 0 type NEAR_BNO otype START_BNO wasdel 0 wasfromfl 0 resv 0 datatype 0x5 firstblock 0xffffffffffffffff

minlen and maxlen are now separated by the alignment size, and
allocation fails because args.total > free space in the AG.

[bfoster: Added xfs_bmap_btalloc() changes.]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-23 17:01:07 -07:00
Christoph Hellwig 4e087a3b31 xfs: use a struct iomap in xfs_writepage_ctx
In preparation for moving the XFS writeback code to fs/iomap.c, switch
it to use struct iomap instead of the XFS-specific struct xfs_bmbt_irec.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 08:51:59 -07:00
Brian Foster aeea4b75f0 xfs: move local to extent inode logging into bmap helper
The callers of xfs_bmap_local_to_extents_empty() log the inode
external to the function, yet this function is where the on-disk
format value is updated. Push the inode logging down into the
function itself to help prevent future mistakes.

Note that internal bmap callers track the inode logging flags
independently and thus may log the inode core twice due to this
change. This is harmless, so leave this code around for consistency
with the other attr fork conversion functions.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-09 08:54:30 -07:00