Commit Graph

62 Commits

Author SHA1 Message Date
Bill O'Donnell 359e294324 xfs: implement masked btree key comparisons for _has_records scans
JIRA: https://issues.redhat.com/browse/RHEL-25419

commit 4a200a0978288f919aba3f015f374f6ed279e658
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Tue Apr 11 19:00:11 2023 -0700

    xfs: implement masked btree key comparisons for _has_records scans

    For keyspace fullness scans, we want to be able to mask off the parts of
    the key that we don't care about.  For most btree types we /do/ want the
    full keyspace, but for checking that a given space usage also has a full
    complement of rmapbt records (even if different/multiple owners) we need
    this masking so that we only track sparseness of rm_startblock, not the
    whole keyspace (which is extremely sparse).

    Augment the ->diff_two_keys and ->keys_contiguous helpers to take a
    third union xfs_btree_key argument, and wire up xfs_rmap_has_records to
    pass this through.  This third "mask" argument should contain a nonzero
    value in each structure field that should be used in the key comparisons
    done during the scan.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-06-05 16:56:21 -05:00
Bill O'Donnell de02d9af50 xfs: replace xfs_btree_has_record with a general keyspace scanner
JIRA: https://issues.redhat.com/browse/RHEL-25419

commit 6abc7aef85b1f42cb39a3149f4ab64ca255e41e6
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Tue Apr 11 19:00:10 2023 -0700

    xfs: replace xfs_btree_has_record with a general keyspace scanner

    The current implementation of xfs_btree_has_record returns true if it
    finds /any/ record within the given range.  Unfortunately, that's not
    sufficient for scrub.  We want to be able to tell if a range of keyspace
    for a btree is devoid of records, is totally mapped to records, or is
    somewhere in between.  By forcing this to be a boolean, we conflated
    sparseness and fullness, which caused scrub to return incorrect results.
    Fix the API so that we can tell the caller which of those three is the
    current state.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-06-05 16:56:20 -05:00
Bill O'Donnell 07297e64e0 xfs: create traced helper to get extra perag references
JIRA: https://issues.redhat.com/browse/RHEL-25419

commit 9b2e5a234c89f097ec36f922763dfa1465dc06f8
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Tue Apr 11 18:59:55 2023 -0700

    xfs: create traced helper to get extra perag references

    There are a few places in the XFS codebase where a caller has either an
    active or a passive reference to a perag structure and wants to give
    a passive reference to some other piece of code.  Btree cursor creation
    and inode walks are good examples of this.  Replace the open-coded logic
    with a helper to do this.

    The new function adds a few safeguards -- it checks that there's at
    least one reference to the perag structure passed in, and it records the
    refcount bump in the ftrace information.  This makes it much easier to
    debug perag refcounting problems.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-06-05 16:56:14 -05:00
Bill O'Donnell 3ffd9578d6 xfs: fix rm_offset flag handling in rmap keys
JIRA: https://issues.redhat.com/browse/RHEL-2002

commit 08c987deca56687c0930f308f5148ef1af38dc14
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Tue Apr 11 19:00:07 2023 -0700

    xfs: fix rm_offset flag handling in rmap keys

    Keys for extent interval records in the reverse mapping btree are
    supposed to be computed as follows:

    (physical block, owner, fork, is_btree, offset)

    This provides users the ability to look up a reverse mapping from a file
    block mapping record -- start with the physical block; then if there are
    multiple records for the same block, move on to the owner; then the
    inode fork type; and so on to the file offset.

    Unfortunately, the code that creates rmap lookup keys from rmap records
    forgot to mask off the record attribute flags, leading to ondisk keys
    that look like this:

    (physical block, owner, fork, is_btree, unwritten state, offset)

    Fortunately, this has all worked ok for the past six years because the
    key comparison functions incorrectly ignore the fork/bmbt/unwritten
    information that's encoded in the on-disk offset.  This means that
    lookup comparisons are only done with:

    (physical block, owner, offset)

    Queries can (theoretically) return incorrect results because of this
    omission.  On consistent filesystems this isn't an issue because xattr
    and bmbt blocks cannot be shared and hence the comparisons succeed
    purely on the contents of the rm_startblock field.  For the one case
    where we support sharing (written data fork blocks) all flag bits are
    zero, so the omission in the comparison has no ill effects.

    Unfortunately, this bug prevents scrub from detecting incorrect fork and
    bmbt flag bits in the rmap btree, so we really do need to fix the
    compare code.  Old filesystems with the unwritten bit erroneously set in
    the rmap key struct will work fine on new kernels since we still ignore
    the unwritten bit.  New filesystems on older kernels will work fine
    since the old kernels never paid attention to the unwritten bit.

    A previous version of this patch forgot to keep the (un)written state
    flag masked during the comparison and caused a major regression in
    5.9.x since unwritten extent conversion can update an rmap record
    without requiring key updates.

    Note that blocks cannot go directly from data fork to attr fork without
    being deallocated and reallocated, nor can they be added to or removed
    from a bmbt without a free/alloc cycle, so this should not cause any
    regressions.

    Found by fuzzing keys[1].attrfork = ones on xfs/371.

    Fixes: 4b8ed67794 ("xfs: add rmap btree operations")
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-11-10 07:22:25 -06:00
Bill O'Donnell 6ee6b421b0 xfs: perags need atomic operational state
JIRA: https://issues.redhat.com/browse/RHEL-2002

commit 7ac2ff8bb3713c7cb43564c04384af2ee7cc1f8d
Author: Dave Chinner <dchinner@redhat.com>
Date:   Mon Feb 13 09:14:52 2023 +1100

    xfs: perags need atomic operational state

    We currently don't have any flags or operational state in the
    xfs_perag except for the pagf_init and pagi_init flags. And the
    agflreset flag. Oh, there's also the pagf_metadata and pagi_inodeok
    flags, too.

    For controlling per-ag operations, we are going to need some atomic
    state flags. Hence add an opstate field similar to what we already
    have in the mount and log, and convert all these state flags across
    to atomic bit operations.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-11-10 07:22:21 -06:00
Bill O'Donnell f8121a9b84 xfs: make is_log_ag() a first class helper
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 36029dee382a20cf515494376ce9f0d5949944eb
Author: Dave Chinner <dchinner@redhat.com>
Date:   Thu Jul 7 19:13:21 2022 +1000

    xfs: make is_log_ag() a first class helper

    We check if an ag contains the log in many places, so make this
    a first class XFS helper by lifting it to fs/xfs/libxfs/xfs_ag.h and
    renaming it xfs_ag_contains_log(). The convert all the places that
    check if the AG contains the log to use this helper.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:41 -05:00
Bill O'Donnell 575255a927 xfs: pass perag to xfs_alloc_put_freelist
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 8c392eb27f7a98e403658d066e387c7b1c604f2b
Author: Dave Chinner <dchinner@redhat.com>
Date:   Thu Jul 7 19:08:08 2022 +1000

    xfs: pass perag to xfs_alloc_put_freelist

    It's available in all callers, so pass it in so that the perag can
    be passed further down the stack.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:40 -05:00
Bill O'Donnell 27a6f3dcff xfs: pass perag to xfs_alloc_get_freelist
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 49f0d84ec1db5bd46dcf3796fc792fce74ff25a3
Author: Dave Chinner <dchinner@redhat.com>
Date:   Thu Jul 7 19:08:01 2022 +1000

    xfs: pass perag to xfs_alloc_get_freelist

    It's available in all callers, so pass it in so that the perag can
    be passed further down the stack.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:40 -05:00
Bill O'Donnell d1a077edc7 xfs: pass perag to xfs_alloc_read_agf()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 08d3e84feeb8cb8e20d54f659446b98fe17913aa
Author: Dave Chinner <dchinner@redhat.com>
Date:   Thu Jul 7 19:07:40 2022 +1000

    xfs: pass perag to xfs_alloc_read_agf()

    xfs_alloc_read_agf() initialises the perag if it hasn't been done
    yet, so it makes sense to pass it the perag rather than pull a
    reference from the buffer. This allows callers to be per-ag centric
    rather than passing mount/agno pairs everywhere.

    Whilst modifying the xfs_reflink_find_shared() function definition,
    declare it static and remove the extern declaration as it is an
    internal function only these days.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:39 -05:00
Carlos Maiolino d912d565bb xfs: remove kmem_zone typedef
Bugzilla: https://bugzilla.redhat.com/2125724

Remove these typedefs by referencing kmem_cache directly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
(cherry picked from commit e7720afad068a6729d9cd3aaa08212f2f5a7ceff)
2022-10-21 12:50:46 +02:00
Carlos Maiolino e3c2c647f7 xfs: use separate btree cursor cache for each btree type
Bugzilla: https://bugzilla.redhat.com/2125724

Now that we have the infrastructure to track the max possible height of
each btree type, we can create a separate slab cache for cursors of each
type of btree.  For smaller indices like the free space btrees, this
means that we can pack more cursors into a slab page, improving slab
utilization.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
(cherry picked from commit 9fa47bdcd33b117599e9ee3f2e315cb47939ac2d)
2022-10-21 12:50:46 +02:00
Carlos Maiolino f164f431d6 xfs: compute absolute maximum nlevels for each btree type
Bugzilla: https://bugzilla.redhat.com/2125724

Add code for all five btree types so that we can compute the absolute
maximum possible btree height for each btree type.  This is a setup for
the next patch, which makes every btree type have its own cursor cache.

The functions are exported so that we can have xfs_db report the
absolute maximum btree heights for each btree type, rather than making
everyone run their own ad-hoc computations.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
(cherry picked from commit 0ed5f7356daee74244b02e100b3cc043e886e686)
2022-10-21 12:50:46 +02:00
Carlos Maiolino 6adc25f61c xfs: compute the maximum height of the rmap btree when reflink enabled
Bugzilla: https://bugzilla.redhat.com/2125724

Instead of assuming that the hardcoded XFS_BTREE_MAXLEVELS value is big
enough to handle the maximally tall rmap btree when all blocks are in
use and maximally shared, let's compute the maximum height assuming the
rmapbt consumes as many blocks as possible.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
(cherry picked from commit 9ec691205e7d4a11190519df6561a168ae6af3a4)
2022-10-21 12:50:46 +02:00
Carlos Maiolino 02e10206a0 xfs: dynamically allocate cursors based on maxlevels
Bugzilla: https://bugzilla.redhat.com/2125724

To support future btree code, we need to be able to size btree cursors
dynamically for very large btrees.  Switch the maxlevels computation to
use the precomputed values in the superblock, and create cursors that
can handle a certain height.  For now, we retain the btree cursor cache
that can handle up to 9-level btrees, though a subsequent patch
introduces separate caches for each btree type, where each cache's
objects will be exactly tall enough to handle the specific btree type.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
(cherry picked from commit c940a0c54a2e9333478f1d87ed40006a04fcec7e)
2022-10-21 12:50:46 +02:00
Carlos Maiolino 73ee1d48f2 xfs: refactor btree cursor allocation function
Bugzilla: https://bugzilla.redhat.com/2125724

Refactor btree allocation to a common helper.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
(cherry picked from commit 56370ea6e5fe3e3d6e1ca2da58f95fb0d5e1779f)
2022-10-21 12:50:46 +02:00
Carlos Maiolino 3db6a105c6 xfs: remove xfs_btree_cur.bc_blocklog
Bugzilla: https://bugzilla.redhat.com/2125724

This field isn't used by anyone, so get rid of it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
(cherry picked from commit cc411740472d958b718b9c6a7791ba00d88f7cef)
2022-10-21 12:50:46 +02:00
Brian Foster 74f147b83b xfs: introduce xfs_buf_daddr()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143
Upstream Status: linux.git

commit 04fcad80cd068731a779fb442f78234732683755
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Aug 18 18:46:57 2021 -0700

    xfs: introduce xfs_buf_daddr()

    Introduce a helper function xfs_buf_daddr() to extract the disk
    address of the buffer from the struct xfs_buf. This will replace
    direct accesses to bp->b_bn and bp->b_maps[0].bm_bn, as well as
    the XFS_BUF_ADDR() macro.

    This patch introduces the helper function and replaces all uses of
    XFS_BUF_ADDR() as this is just a simple sed replacement.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Brian Foster <bfoster@redhat.com>
2022-08-25 08:11:36 -04:00
Brian Foster d54a790d1d xfs: replace xfs_sb_version checks with feature flag checks
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143
Upstream Status: linux.git

commit 38c26bfd90e1999650d5ef40f90d721f05916643
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Aug 18 18:46:37 2021 -0700

    xfs: replace xfs_sb_version checks with feature flag checks

    Convert the xfs_sb_version_hasfoo() to checks against
    mp->m_features. Checks of the superblock itself during disk
    operations (e.g. in the read/write verifiers and the to/from disk
    formatters) are not converted - they operate purely on the
    superblock state. Everything else should use the mount features.

    Large parts of this conversion were done with sed with commands like
    this:

    for f in `git grep -l xfs_sb_version_has fs/xfs/*.c`; do
            sed -i -e 's/xfs_sb_version_has\(.*\)(&\(.*\)->m_sb)/xfs_has_\1(\2)/' $f
    done

    With manual cleanups for things like "xfs_has_extflgbit" and other
    little inconsistencies in naming.

    The result is ia lot less typing to check features and an XFS binary
    size reduced by a bit over 3kB:

    $ size -t fs/xfs/built-in.a
            text       data     bss     dec     hex filenam
    before  1130866  311352     484 1442702  16038e (TOTALS)
    after   1127727  311352     484 1439563  15f74b (TOTALS)

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Brian Foster <bfoster@redhat.com>
2022-08-25 08:11:34 -04:00
Brian Foster 4d9abe49b5 xfs: make the start pointer passed to btree alloc_block functions const
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143
Upstream Status: linux.git

commit deb06b9ab6dfa167c280a68d5acb2f12e007073f
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Thu Aug 12 09:53:27 2021 -0700

    xfs: make the start pointer passed to btree alloc_block functions const

    The @start pointer passed to each per-AG btree type's ->alloc_block
    function isn't supposed to be modified, since it's a hint about the
    location of the btree block being split that is to be fed to the
    allocator, so mark the parameter const.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Brian Foster <bfoster@redhat.com>
2022-08-25 08:11:30 -04:00
Brian Foster 1439943f4a xfs: make the pointer passed to btree set_root functions const
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143
Upstream Status: linux.git

commit b5a6e5fe0e6840bc90e51cf522d6c5a880cde567
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Thu Aug 12 09:49:03 2021 -0700

    xfs: make the pointer passed to btree set_root functions const

    The pointer passed to each per-AG btree type's ->set_root function isn't
    supposed to be modified (that function sets an external pointer to the
    root block) so mark them const.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Brian Foster <bfoster@redhat.com>
2022-08-25 08:11:30 -04:00
Brian Foster 0fa328b41a xfs: make the keys and records passed to btree inorder functions const
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143
Upstream Status: linux.git

commit 8e38dc88a67b3c7475cbe8a132d03542717c1e27
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Tue Aug 10 17:02:17 2021 -0700

    xfs: make the keys and records passed to btree inorder functions const

    The inorder functions are simple predicates, which means that they don't
    modify the parameters.  Mark them all const.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Brian Foster <bfoster@redhat.com>
2022-08-25 08:11:30 -04:00
Brian Foster 046569f054 xfs: mark the record passed into btree init_key functions as const
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143
Upstream Status: linux.git

commit 23825cd148764ce133ee92375da395140d6ccb15
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Tue Aug 10 17:02:16 2021 -0700

    xfs: mark the record passed into btree init_key functions as const

    These functions initialize a key from a record, but they aren't supposed
    to modify the record.  Mark it const.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Brian Foster <bfoster@redhat.com>
2022-08-25 08:11:29 -04:00
Brian Foster 542740cc19 xfs: make the key parameters to all btree key comparison functions const
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143
Upstream Status: linux.git

commit d29d5577774d7d032da1343dba80be7423e307f9
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Tue Aug 10 17:02:15 2021 -0700

    xfs: make the key parameters to all btree key comparison functions const

    The btree key comparison functions are not allowed to change the keys
    that are passed in, so mark them const.  We'll need this for the next
    patch, which adds const to the btree range query functions.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Brian Foster <bfoster@redhat.com>
2022-08-25 08:11:29 -04:00
Dave Chinner 50f02fe333 xfs: remove agno from btree cursor
Now that everything passes a perag, the agno is not needed anymore.
Convert all the users to use pag->pag_agno instead and remove the
agno from the cursor. This was largely done as an automated search
and replace.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2021-06-02 10:48:24 +10:00
Dave Chinner fa9c3c1973 xfs: convert rmap btree cursor to using a perag
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2021-06-02 10:48:24 +10:00
Dave Chinner be9fb17d88 xfs: add a perag to the btree cursor
Which will eventually completely replace the agno in it.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2021-06-02 10:48:24 +10:00
Dave Chinner 30933120ad xfs: push perags through the ag reservation callouts
We currently pass an agno from the AG reservation functions to the
individual feature accounting functions, which in future may have to
do perag lookups to access per-AG state. Instead, pre-emptively
plumb the perag through from the highest AG reservation layer to the
feature callouts so they won't have to look it up again.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2021-06-02 10:48:24 +10:00
Dave Chinner 45d0662117 xfs: pass perags through to the busy extent code
All of the callers of the busy extent API either have perag
references available to use so we can pass a perag to the busy
extent functions rather than having them have to do unnecessary
lookups.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2021-06-02 10:48:24 +10:00
Dave Chinner 9bbafc7191 xfs: move xfs_perag_get/put to xfs_ag.[ch]
They are AG functions, not superblock functions, so move them to the
appropriate location.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2021-06-02 10:48:24 +10:00
Darrick J. Wong 1aec7c3d05 xfs: remove obsolete AGF counter debugging
In commit f8f2835a9c we changed the behavior of XFS to use EFIs to
remove blocks from an overfilled AGFL because there were complaints
about transaction overruns that stemmed from trying to free multiple
blocks in a single transaction.

Unfortunately, that commit missed a subtlety in the debug-mode
transaction accounting when a realtime volume is attached.  If a
realtime file undergoes a data fork mapping change such that realtime
extents are allocated (or freed) in the same transaction that a data
device block is also allocated (or freed), we can trip a debugging
assertion.  This can happen (for example) if a realtime extent is
allocated and it is necessary to reshape the bmbt to hold the new
mapping.

When we go to allocate a bmbt block from an AG, the first thing the data
device block allocator does is ensure that the freelist is the proper
length.  If the freelist is too long, it will trim the freelist to the
proper length.

In debug mode, trimming the freelist calls xfs_trans_agflist_delta() to
record the decrement in the AG free list count.  Prior to f8f28 we would
put the free block back in the free space btrees in the same
transaction, which calls xfs_trans_agblocks_delta() to record the
increment in the AG free block count.  Since AGFL blocks are included in
the global free block count (fdblocks), there is no corresponding
fdblocks update, so the AGFL free satisfies the following condition in
xfs_trans_apply_sb_deltas:

	/*
	 * Check that superblock mods match the mods made to AGF counters.
	 */
	ASSERT((tp->t_fdblocks_delta + tp->t_res_fdblocks_delta) ==
	       (tp->t_ag_freeblks_delta + tp->t_ag_flist_delta +
		tp->t_ag_btree_delta));

The comparison here used to be: (X + 0) == ((X+1) + -1 + 0), where X is
the number blocks that were allocated.

After commit f8f28 we defer the block freeing to the next chained
transaction, which means that the calls to xfs_trans_agflist_delta and
xfs_trans_agblocks_delta occur in separate transactions.  The (first)
transaction that shortens the free list trips on the comparison, which
has now become:

(X + 0) == ((X) + -1 + 0)

because we haven't freed the AGFL block yet; we've only logged an
intention to free it.  When the second transaction (the deferred free)
commits, it will evaluate the expression as:

(0 + 0) == (1 + 0 + 0)

and trip over that in turn.

At this point, the astute reader may note that the two commits tagged by
this patch have been in the kernel for a long time but haven't generated
any bug reports.  How is it that the author became aware of this bug?

This originally surfaced as an intermittent failure when I was testing
realtime rmap, but a different bug report by Zorro Lang reveals the same
assertion occuring on !lazysbcount filesystems.

The common factor to both reports (and why this problem wasn't
previously reported) becomes apparent if we consider when
xfs_trans_apply_sb_deltas is called by __xfs_trans_commit():

	if (tp->t_flags & XFS_TRANS_SB_DIRTY)
		xfs_trans_apply_sb_deltas(tp);

With a modern lazysbcount filesystem, transactions update only the
percpu counters, so they don't need to set XFS_TRANS_SB_DIRTY, hence
xfs_trans_apply_sb_deltas is rarely called.

However, updates to the count of free realtime extents are not part of
lazysbcount, so XFS_TRANS_SB_DIRTY will be set on transactions adding or
removing data fork mappings to realtime files; similarly,
XFS_TRANS_SB_DIRTY is always set on !lazysbcount filesystems.

Dave mentioned in response to an earlier version of this patch:

"IIUC, what you are saying is that this debug code is simply not
exercised in normal testing and hasn't been for the past decade?  And it
still won't be exercised on anything other than realtime device testing?

"...it was debugging code from 1994 that was largely turned into dead
code when lazysbcounters were introduced in 2007. Hence I'm not sure it
holds any value anymore."

This debugging code isn't especially helpful - you can modify the
flcount on one AG and the freeblks of another AG, and it won't trigger.
Add the fact that nobody noticed for a decade, and let's just get rid of
it (and start testing realtime :P).

This bug was found by running generic/051 on either a V4 filesystem
lacking lazysbcount; or a V5 filesystem with a realtime volume.

Cc: bfoster@redhat.com, zlang@redhat.com
Fixes: f8f2835a9c ("xfs: defer agfl block frees when dfops is available")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2021-04-29 07:44:18 -07:00
Darrick J. Wong eb8409071a xfs: revert "xfs: fix rmap key and record comparison functions"
This reverts commit 6ff646b2ce.

Your maintainer committed a major braino in the rmap code by adding the
attr fork, bmbt, and unwritten extent usage bits into rmap record key
comparisons.  While XFS uses the usage bits *in the rmap records* for
cross-referencing metadata in xfs_scrub and xfs_repair, it only needs
the owner and offset information to distinguish between reverse mappings
of the same physical extent into the data fork of a file at multiple
offsets.  The other bits are not important for key comparisons for index
lookups, and never have been.

Eric Sandeen reports that this causes regressions in generic/299, so
undo this patch before it does more damage.

Reported-by: Eric Sandeen <sandeen@sandeen.net>
Fixes: 6ff646b2ce ("xfs: fix rmap key and record comparison functions")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2020-11-19 15:17:50 -08:00
Darrick J. Wong 6ff646b2ce xfs: fix rmap key and record comparison functions
Keys for extent interval records in the reverse mapping btree are
supposed to be computed as follows:

(physical block, owner, fork, is_btree, is_unwritten, offset)

This provides users the ability to look up a reverse mapping from a bmbt
record -- start with the physical block; then if there are multiple
records for the same block, move on to the owner; then the inode fork
type; and so on to the file offset.

However, the key comparison functions incorrectly remove the
fork/btree/unwritten information that's encoded in the on-disk offset.
This means that lookup comparisons are only done with:

(physical block, owner, offset)

This means that queries can return incorrect results.  On consistent
filesystems this hasn't been an issue because blocks are never shared
between forks or with bmbt blocks; and are never unwritten.  However,
this bug means that online repair cannot always detect corruption in the
key information in internal rmapbt nodes.

Found by fuzzing keys[1].attrfork = ones on xfs/371.

Fixes: 4b8ed67794 ("xfs: add rmap btree operations")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-11-10 16:47:56 -08:00
Carlos Maiolino 32a2b11f46 xfs: Remove kmem_zone_zalloc() usage
Use kmem_cache_zalloc() directly.

With the exception of xlog_ticket_alloc() which will be dealt on the
next patch for readability.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-07-28 20:24:14 -07:00
Gao Xiang 92a005448f xfs: get rid of unnecessary xfs_perag_{get,put} pairs
In the course of some operations, we look up the perag from
the mount multiple times to get or change perag information.
These are often very short pieces of code, so while the
lookup cost is generally low, the cost of the lookup is far
higher than the cost of the operation we are doing on the
perag.

Since we changed buffers to hold references to the perag
they are cached in, many modification contexts already hold
active references to the perag that are held across these
operations. This is especially true for any operation that
is serialised by an allocation group header buffer.

In these cases, we can just use the buffer's reference to
the perag to avoid needing to do lookups to access the
perag. This means that many operations don't need to do
perag lookups at all to access the perag because they've
already looked up objects that own persistent references
and hence can use that reference instead.

Cc: Dave Chinner <dchinner@redhat.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-07-14 08:47:33 -07:00
Darrick J. Wong 59d677127c xfs: add support for rmap btree staging cursors
Add support for btree staging cursors for the rmap btrees.  This is
needed both for online repair and also to convert xfs_repair to use
btree bulk loading.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-03-18 08:12:23 -07:00
Dave Chinner 576af73228 xfs: convert btree cursor ag-private member name
bc_private.a -> bc_ag conversion via script:

`sed -i 's/bc_private\.a/bc_ag/g' fs/xfs/*[ch] fs/xfs/*/*[ch]`

And then revert the change to the bc_ag #define in
fs/xfs/libxfs/xfs_btree.h manually.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-03-13 10:37:14 -07:00
Christoph Hellwig 9798f615ad xfs: remove XFS_BUF_TO_AGF
Just dereference bp->b_addr directly and make the code a little
simpler and more clear.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-11 09:11:39 -07:00
Eric Sandeen 250d4b4c40 xfs: remove unused header files
There are many, many xfs header files which are included but
unneeded (or included twice) in the xfs code, so remove them.

nb: xfs_linux.h includes about 9 headers for everyone, so those
explicit includes get removed by this.  I'm not sure what the
preference is, but if we wanted explicit includes everywhere,
a followup patch could remove those xfs_*.h includes from
xfs_linux.h and move them into the files that need them.
Or it could be left as-is.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:30:43 -07:00
Christoph Hellwig dbd329f1e4 xfs: add struct xfs_mount pointer to struct xfs_buf
We need to derive the mount pointer from a buffer in a lot of place.
Add a direct pointer to short cut the pointer chasing.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:29 -07:00
Darrick J. Wong 5cd213b0fe xfs: don't reserve per-AG space for an internal log
It turns out that the log can consume nearly all the space in an AG, and
when this happens this it's possible that there will be less free space
in the AG than the reservation would try to hide.  On a debug kernel
this can trigger an ASSERT in xfs/250:

XFS: Assertion failed: xfs_perag_resv(pag, XFS_AG_RESV_METADATA)->ar_reserved + xfs_perag_resv(pag, XFS_AG_RESV_RMAPBT)->ar_reserved <= pag->pagf_freeblks + pag->pagf_flcount, file: fs/xfs/libxfs/xfs_ag_resv.c, line: 319

The log is permanently allocated, so we know we're never going to have
to expand the btrees to hold any records associated with the log space.
We therefore can treat the space as if it doesn't exist.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2019-05-20 11:25:39 -07:00
Brian Foster 39708c20ab xfs: miscellaneous verifier magic value fixups
Most buffer verifiers have hardcoded magic value checks
conditionalized on the version of the filesystem. The magic value
field of the verifier structure facilitates abstraction of some of
this code. Populate the ->magic field of various verifiers to take
advantage of this abstraction. No functional changes.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-02-11 16:07:01 -08:00
Darrick J. Wong ebcbef3a61 xfs: pass transaction lock while setting up agresv on cyclic metadata
Pass a tranaction pointer through to all helpers that calculate the
per-AG block reservation.  Online repair will use this to reinitialize
per-ag reservations while it still holds all the AG headers locked to
the repair transaction.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-07-29 22:37:08 -07:00
Dave Chinner 0b61f8a407 xfs: convert to SPDX license tags
Remove the verbose license text from XFS files and replace them
with SPDX tags. This does not change the license of any of the code,
merely refers to the common, up-to-date license files in LICENSES/

This change was mostly scripted. fs/xfs/Makefile and
fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
and modified by the following command:

for f in `git grep -l "GNU General" fs/xfs/` ; do
	echo $f
	cat $f | awk -f hdr.awk > $f.new
	mv -f $f.new $f
done

And the hdr.awk script that did the modification (including
detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
is as follows:

$ cat hdr.awk
BEGIN {
	hdr = 1.0
	tag = "GPL-2.0"
	str = ""
}

/^ \* This program is free software/ {
	hdr = 2.0;
	next
}

/any later version./ {
	tag = "GPL-2.0+"
	next
}

/^ \*\// {
	if (hdr > 0.0) {
		print "// SPDX-License-Identifier: " tag
		print str
		print $0
		str=""
		hdr = 0.0
		next
	}
	print $0
	next
}

/^ \* / {
	if (hdr > 1.0)
		next
	if (hdr > 0.0) {
		if (str != "")
			str = str "\n"
		str = str $0
		next
	}
	print $0
	next
}

/^ \*/ {
	if (hdr > 0.0)
		next
	print $0
	next
}

// {
	if (hdr > 0.0) {
		if (str != "")
			str = str "\n"
		str = str $0
		next
	}
	print $0
}

END { }
$

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-06 14:17:53 -07:00
Darrick J. Wong 1f5c071d19 xfs: don't ASSERT on short form btree root pointer of zero
Don't ASSERT if the short form btree root pointer is zero.  Now that we
use xfs_verify_agbno to check all short form btree pointers, we'll let
that log the error and pass it to the upper layers.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-06-04 14:45:30 -07:00
Eric Sandeen a1f69417c6 xfs: non-scrub - remove unused function parameters
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-04-09 10:23:42 -07:00
Brian Foster 0ab32086d0 xfs: account only rmapbt-used blocks against rmapbt perag res
The rmapbt perag metadata reservation reserves blocks for the
reverse mapping btree (rmapbt). Since the rmapbt uses blocks from
the agfl and perag accounting is updated as blocks are allocated
from the allocation btrees, the reservation actually accounts blocks
as they are allocated to (or freed from) the agfl rather than the
rmapbt itself.

While this works for blocks that are eventually used for the rmapbt,
not all agfl blocks are destined for the rmapbt. Blocks that are
allocated to the agfl (and thus "reserved" for the rmapbt) but then
used by another structure leads to a growing inconsistency over time
between the runtime tracking of rmapbt usage vs. actual rmapbt
usage. Since the runtime tracking thinks all agfl blocks are rmapbt
blocks, it essentially believes that less future reservation is
required to satisfy the rmapbt than what is actually necessary.

The inconsistency is rectified across mount cycles because the perag
reservation is initialized based on the actual rmapbt usage at mount
time. The problem, however, is that the excessive drain of the
reservation at runtime opens a window to allocate blocks for other
purposes that might be required for the rmapbt on a subsequent
mount. This problem can be demonstrated by a simple test that runs
an allocation workload to consume agfl blocks over time and then
observe the difference in the agfl reservation requirement across an
unmount/mount cycle:

  mount ...: xfs_ag_resv_init: ... resv 3193 ask 3194 len 3194
  ...
  ...      : xfs_ag_resv_alloc_extent: ... resv 2957 ask 3194 len 1
  umount...: xfs_ag_resv_free: ... resv 2956 ask 3194 len 0
  mount ...: xfs_ag_resv_init: ... resv 3052 ask 3194 len 3194

As the above tracepoints show, the reservation requirement reduces
from 3194 blocks to 2956 blocks as the workload runs.  Without any
other changes in the filesystem, the same reservation requirement
jumps from 2956 to 3052 blocks over a umount/mount cycle.

To address this divergence, update the RMAPBT reservation to account
blocks used for the rmapbt only rather than all blocks filled into
the agfl. This patch makes several high-level changes toward that
end:

1.) Reintroduce an AGFL reservation type to serve as an accounting
    no-op for blocks allocated to (or freed from) the AGFL.
2.) Invoke RMAPBT usage accounting from the actual rmapbt block
    allocation path rather than the AGFL allocation path.

The first change is required because agfl blocks are considered free
blocks throughout their lifetime. The perag reservation subsystem is
invoked unconditionally by the allocation subsystem, so we need a
way to tell the perag subsystem (via the allocation subsystem) to
not make any accounting changes for blocks filled into the AGFL.

The second change causes the in-core RMAPBT reservation usage
accounting to remain consistent with the on-disk state at all times
and eliminates the risk of leaving the rmapbt reservation
underfilled.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-03-11 20:27:57 -07:00
Carlos Maiolino e157ebdcb3 Cleanup old XFS_BTREE_* traces
Remove unused legacy btree traces from IRIX era.

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-03-11 20:27:55 -07:00
Darrick J. Wong b55725974c xfs: create a new buf_ops pointer to verify structure metadata
Expose all metadata structure buffer verifier functions via buf_ops.
These will be used by the online scrub mechanism to look for problems
with buffers that are already sitting around in memory.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08 10:54:47 -08:00
Darrick J. Wong bc1a09b8e3 xfs: refactor verifier callers to print address of failing check
Refactor the callers of verifiers to print the instruction address of a
failing check.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08 10:54:46 -08:00
Darrick J. Wong a6a781a58b xfs: have buffer verifier functions report failing address
Modify each function that checks the contents of a metadata buffer to
return the instruction address of the failing test so that we can report
more precise failure errors to the log.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08 10:54:46 -08:00