Commit Graph

135 Commits

Author SHA1 Message Date
Bill O'Donnell fb54bc42ca xfs: force all buffers to be written during btree bulk load
JIRA: https://issues.redhat.com/browse/RHEL-65728

commit 13ae04d8d45227c2ba51e188daf9fc13d08a1b12
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Fri Dec 15 10:03:27 2023 -0800

    xfs: force all buffers to be written during btree bulk load

    While stress-testing online repair of btrees, I noticed periodic
    assertion failures from the buffer cache about buffers with incorrect
    DELWRI_Q state.  Looking further, I observed this race between the AIL
    trying to write out a btree block and repair zapping a btree block after
    the fact:

    AIL:    Repair0:

    pin buffer X
    delwri_queue:
    set DELWRI_Q
    add to delwri list

            stale buf X:
            clear DELWRI_Q
            does not clear b_list
            free space X
            commit

    delwri_submit   # oops

    Worse yet, I discovered that running the same repair over and over in a
    tight loop can result in a second race that cause data integrity
    problems with the repair:

    AIL:    Repair0:        Repair1:

    pin buffer X
    delwri_queue:
    set DELWRI_Q
    add to delwri list

            stale buf X:
            clear DELWRI_Q
            does not clear b_list
            free space X
            commit

                            find free space X
                            get buffer
                            rewrite buffer
                            delwri_queue:
                            set DELWRI_Q
                            already on a list, do not add
                            commit

                            BAD: committed tree root before all blocks written

    delwri_submit   # too late now

    I traced this to my own misunderstanding of how the delwri lists work,
    particularly with regards to the AIL's buffer list.  If a buffer is
    logged and committed, the buffer can end up on that AIL buffer list.  If
    btree repairs are run twice in rapid succession, it's possible that the
    first repair will invalidate the buffer and free it before the next time
    the AIL wakes up.  Marking the buffer stale clears DELWRI_Q from the
    buffer state without removing the buffer from its delwri list.  The
    buffer doesn't know which list it's on, so it cannot know which lock to
    take to protect the list for a removal.

    If the second repair allocates the same block, it will then recycle the
    buffer to start writing the new btree block.  Meanwhile, if the AIL
    wakes up and walks the buffer list, it will ignore the buffer because it
    can't lock it, and go back to sleep.

    When the second repair calls delwri_queue to put the buffer on the
    list of buffers to write before committing the new btree, it will set
    DELWRI_Q again, but since the buffer hasn't been removed from the AIL's
    buffer list, it won't add it to the bulkload buffer's list.

    This is incorrect, because the bulkload caller relies on delwri_submit
    to ensure that all the buffers have been sent to disk /before/
    committing the new btree root pointer.  This ordering requirement is
    required for data consistency.

    Worse, the AIL won't clear DELWRI_Q from the buffer when it does finally
    drop it, so the next thread to walk through the btree will trip over a
    debug assertion on that flag.

    To fix this, create a new function that waits for the buffer to be
    removed from any other delwri lists before adding the buffer to the
    caller's delwri list.  By waiting for the buffer to clear both the
    delwri list and any potential delwri wait list, we can be sure that
    repair will initiate writes of all buffers and report all write errors
    back to userspace instead of committing the new structure.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-11-20 11:26:00 -06:00
Bill O'Donnell d0e7df7358 xfs: allow scanning ranges of the buffer cache for live buffers
JIRA: https://issues.redhat.com/browse/RHEL-57114

commit 9ed851f695c71d325758f8c18e265da9316afd26
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Thu Aug 10 07:48:03 2023 -0700

    xfs: allow scanning ranges of the buffer cache for live buffers

    After an online repair, we need to invalidate buffers representing the
    blocks from the old metadata that we're replacing.  It's possible that
    parts of a tree that were previously cached in memory are no longer
    accessible due to media failure or other corruption on interior nodes,
    so repair figures out the old blocks from the reverse mapping data and
    scans the buffer cache directly.

    In other words, online fsck needs to find all the live (i.e. non-stale)
    buffers for a range of fsblocks so that it can invalidate them.

    Unfortunately, the current buffer cache code triggers asserts if the
    rhashtable lookup finds a non-stale buffer of a different length than
    the key we searched for.  For regular operation this is desirable, but
    for this repair procedure, we don't care since we're going to forcibly
    stale the buffer anyway.  Add an internal lookup flag to avoid the
    assert.  Skip buffers that are already XBF_STALE.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-10-15 10:46:22 -05:00
Ming Lei 0c712a8085 xfs: port block device access to files
JIRA: https://issues.redhat.com/browse/RHEL-29564

commit 1b9e2d90141c5e25faefbb7891f0ed8606aa02cf
Author: Christian Brauner <brauner@kernel.org>
Date:   Tue Jan 23 14:26:24 2024 +0100

    xfs: port block device access to files

    Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-7-adbd023e19cc@kernel.org
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Christian Brauner <brauner@kernel.org>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2024-04-17 10:18:37 +08:00
Ming Lei 079721ca6f xfs: Convert to bdev_open_by_path()
JIRA: https://issues.redhat.com/browse/RHEL-29262

commit e340dd63f6a11402424b3d77e51149bce8fcba7d
Author: Jan Kara <jack@suse.cz>
Date:   Wed Sep 27 11:34:34 2023 +0200

    xfs: Convert to bdev_open_by_path()

    Convert xfs to use bdev_open_by_path() and pass the handle around.

    CC: "Darrick J. Wong" <djwong@kernel.org>
    CC: linux-xfs@vger.kernel.org
    Acked-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christian Brauner <brauner@kernel.org>
    Signed-off-by: Jan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/20230927093442.25915-28-jack@suse.cz
    Acked-by: "Darrick J. Wong" <djwong@kernel.org>
    Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
    Signed-off-by: Christian Brauner <brauner@kernel.org>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2024-03-19 10:07:46 +08:00
Bill O'Donnell 426777a415 xfs: xfs_buf cache destroy isn't RCU safe
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 231f91ab504ecebcb88e942341b3d7dd91de45f1
Author: Dave Chinner <dchinner@redhat.com>
Date:   Mon Jul 18 18:20:37 2022 -0700

    xfs: xfs_buf cache destroy isn't RCU safe

    Darrick and Sachin Sant reported that xfs/435 and xfs/436 would
    report an non-empty xfs_buf slab on module remove. This isn't easily
    to reproduce, but is clearly a side effect of converting the buffer
    caceh to RUC freeing and lockless lookups. Sachin bisected and
    Darrick hit it when testing the patchset directly.

    Turns out that the xfs_buf slab is not destroyed when all the other
    XFS slab caches are destroyed. Instead, it's got it's own little
    wrapper function that gets called separately, and so it doesn't have
    an rcu_barrier() call in it that is needed to drain all the rcu
    callbacks before the slab is destroyed.

    Fix it by removing the xfs_buf_init/terminate wrappers that just
    allocate and destroy the xfs_buf slab, and move them to the same
    place that all the other slab caches are set up and destroyed.

    Reported-and-tested-by: Sachin Sant <sachinp@linux.ibm.com>
    Fixes: 298f34224506 ("xfs: lockless buffer lookup")
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:48 -05:00
Bill O'Donnell 2481f637d7 xfs: lockless buffer lookup
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 298f342245066309189d8637ca7339d56840c3e1
Author: Dave Chinner <dchinner@redhat.com>
Date:   Thu Jul 14 12:05:07 2022 +1000

    xfs: lockless buffer lookup

    Now that we have a standalone fast path for buffer lookup, we can
    easily convert it to use rcu lookups. When we continually hammer the
    buffer cache with trylock lookups, we end up with a huge amount of
    lock contention on the per-ag buffer hash locks:

    -   92.71%     0.05%  [kernel]                  [k] xfs_inodegc_worker
       - 92.67% xfs_inodegc_worker
          - 92.13% xfs_inode_unlink
             - 91.52% xfs_inactive_ifree
                - 85.63% xfs_read_agi
                   - 85.61% xfs_trans_read_buf_map
                      - 85.59% xfs_buf_read_map
                         - xfs_buf_get_map
                            - 85.55% xfs_buf_find
                               - 72.87% _raw_spin_lock
                                  - do_raw_spin_lock
                                       71.86% __pv_queued_spin_lock_slowpath
                               - 8.74% xfs_buf_rele
                                  - 7.88% _raw_spin_lock
                                     - 7.88% do_raw_spin_lock
                                          7.63% __pv_queued_spin_lock_slowpath
                               - 1.70% xfs_buf_trylock
                                  - 1.68% down_trylock
                                     - 1.41% _raw_spin_lock_irqsave
                                        - 1.39% do_raw_spin_lock
                                             __pv_queued_spin_lock_slowpath
                               - 0.76% _raw_spin_unlock
                                    0.75% do_raw_spin_unlock

    This is basically hammering the pag->pag_buf_lock from lots of CPUs
    doing trylocks at the same time. Most of the buffer trylock
    operations ultimately fail after we've done the lookup, so we're
    really hammering the buf hash lock whilst making no progress.

    We can also see significant spinlock traffic on the same lock just
    under normal operation when lots of tasks are accessing metadata
    from the same AG, so let's avoid all this by converting the lookup
    fast path to leverages the rhashtable's ability to do rcu protected
    lookups.

    We avoid races with the buffer release path by using
    atomic_inc_not_zero() on the buffer hold count. Any buffer that is
    in the LRU will have a non-zero count, thereby allowing the lockless
    fast path to be taken in most cache hit situations. If the buffer
    hold count is zero, then it is likely going through the release path
    so in that case we fall back to the existing lookup miss slow path.

    The slow path will then do an atomic lookup and insert under the
    buffer hash lock and hence serialise correctly against buffer
    release freeing the buffer.

    The use of rcu protected lookups means that buffer handles now need
    to be freed by RCU callbacks (same as inodes). We still free the
    buffer pages before the RCU callback - we won't be trying to access
    them at all on a buffer that has zero references - but we need the
    buffer handle itself to be present for the entire rcu protected read
    side to detect a zero hold count correctly.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:47 -05:00
Bill O'Donnell ac5ff39607 xfs: rework xfs_buf_incore() API
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 85c73bf726e41be276bcad3325d9a8aef10be289
Author: Dave Chinner <dchinner@redhat.com>
Date:   Thu Jul 7 22:05:18 2022 +1000

    xfs: rework xfs_buf_incore() API

    Make it consistent with the other buffer APIs to return a error and
    the buffer is placed in a parameter.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:41 -05:00
Bill O'Donnell ca371f43d0 xfs: convert buffer flags to unsigned.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit b9b3fe152e4966cf8562630de67aa49e2f9c9222
Author: Dave Chinner <david@fromorbit.com>
Date:   Thu Apr 21 08:44:59 2022 +1000

    xfs: convert buffer flags to unsigned.

    5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned
    fields to be unsigned. This manifests as a compiler error such as:

    /kisskb/src/fs/xfs/./xfs_trace.h:432:2: note: in expansion of macro 'TP_printk'
      TP_printk("dev %d:%d daddr 0x%llx bbcount 0x%x hold %d pincount %d "
      ^
    /kisskb/src/fs/xfs/./xfs_trace.h:440:5: note: in expansion of macro '__print_flags'
         __print_flags(__entry->flags, "|", XFS_BUF_FLAGS),
         ^
    /kisskb/src/fs/xfs/xfs_buf.h:67:4: note: in expansion of macro 'XBF_UNMAPPED'
      { XBF_UNMAPPED,  "UNMAPPED" }
        ^
    /kisskb/src/fs/xfs/./xfs_trace.h:440:40: note: in expansion of macro 'XFS_BUF_FLAGS'
         __print_flags(__entry->flags, "|", XFS_BUF_FLAGS),
                                            ^
    /kisskb/src/fs/xfs/./xfs_trace.h: In function 'trace_raw_output_xfs_buf_flags_class':
    /kisskb/src/fs/xfs/xfs_buf.h:46:23: error: initializer element is not constant
     #define XBF_UNMAPPED  (1 << 31)/* do not map the buffer */

    as __print_flags assigns XFS_BUF_FLAGS to a structure that uses an
    unsigned long for the flag. Since this results in the value of
    XBF_UNMAPPED causing a signed integer overflow, the result is
    technically undefined behavior, which gcc-5 does not accept as an
    integer constant.

    This is based on a patch from Arnd Bergman <arnd@arndb.de>.

    Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
    Signed-off-by: Dave Chinner <david@fromorbit.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:00 -05:00
Jeff Moyer 7f25e45b02 dax: return the partition offset from fs_dax_get_by_bdev
Bugzilla: https://bugzilla.redhat.com/2162211
Conflicts: dropped ext2 and erofs hunks, as they're not supported in RHEL.
  Fixed up ext4 conflict due to RHEL differences.

commit cd913c76f489def1a388e3a5b10df94948ede3f5
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Nov 29 11:21:59 2021 +0100

    dax: return the partition offset from fs_dax_get_by_bdev
    
    Prepare for the removal of the block_device from the DAX I/O path by
    returning the partition offset from fs_dax_get_by_bdev so that the file
    systems have it at hand for use during I/O.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Dan Williams <dan.j.williams@intel.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Link: https://lore.kernel.org/r/20211129102203.2243509-26-hch@lst.de
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-03-14 10:54:18 -04:00
Jeff Moyer cf1518f312 xfs: move dax device handling into xfs_{alloc,free}_buftarg
Bugzilla: https://bugzilla.redhat.com/2162211

commit 5b5abbefec1bea98abba8f1cffcf72c11c32a92d
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Nov 29 11:21:55 2021 +0100

    xfs: move dax device handling into xfs_{alloc,free}_buftarg
    
    Hide the DAX device lookup from the xfs_super.c code.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dan Williams <dan.j.williams@intel.com>
    Link: https://lore.kernel.org/r/20211129102203.2243509-22-hch@lst.de
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-03-14 10:54:17 -04:00
Brian Foster 21a25a1300 xfs: rename buffer cache index variable b_bn
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143
Upstream Status: linux.git

commit 4c7f65aea7b7fe66c08f8f7304c1ea3f7a871d5a
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Aug 18 18:48:54 2021 -0700

    xfs: rename buffer cache index variable b_bn

    To stop external users from using b_bn as the disk address of the
    buffer, rename it to b_rhash_key to indicate that it is the buffer
    cache index, not the block number of the buffer. Code that needs the
    disk address should use xfs_buf_daddr() to obtain it.

    Do the rename and clean up any of the remaining internal b_bn users.
    Also clean up any remaining b_bn cruft that is now unused.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Brian Foster <bfoster@redhat.com>
2022-08-25 08:11:36 -04:00
Brian Foster 74f147b83b xfs: introduce xfs_buf_daddr()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143
Upstream Status: linux.git

commit 04fcad80cd068731a779fb442f78234732683755
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Aug 18 18:46:57 2021 -0700

    xfs: introduce xfs_buf_daddr()

    Introduce a helper function xfs_buf_daddr() to extract the disk
    address of the buffer from the struct xfs_buf. This will replace
    direct accesses to bp->b_bn and bp->b_maps[0].bm_bn, as well as
    the XFS_BUF_ADDR() macro.

    This patch introduces the helper function and replaces all uses of
    XFS_BUF_ADDR() as this is just a simple sed replacement.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Brian Foster <bfoster@redhat.com>
2022-08-25 08:11:36 -04:00
Brian Foster f0f7eee07e xfs: sb verifier doesn't handle uncached sb buffer
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143
Upstream Status: linux.git

commit 8cf07f3dd56195316be97758cb8b4e1d7183ea84
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Aug 18 18:46:24 2021 -0700

    xfs: sb verifier doesn't handle uncached sb buffer

    The verifier checks explicitly for bp->b_bn == XFS_SB_DADDR to match
    the primary superblock buffer, but the primary superblock is an
    uncached buffer and so bp->b_bn is always -1ULL. Hence this never
    matches and the CRC error reporting is wholly dependent on the
    mount superblock already being populated so CRC feature checks pass
    and allow CRC errors to be reported.

    Fix this so that the primary superblock CRC error reporting is not
    dependent on already having read the superblock into memory.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Brian Foster <bfoster@redhat.com>
2022-08-25 08:11:33 -04:00
Brian Foster 7fe76aa101 xfs: remove kmem_alloc_io()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143
Upstream Status: linux.git

commit 98fe2c3cef21b784e2efd1d9d891430d95b4f073
Author: Dave Chinner <dchinner@redhat.com>
Date:   Mon Aug 9 10:10:01 2021 -0700

    xfs: remove kmem_alloc_io()

    Since commit 59bb47985c ("mm, sl[aou]b: guarantee natural alignment
    for kmalloc(power-of-two)"), the core slab code now guarantees slab
    alignment in all situations sufficient for IO purposes (i.e. minimum
    of 512 byte alignment of >= 512 byte sized heap allocations) we no
    longer need the workaround in the XFS code to provide this
    guarantee.

    Replace the use of kmem_alloc_io() with kmem_alloc() or
    kmem_alloc_large() appropriately, and remove the kmem_alloc_io()
    interface altogether.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Brian Foster <bfoster@redhat.com>
2022-08-25 08:11:24 -04:00
Christoph Hellwig 54cd3aa6f8 xfs: remove ->b_offset handling for page backed buffers
->b_offset can only be non-zero for _XBF_KMEM backed buffers, so
remove all code dealing with it for page backed buffers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
[dgc: modified to fit this patchset]
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2021-06-07 11:49:50 +10:00
Brian Foster 8321ddb2fa xfs: don't drain buffer lru on freeze and read-only remount
xfs_buftarg_drain() is called from xfs_log_quiesce() to ensure the
buffer cache is reclaimed during unmount. xfs_log_quiesce() is also
called from xfs_quiesce_attr(), however, which means that cache
state is completely drained for filesystem freeze and read-only
remount. While technically harmless, this is unnecessarily
heavyweight. Both freeze and read-only mounts allow reads and thus
allow population of the buffer cache. Therefore, the transitional
sequence in either case really only needs to quiesce outstanding
writes to return the filesystem in a generally read-only state.

Additionally, some users have reported that attempts to freeze a
filesystem concurrent with a read-heavy workload causes the freeze
process to stall for a significant amount of time. This occurs
because, as mentioned above, the read workload repopulates the
buffer LRU while the freeze task attempts to drain it.

To improve this situation, replace the drain in xfs_log_quiesce()
with a buffer I/O quiesce and lift the drain into the unmount path.
This removes buffer LRU reclaim from freeze and read-only [re]mount,
but ensures the LRU is still drained before the filesystem unmounts.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-01-22 16:54:50 -08:00
Brian Foster 10fb9ac125 xfs: rename xfs_wait_buftarg() to xfs_buftarg_drain()
xfs_wait_buftarg() is vaguely named and somewhat overloaded. Its
primary purpose is to reclaim all buffers from the provided buffer
target LRU. In preparation to refactor xfs_wait_buftarg() into
serialization and LRU draining components, rename the function and
associated helpers to something more descriptive. This patch has no
functional changes with the minor exception of renaming a
tracepoint.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-01-22 16:54:50 -08:00
Dave Chinner e82226138b xfs: remove xfs_buf_t typedef
Prepare for kernel xfs_buf  alignment by getting rid of the
xfs_buf_t typedef from userspace.

[darrick: This patch is a port of a userspace patch removing the
xfs_buf_t typedef in preparation to make the userspace xfs_buf code
behave more like its kernel counterpart.]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-12-16 16:07:34 -08:00
Christoph Hellwig 26e328759b xfs: reuse _xfs_buf_read for re-reading the superblock
Instead of poking deeply into buffer cache internals when re-reading the
superblock during log recovery just generalize _xfs_buf_read and use it
there.  Note that we don't have to explicitly set up the ops as they
must be set from the initial read.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-09-15 20:52:39 -07:00
Christoph Hellwig 6a7584b1d8 xfs: fold xfs_buf_ioend_finish into xfs_ioend
No need to keep a separate helper for this logic.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-09-15 20:52:38 -07:00
Christoph Hellwig 76b2d32346 xfs: mark xfs_buf_ioend static
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-09-15 20:52:38 -07:00
Dave Chinner b01d1461ae xfs: call xfs_buf_iodone directly
All unmarked dirty buffers should be in the AIL and have log items
attached to them. Hence when they are written, we will run a
callback to remove the item from the AIL if appropriate. Now that
we've handled inode and dquot buffers, all remaining calls are to
xfs_buf_iodone() and so we can hard code this rather than use an
indirect call.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-07-06 10:46:58 -07:00
Dave Chinner 9fe5c77cbe xfs: mark log recovery buffers for completion
Log recovery has it's own buffer write completion handler for
buffers that it directly recovers. Convert these to direct calls by
flagging these buffers as being log recovery buffers. The flag will
get cleared by the log recovery IO completion routine, so it will
never leak out of log recovery.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-07-06 10:46:58 -07:00
Dave Chinner 0c7e5afbea xfs: mark dquot buffers in cache
dquot buffers always have write IO callbacks, so by marking them
directly we can avoid needing to attach ->b_iodone functions to
them. This avoids an indirect call, and makes future modifications
much simpler.

This is largely a rearrangement of the code at this point - no IO
completion functionality changes at this point, just how the
code is run is modified.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-07-06 10:46:58 -07:00
Dave Chinner f593bf144c xfs: mark inode buffers in cache
Inode buffers always have write IO callbacks, so by marking them
directly we can avoid needing to attach ->b_iodone functions to
them. This avoids an indirect call, and makes future modifications
much simpler.

While this is largely a refactor of existing functionality, we
broaden the scope of the flag to beyond where inodes are explicitly
attached because future changes need to know what type of log items
are attached to the buffer. Adding this buffer flag may invoke the
inode iodone callback in cases where it wouldn't have been
previously, but this is not a functional change because the callback
is identical to the normal buffer write iodone callback when inodes
are not attached.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-07-06 10:46:58 -07:00
Brian Foster f9bccfcc3b xfs: refactor ratelimited buffer error messages into helper
XFS has some inconsistent log message rate limiting with respect to
buffer alerts. The metadata I/O error notification uses the generic
ratelimited alert, the buffer push code uses a custom rate limit and
the similar quiesce time failure checks are not rate limited at all
(when they should be).

The custom rate limit defined in the buf item code is specifically
crafted for buffer alerts. It is more aggressive than generic rate
limiting code because it must accommodate a high frequency of I/O
error events in a relative short timeframe.

Factor out the custom rate limit state from the buf item code into a
per-buftarg rate limit so various alerts are limited based on the
target. Define a buffer alert helper function and use it for the
buffer alerts that are already ratelimited.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-07 08:27:46 -07:00
Brian Foster 54b3b1f619 xfs: factor out buffer I/O failure code
We use the same buffer I/O failure code in a few different places.
It's not much code, but it's not necessarily self-explanatory.
Factor it into a helper and document it in one place.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-07 08:27:45 -07:00
Darrick J. Wong 8d57c21600 xfs: add a function to deal with corrupt buffers post-verifiers
Add a helper function to get rid of buffers that we have decided are
corrupt after the verifiers have run.  This function is intended to
handle metadata checks that can't happen in the verifiers, such as
inter-block relationship checking.  Note that we now mark the buffer
stale so that it will not end up on any LRU and will be purged on
release.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-03-12 07:58:12 -07:00
Darrick J. Wong cdbcf82b86 xfs: fix xfs_buf_ioerror_alert location reporting
Instead of passing __func__ to the error reporting function, let's use
the return address builtins so that the messages actually tell you which
higher level function called the buffer functions.  This was previously
true for the xfs_buf_read callers, but not for the xfs_trans_read_buf
callers.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-01-26 14:32:27 -08:00
Darrick J. Wong 0e3eccce5e xfs: make xfs_buf_read return an error code
Convert xfs_buf_read() to return numeric error codes like most
everywhere else in xfs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-01-26 14:32:26 -08:00
Darrick J. Wong 2842b6db3d xfs: make xfs_buf_get_uncached return an error code
Convert xfs_buf_get_uncached() to return numeric error codes like most
everywhere else in xfs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-01-26 14:32:26 -08:00
Darrick J. Wong 841263e933 xfs: make xfs_buf_get return an error code
Convert xfs_buf_get() to return numeric error codes like most
everywhere else in xfs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-01-26 14:32:26 -08:00
Darrick J. Wong 4ed8e27b4f xfs: make xfs_buf_read_map return an error code
Convert xfs_buf_read_map() to return numeric error codes like most
everywhere else in xfs.  This involves moving the open-coded logic that
reports metadata IO read / corruption errors and stales the buffer into
xfs_buf_read_map so that the logic is all in one place.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-01-26 14:32:26 -08:00
Darrick J. Wong 3848b5f670 xfs: make xfs_buf_get_map return an error code
Convert xfs_buf_get_map() to return numeric error codes like most
everywhere else in xfs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-01-26 14:32:25 -08:00
Christoph Hellwig 25a409572b xfs: mark xfs_buf_free static
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-28 08:37:54 -07:00
Dave Chinner d916275aa4 xfs: get allocation alignment from the buftarg
Needed to feed into the allocation routine to guarantee the memory
buffers we add to bios are correctly aligned to the underlying
device.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-26 17:43:14 -07:00
Christoph Hellwig dbd329f1e4 xfs: add struct xfs_mount pointer to struct xfs_buf
We need to derive the mount pointer from a buffer in a lot of place.
Add a direct pointer to short cut the pointer chasing.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:29 -07:00
Christoph Hellwig 8124b9b601 xfs: remove the b_io_length field in struct xfs_buf
This field is now always idential to b_length.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:28 -07:00
Christoph Hellwig e99b4bd0cb xfs: properly type the b_log_item field in struct xfs_buf
Now that the log code doesn't abuse this field any more we can
declare it as a struct xfs_buf_log_item pointer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:28 -07:00
Christoph Hellwig 0564501ff5 xfs: remove unused buffer cache APIs
Now that the log code uses bios directly we can drop various special
cases in the buffer cache code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:27 -07:00
Christoph Hellwig ce89755cdf xfs: renumber XBF_WRITE_FAIL
Assining a numerical value that is not close to the flags
defined near by is just asking for conflicts later on.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:18 -07:00
Christoph Hellwig 153fd7b57c xfs: remove the never used _XBF_COMPOUND flag
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:18 -07:00
Eric Sandeen f5b999c03f xfs: remove unused flag arguments
There are several functions which take a flag argument that is
only ever passed as "0," so remove these arguments.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-12 09:00:00 -07:00
Christoph Hellwig f9a196ee5a xfs: merge xfs_buf_zero and xfs_buf_iomove
xfs_buf_zero is the only caller of xfs_buf_iomove.  Remove support
for copying from or to the buffer in xfs_buf_iomove and merge the
two functions.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-12 08:59:59 -07:00
Darrick J. Wong 15baadf72c xfs: fix xfs_buf magic number endian checks
Create a separate magic16 check function so that we don't run afoul of
static checkers.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-02-18 09:38:41 -08:00
Brian Foster 8473fee340 xfs: distinguish between inobt and finobt magic values
The inode btree verifier code is shared between the inode btree and
free inode btree because the underlying metadata formats are
essentially equivalent. A side effect of this is that the verifier
cannot determine whether a particular btree block should have an
inobt or finobt magic value.

This logic allows an unfortunate xfs_repair bug to escape detection
where certain level > 0 nodes of the finobt are stamped with inobt
magic by xfs_repair finobt reconstruction. This is fortunately not a
severe problem since the inode btree magic values do not contribute
to any changes in kernel behavior, but we do need a means to detect
and prevent this problem in the future.

Add a field to xfs_buf_ops to store the v4 and v5 superblock magic
values expected by a particular verifier. Add a helper to check an
on-disk magic value against the value expected by the verifier. Call
the helper from the shared [f]inobt verifier code for magic value
verification. This ensures that the inode btree blocks each have the
appropriate magic value based on specific tree type and superblock
version.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-02-11 16:07:01 -08:00
Brian Foster 75d0230314 xfs: clarify documentation for the function to reverify buffers
Improve the documentation around xfs_buf_ensure_ops, which is the
function that is responsible for cleaning up the b_ops state of buffers
that go through xrep_findroot_block but don't match anything.  Rename
the function to xfs_buf_reverify.

[darrick: this started off as bfoster mods of a previous patch of mine,
but the renaming part is now this separate patch.]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
2019-02-11 16:07:01 -08:00
Darrick J. Wong 1aff5696f3 xfs: always assign buffer verifiers when one is provided
If a caller supplies buffer ops when trying to read a buffer and the
buffer doesn't already have buf ops assigned, ensure that the ops are
assigned to the buffer and the verifier is run on that buffer.

Note that current XFS code is careful to assign buffer ops after a
xfs_{trans_,}buf_read call in which ops were not supplied.  However, we
should apply ops defensively in case there is ever a coding mistake; and
an upcoming repair patch will need to be able to read a buffer without
assigning buf ops.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-18 17:20:30 +11:00
Eric Sandeen fa6c668d80 xfs: remove b_last_holder & associated macros
The old lock tracking infrastructure in xfs using the b_last_holder
field seems to only be useful if you can get into the system with a
debugger; it seems that the existing tracepoints would be the way to
go these days, and this old infrastructure can be removed.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-08-12 08:37:31 -07:00
Brian Foster 6af88cda00 xfs: combine [a]sync buffer submission apis
The buffer I/O submission path consists of separate function calls
per type. The buffer I/O type is already controlled via buffer
state (XBF_ASYNC), however, so there is no real need for separate
submission functions.

Combine the buffer submission functions into a single function that
processes the buffer appropriately based on XBF_ASYNC. Retain an
internal helper with a conditional wait parameter to continue to
support batched !XBF_ASYNC submission/completion required by delwri
queues.

Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11 22:26:35 -07:00