Commit Graph

186 Commits

Author SHA1 Message Date
Bill O'Donnell 16bfa41bdf xfs: repair inode fork block mapping data structures
JIRA: https://issues.redhat.com/browse/RHEL-65728

commit 8f71bede8efd820627ac05c19eac2758214bc896
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Fri Dec 15 10:03:39 2023 -0800

    xfs: repair inode fork block mapping data structures

    Use the reverse-mapping btree information to rebuild an inode block map.
    Update the btree bulk loading code as necessary to support inode rooted
    btrees and fix some bitrot problems.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-11-20 11:26:07 -06:00
Bill O'Donnell c7fd21a7b1 xfs: move ->iop_relog to struct xfs_defer_op_type
JIRA: https://issues.redhat.com/browse/RHEL-65728

commit a49c708f9a445457f6a5905732081871234f61c6
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Thu Nov 30 12:31:30 2023 -0800

    xfs: move ->iop_relog to struct xfs_defer_op_type

    The only log items that need relogging are the ones created for deferred
    work operations, and the only part of the code base that relogs log
    items is the deferred work machinery.  Move the function pointers.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-11-20 11:25:51 -06:00
Bill O'Donnell 6ed6cafcf8 xfs: use xfs_defer_create_done for the relogging operation
JIRA: https://issues.redhat.com/browse/RHEL-65728

commit bd3a88f6b71c7509566b44b7021581191cc11ae3
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Thu Nov 30 11:44:56 2023 -0800

    xfs: use xfs_defer_create_done for the relogging operation

    Now that we have a helper to handle creating a log intent done item and
    updating all the necessary state flags, use it to reduce boilerplate in
    the ->iop_relog implementations.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-11-20 11:25:50 -06:00
Bill O'Donnell 2990c8ef58 xfs: move ->iop_recover to xfs_defer_op_type
JIRA: https://issues.redhat.com/browse/RHEL-65728

commit db7ccc0bac2add5a41b66578e376b49328fc99d0
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Wed Nov 22 13:39:25 2023 -0800

    xfs: move ->iop_recover to xfs_defer_op_type

    Finish off the series by moving the intent item recovery function
    pointer to the xfs_defer_op_type struct, since this is really a deferred
    work function now.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-11-20 11:25:49 -06:00
Bill O'Donnell d3b894d85d xfs: pass the xfs_defer_pending object to iop_recover
JIRA: https://issues.redhat.com/browse/RHEL-65728

commit a050acdfa8003a44eae4558fddafc7afb1aef458
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Wed Nov 22 10:38:10 2023 -0800

    xfs: pass the xfs_defer_pending object to iop_recover

    Now that log intent item recovery recreates the xfs_defer_pending state,
    we should pass that into the ->iop_recover routines so that the intent
    item can finish the recreation work.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-11-20 11:25:47 -06:00
Bill O'Donnell 21482343ad xfs: t_firstblock is tracking AGs not blocks
JIRA: https://issues.redhat.com/browse/RHEL-2002

commit 692b6cddeb65a5170c1e63d25b1ffb7822e80f7d
Author: Dave Chinner <dchinner@redhat.com>
Date:   Sat Feb 11 04:11:06 2023 +1100

    xfs: t_firstblock is tracking AGs not blocks

    The tp->t_firstblock field is now raelly tracking the highest AG we
    have locked, not the block number of the highest allocation we've
    made. It's purpose is to prevent AGF locking deadlocks, so rename it
    to "highest AG" and simplify the implementation to just track the
    agno rather than a fsbno.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-11-10 07:22:20 -06:00
Bill O'Donnell 5b5b4424f2 xfs: add log item precommit operation
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit fad743d7cd8bd92d03c09e71f29eace860f50415
Author: Dave Chinner <dchinner@redhat.com>
Date:   Thu Jul 14 11:47:26 2022 +1000

    xfs: add log item precommit operation

    For inodes that are dirty, we have an attached cluster buffer that
    we want to use to track the dirty inode through the AIL.
    Unfortunately, locking the cluster buffer and adding it to the
    transaction when the inode is first logged in a transaction leads to
    buffer lock ordering inversions.

    The specific problem is ordering against the AGI buffer. When
    modifying unlinked lists, the buffer lock order is AGI -> inode
    cluster buffer as the AGI buffer lock serialises all access to the
    unlinked lists. Unfortunately, functionality like xfs_droplink()
    logs the inode before calling xfs_iunlink(), as do various directory
    manipulation functions. The inode can be logged way down in the
    stack as far as the bmapi routines and hence, without a major
    rewrite of lots of APIs there's no way we can avoid the inode being
    logged by something until after the AGI has been logged.

    As we are going to be using ordered buffers for inode AIL tracking,
    there isn't a need to actually lock that buffer against modification
    as all the modifications are captured by logging the inode item
    itself. Hence we don't actually need to join the cluster buffer into
    the transaction until just before it is committed. This means we do
    not perturb any of the existing buffer lock orders in transactions,
    and the inode cluster buffer is always locked last in a transaction
    that doesn't otherwise touch inode cluster buffers.

    We do this by introducing a precommit log item method.  This commit
    just introduces the mechanism; the inode item implementation is in
    followup commits.

    The precommit items need to be sorted into consistent order as we
    may be locking multiple items here. Hence if we have two dirty
    inodes in cluster buffers A and B, and some other transaction has
    two separate dirty inodes in the same cluster buffers, locking them
    in different orders opens us up to ABBA deadlocks. Hence we sort the
    items on the transaction based on the presence of a sort log item
    method.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:45 -05:00
Bill O'Donnell a98fa29c6e xfs: Add order IDs to log items in CIL
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 016a23388cdcb2740deb1379dc408f21c84efb11
Author: Dave Chinner <dchinner@redhat.com>
Date:   Thu Jul 7 18:53:59 2022 +1000

    xfs: Add order IDs to log items in CIL

    Before we split the ordered CIL up into per cpu lists, we need a
    mechanism to track the order of the items in the CIL. We need to do
    this because there are rules around the order in which related items
    must physically appear in the log even inside a single checkpoint
    transaction.

    An example of this is intents - an intent must appear in the log
    before it's intent done record so that log recovery can cancel the
    intent correctly. If we have these two records misordered in the
    CIL, then they will not be recovered correctly by journal replay.

    We also will not be able to move items to the tail of
    the CIL list when they are relogged, hence the log items will need
    some mechanism to allow the correct log item order to be recreated
    before we write log items to the hournal.

    Hence we need to have a mechanism for recording global order of
    transactions in the log items  so that we can recover that order
    from un-ordered per-cpu lists.

    Do this with a simple monotonic increasing commit counter in the CIL
    context. Each log item in the transaction gets stamped with the
    current commit order ID before it is added to the CIL. If the item
    is already in the CIL, leave it where it is instead of moving it to
    the tail of the list and instead sort the list before we start the
    push work.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:36 -05:00
Bill O'Donnell c84b4e1db6 xfs: intent item whiteouts
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 0d227466be84332d1888724e1e74dac34bff6d71
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed May 4 11:50:29 2022 +1000

    xfs: intent item whiteouts

    When we log modifications based on intents, we add both intent
    and intent done items to the modification being made. These get
    written to the log to ensure that the operation is re-run if the
    intent done is not found in the log.

    However, for operations that complete wholly within a single
    checkpoint, the change in the checkpoint is atomic and will never
    need replay. In this case, we don't need to actually write the
    intent and intent done items to the journal because log recovery
    will never need to manually restart this modification.

    Log recovery currently handles intent/intent done matching by
    inserting the intent into the AIL, then removing it when a matching
    intent done item is found. Hence for all the intent-based operations
    that complete within a checkpoint, we spend all that time parsing
    the intent/intent done items just to cancel them and do nothing with
    them.

    Hence it follows that the only time we actually need intents in the
    log is when the modification crosses checkpoint boundaries in the
    log and so may only be partially complete in the journal. Hence if
    we commit and intent done item to the CIL and the intent item is in
    the same checkpoint, we don't actually have to write them to the
    journal because log recovery will always cancel the intents.

    We've never really worried about the overhead of logging intents
    unnecessarily like this because the intents we log are generally
    very much smaller than the change being made. e.g. freeing an extent
    involves modifying at lease two freespace btree blocks and the AGF,
    so the EFI/EFD overhead is only a small increase in space and
    processing time compared to the overall cost of freeing an extent.

    However, delayed attributes change this cost equation dramatically,
    especially for inline attributes. In the case of adding an inline
    attribute, we only log the inode core and attribute fork at present.
    With delayed attributes, we now log the attr intent which includes
    the name and value, the inode core adn attr fork, and finally the
    attr intent done item. We increase the number of items we log from 1
    to 3, and the number of log vectors (regions) goes up from 3 to 7.
    Hence we tripple the number of objects that the CIL has to process,
    and more than double the number of log vectors that need to be
    written to the journal.

    At scale, this means delayed attributes cause a non-pipelined CIL to
    become CPU bound processing all the extra items, resulting in a > 40%
    performance degradation on 16-way file+xattr create worklaods.
    Pipelining the CIL (as per 5.15) reduces the performance degradation
    to 20%, but now the limitation is the rate at which the log items
    can be written to the iclogs and iclogs be dispatched for IO and
    completed.

    Even log IO completion is slowed down by these intents, because it
    now has to process 3x the number of items in the checkpoint.
    Processing completed intents is especially inefficient here, because
    we first insert the intent into the AIL, then remove it from the AIL
    when the intent done is processed. IOWs, we are also doing expensive
    operations in log IO completion we could completely avoid if we
    didn't log completed intent/intent done pairs.

    Enter log item whiteouts.

    When an intent done is committed, we can check to see if the
    associated intent is in the same checkpoint as we are currently
    committing the intent done to. If so, we can mark the intent log
    item with a whiteout and immediately free the intent done item
    rather than committing it to the CIL. We can basically skip the
    entire formatting and CIL insertion steps for the intent done item.

    However, we cannot remove the intent item from the CIL at this point
    because the unlocked per-cpu CIL item lists do not permit removal
    without holding the CIL context lock exclusively. Transaction commit
    only holds the context lock shared, hence the best we can do is mark
    the intent item with a whiteout so that the CIL push can release it
    rather than writing it to the log.

    This means we never write the intent to the log if the intent done
    has also been committed to the same checkpoint, but we'll always
    write the intent if the intent done has not been committed or has
    been committed to a different checkpoint. This will result in
    correct log recovery behaviour in all cases, without the overhead of
    logging unnecessary intents.

    This intent whiteout concept is generic - we can apply it to all
    intent/intent done pairs that have a direct 1:1 relationship. The
    way deferred ops iterate and relog intents mean that all intents
    currently have a 1:1 relationship with their done intent, and hence
    we can apply this cancellation to all existing intent/intent done
    implementations.

    For delayed attributes with a 16-way 64kB xattr create workload,
    whiteouts reduce the amount of journalled metadata from ~2.5GB/s
    down to ~600MB/s and improve the creation rate from 9000/s to
    14000/s.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
    Signed-off-by: Dave Chinner <david@fromorbit.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:14 -05:00
Bill O'Donnell 3d69dd80ab xfs: add log item method to return related intents
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit c23ab603e3d6557bd15e672fdbcbba4b28d08921
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed May 4 11:46:39 2022 +1000

    xfs: add log item method to return related intents

    To apply a whiteout to an intent item when an intent done item is
    committed, we need to be able to retrieve the intent item from the
    the intent done item. Add a log item op method for doing this, and
    wire all the intent done items up to it.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Dave Chinner <david@fromorbit.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:14 -05:00
Bill O'Donnell 5024d03194 xfs: add log item flags to indicate intents
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit f5b81200b6c166f78b73b3e2ca3e8f0c34c9daaf
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed May 4 11:46:09 2022 +1000

    xfs: add log item flags to indicate intents

    We currently have a couple of helper functions that try to infer
    whether the log item is an intent or intent done item from the
    combinations of operations it supports.  This is incredibly fragile
    and not very efficient as it requires checking specific combinations
    of ops.

    We need to be able to identify intent and intent done items quickly
    and easily in upcoming patches, so simply add intent and intent done
    type flags to the log item ops flags. These are static flags to
    begin with, so intent items should have been typed like this from
    the start.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Dave Chinner <david@fromorbit.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:13 -05:00
Bill O'Donnell b4d91db63e xfs: convert log item tracepoint flags to unsigned.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 22d53f480c56e34316d2e5f3757ba1839d47008b
Author: Dave Chinner <dchinner@redhat.com>
Date:   Thu Apr 21 10:47:07 2022 +1000

    xfs: convert log item tracepoint flags to unsigned.

    5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned
    fields to be unsigned.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
    Signed-off-by: Dave Chinner <david@fromorbit.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:07 -05:00
Bill O'Donnell ca371f43d0 xfs: convert buffer flags to unsigned.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit b9b3fe152e4966cf8562630de67aa49e2f9c9222
Author: Dave Chinner <david@fromorbit.com>
Date:   Thu Apr 21 08:44:59 2022 +1000

    xfs: convert buffer flags to unsigned.

    5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned
    fields to be unsigned. This manifests as a compiler error such as:

    /kisskb/src/fs/xfs/./xfs_trace.h:432:2: note: in expansion of macro 'TP_printk'
      TP_printk("dev %d:%d daddr 0x%llx bbcount 0x%x hold %d pincount %d "
      ^
    /kisskb/src/fs/xfs/./xfs_trace.h:440:5: note: in expansion of macro '__print_flags'
         __print_flags(__entry->flags, "|", XFS_BUF_FLAGS),
         ^
    /kisskb/src/fs/xfs/xfs_buf.h:67:4: note: in expansion of macro 'XBF_UNMAPPED'
      { XBF_UNMAPPED,  "UNMAPPED" }
        ^
    /kisskb/src/fs/xfs/./xfs_trace.h:440:40: note: in expansion of macro 'XFS_BUF_FLAGS'
         __print_flags(__entry->flags, "|", XFS_BUF_FLAGS),
                                            ^
    /kisskb/src/fs/xfs/./xfs_trace.h: In function 'trace_raw_output_xfs_buf_flags_class':
    /kisskb/src/fs/xfs/xfs_buf.h:46:23: error: initializer element is not constant
     #define XBF_UNMAPPED  (1 << 31)/* do not map the buffer */

    as __print_flags assigns XFS_BUF_FLAGS to a structure that uses an
    unsigned long for the flag. Since this results in the value of
    XBF_UNMAPPED causing a signed integer overflow, the result is
    technically undefined behavior, which gcc-5 does not accept as an
    integer constant.

    This is based on a patch from Arnd Bergman <arnd@arndb.de>.

    Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
    Signed-off-by: Dave Chinner <david@fromorbit.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:00 -05:00
Carlos Maiolino 25a40d32f8 xfs: rename _zone variables to _cache
Bugzilla: https://bugzilla.redhat.com/2125724

Conflicts:
	Small conflict at xfs_inode_alloc() due to out of order
	backport. Inode alloc using kmem_cache_alloc() has been
	converted to use alloc_inode_sb() before this patch.

Now that we've gotten rid of the kmem_zone_t typedef, rename the
variables to _cache since that's what they are.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
(cherry picked from commit 182696fb021fc196e5cbe641565ca40fcf0f885a)
2022-10-21 12:50:46 +02:00
Carlos Maiolino d912d565bb xfs: remove kmem_zone typedef
Bugzilla: https://bugzilla.redhat.com/2125724

Remove these typedefs by referencing kmem_cache directly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
(cherry picked from commit e7720afad068a6729d9cd3aaa08212f2f5a7ceff)
2022-10-21 12:50:46 +02:00
Carlos Maiolino f713a1da33 xfs: formalize the process of holding onto resources across a defer roll
Bugzilla: https://bugzilla.redhat.com/2125724

Transaction users are allowed to flag up to two buffers and two inodes
for ownership preservation across a deferred transaction roll.  Hoist
the variables and code responsible for this out of xfs_defer_trans_roll
so that we can use it for the defer capture mechanism.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
(cherry picked from commit c5db9f937b2971c78d6c6bbaa61a6450efa8b845)
2022-10-21 12:50:46 +02:00
Brian Foster f24fb058dd xfs: log items should have a xlog pointer, not a mount
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143
Upstream Status: linux.git

commit d86142dd7c4e10e50bdb3679b405d748214b2c28
Author: Dave Chinner <dchinner@redhat.com>
Date:   Thu Mar 17 09:09:12 2022 -0700

    xfs: log items should have a xlog pointer, not a mount

    Log items belong to the log, not the xfs_mount. Convert the mount
    pointer in the log item to a xlog pointer in preparation for
    upcoming log centric changes to the log items.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Brian Foster <bfoster@redhat.com>
2022-08-25 08:11:37 -04:00
Andrey Albershteyn d28d9b8e65 xfs: reserve quota for dir expansion when linking/unlinking files
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2106569

commit 871b9316e7a778ff97bdc34fdb2f2977f616651d
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Fri Feb 25 16:18:41 2022 -0800

xfs: reserve quota for dir expansion when linking/unlinking files

XFS does not reserve quota for directory expansion when linking or
unlinking children from a directory.  This means that we don't reject
the expansion with EDQUOT when we're at or near a hard limit, which
means that unprivileged userspace can use link()/unlink() to exceed
quota.

The fix for this is nuanced -- link operations don't always expand the
directory, and we allow a link to proceed with no space reservation if
we don't need to add a block to the directory to handle the addition.
Unlink operations generally do not expand the directory (you'd have to
free a block and then cause a btree split) and we can defer the
directory block freeing if there is no space reservation.

Moreover, there is a further bug in that we do not trigger the blockgc
workers to try to clear space when we're out of quota.

To fix both cases, create a new xfs_trans_alloc_dir function that
allocates the transaction, locks and joins the inodes, and reserves
quota for the directory.  If there isn't sufficient space or quota,
we'll switch the caller to reservationless mode.  This should prevent
quota usage overruns with the least restriction in functionality.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
2022-07-27 10:52:13 +02:00
Dave Chinner 5f9b4b0de8 xfs: xfs_log_force_lsn isn't passed a LSN
In doing an investigation into AIL push stalls, I was looking at the
log force code to see if an async CIL push could be done instead.
This lead me to xfs_log_force_lsn() and looking at how it works.

xfs_log_force_lsn() is only called from inode synchronisation
contexts such as fsync(), and it takes the ip->i_itemp->ili_last_lsn
value as the LSN to sync the log to. This gets passed to
xlog_cil_force_lsn() via xfs_log_force_lsn() to flush the CIL to the
journal, and then used by xfs_log_force_lsn() to flush the iclogs to
the journal.

The problem is that ip->i_itemp->ili_last_lsn does not store a
log sequence number. What it stores is passed to it from the
->iop_committing method, which is called by xfs_log_commit_cil().
The value this passes to the iop_committing method is the CIL
context sequence number that the item was committed to.

As it turns out, xlog_cil_force_lsn() converts the sequence to an
actual commit LSN for the related context and returns that to
xfs_log_force_lsn(). xfs_log_force_lsn() overwrites it's "lsn"
variable that contained a sequence with an actual LSN and then uses
that to sync the iclogs.

This caused me some confusion for a while, even though I originally
wrote all this code a decade ago. ->iop_committing is only used by
a couple of log item types, and only inode items use the sequence
number it is passed.

Let's clean up the API, CIL structures and inode log item to call it
a sequence number, and make it clear that the high level code is
using CIL sequence numbers and not on-disk LSNs for integrity
synchronisation purposes.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-06-21 10:12:33 -07:00
Darrick J. Wong 1aec7c3d05 xfs: remove obsolete AGF counter debugging
In commit f8f2835a9c we changed the behavior of XFS to use EFIs to
remove blocks from an overfilled AGFL because there were complaints
about transaction overruns that stemmed from trying to free multiple
blocks in a single transaction.

Unfortunately, that commit missed a subtlety in the debug-mode
transaction accounting when a realtime volume is attached.  If a
realtime file undergoes a data fork mapping change such that realtime
extents are allocated (or freed) in the same transaction that a data
device block is also allocated (or freed), we can trip a debugging
assertion.  This can happen (for example) if a realtime extent is
allocated and it is necessary to reshape the bmbt to hold the new
mapping.

When we go to allocate a bmbt block from an AG, the first thing the data
device block allocator does is ensure that the freelist is the proper
length.  If the freelist is too long, it will trim the freelist to the
proper length.

In debug mode, trimming the freelist calls xfs_trans_agflist_delta() to
record the decrement in the AG free list count.  Prior to f8f28 we would
put the free block back in the free space btrees in the same
transaction, which calls xfs_trans_agblocks_delta() to record the
increment in the AG free block count.  Since AGFL blocks are included in
the global free block count (fdblocks), there is no corresponding
fdblocks update, so the AGFL free satisfies the following condition in
xfs_trans_apply_sb_deltas:

	/*
	 * Check that superblock mods match the mods made to AGF counters.
	 */
	ASSERT((tp->t_fdblocks_delta + tp->t_res_fdblocks_delta) ==
	       (tp->t_ag_freeblks_delta + tp->t_ag_flist_delta +
		tp->t_ag_btree_delta));

The comparison here used to be: (X + 0) == ((X+1) + -1 + 0), where X is
the number blocks that were allocated.

After commit f8f28 we defer the block freeing to the next chained
transaction, which means that the calls to xfs_trans_agflist_delta and
xfs_trans_agblocks_delta occur in separate transactions.  The (first)
transaction that shortens the free list trips on the comparison, which
has now become:

(X + 0) == ((X) + -1 + 0)

because we haven't freed the AGFL block yet; we've only logged an
intention to free it.  When the second transaction (the deferred free)
commits, it will evaluate the expression as:

(0 + 0) == (1 + 0 + 0)

and trip over that in turn.

At this point, the astute reader may note that the two commits tagged by
this patch have been in the kernel for a long time but haven't generated
any bug reports.  How is it that the author became aware of this bug?

This originally surfaced as an intermittent failure when I was testing
realtime rmap, but a different bug report by Zorro Lang reveals the same
assertion occuring on !lazysbcount filesystems.

The common factor to both reports (and why this problem wasn't
previously reported) becomes apparent if we consider when
xfs_trans_apply_sb_deltas is called by __xfs_trans_commit():

	if (tp->t_flags & XFS_TRANS_SB_DIRTY)
		xfs_trans_apply_sb_deltas(tp);

With a modern lazysbcount filesystem, transactions update only the
percpu counters, so they don't need to set XFS_TRANS_SB_DIRTY, hence
xfs_trans_apply_sb_deltas is rarely called.

However, updates to the count of free realtime extents are not part of
lazysbcount, so XFS_TRANS_SB_DIRTY will be set on transactions adding or
removing data fork mappings to realtime files; similarly,
XFS_TRANS_SB_DIRTY is always set on !lazysbcount filesystems.

Dave mentioned in response to an earlier version of this patch:

"IIUC, what you are saying is that this debug code is simply not
exercised in normal testing and hasn't been for the past decade?  And it
still won't be exercised on anything other than realtime device testing?

"...it was debugging code from 1994 that was largely turned into dead
code when lazysbcounters were introduced in 2007. Hence I'm not sure it
holds any value anymore."

This debugging code isn't especially helpful - you can modify the
flcount on one AG and the freeblks of another AG, and it won't trigger.
Add the fact that nobody noticed for a decade, and let's just get rid of
it (and start testing realtime :P).

This bug was found by running generic/051 on either a V4 filesystem
lacking lazysbcount; or a V5 filesystem with a realtime volume.

Cc: bfoster@redhat.com, zlang@redhat.com
Fixes: f8f2835a9c ("xfs: defer agfl block frees when dfops is available")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2021-04-29 07:44:18 -07:00
Dave Chinner 756b1c3433 xfs: use current->journal_info for detecting transaction recursion
Because the iomap code using PF_MEMALLOC_NOFS to detect transaction
recursion in XFS is just wrong. Remove it from the iomap code and
replace it with XFS specific internal checks using
current->journal_info instead.

[djwong: This change also realigns the lifetime of NOFS flag changes to
match the incore transaction, instead of the inconsistent scheme we have
now.]

Fixes: 9070733b4e ("xfs: abstract PF_FSTRANS to PF_MEMALLOC_NOFS")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-02-25 08:07:04 -08:00
Darrick J. Wong 7317a03df7 xfs: refactor inode ownership change transaction/inode/quota allocation idiom
For file ownership (uid, gid, prid) changes, create a new helper
xfs_trans_alloc_ichange that allocates a transaction and reserves the
appropriate amount of quota against that transction in preparation for a
change of user, group, or project id.  Replace all the open-coded idioms
with a single call to this helper so that we can contain the retry loops
in the next patchset.

This changes the locking behavior for ichange transactions slightly.
Since tr_ichange does not have a permanent reservation and cannot roll,
we pass XFS_ILOCK_EXCL to ijoin so that the inode will be unlocked
automatically at commit time.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2021-02-03 09:18:49 -08:00
Darrick J. Wong f2f7b9ff62 xfs: refactor inode creation transaction/inode/quota allocation idiom
For file creation, create a new helper xfs_trans_alloc_icreate that
allocates a transaction and reserves the appropriate amount of quota
against that transction.  Replace all the open-coded idioms with a
single call to this helper so that we can contain the retry loops in the
next patchset.

This changes the locking behavior for non-tempfile creation slightly, in
that we now make the quota reservation without holding the directory
ILOCK.  While the dquots chosen for inode creation are based on the
directory state at a given point in time, the directory ILOCK was
released as soon as the dquot references are picked up.  Hence it was
never necessary to hold the directory ILOCK for the quota reservation.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-02-03 09:18:49 -08:00
Darrick J. Wong 3de4eb106f xfs: allow reservation of rtblocks with xfs_trans_alloc_inode
Make it so that we can reserve rt blocks with the xfs_trans_alloc_inode
wrapper function, then convert a few more callsites.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2021-02-03 09:18:49 -08:00
Darrick J. Wong 3a1af6c317 xfs: refactor common transaction/inode/quota allocation idiom
Create a new helper xfs_trans_alloc_inode that allocates a transaction,
locks and joins an inode to it, and then reserves the appropriate amount
of quota against that transction.  Then replace all the open-coded
idioms with a single call to this helper.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2021-02-03 09:18:49 -08:00
Darrick J. Wong 4e919af782 xfs: periodically relog deferred intent items
There's a subtle design flaw in the deferred log item code that can lead
to pinning the log tail.  Taking up the defer ops chain examples from
the previous commit, we can get trapped in sequences like this:

Caller hands us a transaction t0 with D0-D3 attached.  The defer ops
chain will look like the following if the transaction rolls succeed:

t1: D0(t0), D1(t0), D2(t0), D3(t0)
t2: d4(t1), d5(t1), D1(t0), D2(t0), D3(t0)
t3: d5(t1), D1(t0), D2(t0), D3(t0)
...
t9: d9(t7), D3(t0)
t10: D3(t0)
t11: d10(t10), d11(t10)
t12: d11(t10)

In transaction 9, we finish d9 and try to roll to t10 while holding onto
an intent item for D3 that we logged in t0.

The previous commit changed the order in which we place new defer ops in
the defer ops processing chain to reduce the maximum chain length.  Now
make xfs_defer_finish_noroll capable of relogging the entire chain
periodically so that we can always move the log tail forward.  Most
chains will never get relogged, except for operations that generate very
long chains (large extents containing many blocks with different sharing
levels) or are on filesystems with small logs and a lot of ongoing
metadata updates.

Callers are now required to ensure that the transaction reservation is
large enough to handle logging done items and new intent items for the
maximum possible chain length.  Most callers are careful to keep the
chain lengths low, so the overhead should be minimal.

The decision to relog an intent item is made based on whether the intent
was logged in a previous checkpoint, since there's no point in relogging
an intent into the same checkpoint.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-10-07 08:40:28 -07:00
Darrick J. Wong e6fff81e48 xfs: proper replay of deferred ops queued during log recovery
When we replay unfinished intent items that have been recovered from the
log, it's possible that the replay will cause the creation of more
deferred work items.  As outlined in commit 509955823c ("xfs: log
recovery should replay deferred ops in order"), later work items have an
implicit ordering dependency on earlier work items.  Therefore, recovery
must replay the items (both recovered and created) in the same order
that they would have been during normal operation.

For log recovery, we enforce this ordering by using an empty transaction
to collect deferred ops that get created in the process of recovering a
log intent item to prevent them from being committed before the rest of
the recovered intent items.  After we finish committing all the
recovered log items, we allocate a transaction with an enormous block
reservation, splice our huge list of created deferred ops into that
transaction, and commit it, thereby finishing all those ops.

This is /really/ hokey -- it's the one place in XFS where we allow
nested transactions; the splicing of the defer ops list is is inelegant
and has to be done twice per recovery function; and the broken way we
handle inode pointers and block reservations cause subtle use-after-free
and allocator problems that will be fixed by this patch and the two
patches after it.

Therefore, replace the hokey empty transaction with a structure designed
to capture each chain of deferred ops that are created as part of
recovering a single unfinished log intent.  Finally, refactor the loop
that replays those chains to do so using one transaction per chain.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-10-07 08:40:28 -07:00
Darrick J. Wong 901219bb25 xfs: remove XFS_LI_RECOVERED
The ->iop_recover method of a log intent item removes the recovered
intent item from the AIL by logging an intent done item and committing
the transaction, so it's superfluous to have this flag check.  Nothing
else uses it, so get rid of the flag entirely.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-10-07 08:40:27 -07:00
Kaixu Xia d6b8fc6c7a xfs: do the assert for all the log done items in xfs_trans_cancel
We should do the assert for all the log intent-done items if they appear
here. This patch detect intent-done items by the fact that their item ops
don't have iop_unpin and iop_push methods and also move the helper
xlog_item_is_intent to xfs_trans.h.

Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-09-25 11:34:07 -07:00
Christoph Hellwig cead0b10f5 xfs: simplify xfs_trans_getsb
Remove the mp argument as this function is only called in transaction
context, and open code xfs_getsb given that the function already accesses
the buffer pointer in the mount point directly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-09-15 20:52:39 -07:00
Dave Chinner 3536b61e74 xfs: unwind log item error flagging
When an buffer IO error occurs, we want to mark all
the log items attached to the buffer as failed. Open code
the error handling loop so that we can modify the flagging for the
different types of objects directly and independently of each other.

This also allows us to remove the ->iop_error method from the log
item operations.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-07-07 07:15:07 -07:00
Dave Chinner 2ef3f7f5db xfs: get rid of log item callbacks
They are not used anymore, so remove them from the log item and the
buffer iodone attachment interfaces.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-07-07 07:15:07 -07:00
Darrick J. Wong 889eb55dd6 xfs: refactor intent item RECOVERED flag into the log item
Rename XFS_{EFI,BUI,RUI,CUI}_RECOVERED to XFS_LI_RECOVERED so that we
track recovery status in the log item, then get rid of the now unused
flags fields in each of those log item types.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-08 08:50:01 -07:00
Darrick J. Wong 154c733a33 xfs: refactor releasing finished intents during log recovery
Replace the open-coded AIL item walking with a proper helper when we're
trying to release an intent item that has been finished.  We add a new
->iop_match method to decide if an intent item matches a supplied ID.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-08 08:50:00 -07:00
Darrick J. Wong 10d0c6e06f xfs: refactor recovered EFI log item playback
Move the code that processes the log items created from the recovered
log items into the per-item source code files and use dispatch functions
to call them.  No functional changes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-08 08:50:00 -07:00
Darrick J. Wong ce92464c18 xfs: make xfs_trans_get_buf return an error code
Convert xfs_trans_get_buf() to return numeric error codes like most
everywhere else in xfs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-01-26 14:32:26 -08:00
Darrick J. Wong 9676b54e6e xfs: make xfs_trans_get_buf_map return an error code
Convert xfs_trans_get_buf_map() to return numeric error codes like most
everywhere else in xfs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-01-26 14:32:26 -08:00
Christoph Hellwig caeaea9858 xfs: merge xfs_trans_bmap.c into xfs_bmap_item.c
Keep all bmap item related code together.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:29:42 -07:00
Christoph Hellwig 3cfce1e3ce xfs: merge xfs_trans_rmap.c into xfs_rmap_item.c
Keep all rmap item related code together in one file.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:29:41 -07:00
Christoph Hellwig effd5e96e7 xfs: merge xfs_trans_refcount.c into xfs_refcount_item.c
Keep all the refcount item related code together in one file.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:29:41 -07:00
Christoph Hellwig 81f4004173 xfs: merge xfs_trans_extfree.c into xfs_extfree_item.c
Keep all the extree item related code together in one file.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:28:17 -07:00
Christoph Hellwig efe2330fdc xfs: remove the xfs_log_item_t typedef
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:33 -07:00
Christoph Hellwig 9ce632a28a xfs: add a flag to release log items on commit
We have various items that are released from ->iop_comitting.  Add a
flag to just call ->iop_release from the commit path to avoid tons
of boilerplate code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:32 -07:00
Christoph Hellwig ddf92053e4 xfs: split iop_unlock
The iop_unlock method is called when comitting or cancelling a
transaction.  In the latter case, the transaction may or may not be
aborted.  While there is no known problem with the current code in
practice, this implementation is limited in that any log item
implementation that might want to differentiate between a commit and a
cancellation must rely on the aborted state.  The aborted bit is only
set when the cancelled transaction is dirty, however.  This means that
there is no way to distinguish between a commit and a clean transaction
cancellation.

For example, intent log items currently rely on this distinction.  The
log item is either transferred to the CIL on commit or released on
transaction cancel. There is currently no possibility for a clean intent
log item in a transaction, but if that state is ever introduced a cancel
of such a transaction will immediately result in memory leaks of the
associated log item(s).  This is an interface deficiency and landmine.

To clean this up, replace the iop_unlock method with an iop_release
method that is specific to transaction cancel.  The existing
iop_committing method occurs at the same time as iop_unlock in the
commit path and there is no need for two separate callbacks here.
Overload the iop_committing method with the current commit time
iop_unlock implementations to eliminate the need for the latter and
further simplify the interface.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:32 -07:00
Eric Sandeen 8c9ce2f707 xfs: remove unused flags arg from getsb interfaces
The flags value is always passed as 0 so remove the argument.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-12 08:59:58 -07:00
Darrick J. Wong 66e3237e72 xfs: const-ify xfs_owner_info arguments
Only certain functions actually change the contents of an
xfs_owner_info; the rest can accept a const struct pointer.  This will
enable us to save stack space by hoisting static owner info types to
be const global variables.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-12-12 08:47:16 -08:00
Darrick J. Wong bc9f2b7c8a xfs: idiotproof defer op type configuration
Recently, we forgot to port a new defer op type to xfsprogs, which
caused us some userspace pain.  Reorganize the way we make libxfs
clients supply defer op type information so that all type information
has to be provided at build time instead of risky runtime dynamic
configuration.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-12-12 08:47:16 -08:00
Darrick J. Wong 38b6238eb6 xfs: fix buffer state management in xrep_findroot_block
We don't handle buffer state properly in online repair's findroot
routine.  If a buffer already has b_ops set, we don't ever want to touch
that, and we don't want to call the read verifiers on a buffer that
could be dirty (CRCs are only recomputed during log checkpoints).

Therefore, be more careful about what we do with a buffer -- if someone
else already attached ops that are not the ones for this btree type,
just ignore the buffer.  We only attach our btree type's buf ops if it
matches the magic/uuid and structure checks.

We also modify xfs_buf_read_map to allow callers to set buffer ops on a
DONE buffer with NULL ops so that repair doesn't leave behind buffers
which won't have buffers attached to them.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-18 17:20:35 +11:00
Brian Foster 9d9e623385 xfs: fold dfops into the transaction
struct xfs_defer_ops has now been reduced to a single list_head. The
external dfops mechanism is unused and thus everywhere a (permanent)
transaction is accessible the associated dfops structure is as well.

Remove the xfs_defer_ops structure and fold the list_head into the
transaction. Also remove the last remnant of external dfops in
xfs_trans_dup().

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-08-02 23:05:14 -07:00
Brian Foster 1ae093cbea xfs: replace xfs_defer_ops ->dop_pending with on-stack list
The xfs_defer_ops ->dop_pending list is used to track active
deferred operations once intents are logged. These items must be
aborted in the event of an error. The list is populated as intents
are logged and items are removed as they complete (or are aborted).

Now that xfs_defer_finish() cancels on error, there is no need to
ever access ->dop_pending outside of xfs_defer_finish(). The list is
only ever populated after xfs_defer_finish() begins and is either
completed or cancelled before it returns.

Remove ->dop_pending from xfs_defer_ops and replace it with a local
list in the xfs_defer_finish() path. Pass the local list to the
various helpers now that it is not accessible via dfops. Note that
we have to check for NULL in the abort case as the final tx roll
occurs outside of the scope of the new local list (once the dfops
has completed and thus drained the list).

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-08-02 23:05:14 -07:00