Commit Graph

8 Commits

Author SHA1 Message Date
Bill O'Donnell d7fddd5eaa mm, pmem, xfs: Introduce MF_MEM_PRE_REMOVE for unbind
JIRA: https://issues.redhat.com/browse/RHEL-12888

Conflicts: difference from upstream mm/memory-failure.c

commit fa422b353d212373fb2b2857a5ea5a6fa4876f9c
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date:   Mon Oct 23 15:20:46 2023 +0800

    mm, pmem, xfs: Introduce MF_MEM_PRE_REMOVE for unbind

    Now, if we suddenly remove a PMEM device(by calling unbind) which
    contains FSDAX while programs are still accessing data in this device,
    e.g.:
    ```
     $FSSTRESS_PROG -d $SCRATCH_MNT -n 99999 -p 4 &
     # $FSX_PROG -N 1000000 -o 8192 -l 500000 $SCRATCH_MNT/t001 &
     echo "pfn1.1" > /sys/bus/nd/drivers/nd_pmem/unbind
    ```
    it could come into an unacceptable state:
      1. device has gone but mount point still exists, and umount will fail
           with "target is busy"
      2. programs will hang and cannot be killed
      3. may crash with NULL pointer dereference

    To fix this, we introduce a MF_MEM_PRE_REMOVE flag to let it know that we
    are going to remove the whole device, and make sure all related processes
    could be notified so that they could end up gracefully.

    This patch is inspired by Dan's "mm, dax, pmem: Introduce
    dev_pagemap_failure()"[1].  With the help of dax_holder and
    ->notify_failure() mechanism, the pmem driver is able to ask filesystem
    on it to unmap all files in use, and notify processes who are using
    those files.

    Call trace:
    trigger unbind
     -> unbind_store()
      -> ... (skip)
       -> devres_release_all()
        -> kill_dax()
         -> dax_holder_notify_failure(dax_dev, 0, U64_MAX, MF_MEM_PRE_REMOVE)
          -> xfs_dax_notify_failure()
          `-> freeze_super()             // freeze (kernel call)
          `-> do xfs rmap
          ` -> mf_dax_kill_procs()
          `  -> collect_procs_fsdax()    // all associated processes
          `  -> unmap_and_kill()
          ` -> invalidate_inode_pages2_range() // drop file's cache
          `-> thaw_super()               // thaw (both kernel & user call)

    Introduce MF_MEM_PRE_REMOVE to let filesystem know this is a remove
    event.  Use the exclusive freeze/thaw[2] to lock the filesystem to prevent
    new dax mapping from being created.  Do not shutdown filesystem directly
    if configuration is not supported, or if failure range includes metadata
    area.  Make sure all files and processes(not only the current progress)
    are handled correctly.  Also drop the cache of associated files before
    pmem is removed.

    [1]: https://lore.kernel.org/linux-mm/161604050314.1463742.14151665140035795571.stgit@dwillia2-desk3.amr.corp.intel.com/
    [2]: https://lore.kernel.org/linux-xfs/169116275623.3187159.16862410128731457358.stg-ugh@frogsfrogsfrogs/

    Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dan Williams <dan.j.williams@intel.com>
    Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-04-05 11:58:49 -05:00
Bill O'Donnell a4a194c4b0 xfs: correct calculation for agend and blockcount
JIRA: https://issues.redhat.com/browse/RHEL-12888

commit 3c90c01e49342b166e5c90ec2c85b220be15a20e
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date:   Wed Sep 13 18:29:42 2023 +0800

    xfs: correct calculation for agend and blockcount

    The agend should be "start + length - 1", then, blockcount should be
    "end + 1 - start".  Correct 2 calculation mistakes.

    Also, rename "agend" to "range_agend" because it's not the end of the AG
    per se; it's the end of the dead region within an AG's agblock space.

    Fixes: 5cf32f63b0f4 ("xfs: fix the calculation for "end" and "length"")
    Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
    Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
    Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-04-05 11:58:48 -05:00
Bill O'Donnell 6e5032d556 xfs: fix the calculation for "end" and "length"
JIRA: https://issues.redhat.com/browse/RHEL-12888

commit 5cf32f63b0f4c520460c1a5dd915dc4f09085f29
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date:   Thu Jun 29 17:40:30 2023 -0700

    xfs: fix the calculation for "end" and "length"

    The value of "end" should be "start + length - 1".

    Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-04-05 11:45:50 -05:00
Bill O'Donnell 26a4936de4 xfs: fix up for "xfs: pass perag to xfs_alloc_read_agf()"
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2218635

Conflicts: pick only the change to xfs_notify_failure.c that
	   was included in this merge commit.

commit 6614a3c3164a5df2b54abb0b3559f51041cf705b
Merge: 74cae210a335 360614c01f81
    ...
    [ XFS merge from hell as per Darrick Wong in
    https://lore.kernel.org/all/YshKnxb4VwXycPO8@magnolia/ ]

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-07-05 15:34:25 -05:00
Bill O'Donnell 3871255f2c xfs: on memory failure, only shut down fs after scanning all mappings
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2192730

commit e033f40be262c4d227f8fbde52856e1d8646872b
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Tue Oct 4 16:40:01 2022 +1100

    xfs: on memory failure, only shut down fs after scanning all mappings

    xfs_dax_failure_fn is used to scan the filesystem during a memory
    failure event to look for memory mappings to revoke.  Unfortunately,
    if it encounters an rmap record for filesystem metadata, it will
    shut down the filesystem and the scan immediately.  This means that
    we don't complete the mapping revocation scan and instead leave live
    mappings to failed memory.  Fix the function to defer the shutdown
    until after we've finished culling mappings.

    While we're at it, add the usual "xfs_" prefix to struct
    failure_info, and actually initialize mf_flags.

    Fixes: 6f643c57d57c ("xfs: implement ->notify_failure() for XFS")
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Dave Chinner <david@fromorbit.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-06-16 10:35:48 -05:00
Bill O'Donnell be9bfce15f xfs: fix SB_BORN check in xfs_dax_notify_failure()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2192730

commit fd63612ae81159bd7e59762de478889315463ee8
Author: Dan Williams <dan.j.williams@intel.com>
Date:   Fri Aug 26 10:18:01 2022 -0700

    xfs: fix SB_BORN check in xfs_dax_notify_failure()

    The SB_BORN flag is stored in the vfs superblock, not xfs_sb.

    Link: https://lkml.kernel.org/r/166153428094.2758201.7936572520826540019.stgit@dwillia2-xfh.jf.intel.com
    Fixes: 6f643c57d57c ("xfs: implement ->notify_failure() for XFS")
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Cc: Shiyang Ruan <ruansy.fnst@fujitsu.com>
    Cc: Darrick J. Wong <djwong@kernel.org>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
    Cc: Jane Chu <jane.chu@oracle.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Cc: Ritesh Harjani <riteshh@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-06-16 10:35:47 -05:00
Bill O'Donnell 5c90d99915 xfs: quiet notify_failure EOPNOTSUPP cases
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2192730

commit b14d067e850c19921cec2200bd8d179edf6a1aa6
Author: Dan Williams <dan.j.williams@intel.com>
Date:   Fri Aug 26 10:17:54 2022 -0700

    xfs: quiet notify_failure EOPNOTSUPP cases

    Patch series "mm, xfs, dax: Fixes for memory_failure() handling".

    I failed to run the memory error injection section of the ndctl test suite
    on linux-next prior to the merge window and as a result some bugs were
    missed.  While the new enabling targeted reflink enabled XFS filesystems
    the bugs cropped up in the surrounding cases of DAX error injection on
    ext4-fsdax and device-dax.

    One new assumption / clarification in this set is the notion that if a
    filesystem's ->notify_failure() handler returns -EOPNOTSUPP, then it must
    be the case that the fsdax usage of page->index and page->mapping are
    valid.  I am fairly certain this is true for xfs_dax_notify_failure(), but
    would appreciate another set of eyes.

    This patch (of 4):

    XFS always registers dax_holder_operations regardless of whether the
    filesystem is capable of handling the notifications.  The expectation is
    that if the notify_failure handler cannot run then there are no scenarios
    where it needs to run.  In other words the expected semantic is that
    page->index and page->mapping are valid for memory_failure() when the
    conditions that cause -EOPNOTSUPP in xfs_dax_notify_failure() are present.

    A fallback to the generic memory_failure() path is expected so do not warn
    when that happens.

    Link: https://lkml.kernel.org/r/166153426798.2758201.15108211981034512993.stgit@dwillia2-xfh.jf.intel.com
    Link: https://lkml.kernel.org/r/166153427440.2758201.6709480562966161512.stgit@dwillia2-xfh.jf.intel.com
    Fixes: 6f643c57d57c ("xfs: implement ->notify_failure() for XFS")
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Cc: Shiyang Ruan <ruansy.fnst@fujitsu.com>
    Cc: Darrick J. Wong <djwong@kernel.org>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
    Cc: Jane Chu <jane.chu@oracle.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Cc: Ritesh Harjani <riteshh@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-06-16 10:35:47 -05:00
Bill O'Donnell c6108fc126 xfs: implement ->notify_failure() for XFS
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2192730

Conflicts: change to xfs_perag_get() and xfs_perag_put() api from previous out
	   of order patch from upstream fa044ae70 xfs: pass perag to xfs_read_agf
	   required changes to xfs_notify_failure.c

commit 6f643c57d57c56d4677bc05f1fca2ef3f249797c
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date:   Fri Jun 3 13:37:30 2022 +0800

    xfs: implement ->notify_failure() for XFS

    Introduce xfs_notify_failure.c to handle failure related works, such as
    implement ->notify_failure(), register/unregister dax holder in xfs, and
    so on.

    If the rmap feature of XFS enabled, we can query it to find files and
    metadata which are associated with the corrupt data.  For now all we do is
    kill processes with that file mapped into their address spaces, but future
    patches could actually do something about corrupt metadata.

    After that, the memory failure needs to notify the processes who are using
    those files.

    Link: https://lkml.kernel.org/r/20220603053738.1218681-7-ruansy.fnst@fujitsu.com
    Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Dan Williams <dan.j.wiliams@intel.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
    Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
    Cc: Jane Chu <jane.chu@oracle.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Cc: Ritesh Harjani <riteshh@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-06-16 10:34:27 -05:00