Commit Graph

277 Commits

Author SHA1 Message Date
Augusto Caringi 20564463f1 Merge: [xfstests xfs/017] xfs_repair fails and hit XFS: Assertion failed: 0, file: fs/xfs/xfs_icache.c, line: 1840
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/6047

JIRA: https://issues.redhat.com/browse/RHEL-56816

```
xfs: fix freeing speculative preallocations for preallocated files

xfs_can_free_eofblocks returns false for files that have persistent
preallocations unless the force flag is passed and there are delayed
blocks.  This means it won't free delalloc reservations for files
with persistent preallocations unless the force flag is set, and it
will also free the persistent preallocations if the force flag is
set and the file happens to have delayed allocations.

Both of these are bad, so do away with the force flag and always free
only post-EOF delayed allocations for files with the XFS_DIFLAG_PREALLOC
or APPEND flags set.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
(cherry picked from commit 610b29161b0aa9feb59b78dc867553274f17fb01)
```

Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>

---

<small>Created 2024-12-17 14:02 UTC by backporter - [KWF FAQ](https://red.ht/kernel_workflow_doc) - [Slack #team-kernel-workflow](https://redhat-internal.slack.com/archives/C04LRUPMJQ5) - [Source](https://gitlab.com/cki-project/kernel-workflow/-/blob/main/webhook/utils/backporter.py) - [Documentation](https://gitlab.com/cki-project/kernel-workflow/-/blob/main/docs/README.backporter.md) - [Report an issue](https://gitlab.com/cki-project/kernel-workflow/-/issues/new?issue%5Btitle%5D=backporter%20webhook%20issue)</small>

Approved-by: Brian Foster <bfoster@redhat.com>
Approved-by: Carlos Maiolino <cmaiolino@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Augusto Caringi <acaringi@redhat.com>
2025-03-31 16:55:06 -03:00
CKI Backport Bot c549212983 xfs: fix freeing speculative preallocations for preallocated files
JIRA: https://issues.redhat.com/browse/RHEL-56816

commit 610b29161b0aa9feb59b78dc867553274f17fb01
Author: Christoph Hellwig <hch@lst.de>
Date:   Wed Jun 19 10:32:43 2024 -0700

    xfs: fix freeing speculative preallocations for preallocated files

    xfs_can_free_eofblocks returns false for files that have persistent
    preallocations unless the force flag is passed and there are delayed
    blocks.  This means it won't free delalloc reservations for files
    with persistent preallocations unless the force flag is set, and it
    will also free the persistent preallocations if the force flag is
    set and the file happens to have delayed allocations.

    Both of these are bad, so do away with the force flag and always free
    only post-EOF delayed allocations for files with the XFS_DIFLAG_PREALLOC
    or APPEND flags set.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>
2024-12-17 14:02:09 +00:00
Bill O'Donnell fd1badcae4 xfs: move xfs_bmap_rtalloc to xfs_rtalloc.c
JIRA: https://issues.redhat.com/browse/RHEL-65728

Conflicts: context diffs

commit 152e21235727bbfe50ddc79a2d60f6bcf19d1640
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Dec 18 05:57:21 2023 +0100

    xfs: move xfs_bmap_rtalloc to xfs_rtalloc.c

    xfs_bmap_rtalloc is currently in xfs_bmap_util.c, which is a somewhat
    odd spot for it, given that is only called from xfs_bmap.c and calls
    into xfs_rtalloc.c to do the actual work.  Move xfs_bmap_rtalloc to
    xfs_rtalloc.c and mark xfs_rtpick_extent xfs_rtallocate_extent and
    xfs_rtallocate_extent static now that they aren't called from outside
    of xfs_rtalloc.c.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
    Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-11-20 11:26:12 -06:00
Bill O'Donnell b97b0b0701 xfs: also use xfs_bmap_btalloc_accounting for RT allocations
JIRA: https://issues.redhat.com/browse/RHEL-65728

commit 58643460546da1dc61593fc6fd78762798b4534f
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Dec 18 05:57:20 2023 +0100

    xfs: also use xfs_bmap_btalloc_accounting for RT allocations

    Make xfs_bmap_btalloc_accounting more generic by handling the RT quota
    reservations and then also use it from xfs_bmap_rtalloc instead of
    open coding the accounting logic there.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
    Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-11-20 11:26:12 -06:00
Bill O'Donnell e601fc184a xfs: handle nimaps=0 from xfs_bmapi_write in xfs_alloc_file_space
JIRA: https://issues.redhat.com/browse/RHEL-62760

commit 35dc55b9e80cb9ec4bcb969302000b002b2ed850
Author: Christoph Hellwig <hch@lst.de>
Date:   Wed Oct 11 07:16:26 2023 +0200

    xfs: handle nimaps=0 from xfs_bmapi_write in xfs_alloc_file_space

    If xfs_bmapi_write finds a delalloc extent at the requested range, it
    tries to convert the entire delalloc extent to a real allocation.

    But if the allocator cannot find a single free extent large enough to
    cover the start block of the requested range, xfs_bmapi_write will
    return 0 but leave *nimaps set to 0.

    In that case we simply need to keep looping with the same startoffset_fsb
    so that one of the following allocations will eventually reach the
    requested range.

    Note that this could affect any caller of xfs_bmapi_write that covers
    an existing delayed allocation.  As far as I can tell we do not have
    any other such caller, though - the regular writeback path uses
    xfs_bmapi_convert_delalloc to convert delayed allocations to real ones,
    and direct I/O invalidates the page cache first.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
    Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-11-09 10:06:43 -06:00
Bill O'Donnell 0c25cef455 xfs: create rt extent rounding helpers for realtime extent blocks
JIRA: https://issues.redhat.com/browse/RHEL-62760

commit 5f57f7309d9ab9d24d50c5707472b1ed8af4eabc
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Mon Oct 16 09:38:28 2023 -0700

    xfs: create rt extent rounding helpers for realtime extent blocks

    Create a pair of functions to round rtblock numbers up or down to the
    nearest rt extent.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-11-09 10:06:37 -06:00
Bill O'Donnell d97dd3291d xfs: convert do_div calls to xfs_rtb_to_rtx helper calls
JIRA: https://issues.redhat.com/browse/RHEL-62760

commit 055641248f649b52620a5fe8774bea253690e057
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Mon Oct 16 09:37:47 2023 -0700

    xfs: convert do_div calls to xfs_rtb_to_rtx helper calls

    Convert these calls to use the helpers, and clean up all these places
    where the same variable can have different units depending on where it
    is in the function.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-11-09 10:06:37 -06:00
Bill O'Donnell 520fae453c xfs: create a helper to convert extlen to rtextlen
JIRA: https://issues.redhat.com/browse/RHEL-62760

commit 2c2b981b737a519907429f62148bbd9e40e01132
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Mon Oct 16 09:35:23 2023 -0700

    xfs: create a helper to convert extlen to rtextlen

    Create a helper to compute the realtime extent (xfs_rtxlen_t) from an
    extent length (xfs_extlen_t) value.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-11-09 10:06:36 -06:00
Bill O'Donnell 0a4322cf76 xfs: create a helper to compute leftovers of realtime extents
JIRA: https://issues.redhat.com/browse/RHEL-62760

commit 68db60bf01c131c09bbe35adf43bd957a4c124bc
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Mon Oct 16 09:34:39 2023 -0700

    xfs: create a helper to compute leftovers of realtime extents

    Create a helper to compute the misalignment between a file extent
    (xfs_extlen_t) and a realtime extent.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-11-09 10:06:36 -06:00
Bill O'Donnell 3ec04f5ca0 xfs: create a helper to convert rtextents to rtblocks
JIRA: https://issues.redhat.com/browse/RHEL-62760

commit fa5a387230861116c2434c20d29fc4b3fd077d24
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Mon Oct 16 09:32:54 2023 -0700

    xfs: create a helper to convert rtextents to rtblocks

    Create a helper to convert a realtime extent to a realtime block.  Later
    on we'll change the helper to use bit shifts when possible.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-11-09 10:06:36 -06:00
Bill O'Donnell bf3789f89b xfs: convert rt extent numbers to xfs_rtxnum_t
JIRA: https://issues.redhat.com/browse/RHEL-62760

commit 2d5f216b77e33f9b503bd42998271da35d4b7055
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Mon Oct 16 09:32:45 2023 -0700

    xfs: convert rt extent numbers to xfs_rtxnum_t

    Further disambiguate the xfs_rtblock_t uses by creating a new type,
    xfs_rtxnum_t, to store the position of an extent within the realtime
    section, in units of rtextents.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-11-09 10:06:35 -06:00
Bill O'Donnell c65cfa90d7 xfs: convert xfs_extlen_t to xfs_rtxlen_t in the rt allocator
JIRA: https://issues.redhat.com/browse/RHEL-62760

commit a684c538bc14410565e8939393089670fa1e19dd
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Mon Oct 16 09:31:11 2023 -0700

    xfs: convert xfs_extlen_t to xfs_rtxlen_t in the rt allocator

    In most of the filesystem, we use xfs_extlen_t to store the length of a
    file (or AG) space mapping in units of fs blocks.  Unfortunately, the
    realtime allocator also uses it to store the length of a rt space
    mapping in units of rt extents.  This is confusing, since one rt extent
    can consist of many fs blocks.

    Separate the two by introducing a new type (xfs_rtxlen_t) to store the
    length of a space mapping (in units of realtime extents) that would be
    found in a file.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-11-09 10:06:34 -06:00
Bill O'Donnell 409a121df2 xfs: fix negative array access in xfs_getbmap
JIRA: https://issues.redhat.com/browse/RHEL-25419

commit 1bba82fe1afac69c85c1f5ea137c8e73de3c8032
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Tue May 2 09:15:01 2023 +1000

    xfs: fix negative array access in xfs_getbmap

    In commit 8ee81ed581ff, Ye Bin complained about an ASSERT in the bmapx
    code that trips if we encounter a delalloc extent after flushing the
    pagecache to disk.  The ioctl code does not hold MMAPLOCK so it's
    entirely possible that a racing write page fault can create a delalloc
    extent after the file has been flushed.  The proposed solution was to
    replace the assertion with an early return that avoids filling out the
    bmap recordset with a delalloc entry if the caller didn't ask for it.

    At the time, I recall thinking that the forward logic sounded ok, but
    felt hesitant because I suspected that changing this code would cause
    something /else/ to burst loose due to some other subtlety.

    syzbot of course found that subtlety.  If all the extent mappings found
    after the flush are delalloc mappings, we'll reach the end of the data
    fork without ever incrementing bmv->bmv_entries.  This is new, since
    before we'd have emitted the delalloc mappings even though the caller
    didn't ask for them.  Once we reach the end, we'll try to set
    BMV_OF_LAST on the -1st entry (because bmv_entries is zero) and go
    corrupt something else in memory.  Yay.

    I really dislike all these stupid patches that fiddle around with debug
    code and break things that otherwise worked well enough.  Nobody was
    complaining that calling XFS_IOC_BMAPX without BMV_IF_DELALLOC would
    return BMV_OF_DELALLOC records, and now we've gone from "weird behavior
    that nobody cared about" to "bad behavior that must be addressed
    immediately".

    Maybe I'll just ignore anything from Huawei from now on for my own sake.

    Reported-by: syzbot+c103d3808a0de5faaf80@syzkaller.appspotmail.com
    Link: https://lore.kernel.org/linux-xfs/20230412024907.GP360889@frogsfrogsfrogs/
    Fixes: 8ee81ed581ff ("xfs: fix BUG_ON in xfs_getbmap()")
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Dave Chinner <david@fromorbit.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-06-06 10:32:50 -05:00
Bill O'Donnell 3eb11f8212 xfs: fix BUG_ON in xfs_getbmap()
JIRA: https://issues.redhat.com/browse/RHEL-25419

commit 8ee81ed581ff35882b006a5205100db0b57bf070
Author: Ye Bin <yebin10@huawei.com>
Date:   Wed Apr 12 15:49:44 2023 +1000

    xfs: fix BUG_ON in xfs_getbmap()

    There's issue as follows:
    XFS: Assertion failed: (bmv->bmv_iflags & BMV_IF_DELALLOC) != 0, file: fs/xfs/xfs_bmap_util.c, line: 329
    ------------[ cut here ]------------
    kernel BUG at fs/xfs/xfs_message.c:102!
    invalid opcode: 0000 [#1] PREEMPT SMP KASAN
    CPU: 1 PID: 14612 Comm: xfs_io Not tainted 6.3.0-rc2-next-20230315-00006-g2729d23ddb3b-dirty #422
    RIP: 0010:assfail+0x96/0xa0
    RSP: 0018:ffffc9000fa178c0 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff888179a18000
    RDX: 0000000000000000 RSI: ffff888179a18000 RDI: 0000000000000002
    RBP: 0000000000000000 R08: ffffffff8321aab6 R09: 0000000000000000
    R10: 0000000000000001 R11: ffffed1105f85139 R12: ffffffff8aacc4c0
    R13: 0000000000000149 R14: ffff888269f58000 R15: 000000000000000c
    FS:  00007f42f27a4740(0000) GS:ffff88882fc00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000b92388 CR3: 000000024f006000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     <TASK>
     xfs_getbmap+0x1a5b/0x1e40
     xfs_ioc_getbmap+0x1fd/0x5b0
     xfs_file_ioctl+0x2cb/0x1d50
     __x64_sys_ioctl+0x197/0x210
     do_syscall_64+0x39/0xb0
     entry_SYSCALL_64_after_hwframe+0x63/0xcd

    Above issue may happen as follows:
             ThreadA                       ThreadB
    do_shared_fault
     __do_fault
      xfs_filemap_fault
       __xfs_filemap_fault
        filemap_fault
                                 xfs_ioc_getbmap -> Without BMV_IF_DELALLOC flag
                                  xfs_getbmap
                                   xfs_ilock(ip, XFS_IOLOCK_SHARED);
                                   filemap_write_and_wait
     do_page_mkwrite
      xfs_filemap_page_mkwrite
       __xfs_filemap_fault
        xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
        iomap_page_mkwrite
         ...
         xfs_buffered_write_iomap_begin
          xfs_bmapi_reserve_delalloc -> Allocate delay extent
                                  xfs_ilock_data_map_shared(ip)
                                  xfs_getbmap_report_one
                                   ASSERT((bmv->bmv_iflags & BMV_IF_DELALLOC) != 0)
                                    -> trigger BUG_ON

    As xfs_filemap_page_mkwrite() only hold XFS_MMAPLOCK_SHARED lock, there's
    small window mkwrite can produce delay extent after file write in xfs_getbmap().
    To solve above issue, just skip delalloc extents.

    Signed-off-by: Ye Bin <yebin10@huawei.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Dave Chinner <david@fromorbit.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-06-06 10:32:48 -05:00
Bill O'Donnell 21482343ad xfs: t_firstblock is tracking AGs not blocks
JIRA: https://issues.redhat.com/browse/RHEL-2002

commit 692b6cddeb65a5170c1e63d25b1ffb7822e80f7d
Author: Dave Chinner <dchinner@redhat.com>
Date:   Sat Feb 11 04:11:06 2023 +1100

    xfs: t_firstblock is tracking AGs not blocks

    The tp->t_firstblock field is now raelly tracking the highest AG we
    have locked, not the block number of the highest allocation we've
    made. It's purpose is to prevent AGF locking deadlocks, so rename it
    to "highest AG" and simplify the implementation to just track the
    agno rather than a fsbno.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-11-10 07:22:20 -06:00
Bill O'Donnell f11725a9b3 xfs: fix NULL pointer dereference in xfs_getbmap()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 001c179c4e26d04db8c9f5e3fef9558b58356be6
Author: ChenXiaoSong <chenxiaosong2@huawei.com>
Date:   Wed Jul 27 17:21:52 2022 -0700

    xfs: fix NULL pointer dereference in xfs_getbmap()

    Reproducer:
     1. fallocate -l 100M image
     2. mkfs.xfs -f image
     3. mount image /mnt
     4. setxattr("/mnt", "trusted.overlay.upper", NULL, 0, XATTR_CREATE)
     5. char arg[32] = "\x01\xff\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00"
                       "\x00\x00\x00\x00\x00\x08\x00\x00\x00\xc6\x2a\xf7";
        fd = open("/mnt", O_RDONLY|O_DIRECTORY);
        ioctl(fd, _IOC(_IOC_READ|_IOC_WRITE, 0x58, 0x2c, 0x20), arg);

    NULL pointer dereference will occur when race happens between xfs_getbmap()
    and xfs_bmap_set_attrforkoff():

             ioctl               |       setxattr
     ----------------------------|---------------------------
     xfs_getbmap                 |
       xfs_ifork_ptr             |
         xfs_inode_has_attr_fork |
           ip->i_forkoff == 0    |
         return NULL             |
       ifp == NULL               |
                                 | xfs_bmap_set_attrforkoff
                                 |   ip->i_forkoff > 0
       xfs_inode_has_attr_fork   |
         ip->i_forkoff > 0       |
       ifp == NULL               |
       ifp->if_format            |

    Fix this by locking i_lock before xfs_ifork_ptr().

    Fixes: abbf9e8a45 ("xfs: rewrite getbmap using the xfs_iext_* helpers")
    Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
    Signed-off-by: Guo Xuenan <guoxuenan@huawei.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    [djwong: added fixes tag]
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:49 -05:00
Bill O'Donnell 3219617b1b xfs: replace inode fork size macros with functions
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit c01147d929899f02a0a8b15e406d12784768ca72
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Sat Jul 9 10:56:07 2022 -0700

    xfs: replace inode fork size macros with functions

    Replace the shouty macros here with typechecked helper functions.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:43 -05:00
Bill O'Donnell f77675b5d0 xfs: replace XFS_IFORK_Q with a proper predicate function
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 932b42c66cb5d0ca9800b128415b4ad6b1952b3e
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Sat Jul 9 10:56:06 2022 -0700

    xfs: replace XFS_IFORK_Q with a proper predicate function

    Replace this shouty macro with a real C function that has a more
    descriptive name.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:43 -05:00
Bill O'Donnell a2d362f29a xfs: make inode attribute forks a permanent part of struct xfs_inode
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

Conflicts: previous out of order application of 5625ea0 requires minor adjust to xfs_iomap.c

commit 2ed5b09b3e8fc274ae8fecd6ab7c5106a364bed1
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Sat Jul 9 10:56:06 2022 -0700

    xfs: make inode attribute forks a permanent part of struct xfs_inode

    Syzkaller reported a UAF bug a while back:

    ==================================================================
    BUG: KASAN: use-after-free in xfs_ilock_attr_map_shared+0xe3/0xf6 fs/xfs/xfs_inode.c:127
    Read of size 4 at addr ffff88802cec919c by task syz-executor262/2958

    CPU: 2 PID: 2958 Comm: syz-executor262 Not tainted
    5.15.0-0.30.3-20220406_1406 #3
    Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7860+a7792d29
    04/01/2014
    Call Trace:
     <TASK>
     __dump_stack lib/dump_stack.c:88 [inline]
     dump_stack_lvl+0x82/0xa9 lib/dump_stack.c:106
     print_address_description.constprop.9+0x21/0x2d5 mm/kasan/report.c:256
     __kasan_report mm/kasan/report.c:442 [inline]
     kasan_report.cold.14+0x7f/0x11b mm/kasan/report.c:459
     xfs_ilock_attr_map_shared+0xe3/0xf6 fs/xfs/xfs_inode.c:127
     xfs_attr_get+0x378/0x4c2 fs/xfs/libxfs/xfs_attr.c:159
     xfs_xattr_get+0xe3/0x150 fs/xfs/xfs_xattr.c:36
     __vfs_getxattr+0xdf/0x13d fs/xattr.c:399
     cap_inode_need_killpriv+0x41/0x5d security/commoncap.c:300
     security_inode_need_killpriv+0x4c/0x97 security/security.c:1408
     dentry_needs_remove_privs.part.28+0x21/0x63 fs/inode.c:1912
     dentry_needs_remove_privs+0x80/0x9e fs/inode.c:1908
     do_truncate+0xc3/0x1e0 fs/open.c:56
     handle_truncate fs/namei.c:3084 [inline]
     do_open fs/namei.c:3432 [inline]
     path_openat+0x30ab/0x396d fs/namei.c:3561
     do_filp_open+0x1c4/0x290 fs/namei.c:3588
     do_sys_openat2+0x60d/0x98c fs/open.c:1212
     do_sys_open+0xcf/0x13c fs/open.c:1228
     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
     do_syscall_64+0x3a/0x7e arch/x86/entry/common.c:80
     entry_SYSCALL_64_after_hwframe+0x44/0x0
    RIP: 0033:0x7f7ef4bb753d
    Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48
    89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73
    01 c3 48 8b 0d 1b 79 2c 00 f7 d8 64 89 01 48
    RSP: 002b:00007f7ef52c2ed8 EFLAGS: 00000246 ORIG_RAX: 0000000000000055
    RAX: ffffffffffffffda RBX: 0000000000404148 RCX: 00007f7ef4bb753d
    RDX: 00007f7ef4bb753d RSI: 0000000000000000 RDI: 0000000020004fc0
    RBP: 0000000000404140 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 0030656c69662f2e
    R13: 00007ffd794db37f R14: 00007ffd794db470 R15: 00007f7ef52c2fc0
     </TASK>

    Allocated by task 2953:
     kasan_save_stack+0x19/0x38 mm/kasan/common.c:38
     kasan_set_track mm/kasan/common.c:46 [inline]
     set_alloc_info mm/kasan/common.c:434 [inline]
     __kasan_slab_alloc+0x68/0x7c mm/kasan/common.c:467
     kasan_slab_alloc include/linux/kasan.h:254 [inline]
     slab_post_alloc_hook mm/slab.h:519 [inline]
     slab_alloc_node mm/slub.c:3213 [inline]
     slab_alloc mm/slub.c:3221 [inline]
     kmem_cache_alloc+0x11b/0x3eb mm/slub.c:3226
     kmem_cache_zalloc include/linux/slab.h:711 [inline]
     xfs_ifork_alloc+0x25/0xa2 fs/xfs/libxfs/xfs_inode_fork.c:287
     xfs_bmap_add_attrfork+0x3f2/0x9b1 fs/xfs/libxfs/xfs_bmap.c:1098
     xfs_attr_set+0xe38/0x12a7 fs/xfs/libxfs/xfs_attr.c:746
     xfs_xattr_set+0xeb/0x1a9 fs/xfs/xfs_xattr.c:59
     __vfs_setxattr+0x11b/0x177 fs/xattr.c:180
     __vfs_setxattr_noperm+0x128/0x5e0 fs/xattr.c:214
     __vfs_setxattr_locked+0x1d4/0x258 fs/xattr.c:275
     vfs_setxattr+0x154/0x33d fs/xattr.c:301
     setxattr+0x216/0x29f fs/xattr.c:575
     __do_sys_fsetxattr fs/xattr.c:632 [inline]
     __se_sys_fsetxattr fs/xattr.c:621 [inline]
     __x64_sys_fsetxattr+0x243/0x2fe fs/xattr.c:621
     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
     do_syscall_64+0x3a/0x7e arch/x86/entry/common.c:80
     entry_SYSCALL_64_after_hwframe+0x44/0x0

    Freed by task 2949:
     kasan_save_stack+0x19/0x38 mm/kasan/common.c:38
     kasan_set_track+0x1c/0x21 mm/kasan/common.c:46
     kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:360
     ____kasan_slab_free mm/kasan/common.c:366 [inline]
     ____kasan_slab_free mm/kasan/common.c:328 [inline]
     __kasan_slab_free+0xe2/0x10e mm/kasan/common.c:374
     kasan_slab_free include/linux/kasan.h:230 [inline]
     slab_free_hook mm/slub.c:1700 [inline]
     slab_free_freelist_hook mm/slub.c:1726 [inline]
     slab_free mm/slub.c:3492 [inline]
     kmem_cache_free+0xdc/0x3ce mm/slub.c:3508
     xfs_attr_fork_remove+0x8d/0x132 fs/xfs/libxfs/xfs_attr_leaf.c:773
     xfs_attr_sf_removename+0x5dd/0x6cb fs/xfs/libxfs/xfs_attr_leaf.c:822
     xfs_attr_remove_iter+0x68c/0x805 fs/xfs/libxfs/xfs_attr.c:1413
     xfs_attr_remove_args+0xb1/0x10d fs/xfs/libxfs/xfs_attr.c:684
     xfs_attr_set+0xf1e/0x12a7 fs/xfs/libxfs/xfs_attr.c:802
     xfs_xattr_set+0xeb/0x1a9 fs/xfs/xfs_xattr.c:59
     __vfs_removexattr+0x106/0x16a fs/xattr.c:468
     cap_inode_killpriv+0x24/0x47 security/commoncap.c:324
     security_inode_killpriv+0x54/0xa1 security/security.c:1414
     setattr_prepare+0x1a6/0x897 fs/attr.c:146
     xfs_vn_change_ok+0x111/0x15e fs/xfs/xfs_iops.c:682
     xfs_vn_setattr_size+0x5f/0x15a fs/xfs/xfs_iops.c:1065
     xfs_vn_setattr+0x125/0x2ad fs/xfs/xfs_iops.c:1093
     notify_change+0xae5/0x10a1 fs/attr.c:410
     do_truncate+0x134/0x1e0 fs/open.c:64
     handle_truncate fs/namei.c:3084 [inline]
     do_open fs/namei.c:3432 [inline]
     path_openat+0x30ab/0x396d fs/namei.c:3561
     do_filp_open+0x1c4/0x290 fs/namei.c:3588
     do_sys_openat2+0x60d/0x98c fs/open.c:1212
     do_sys_open+0xcf/0x13c fs/open.c:1228
     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
     do_syscall_64+0x3a/0x7e arch/x86/entry/common.c:80
     entry_SYSCALL_64_after_hwframe+0x44/0x0

    The buggy address belongs to the object at ffff88802cec9188
     which belongs to the cache xfs_ifork of size 40
    The buggy address is located 20 bytes inside of
     40-byte region [ffff88802cec9188, ffff88802cec91b0)
    The buggy address belongs to the page:
    page:00000000c3af36a1 refcount:1 mapcount:0 mapping:0000000000000000
    index:0x0 pfn:0x2cec9
    flags: 0xfffffc0000200(slab|node=0|zone=1|lastcpupid=0x1fffff)
    raw: 000fffffc0000200 ffffea00009d2580 0000000600000006 ffff88801a9ffc80
    raw: 0000000000000000 0000000080490049 00000001ffffffff 0000000000000000
    page dumped because: kasan: bad access detected

    Memory state around the buggy address:
     ffff88802cec9080: fb fb fb fc fc fa fb fb fb fb fc fc fb fb fb fb
     ffff88802cec9100: fb fc fc fb fb fb fb fb fc fc fb fb fb fb fb fc
    >ffff88802cec9180: fc fa fb fb fb fb fc fc fa fb fb fb fb fc fc fb
                                ^
     ffff88802cec9200: fb fb fb fb fc fc fb fb fb fb fb fc fc fb fb fb
     ffff88802cec9280: fb fb fc fc fa fb fb fb fb fc fc fa fb fb fb fb
    ==================================================================

    The root cause of this bug is the unlocked access to xfs_inode.i_afp
    from the getxattr code paths while trying to determine which ILOCK mode
    to use to stabilize the xattr data.  Unfortunately, the VFS does not
    acquire i_rwsem when vfs_getxattr (or listxattr) call into the
    filesystem, which means that getxattr can race with a removexattr that's
    tearing down the attr fork and crash:

    xfs_attr_set:                          xfs_attr_get:
    xfs_attr_fork_remove:                  xfs_ilock_attr_map_shared:

    xfs_idestroy_fork(ip->i_afp);
    kmem_cache_free(xfs_ifork_cache, ip->i_afp);

                                           if (ip->i_afp &&

    ip->i_afp = NULL;

                                               xfs_need_iread_extents(ip->i_afp))
                                           <KABOOM>

    ip->i_forkoff = 0;

    Regrettably, the VFS is much more lax about i_rwsem and getxattr than
    is immediately obvious -- not only does it not guarantee that we hold
    i_rwsem, it actually doesn't guarantee that we *don't* hold it either.
    The getxattr system call won't acquire the lock before calling XFS, but
    the file capabilities code calls getxattr with and without i_rwsem held
    to determine if the "security.capabilities" xattr is set on the file.

    Fixing the VFS locking requires a treewide investigation into every code
    path that could touch an xattr and what i_rwsem state it expects or sets
    up.  That could take years or even prove impossible; fortunately, we
    can fix this UAF problem inside XFS.

    An earlier version of this patch used smp_wmb in xfs_attr_fork_remove to
    ensure that i_forkoff is always zeroed before i_afp is set to null and
    changed the read paths to use smp_rmb before accessing i_forkoff and
    i_afp, which avoided these UAF problems.  However, the patch author was
    too busy dealing with other problems in the meantime, and by the time he
    came back to this issue, the situation had changed a bit.

    On a modern system with selinux, each inode will always have at least
    one xattr for the selinux label, so it doesn't make much sense to keep
    incurring the extra pointer dereference.  Furthermore, Allison's
    upcoming parent pointer patchset will also cause nearly every inode in
    the filesystem to have extended attributes.  Therefore, make the inode
    attribute fork structure part of struct xfs_inode, at a cost of 40 more
    bytes.

    This patch adds a clunky if_present field where necessary to maintain
    the existing logic of xattr fork null pointer testing in the existing
    codebase.  The next patch switches the logic over to XFS_IFORK_Q and it
    all goes away.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:42 -05:00
Bill O'Donnell 08529f7680 xfs: convert XFS_IFORK_PTR to a static inline helper
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 732436ef916b4f338d672ea56accfdb11e8d0732
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Sat Jul 9 10:56:05 2022 -0700

    xfs: convert XFS_IFORK_PTR to a static inline helper

    We're about to make this logic do a bit more, so convert the macro to a
    static inline function for better typechecking and fewer shouty macros.
    No functional changes here.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:42 -05:00
Bill O'Donnell b1e7d509d1 xfs: dont treat rt extents beyond EOF as eofblocks to be cleared
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 8944c6fb8add384154b784a90ceca88a51a8c364
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Sat Jun 25 10:47:45 2022 -0700

    xfs: dont treat rt extents beyond EOF as eofblocks to be cleared

    On a system with a realtime volume and a 28k realtime extent,
    generic/491 fails because the test opens a file on a frozen filesystem
    and closing it causes xfs_release -> xfs_can_free_eofblocks to
    mistakenly think that the the blocks of the realtime extent beyond EOF
    are posteof blocks to be freed.  Realtime extents cannot be partially
    unmapped, so this is pointless.  Worse yet, this triggers posteof
    cleanup, which stalls on a transaction allocation, which is why the test
    fails.

    Teach the predicate to account for realtime extents properly.

    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:34 -05:00
Bill O'Donnell cf7ff3302c xfs: Conditionally upgrade existing inodes to use large extent counters
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 4f86bb4b66c999ad9ddcfd49fec93992eeba2715
Author: Chandan Babu R <chandan.babu@oracle.com>
Date:   Wed Mar 9 07:49:36 2022 +0000

    xfs: Conditionally upgrade existing inodes to use large extent counters

    This commit enables upgrading existing inodes to use large extent counters
    provided that underlying filesystem's superblock has large extent counter
    feature enabled.

    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:10:59 -05:00
Bill O'Donnell 79e16d14ed xfs: Define max extent length based on on-disk format definition
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 95f0b95e2b686ceaa3f465e9fa079f22e0fe7665
Author: Chandan Babu R <chandan.babu@oracle.com>
Date:   Mon Aug 9 12:05:22 2021 +0530

    xfs: Define max extent length based on on-disk format definition

    The maximum extent length depends on maximum block count that can be stored in
    a BMBT record. Hence this commit defines MAXEXTLEN based on
    BMBT_BLOCKCOUNT_BITLEN.

    While at it, the commit also renames MAXEXTLEN to XFS_MAX_BMBT_EXTLEN.

    Suggested-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:10:55 -05:00
Jeff Moyer 358fa83614 Merge branch 'main' into 'guilt/pmem-9.2'
Several patches to this file were backported out of order.  The result of this merge resolution matches upstream after the inclusion of all of the patches we have backported.

# Conflicts:
#   fs/iomap/buffered-io.c
2023-03-30 20:35:46 +00:00
Chris von Recklinghausen c52a60a8f1 xfs: kill the XFS_IOC_{ALLOC,FREE}SP* ioctls
Conflicts: fs/xfs/xfs_ioctl.c - We already have
	472c6e46f589 ("xfs: remove XFS_PREALLOC_SYNC")
	so there is a dfference in deleted code

Bugzilla: https://bugzilla.redhat.com/2160210

commit 4d1b97f9ce7c0d2af2bb85b12d48e6902172a28e
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Fri Jan 7 17:45:51 2022 -0800

    xfs: kill the XFS_IOC_{ALLOC,FREE}SP* ioctls

    According to the glibc compat header for Irix 4, these ioctls originated
    in April 1991 as a (somewhat clunky) way to preallocate space at the end
    of a file on an EFS filesystem.  XFS, which was released in Irix 5.3 in
    December 1993, picked up these ioctls to maintain compatibility and they
    were ported to Linux in the early 2000s.

    Recently it was pointed out to me they still lurk in the kernel, even
    though the Linux fallocate syscall supplanted the functionality a long
    time ago.  fstests doesn't seem to include any real functional or stress
    tests for these ioctls, which means that the code quality is ... very
    questionable.  Most notably, it was a stale disk block exposure vector
    for 21 years and nobody noticed or complained.  As mature programmers
    say, "If you're not testing it, it's broken."

    Given all that, let's withdraw these ioctls from the XFS userspace API.
    Normally we'd set a long deprecation process, but I estimate that there
    aren't any real users, so let's trigger a warning in dmesg and return
    -ENOTTY.

    See: CVE-2021-4155

    Augments: 983d8e60f508 ("xfs: map unwritten blocks in XFS_IOC_{ALLOC,FREE}SP
 just like fallocate")
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Eric Sandeen <sandeen@redhat.com>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:18:45 -04:00
Jeff Moyer 7af7c9943b xfs: add xfs_zero_range and xfs_truncate_page helpers
Bugzilla: https://bugzilla.redhat.com/2162211

commit f1ba5fafba9bfde4b040cd0d14256aed25a35c5e
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date:   Mon Nov 29 11:21:49 2021 +0100

    xfs: add xfs_zero_range and xfs_truncate_page helpers
    
    Add helpers to prepare for using different DAX operations.
    
    Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
    [hch: split from a larger patch + slight cleanups]
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Dan Williams <dan.j.williams@intel.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Link: https://lore.kernel.org/r/20211129102203.2243509-16-hch@lst.de
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-03-09 03:57:06 -05:00
Carlos Maiolino 48ccb79c6d xfs: xfs_bmap_punch_delalloc_range() should take a byte range
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2155605
Tested: With xfstests and bz reproducer

Conflicts:
	- We still use xfs_discard_page()

All the callers of xfs_bmap_punch_delalloc_range() jump through
hoops to convert a byte range to filesystem blocks before calling
xfs_bmap_punch_delalloc_range(). Instead, pass the byte range to
xfs_bmap_punch_delalloc_range() and have it do the conversion to
filesystem blocks internally.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
(cherry picked from commit 7348b322332d8602a4133f0b861334ea021b134a)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2023-02-06 11:03:27 +01:00
Brian Foster d179379de4 xfs: replace XFS_FORCED_SHUTDOWN with xfs_is_shutdown
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143
Upstream Status: linux.git

commit 75c8c50fa16a23f8ac89ea74834ae8ddd1558d75
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Aug 18 18:46:53 2021 -0700

    xfs: replace XFS_FORCED_SHUTDOWN with xfs_is_shutdown

    Remove the shouty macro and instead use the inline function that
    matches other state/feature check wrapper naming. This conversion
    was done with sed.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Brian Foster <bfoster@redhat.com>
2022-08-25 08:11:34 -04:00
Brian Foster 6def1029c3 xfs: convert mount flags to features
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143
Upstream Status: linux.git
Conflicts: Work around out of order backport in xfs_fs_fill_super().

commit 0560f31a09e523090d1ab2bfe21c69d028c2bdf2
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Aug 18 18:46:52 2021 -0700

    xfs: convert mount flags to features

    Replace m_flags feature checks with xfs_has_<feature>() calls and
    rework the setup code to set flags in m_features.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Brian Foster <bfoster@redhat.com>
2022-08-25 08:11:34 -04:00
Brian Foster d54a790d1d xfs: replace xfs_sb_version checks with feature flag checks
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143
Upstream Status: linux.git

commit 38c26bfd90e1999650d5ef40f90d721f05916643
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Aug 18 18:46:37 2021 -0700

    xfs: replace xfs_sb_version checks with feature flag checks

    Convert the xfs_sb_version_hasfoo() to checks against
    mp->m_features. Checks of the superblock itself during disk
    operations (e.g. in the read/write verifiers and the to/from disk
    formatters) are not converted - they operate purely on the
    superblock state. Everything else should use the mount features.

    Large parts of this conversion were done with sed with commands like
    this:

    for f in `git grep -l xfs_sb_version_has fs/xfs/*.c`; do
            sed -i -e 's/xfs_sb_version_has\(.*\)(&\(.*\)->m_sb)/xfs_has_\1(\2)/' $f
    done

    With manual cleanups for things like "xfs_has_extflgbit" and other
    little inconsistencies in naming.

    The result is ia lot less typing to check features and an XFS binary
    size reduced by a bit over 3kB:

    $ size -t fs/xfs/built-in.a
            text       data     bss     dec     hex filenam
    before  1130866  311352     484 1442702  16038e (TOTALS)
    after   1127727  311352     484 1439563  15f74b (TOTALS)

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Brian Foster <bfoster@redhat.com>
2022-08-25 08:11:34 -04:00
Brian Foster 7e66118a74 xfs: Convert double locking of MMAPLOCK to use VFS helpers
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143
Upstream Status: linux.git

commit d2c292d84c4983424938f32c9c247f6ab8719769
Author: Jan Kara <jack@suse.cz>
Date:   Mon May 24 13:17:49 2021 +0200

    xfs: Convert double locking of MMAPLOCK to use VFS helpers

    Convert places in XFS that take MMAPLOCK for two inodes to use helper
    VFS provides for it (filemap_invalidate_down_write_two()). Note that
    this changes lock ordering for MMAPLOCK from inode number based ordering
    to pointer based ordering VFS generally uses.

    CC: "Darrick J. Wong" <djwong@kernel.org>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Jan Kara <jack@suse.cz>

Signed-off-by: Brian Foster <bfoster@redhat.com>
2022-08-25 08:11:20 -04:00
Pavel Reichl 58d7677561 xfs: set prealloc flag in xfs_alloc_file_space()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2085722
Tested: xfstests
Upstream Status: upstream

Now that we only call xfs_update_prealloc_flags() from
xfs_file_fallocate() in the case where we need to set the
preallocation flag, do this in xfs_alloc_file_space() where we
already have the inode joined into a transaction and get
rid of the call to xfs_update_prealloc_flags() from the fallocate
code.

This also means that we now correctly avoid setting the
XFS_DIFLAG_PREALLOC flag when xfs_is_always_cow_inode() is true, as
these inodes will never have preallocated extents.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
(cherry picked from commit 0b02c8c0d75a738c98c35f02efb36217c170d78c)
Signed-off-by: Pavel Reichl <preichl@redhat.com>
2022-07-20 16:01:32 +02:00
Linus Torvalds 9f7b640f00 New code for 5.14:
- Refactor the buffer cache to use bulk page allocation
 - Convert agnumber-based AG iteration to walk per-AG structures
 - Clean up some unit conversions and other code warts
 - Reduce spinlock contention in the directio fastpath
 - Collapse all the inode cache walks into a single function
 - Remove indirect function calls from the inode cache walk code
 - Dramatically reduce the number of cache flushes sent when writing log
   buffers
 - Preserve inode sickness reports for longer
 - Rename xfs_eofblocks since it controls inode cache walks
 - Refactor the extended attribute code to prepare it for the addition
   of log intent items to make xattrs fully transactional
 - A few fixes to earlier large patchsets
 - Log recovery fixes so that we don't accidentally mark the log clean
   when log intent recovery fails
 - Fix some latent SOB errors
 - Clean up shutdown messages that get logged to dmesg
 - Fix a regression in the online shrink code
 - Fix a UAF in the buffer logging code if the fs goes offline
 - Fix uninitialized error variables
 - Fix a UAF in the CIL when commited log item callbacks race with a
   shutdown
 - Fix a bug where the CIL could hang trying to push part of the log ring
   buffer that hasn't been filled yet
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmDXP38ACgkQ+H93GTRK
 tOsKzw//eHvEgeyBo7ek06GDsUph2kQVR9AJWE7MNMiBFxlmL8R9H225xJK7Qmcr
 YswcyEeDq8cNXbXDA249ueuMb+DxhZPY68hPK5BJ3KsbvL2RZV0lJCbk492l4cgb
 IvBJiG/MDo55km83tdr81AlmFYQM7rSQz5MbVogGxxsnp0ul3VpIrJZba8kPRDQ1
 mZzH2fdlnE9Ozw/CfvjSgT1pySyFpxNeTRucYXUQil1hL1AGTBw7rGGNnccS090y
 u/EawQ4WJ131m8O3+WomUmaGyZFlWvTpHzukKxvrEvZ6AG+HpIhMcbZ5J6nkRTY4
 xxhUBG2qNKIcgPmPwAGmx1cylcsOCNKQgp+fko9tAZjEkgT5cbCpqpjGgjNB0RCf
 pB0PY6idCFl9hmBpVgMWz2AZ9IsDmK54qufmLtzq/zN8cThzt6A95UUR0rGu5Kd8
 CUmmdQTYl0GqlTTszCO2rw1+zRtcasMpBVmeYHDxy00bd1dHLUJ6o8DuXRYTTQti
 J/6CZVVD56jieRb+uvrOq4mhiPR2kynciiu1dXdY5kx79kKom6HMBBvtTl8b9kmh
 smWihfip7BTpz5vFzcwFmMxFwzW3K4LnDZl7qEGqXDEIHOL+pRWazU2yN3JZRGyd
 z4SQMJuER0HTTA0yO09c3/CX9onorhjUIMgQ9U25l1hdyFna0+o=
 =08Q9
 -----END PGP SIGNATURE-----

Merge tag 'xfs-5.14-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs updates from Darrick Wong:
 "Most of the work this cycle has been on refactoring various parts of
  the codebase. The biggest non-cleanup changes are (1) reducing the
  number of cache flushes sent when writing the log; (2) a substantial
  number of log recovery fixes; and (3) I started accepting pull
  requests from contributors if the commits in their branches match
  what's been sent to the list.

  For a week or so I /had/ staged a major cleanup of the logging code
  from Dave Chinner, but it exposed so many lurking bugs in other parts
  of the logging and log recovery code that I decided to defer that
  patchset until we can address those latent bugs.

  Larger cleanups this time include walking the incore inode cache (me)
  and rework of the extended attribute code (Allison) to prepare it for
  adding logged xattr updates (and directory tree parent pointers) in
  future releases.

  Summary:

   - Refactor the buffer cache to use bulk page allocation

   - Convert agnumber-based AG iteration to walk per-AG structures

   - Clean up some unit conversions and other code warts

   - Reduce spinlock contention in the directio fastpath

   - Collapse all the inode cache walks into a single function

   - Remove indirect function calls from the inode cache walk code

   - Dramatically reduce the number of cache flushes sent when writing
     log buffers

   - Preserve inode sickness reports for longer

   - Rename xfs_eofblocks since it controls inode cache walks

   - Refactor the extended attribute code to prepare it for the addition
     of log intent items to make xattrs fully transactional

   - A few fixes to earlier large patchsets

   - Log recovery fixes so that we don't accidentally mark the log clean
     when log intent recovery fails

   - Fix some latent SOB errors

   - Clean up shutdown messages that get logged to dmesg

   - Fix a regression in the online shrink code

   - Fix a UAF in the buffer logging code if the fs goes offline

   - Fix uninitialized error variables

   - Fix a UAF in the CIL when commited log item callbacks race with a
     shutdown

   - Fix a bug where the CIL could hang trying to push part of the log
     ring buffer that hasn't been filled yet"

* tag 'xfs-5.14-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (102 commits)
  xfs: don't wait on future iclogs when pushing the CIL
  xfs: Fix a CIL UAF by getting get rid of the iclog callback lock
  xfs: remove callback dequeue loop from xlog_state_do_iclog_callbacks
  xfs: don't nest icloglock inside ic_callback_lock
  xfs: Initialize error in xfs_attr_remove_iter
  xfs: fix endianness issue in xfs_ag_shrink_space
  xfs: remove dead stale buf unpin handling code
  xfs: hold buffer across unpin and potential shutdown processing
  xfs: force the log offline when log intent item recovery fails
  xfs: fix log intent recovery ENOSPC shutdowns when inactivating inodes
  xfs: shorten the shutdown messages to a single line
  xfs: print name of function causing fs shutdown instead of hex pointer
  xfs: fix type mismatches in the inode reclaim functions
  xfs: separate primary inode selection criteria in xfs_iget_cache_hit
  xfs: refactor the inode recycling code
  xfs: add iclog state trace events
  xfs: xfs_log_force_lsn isn't passed a LSN
  xfs: Fix CIL throttle hang when CIL space used going backwards
  xfs: journal IO cache flush reductions
  xfs: remove need_start_rec parameter from xlog_write()
  ...
2021-07-02 14:30:27 -07:00
Linus Torvalds 8ec035ac4a fallthrough fixes for Clang for 5.14-rc1
Hi Linus,
 
 Please, pull the following patches that fix many fall-through warnings
 when building with Clang 12.0.0 and this[1] change reverted. Notice
 that in order to enable -Wimplicit-fallthrough for Clang, such change[1]
 is meant to be reverted at some point. So, these patches help to move
 in that direction.
 
 Thanks!
 
 [1] commit e2079e93f5 ("kbuild: Do not enable -Wimplicit-fallthrough for clang for now")
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEkmRahXBSurMIg1YvRwW0y0cG2zEFAmDaNe8ACgkQRwW0y0cG
 2zFfGA/9G1A/Hrf261/P9olyYe2TRBwLnO1tUDREm3qtJ2JdKpf+7EM3VDm+Ue/A
 qhNmwp5G7nmp7Nqq8MfbdFjeo/rPS67voXiOfO8b0pU+E4XlOc+B1BXL0BWtnP7b
 xvuauklQU6dmCp2u44vsxdBIO6ooR0uQh+7/+1la+mPyEk9mlooQ4lyFcpfA53yt
 zxEGrx0tZBrDXghEI1CkHxOaJaX3qhw4EUYvxe8n2L7Dgx+o2djL/G4/SRYH/xoq
 MZa8TLyCuR3J0Ph4TfDONhMmf8ZLn+j70xBhewcVfZ1JfvGSVw4DQNN44KZCDnrK
 tGsBo5VFksjbmX83LmT8UlqB1rTP4nVQtRmtOPvbQA9kd19yy+Y64Y58FcGU2FHl
 PWt3rQJ1JzBo3TtzQoz7HSJCt9QTil4U7hFbNtcp5BbWQfUPkRgpWcL3FOchZbZ6
 FnLMqHanw2lrKMzZEoyHvg6G7BT67k3rrFgtd/xGSn8ohtfKXaZBYa9PKrQ0LwuG
 o8tQtIX1owj4rbdI1t6Ob4X/tT6Y7DzH8nsF+TsJQ4XeSCD2rURUcYltBMIlEr16
 DFj7iWKIrrX80/JRsBXu7a9h8nn5YptxV12SGRq/Cu/2jfRwjDye4IzsCyqMf67n
 oEN6YC1XYaEUmKXTnI8Z0CxY0qwSTcNjeH5Ci9jWepinsqD3Jxw=
 =Kt2q
 -----END PGP SIGNATURE-----

Merge tag 'fallthrough-fixes-clang-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux

Pull fallthrough fixes from Gustavo Silva:
 "Fix many fall-through warnings when building with Clang 12.0.0 and
  '-Wimplicit-fallthrough' so that we at some point will be able to
  enable that warning by default"

* tag 'fallthrough-fixes-clang-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: (26 commits)
  rxrpc: Fix fall-through warnings for Clang
  drm/nouveau/clk: Fix fall-through warnings for Clang
  drm/nouveau/therm: Fix fall-through warnings for Clang
  drm/nouveau: Fix fall-through warnings for Clang
  xfs: Fix fall-through warnings for Clang
  xfrm: Fix fall-through warnings for Clang
  tipc: Fix fall-through warnings for Clang
  sctp: Fix fall-through warnings for Clang
  rds: Fix fall-through warnings for Clang
  net/packet: Fix fall-through warnings for Clang
  net: netrom: Fix fall-through warnings for Clang
  ide: Fix fall-through warnings for Clang
  hwmon: (max6621) Fix fall-through warnings for Clang
  hwmon: (corsair-cpro) Fix fall-through warnings for Clang
  firewire: core: Fix fall-through warnings for Clang
  braille_console: Fix fall-through warnings for Clang
  ipv4: Fix fall-through warnings for Clang
  qlcnic: Fix fall-through warnings for Clang
  bnxt_en: Fix fall-through warnings for Clang
  netxen_nic: Fix fall-through warnings for Clang
  ...
2021-06-28 20:03:38 -07:00
Darrick J. Wong 20bd8e63f3 xfs: remove unnecessary shifts
The superblock verifier already validates that (1 << blocklog) ==
blocksize, so use the value directly instead of doing math.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
2021-06-01 12:53:59 -07:00
Gustavo A. R. Silva 53004ee78d xfs: Fix fall-through warnings for Clang
In preparation to enable -Wimplicit-fallthrough for Clang, fix
the following warnings by replacing /* fall through */ comments,
and its variants, with the new pseudo-keyword macro fallthrough:

fs/xfs/libxfs/xfs_alloc.c:3167:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/libxfs/xfs_da_btree.c:286:3: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/libxfs/xfs_ag_resv.c:346:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/libxfs/xfs_ag_resv.c:388:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/xfs_bmap_util.c:246:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/xfs_export.c:88:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/xfs_export.c:96:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/xfs_file.c:867:3: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/xfs_ioctl.c:562:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/xfs_ioctl.c:1548:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/xfs_iomap.c:1040:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/xfs_inode.c:852:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/xfs_log.c:2627:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/xfs_trans_buf.c:298:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/scrub/bmap.c:275:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/scrub/btree.c:48:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/scrub/common.c:85:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/scrub/common.c:138:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/scrub/common.c:698:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/scrub/dabtree.c:51:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/scrub/repair.c:951:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/scrub/agheader.c:89:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]

Notice that Clang doesn't recognize /* fall through */ comments as
implicit fall-through markings, so in order to globally enable
-Wimplicit-fallthrough for Clang, these comments need to be
replaced with fallthrough; in the whole codebase.

Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2021-05-26 14:51:26 -05:00
Darrick J. Wong 676a659b60 xfs: retry allocations when locality-based search fails
If a realtime allocation fails because we can't find a sufficiently
large free extent satisfying locality rules, relax the locality rules
and try again.  This reduces the occurrence of short writes to realtime
files when the write size is large and the free space is fragmented.

This was originally discovered by running generic/186 with the realtime
reflink patchset and a 128k cow extent size hint, but the short write
symptoms can manifest with a 128k extent size hint and no reflink, so
apply the fix now.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
2021-05-20 08:28:34 -07:00
Darrick J. Wong 9d5e8492ee xfs: adjust rt allocation minlen when extszhint > rtextsize
xfs_bmap_rtalloc doesn't handle realtime extent files with extent size
hints larger than the rt volume's extent size properly, because
xfs_bmap_extsize_align can adjust the offset/length parameters to try to
fit the extent size hint.

Under these conditions, minlen has to be large enough so that any
allocation returned by xfs_rtallocate_extent will be large enough to
cover at least one of the blocks that the caller asked for.  If the
allocation is too short, bmapi_write will return no mapping for the
requested range, which causes ENOSPC errors in other parts of the
filesystem.

Therefore, adjust minlen upwards to fix this.  This can be found by
running generic/263 (g/127 or g/522) with a realtime extent size hint
that's larger than the rt volume extent size.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
2021-05-16 18:45:03 -07:00
Christoph Hellwig b2197a36c0 xfs: remove XFS_IFEXTENTS
The in-memory XFS_IFEXTENTS is now only used to check if an inode with
extents still needs the extents to be read into memory before doing
operations that need the extent map.  Add a new xfs_need_iread_extents
helper that returns true for btree format forks that do not have any
entries in the in-memory extent btree, and use that instead of checking
the XFS_IFEXTENTS flag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-15 09:35:51 -07:00
Christoph Hellwig 862a804aae xfs: move the XFS_IFEXTENTS check into xfs_iread_extents
Move the XFS_IFEXTENTS check from the callers into xfs_iread_extents to
simplify the code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-15 09:35:50 -07:00
Darrick J. Wong 7d88329e5b xfs: move the check for post-EOF mappings into xfs_can_free_eofblocks
Fix the weird split of responsibilities between xfs_can_free_eofblocks
and xfs_free_eofblocks by moving the chunk of code that looks for any
actual post-EOF space mappings from the second function into the first.

This clears the way for deferred inode inactivation to be able to decide
if an inode needs inactivation work before committing the released inode
to the inactivation code paths (vs. marking it for reclaim).

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-04-07 14:38:21 -07:00
Christoph Hellwig 3e09ab8fdc xfs: move the di_flags2 field to struct xfs_inode
In preparation of removing the historic icinode struct, move the flags2
field into the containing xfs_inode structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-07 14:37:05 -07:00
Christoph Hellwig db07349da2 xfs: move the di_flags field to struct xfs_inode
In preparation of removing the historic icinode struct, move the flags
field into the containing xfs_inode structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-07 14:37:05 -07:00
Christoph Hellwig 6e73a545f9 xfs: move the di_nblocks field to struct xfs_inode
In preparation of removing the historic icinode struct, move the nblocks
field into the containing xfs_inode structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-07 14:37:03 -07:00
Christoph Hellwig 13d2c10b05 xfs: move the di_size field to struct xfs_inode
In preparation of removing the historic icinode struct, move the on-disk
size field into the containing xfs_inode structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-07 14:37:03 -07:00
Christoph Hellwig ceaf603c70 xfs: move the di_projid field to struct xfs_inode
In preparation of removing the historic icinode struct, move the projid
field into the containing xfs_inode structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-07 14:37:03 -07:00
Darrick J. Wong 3de4eb106f xfs: allow reservation of rtblocks with xfs_trans_alloc_inode
Make it so that we can reserve rt blocks with the xfs_trans_alloc_inode
wrapper function, then convert a few more callsites.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2021-02-03 09:18:49 -08:00
Darrick J. Wong 3a1af6c317 xfs: refactor common transaction/inode/quota allocation idiom
Create a new helper xfs_trans_alloc_inode that allocates a transaction,
locks and joins an inode to it, and then reserves the appropriate amount
of quota against that transction.  Then replace all the open-coded
idioms with a single call to this helper.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2021-02-03 09:18:49 -08:00
Darrick J. Wong 02b7ee4eb6 xfs: reserve data and rt quota at the same time
Modify xfs_trans_reserve_quota_nblks so that we can reserve data and
realtime blocks from the dquot at the same time.  This change has the
theoretical side effect that for allocations to realtime files we will
reserve from the dquot both the number of rtblocks being allocated and
the number of bmbt blocks that might be needed to add the mapping.
However, since the mount code disables quota if it finds a realtime
device, this should not result in any behavior changes.

Now that we've moved the inode creation callers away from using the
_nblks function, we can repurpose the (now unused) ninos argument for
realtime blocks, so make that change.  This also replaces the flags
argument with a boolean parameter to force the reservation since we
don't need to distinguish between data and rt quota reservations any
more, and the only flag being passed in was FORCE_RES.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2021-02-03 09:18:49 -08:00
Darrick J. Wong 35b1101099 xfs: remove xfs_trans_unreserve_quota_nblks completely
xfs_trans_cancel will release all the quota resources that were reserved
on behalf of the transaction, so get rid of the explicit unreserve step.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2021-02-03 09:18:49 -08:00