Commit Graph

104 Commits

Author SHA1 Message Date
Bill O'Donnell e21282d525 xfs: remove struct xfs_attr_shortform
JIRA: https://issues.redhat.com/browse/RHEL-65728

commit 414147225400a0c4562ebfb0fdd40f065099ede4
Author: Christoph Hellwig <hch@lst.de>
Date:   Wed Dec 20 07:35:01 2023 +0100

    xfs: remove struct xfs_attr_shortform

    sparse complains about struct xfs_attr_shortform because it embeds a
    structure with a variable sized array in a variable sized array.

    Given that xfs_attr_shortform is not a very useful structure, and the
    dir2 equivalent has been removed a long time ago, remove it as well.

    Provide a xfs_attr_sf_firstentry helper that returns the first
    xfs_attr_sf_entry behind a xfs_attr_sf_hdr to replace the structure
    dereference.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-11-20 11:26:19 -06:00
Bill O'Donnell 5718b6bb4d xfs: return if_data from xfs_idata_realloc
JIRA: https://issues.redhat.com/browse/RHEL-65728

commit 45c76a2add55b332d965c901e14004ae0134a67e
Author: Christoph Hellwig <hch@lst.de>
Date:   Wed Dec 20 07:34:56 2023 +0100

    xfs: return if_data from xfs_idata_realloc

    Many of the xfs_idata_realloc callers need to set a local pointer to the
    just reallocated if_data memory.  Return the pointer to simplify them a
    bit and use the opportunity to re-use krealloc for freeing if_data if the
    size hits 0.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-11-20 11:26:18 -06:00
Bill O'Donnell 86c0442471 xfs: make if_data a void pointer
JIRA: https://issues.redhat.com/browse/RHEL-65728

commit 6e145f943bd86be47e54101fa5939f9ed0cb73e5
Author: Christoph Hellwig <hch@lst.de>
Date:   Wed Dec 20 07:34:55 2023 +0100

    xfs: make if_data a void pointer

    The xfs_ifork structure currently has a union of the if_root void pointer
    and the if_data char pointer.  In either case it is an opaque pointer
    that depends on the fork format.  Replace the union with a single if_data
    void pointer as that is what almost all callers want.  Only the symlink
    NULL termination code in xfs_init_local_fork actually needs a new local
    variable now.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-11-20 11:26:17 -06:00
Bill O'Donnell 16bfa41bdf xfs: repair inode fork block mapping data structures
JIRA: https://issues.redhat.com/browse/RHEL-65728

commit 8f71bede8efd820627ac05c19eac2758214bc896
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Fri Dec 15 10:03:39 2023 -0800

    xfs: repair inode fork block mapping data structures

    Use the reverse-mapping btree information to rebuild an inode block map.
    Update the btree bulk loading code as necessary to support inode rooted
    btrees and fix some bitrot problems.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-11-20 11:26:07 -06:00
Bill O'Donnell 070bdf384b xfs: zap broken inode forks
JIRA: https://issues.redhat.com/browse/RHEL-65728

commit e744cef206055954517648070d2b3aaa3d2515ba
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Fri Dec 15 10:03:37 2023 -0800

    xfs: zap broken inode forks

    Determine if inode fork damage is responsible for the inode being unable
    to pass the ifork verifiers in xfs_iget and zap the fork contents if
    this is true.  Once this is done the fork will be empty but we'll be
    able to construct an in-core inode, and a subsequent call to the inode
    fork repair ioctl will search the rmapbt to rebuild the records that
    were in the fork.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-11-20 11:26:06 -06:00
Bill O'Donnell a8cc7b7360 xfs: _{attr,data}_map_shared should take ILOCK_EXCL until iread_extents is completely done
JIRA: https://issues.redhat.com/browse/RHEL-25419

commit c95356ca884885db702670e24933ee7f2b9f1754
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Wed Apr 12 15:49:10 2023 +1000

    xfs: _{attr,data}_map_shared should take ILOCK_EXCL until iread_extents is completely done

    While fuzzing the data fork extent count on a btree-format directory
    with xfs/375, I observed the following (excerpted) splat:

    XFS: Assertion failed: xfs_isilocked(ip, XFS_ILOCK_EXCL), file: fs/xfs/libxfs/xfs_bmap.c, line: 1208
    ------------[ cut here ]------------
    WARNING: CPU: 0 PID: 43192 at fs/xfs/xfs_message.c:104 assfail+0x46/0x4a [xfs]
    Call Trace:
     <TASK>
     xfs_iread_extents+0x1af/0x210 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
     xchk_dir_walk+0xb8/0x190 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
     xchk_parent_count_parent_dentries+0x41/0x80 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
     xchk_parent_validate+0x199/0x2e0 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
     xchk_parent+0xdf/0x130 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
     xfs_scrub_metadata+0x2b8/0x730 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
     xfs_scrubv_metadata+0x38b/0x4d0 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
     xfs_ioc_scrubv_metadata+0x111/0x160 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
     xfs_file_ioctl+0x367/0xf50 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
     __x64_sys_ioctl+0x82/0xa0
     do_syscall_64+0x2b/0x80
     entry_SYSCALL_64_after_hwframe+0x46/0xb0

    The cause of this is a race condition in xfs_ilock_data_map_shared,
    which performs an unlocked access to the data fork to guess which lock
    mode it needs:

    Thread 0                          Thread 1

    xfs_need_iread_extents
    <observe no iext tree>
    xfs_ilock(..., ILOCK_EXCL)
    xfs_iread_extents
    <observe no iext tree>
    <check ILOCK_EXCL>
    <load bmbt extents into iext>
    <notice iext size doesn't
     match nextents>
                                      xfs_need_iread_extents
                                      <observe iext tree>
                                      xfs_ilock(..., ILOCK_SHARED)
    <tear down iext tree>
    xfs_iunlock(..., ILOCK_EXCL)
                                      xfs_iread_extents
                                      <observe no iext tree>
                                      <check ILOCK_EXCL>
                                      *BOOM*

    Fix this race by adding a flag to the xfs_ifork structure to indicate
    that we have not yet read in the extent records and changing the
    predicate to look at the flag state, not if_height.  The memory barrier
    ensures that the flag will not be set until the very end of the
    function.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Dave Chinner <david@fromorbit.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-06-06 10:32:48 -05:00
Bill O'Donnell f3724d2a82 xfs: complain about bad file mapping records in the ondisk bmbt
JIRA: https://issues.redhat.com/browse/RHEL-25419

commit 6a3bd8fcf9afb47c703cb268f30f60aa2e7af86a
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Tue Apr 11 19:00:05 2023 -0700

    xfs: complain about bad file mapping records in the ondisk bmbt

    Similar to what we've just done for the other btrees, create a function
    to log corrupt bmbt records and call it whenever we encounter a bad
    record in the ondisk btree.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-06-05 16:56:18 -05:00
Bill O'Donnell f82d4529ed xfs: clean up "%Ld/%Lu" which doesn't meet C standard
JIRA: https://issues.redhat.com/browse/RHEL-2002

commit 78b0f58bdfef45aa9f3c7fbbd9b4d41abad6d85f
Author: Zeng Heng <zengheng4@huawei.com>
Date:   Mon Sep 19 06:47:14 2022 +1000

    xfs: clean up "%Ld/%Lu" which doesn't meet C standard

    The "%Ld" specifier, which represents long long unsigned,
    doesn't meet C language standard, and even more,
    it makes people easily mistake with "%ld", which represent
    long unsigned. So replace "%Ld" with "lld".

    Do the same with "%Lu".

    Signed-off-by: Zeng Heng <zengheng4@huawei.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Dave Chinner <david@fromorbit.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-11-06 19:27:41 -06:00
Bill O'Donnell ba8109db31 xfs: don't leak memory when attr fork loading fails
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit c78c2d0903183a41beb90c56a923e30f90fa91b9
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Tue Jul 19 09:14:55 2022 -0700

    xfs: don't leak memory when attr fork loading fails

    I observed the following evidence of a memory leak while running xfs/399
    from the xfs fsck test suite (edited for brevity):

    XFS (sde): Metadata corruption detected at xfs_attr_shortform_verify_struct.part.0+0x7b/0xb0 [xfs], inode 0x1172 attr fork
    XFS: Assertion failed: ip->i_af.if_u1.if_data == NULL, file: fs/xfs/libxfs/xfs_inode_fork.c, line: 315
    ------------[ cut here ]------------
    WARNING: CPU: 2 PID: 91635 at fs/xfs/xfs_message.c:104 assfail+0x46/0x4a [xfs]
    CPU: 2 PID: 91635 Comm: xfs_scrub Tainted: G        W         5.19.0-rc7-xfsx #rc7 6e6475eb29fd9dda3181f81b7ca7ff961d277a40
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
    RIP: 0010:assfail+0x46/0x4a [xfs]
    Call Trace:
     <TASK>
     xfs_ifork_zap_attr+0x7c/0xb0
     xfs_iformat_attr_fork+0x86/0x110
     xfs_inode_from_disk+0x41d/0x480
     xfs_iget+0x389/0xd70
     xfs_bulkstat_one_int+0x5b/0x540
     xfs_bulkstat_iwalk+0x1e/0x30
     xfs_iwalk_ag_recs+0xd1/0x160
     xfs_iwalk_run_callbacks+0xb9/0x180
     xfs_iwalk_ag+0x1d8/0x2e0
     xfs_iwalk+0x141/0x220
     xfs_bulkstat+0x105/0x180
     xfs_ioc_bulkstat.constprop.0.isra.0+0xc5/0x130
     xfs_file_ioctl+0xa5f/0xef0
     __x64_sys_ioctl+0x82/0xa0
     do_syscall_64+0x2b/0x80
     entry_SYSCALL_64_after_hwframe+0x46/0xb0

    This newly-added assertion checks that there aren't any incore data
    structures hanging off the incore fork when we're trying to reset its
    contents.  From the call trace, it is evident that iget was trying to
    construct an incore inode from the ondisk inode, but the attr fork
    verifier failed and we were trying to undo all the memory allocations
    that we had done earlier.

    The three assertions in xfs_ifork_zap_attr check that the caller has
    already called xfs_idestroy_fork, which clearly has not been done here.
    As the zap function then zeroes the pointers, we've effectively leaked
    the memory.

    The shortest change would have been to insert an extra call to
    xfs_idestroy_fork, but it makes more sense to bundle the _idestroy_fork
    call into _zap_attr, since all other callsites call _idestroy_fork
    immediately prior to calling _zap_attr.  IOWs, it eliminates one way to
    fail.

    Note: This change only applies cleanly to 2ed5b09b3e8f, since we just
    reworked the attr fork lifetime.  However, I think this memory leak has
    existed since 0f45a1b20c, since the chain xfs_iformat_attr_fork ->
    xfs_iformat_local -> xfs_init_local_fork will allocate
    ifp->if_u1.if_data, but if xfs_ifork_verify_local_attr fails,
    xfs_iformat_attr_fork will free i_afp without freeing any of the stuff
    hanging off i_afp.  The solution for older kernels I think is to add the
    missing call to xfs_idestroy_fork just prior to calling kmem_cache_free.

    Found by fuzzing a.sfattr.hdr.totsize = lastbit in xfs/399.

    Fixes: 2ed5b09b3e8f ("xfs: make inode attribute forks a permanent part of struct xfs_inode")
    Probably-Fixes: 0f45a1b20c ("xfs: improve local fork verification")
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:48 -05:00
Bill O'Donnell c8ddf398ff xfs: delete unnecessary NULL checks
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 3f52e016af600982989b5dee958d313c52483c92
Author: Dan Carpenter <error27@gmail.com>
Date:   Mon Jul 18 10:13:48 2022 -0700

    xfs: delete unnecessary NULL checks

    These NULL check are no long needed after commit 2ed5b09b3e8f ("xfs:
    make inode attribute forks a permanent part of struct xfs_inode").

    Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:47 -05:00
Bill O'Donnell 3219617b1b xfs: replace inode fork size macros with functions
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit c01147d929899f02a0a8b15e406d12784768ca72
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Sat Jul 9 10:56:07 2022 -0700

    xfs: replace inode fork size macros with functions

    Replace the shouty macros here with typechecked helper functions.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:43 -05:00
Bill O'Donnell f77675b5d0 xfs: replace XFS_IFORK_Q with a proper predicate function
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 932b42c66cb5d0ca9800b128415b4ad6b1952b3e
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Sat Jul 9 10:56:06 2022 -0700

    xfs: replace XFS_IFORK_Q with a proper predicate function

    Replace this shouty macro with a real C function that has a more
    descriptive name.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:43 -05:00
Bill O'Donnell 0036098801 xfs: use XFS_IFORK_Q to determine the presence of an xattr fork
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit e45d7cb2356e6b59fe64da28324025cc6fcd3fbd
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Sat Jul 9 10:56:06 2022 -0700

    xfs: use XFS_IFORK_Q to determine the presence of an xattr fork

    Modify xfs_ifork_ptr to return a NULL pointer if the caller asks for the
    attribute fork but i_forkoff is zero.  This eliminates the ambiguity
    between i_forkoff and i_af.if_present, which should make it easier to
    understand the lifetime of attr forks.

    While we're at it, remove the if_present checks around calls to
    xfs_idestroy_fork and xfs_ifork_zap_attr since they can both handle attr
    forks that have already been torn down.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:43 -05:00
Bill O'Donnell a2d362f29a xfs: make inode attribute forks a permanent part of struct xfs_inode
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

Conflicts: previous out of order application of 5625ea0 requires minor adjust to xfs_iomap.c

commit 2ed5b09b3e8fc274ae8fecd6ab7c5106a364bed1
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Sat Jul 9 10:56:06 2022 -0700

    xfs: make inode attribute forks a permanent part of struct xfs_inode

    Syzkaller reported a UAF bug a while back:

    ==================================================================
    BUG: KASAN: use-after-free in xfs_ilock_attr_map_shared+0xe3/0xf6 fs/xfs/xfs_inode.c:127
    Read of size 4 at addr ffff88802cec919c by task syz-executor262/2958

    CPU: 2 PID: 2958 Comm: syz-executor262 Not tainted
    5.15.0-0.30.3-20220406_1406 #3
    Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7860+a7792d29
    04/01/2014
    Call Trace:
     <TASK>
     __dump_stack lib/dump_stack.c:88 [inline]
     dump_stack_lvl+0x82/0xa9 lib/dump_stack.c:106
     print_address_description.constprop.9+0x21/0x2d5 mm/kasan/report.c:256
     __kasan_report mm/kasan/report.c:442 [inline]
     kasan_report.cold.14+0x7f/0x11b mm/kasan/report.c:459
     xfs_ilock_attr_map_shared+0xe3/0xf6 fs/xfs/xfs_inode.c:127
     xfs_attr_get+0x378/0x4c2 fs/xfs/libxfs/xfs_attr.c:159
     xfs_xattr_get+0xe3/0x150 fs/xfs/xfs_xattr.c:36
     __vfs_getxattr+0xdf/0x13d fs/xattr.c:399
     cap_inode_need_killpriv+0x41/0x5d security/commoncap.c:300
     security_inode_need_killpriv+0x4c/0x97 security/security.c:1408
     dentry_needs_remove_privs.part.28+0x21/0x63 fs/inode.c:1912
     dentry_needs_remove_privs+0x80/0x9e fs/inode.c:1908
     do_truncate+0xc3/0x1e0 fs/open.c:56
     handle_truncate fs/namei.c:3084 [inline]
     do_open fs/namei.c:3432 [inline]
     path_openat+0x30ab/0x396d fs/namei.c:3561
     do_filp_open+0x1c4/0x290 fs/namei.c:3588
     do_sys_openat2+0x60d/0x98c fs/open.c:1212
     do_sys_open+0xcf/0x13c fs/open.c:1228
     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
     do_syscall_64+0x3a/0x7e arch/x86/entry/common.c:80
     entry_SYSCALL_64_after_hwframe+0x44/0x0
    RIP: 0033:0x7f7ef4bb753d
    Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48
    89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73
    01 c3 48 8b 0d 1b 79 2c 00 f7 d8 64 89 01 48
    RSP: 002b:00007f7ef52c2ed8 EFLAGS: 00000246 ORIG_RAX: 0000000000000055
    RAX: ffffffffffffffda RBX: 0000000000404148 RCX: 00007f7ef4bb753d
    RDX: 00007f7ef4bb753d RSI: 0000000000000000 RDI: 0000000020004fc0
    RBP: 0000000000404140 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 0030656c69662f2e
    R13: 00007ffd794db37f R14: 00007ffd794db470 R15: 00007f7ef52c2fc0
     </TASK>

    Allocated by task 2953:
     kasan_save_stack+0x19/0x38 mm/kasan/common.c:38
     kasan_set_track mm/kasan/common.c:46 [inline]
     set_alloc_info mm/kasan/common.c:434 [inline]
     __kasan_slab_alloc+0x68/0x7c mm/kasan/common.c:467
     kasan_slab_alloc include/linux/kasan.h:254 [inline]
     slab_post_alloc_hook mm/slab.h:519 [inline]
     slab_alloc_node mm/slub.c:3213 [inline]
     slab_alloc mm/slub.c:3221 [inline]
     kmem_cache_alloc+0x11b/0x3eb mm/slub.c:3226
     kmem_cache_zalloc include/linux/slab.h:711 [inline]
     xfs_ifork_alloc+0x25/0xa2 fs/xfs/libxfs/xfs_inode_fork.c:287
     xfs_bmap_add_attrfork+0x3f2/0x9b1 fs/xfs/libxfs/xfs_bmap.c:1098
     xfs_attr_set+0xe38/0x12a7 fs/xfs/libxfs/xfs_attr.c:746
     xfs_xattr_set+0xeb/0x1a9 fs/xfs/xfs_xattr.c:59
     __vfs_setxattr+0x11b/0x177 fs/xattr.c:180
     __vfs_setxattr_noperm+0x128/0x5e0 fs/xattr.c:214
     __vfs_setxattr_locked+0x1d4/0x258 fs/xattr.c:275
     vfs_setxattr+0x154/0x33d fs/xattr.c:301
     setxattr+0x216/0x29f fs/xattr.c:575
     __do_sys_fsetxattr fs/xattr.c:632 [inline]
     __se_sys_fsetxattr fs/xattr.c:621 [inline]
     __x64_sys_fsetxattr+0x243/0x2fe fs/xattr.c:621
     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
     do_syscall_64+0x3a/0x7e arch/x86/entry/common.c:80
     entry_SYSCALL_64_after_hwframe+0x44/0x0

    Freed by task 2949:
     kasan_save_stack+0x19/0x38 mm/kasan/common.c:38
     kasan_set_track+0x1c/0x21 mm/kasan/common.c:46
     kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:360
     ____kasan_slab_free mm/kasan/common.c:366 [inline]
     ____kasan_slab_free mm/kasan/common.c:328 [inline]
     __kasan_slab_free+0xe2/0x10e mm/kasan/common.c:374
     kasan_slab_free include/linux/kasan.h:230 [inline]
     slab_free_hook mm/slub.c:1700 [inline]
     slab_free_freelist_hook mm/slub.c:1726 [inline]
     slab_free mm/slub.c:3492 [inline]
     kmem_cache_free+0xdc/0x3ce mm/slub.c:3508
     xfs_attr_fork_remove+0x8d/0x132 fs/xfs/libxfs/xfs_attr_leaf.c:773
     xfs_attr_sf_removename+0x5dd/0x6cb fs/xfs/libxfs/xfs_attr_leaf.c:822
     xfs_attr_remove_iter+0x68c/0x805 fs/xfs/libxfs/xfs_attr.c:1413
     xfs_attr_remove_args+0xb1/0x10d fs/xfs/libxfs/xfs_attr.c:684
     xfs_attr_set+0xf1e/0x12a7 fs/xfs/libxfs/xfs_attr.c:802
     xfs_xattr_set+0xeb/0x1a9 fs/xfs/xfs_xattr.c:59
     __vfs_removexattr+0x106/0x16a fs/xattr.c:468
     cap_inode_killpriv+0x24/0x47 security/commoncap.c:324
     security_inode_killpriv+0x54/0xa1 security/security.c:1414
     setattr_prepare+0x1a6/0x897 fs/attr.c:146
     xfs_vn_change_ok+0x111/0x15e fs/xfs/xfs_iops.c:682
     xfs_vn_setattr_size+0x5f/0x15a fs/xfs/xfs_iops.c:1065
     xfs_vn_setattr+0x125/0x2ad fs/xfs/xfs_iops.c:1093
     notify_change+0xae5/0x10a1 fs/attr.c:410
     do_truncate+0x134/0x1e0 fs/open.c:64
     handle_truncate fs/namei.c:3084 [inline]
     do_open fs/namei.c:3432 [inline]
     path_openat+0x30ab/0x396d fs/namei.c:3561
     do_filp_open+0x1c4/0x290 fs/namei.c:3588
     do_sys_openat2+0x60d/0x98c fs/open.c:1212
     do_sys_open+0xcf/0x13c fs/open.c:1228
     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
     do_syscall_64+0x3a/0x7e arch/x86/entry/common.c:80
     entry_SYSCALL_64_after_hwframe+0x44/0x0

    The buggy address belongs to the object at ffff88802cec9188
     which belongs to the cache xfs_ifork of size 40
    The buggy address is located 20 bytes inside of
     40-byte region [ffff88802cec9188, ffff88802cec91b0)
    The buggy address belongs to the page:
    page:00000000c3af36a1 refcount:1 mapcount:0 mapping:0000000000000000
    index:0x0 pfn:0x2cec9
    flags: 0xfffffc0000200(slab|node=0|zone=1|lastcpupid=0x1fffff)
    raw: 000fffffc0000200 ffffea00009d2580 0000000600000006 ffff88801a9ffc80
    raw: 0000000000000000 0000000080490049 00000001ffffffff 0000000000000000
    page dumped because: kasan: bad access detected

    Memory state around the buggy address:
     ffff88802cec9080: fb fb fb fc fc fa fb fb fb fb fc fc fb fb fb fb
     ffff88802cec9100: fb fc fc fb fb fb fb fb fc fc fb fb fb fb fb fc
    >ffff88802cec9180: fc fa fb fb fb fb fc fc fa fb fb fb fb fc fc fb
                                ^
     ffff88802cec9200: fb fb fb fb fc fc fb fb fb fb fb fc fc fb fb fb
     ffff88802cec9280: fb fb fc fc fa fb fb fb fb fc fc fa fb fb fb fb
    ==================================================================

    The root cause of this bug is the unlocked access to xfs_inode.i_afp
    from the getxattr code paths while trying to determine which ILOCK mode
    to use to stabilize the xattr data.  Unfortunately, the VFS does not
    acquire i_rwsem when vfs_getxattr (or listxattr) call into the
    filesystem, which means that getxattr can race with a removexattr that's
    tearing down the attr fork and crash:

    xfs_attr_set:                          xfs_attr_get:
    xfs_attr_fork_remove:                  xfs_ilock_attr_map_shared:

    xfs_idestroy_fork(ip->i_afp);
    kmem_cache_free(xfs_ifork_cache, ip->i_afp);

                                           if (ip->i_afp &&

    ip->i_afp = NULL;

                                               xfs_need_iread_extents(ip->i_afp))
                                           <KABOOM>

    ip->i_forkoff = 0;

    Regrettably, the VFS is much more lax about i_rwsem and getxattr than
    is immediately obvious -- not only does it not guarantee that we hold
    i_rwsem, it actually doesn't guarantee that we *don't* hold it either.
    The getxattr system call won't acquire the lock before calling XFS, but
    the file capabilities code calls getxattr with and without i_rwsem held
    to determine if the "security.capabilities" xattr is set on the file.

    Fixing the VFS locking requires a treewide investigation into every code
    path that could touch an xattr and what i_rwsem state it expects or sets
    up.  That could take years or even prove impossible; fortunately, we
    can fix this UAF problem inside XFS.

    An earlier version of this patch used smp_wmb in xfs_attr_fork_remove to
    ensure that i_forkoff is always zeroed before i_afp is set to null and
    changed the read paths to use smp_rmb before accessing i_forkoff and
    i_afp, which avoided these UAF problems.  However, the patch author was
    too busy dealing with other problems in the meantime, and by the time he
    came back to this issue, the situation had changed a bit.

    On a modern system with selinux, each inode will always have at least
    one xattr for the selinux label, so it doesn't make much sense to keep
    incurring the extra pointer dereference.  Furthermore, Allison's
    upcoming parent pointer patchset will also cause nearly every inode in
    the filesystem to have extended attributes.  Therefore, make the inode
    attribute fork structure part of struct xfs_inode, at a cost of 40 more
    bytes.

    This patch adds a clunky if_present field where necessary to maintain
    the existing logic of xattr fork null pointer testing in the existing
    codebase.  The next patch switches the logic over to XFS_IFORK_Q and it
    all goes away.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:42 -05:00
Bill O'Donnell 08529f7680 xfs: convert XFS_IFORK_PTR to a static inline helper
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 732436ef916b4f338d672ea56accfdb11e8d0732
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Sat Jul 9 10:56:05 2022 -0700

    xfs: convert XFS_IFORK_PTR to a static inline helper

    We're about to make this logic do a bit more, so convert the macro to a
    static inline function for better typechecking and fewer shouty macros.
    No functional changes here.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:42 -05:00
Bill O'Donnell 70a12f1b9f xfs: hide log iovec alignment constraints
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit b2c28035cea290edbcec697504e5b7a4b1e023e7
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed May 4 11:45:50 2022 +1000

    xfs: hide log iovec alignment constraints

    Callers currently have to round out the size of buffers to match the
    aligment constraints of log iovecs and xlog_write(). They should not
    need to know this detail, so introduce a new function to calculate
    the iovec length (for use in ->iop_size implementations). Also
    modify xlog_finish_iovec() to round up the length to the correct
    alignment so the callers don't need to do this, either.

    Convert the only user - inode forks - of this alignment rounding to
    use the new interface.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Dave Chinner <david@fromorbit.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:12 -05:00
Bill O'Donnell 5f92f7b858 xfs: zero inode fork buffer at allocation
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit cb512c921639613ce03f87e62c5e93ed9fe8c84d
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed May 4 11:44:55 2022 +1000

    xfs: zero inode fork buffer at allocation

    When we first allocate or resize an inline inode fork, we round up
    the allocation to 4 byte alingment to make journal alignment
    constraints. We don't clear the unused bytes, so we can copy up to
    three uninitialised bytes into the journal. Zero those bytes so we
    only ever copy zeros into the journal.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
    Signed-off-by: Dave Chinner <david@fromorbit.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:11:12 -05:00
Bill O'Donnell cf7ff3302c xfs: Conditionally upgrade existing inodes to use large extent counters
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 4f86bb4b66c999ad9ddcfd49fec93992eeba2715
Author: Chandan Babu R <chandan.babu@oracle.com>
Date:   Wed Mar 9 07:49:36 2022 +0000

    xfs: Conditionally upgrade existing inodes to use large extent counters

    This commit enables upgrading existing inodes to use large extent counters
    provided that underlying filesystem's superblock has large extent counter
    feature enabled.

    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:10:59 -05:00
Bill O'Donnell 014ed3670e xfs: Introduce macros to represent new maximum extent counts for data/attr forks
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit df9ad5cc7a524048ea7ff983d6feeb6d8c47a761
Author: Chandan Babu R <chandan.babu@oracle.com>
Date:   Tue Nov 16 09:54:37 2021 +0000

    xfs: Introduce macros to represent new maximum extent counts for data/attr forks

    This commit defines new macros to represent maximum extent counts allowed by
    filesystems which have support for large per-inode extent counters.

    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:10:58 -05:00
Bill O'Donnell 0dc526ef61 xfs: Promote xfs_extnum_t and xfs_aextnum_t to 64 and 32-bits respectively
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

commit 755c38ffe1a5937d8fa03419018f49f3a23fa9a7
Author: Chandan Babu R <chandan.babu@oracle.com>
Date:   Tue Nov 16 07:28:40 2021 +0000

    xfs: Promote xfs_extnum_t and xfs_aextnum_t to 64 and 32-bits respectively

    A future commit will introduce a 64-bit on-disk data extent counter and a
    32-bit on-disk attr extent counter. This commit promotes xfs_extnum_t and
    xfs_aextnum_t to 64 and 32-bits in order to correctly handle in-core versions
    of these quantities.

    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:10:56 -05:00
Bill O'Donnell 02dc3bf866 xfs: Introduce xfs_dfork_nextents() helper
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

Conflicts: minor line adjustment in xfs_inode_buf.c (previous out-of-order patch application)

commit dd95a6ce31d6441dfd5fd3aa5d7208b0fc61782f
Author: Chandan Babu R <chandan.babu@oracle.com>
Date:   Thu Aug 27 15:34:34 2020 +0530

    xfs: Introduce xfs_dfork_nextents() helper

    This commit replaces the macro XFS_DFORK_NEXTENTS() with the helper function
    xfs_dfork_nextents(). As of this commit, xfs_dfork_nextents() returns the same
    value as XFS_DFORK_NEXTENTS(). A future commit which extends inode's extent
    counter fields will add more logic to this helper.

    This commit also replaces direct accesses to xfs_dinode->di_[a]nextents
    with calls to xfs_dfork_nextents().

    No functional changes have been made.

    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:10:56 -05:00
Bill O'Donnell 3ec2354520 xfs: Use xfs_extnum_t instead of basic data types
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

Conflicts: minor line adjustment in xfs_inode_buf.c (previous out-of-order patch application)

commit bb1d50494cbdd9c5991ddc7feeeb14982872b2a8
Author: Chandan Babu R <chandan.babu@oracle.com>
Date:   Fri Feb 26 11:24:31 2021 +0530

    xfs: Use xfs_extnum_t instead of basic data types

    xfs_extnum_t is the type to use to declare variables which have values
    obtained from xfs_dinode->di_[a]nextents. This commit replaces basic
    types (e.g. uint32_t) with xfs_extnum_t for such variables.

    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:10:56 -05:00
Bill O'Donnell 118a1c9d62 xfs: Introduce xfs_iext_max_nextents() helper
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832

Conflicts: minor line adjustment in xfs_inode_buf.c (previous out-of-order patch application)

commit 9feb8f19665c8ba051c6a81aa7897149e7748e1e
Author: Chandan Babu R <chandan.babu@oracle.com>
Date:   Thu Aug 27 15:09:10 2020 +0530

    xfs: Introduce xfs_iext_max_nextents() helper

    xfs_iext_max_nextents() returns the maximum number of extents possible for one
    of data, cow or attribute fork. This helper will be extended further in a
    future commit when maximum extent counts associated with data/attribute forks
    are increased.

    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-05-18 11:10:55 -05:00
Carlos Maiolino 25a40d32f8 xfs: rename _zone variables to _cache
Bugzilla: https://bugzilla.redhat.com/2125724

Conflicts:
	Small conflict at xfs_inode_alloc() due to out of order
	backport. Inode alloc using kmem_cache_alloc() has been
	converted to use alloc_inode_sb() before this patch.

Now that we've gotten rid of the kmem_zone_t typedef, rename the
variables to _cache since that's what they are.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
(cherry picked from commit 182696fb021fc196e5cbe641565ca40fcf0f885a)
2022-10-21 12:50:46 +02:00
Carlos Maiolino d912d565bb xfs: remove kmem_zone typedef
Bugzilla: https://bugzilla.redhat.com/2125724

Remove these typedefs by referencing kmem_cache directly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
(cherry picked from commit e7720afad068a6729d9cd3aaa08212f2f5a7ceff)
2022-10-21 12:50:46 +02:00
Carlos Maiolino f4f8d445c0 xfs: remove the xfs_dinode_t typedef
Bugzilla: https://bugzilla.redhat.com/2125724

Remove the few leftover instances of the xfs_dinode_t typedef.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
(cherry picked from commit de38db7239c4bd2f37ebfcb8a5f22b4e8e657737)
2022-10-21 12:50:46 +02:00
Christoph Hellwig b2197a36c0 xfs: remove XFS_IFEXTENTS
The in-memory XFS_IFEXTENTS is now only used to check if an inode with
extents still needs the extents to be read into memory before doing
operations that need the extent map.  Add a new xfs_need_iread_extents
helper that returns true for btree format forks that do not have any
entries in the in-memory extent btree, and use that instead of checking
the XFS_IFEXTENTS flag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-15 09:35:51 -07:00
Christoph Hellwig 0779f4a68d xfs: remove XFS_IFINLINE
Just check for an inline format fork instead of the using the equivalent
in-memory XFS_IFINLINE flag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-15 09:35:51 -07:00
Christoph Hellwig ac1e067211 xfs: remove XFS_IFBROOT
Just check for a btree format fork instead of the using the equivalent
in-memory XFS_IFBROOT flag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-15 09:35:51 -07:00
Christoph Hellwig 0eba048dd3 xfs: only look at the fork format in xfs_idestroy_fork
Stop using the XFS_IFEXTENTS flag, and instead switch on the fork format
in xfs_idestroy_fork to decide how to cleanup.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-15 09:35:50 -07:00
Christoph Hellwig 6e73a545f9 xfs: move the di_nblocks field to struct xfs_inode
In preparation of removing the historic icinode struct, move the nblocks
field into the containing xfs_inode structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-07 14:37:03 -07:00
Christoph Hellwig 13d2c10b05 xfs: move the di_size field to struct xfs_inode
In preparation of removing the historic icinode struct, move the on-disk
size field into the containing xfs_inode structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-07 14:37:03 -07:00
Dave Chinner e6a688c332 xfs: initialise attr fork on inode create
When we allocate a new inode, we often need to add an attribute to
the inode as part of the create. This can happen as a result of
needing to add default ACLs or security labels before the inode is
made visible to userspace.

This is highly inefficient right now. We do the create transaction
to allocate the inode, then we do an "add attr fork" transaction to
modify the just created empty inode to set the inode fork offset to
allow attributes to be stored, then we go and do the attribute
creation.

This means 3 transactions instead of 1 to allocate an inode, and
this greatly increases the load on the CIL commit code, resulting in
excessive contention on the CIL spin locks and performance
degradation:

 18.99%  [kernel]                [k] __pv_queued_spin_lock_slowpath
  3.57%  [kernel]                [k] do_raw_spin_lock
  2.51%  [kernel]                [k] __raw_callee_save___pv_queued_spin_unlock
  2.48%  [kernel]                [k] memcpy
  2.34%  [kernel]                [k] xfs_log_commit_cil

The typical profile resulting from running fsmark on a selinux enabled
filesytem is adds this overhead to the create path:

  - 15.30% xfs_init_security
     - 15.23% security_inode_init_security
	- 13.05% xfs_initxattrs
	   - 12.94% xfs_attr_set
	      - 6.75% xfs_bmap_add_attrfork
		 - 5.51% xfs_trans_commit
		    - 5.48% __xfs_trans_commit
		       - 5.35% xfs_log_commit_cil
			  - 3.86% _raw_spin_lock
			     - do_raw_spin_lock
				  __pv_queued_spin_lock_slowpath
		 - 0.70% xfs_trans_alloc
		      0.52% xfs_trans_reserve
	      - 5.41% xfs_attr_set_args
		 - 5.39% xfs_attr_set_shortform.constprop.0
		    - 4.46% xfs_trans_commit
		       - 4.46% __xfs_trans_commit
			  - 4.33% xfs_log_commit_cil
			     - 2.74% _raw_spin_lock
				- do_raw_spin_lock
				     __pv_queued_spin_lock_slowpath
			       0.60% xfs_inode_item_format
		      0.90% xfs_attr_try_sf_addname
	- 1.99% selinux_inode_init_security
	   - 1.02% security_sid_to_context_force
	      - 1.00% security_sid_to_context_core
		 - 0.92% sidtab_entry_to_string
		    - 0.90% sidtab_sid2str_get
			 0.59% sidtab_sid2str_put.part.0
	   - 0.82% selinux_determine_inode_label
	      - 0.77% security_transition_sid
		   0.70% security_compute_sid.part.0

And fsmark creation rate performance drops by ~25%. The key point to
note here is that half the additional overhead comes from adding the
attribute fork to the newly created inode. That's crazy, considering
we can do this same thing at inode create time with a couple of
lines of code and no extra overhead.

So, if we know we are going to add an attribute immediately after
creating the inode, let's just initialise the attribute fork inside
the create transaction and chop that whole chunk of code out of
the create fast path. This completely removes the performance
drop caused by enabling SELinux, and the profile looks like:

     - 8.99% xfs_init_security
         - 9.00% security_inode_init_security
            - 6.43% xfs_initxattrs
               - 6.37% xfs_attr_set
                  - 5.45% xfs_attr_set_args
                     - 5.42% xfs_attr_set_shortform.constprop.0
                        - 4.51% xfs_trans_commit
                           - 4.54% __xfs_trans_commit
                              - 4.59% xfs_log_commit_cil
                                 - 2.67% _raw_spin_lock
                                    - 3.28% do_raw_spin_lock
                                         3.08% __pv_queued_spin_lock_slowpath
                                   0.66% xfs_inode_item_format
                        - 0.90% xfs_attr_try_sf_addname
                  - 0.60% xfs_trans_alloc
            - 2.35% selinux_inode_init_security
               - 1.25% security_sid_to_context_force
                  - 1.21% security_sid_to_context_core
                     - 1.19% sidtab_entry_to_string
                        - 1.20% sidtab_sid2str_get
                           - 0.86% sidtab_sid2str_put.part.0
                              - 0.62% _raw_spin_lock_irqsave
                                 - 0.77% do_raw_spin_lock
                                      __pv_queued_spin_lock_slowpath
               - 0.84% selinux_determine_inode_label
                  - 0.83% security_transition_sid
                       0.86% security_compute_sid.part.0

Which indicates the XFS overhead of creating the selinux xattr has
been halved. This doesn't fix the CIL lock contention problem, just
means it's not a limiting factor for this workload. Lock contention
in the security subsystems is going to be an issue soon, though...

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[djwong: fix compilation error when CONFIG_SECURITY=n]
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Gao Xiang <hsiangkao@redhat.com>
2021-03-25 16:47:51 -07:00
Darrick J. Wong 973975b72a xfs: validate ag btree levels using the precomputed values
Use the AG btree height limits that we precomputed into the xfs_mount to
validate the AG headers instead of using XFS_BTREE_MAXLEVELS.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-03-25 16:47:50 -07:00
Chandan Babu R f9fa87169d xfs: Introduce error injection to reduce maximum inode fork extent count
This commit adds XFS_ERRTAG_REDUCE_MAX_IEXTENTS error tag which enables
userspace programs to test "Inode fork extent count overflow detection"
by reducing maximum possible inode fork extent count to 10.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2021-01-22 16:54:48 -08:00
Chandan Babu R b9b7e1dc56 xfs: Add helper for checking per-inode extent count overflow
XFS does not check for possible overflow of per-inode extent counter
fields when adding extents to either data or attr fork.

For e.g.
1. Insert 5 million xattrs (each having a value size of 255 bytes) and
   then delete 50% of them in an alternating manner.

2. On a 4k block sized XFS filesystem instance, the above causes 98511
   extents to be created in the attr fork of the inode.

   xfsaild/loop0  2008 [003]  1475.127209: probe:xfs_inode_to_disk: (ffffffffa43fb6b0) if_nextents=98511 i_ino=131

3. The incore inode fork extent counter is a signed 32-bit
   quantity. However the on-disk extent counter is an unsigned 16-bit
   quantity and hence cannot hold 98511 extents.

4. The following incorrect value is stored in the attr extent counter,
   # xfs_db -f -c 'inode 131' -c 'print core.naextents' /dev/loop0
   core.naextents = -32561

This commit adds a new helper function (i.e.
xfs_iext_count_may_overflow()) to check for overflow of the per-inode
data and xattr extent counters. Future patches will use this function to
make sure that an FS operation won't cause the extent counter to
overflow.

Suggested-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2021-01-22 16:54:47 -08:00
Carlos Maiolino 771915c4f6 xfs: remove kmem_realloc()
Remove kmem_realloc() function and convert its users to use MM API
directly (krealloc())

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-09-06 18:05:51 -07:00
Carlos Maiolino 32a2b11f46 xfs: Remove kmem_zone_zalloc() usage
Use kmem_cache_zalloc() directly.

With the exception of xlog_ticket_alloc() which will be dealt on the
next patch for readability.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-07-28 20:24:14 -07:00
Christoph Hellwig ef8385128d xfs: cleanup xfs_idestroy_fork
Move freeing the dynamically allocated attr and COW fork, as well
as zeroing the pointers where actually needed into the callers, and
just pass the xfs_ifork structure to xfs_idestroy_fork.  Also simplify
the kmem_free calls by not checking for NULL first.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-19 09:40:59 -07:00
Christoph Hellwig f7e67b20ec xfs: move the fork format fields into struct xfs_ifork
Both the data and attr fork have a format that is stored in the legacy
idinode.  Move it into the xfs_ifork structure instead, where it uses
up padding.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-19 09:40:58 -07:00
Christoph Hellwig daf83964a3 xfs: move the per-fork nextents fields into struct xfs_ifork
There are there are three extents counters per inode, one for each of
the forks.  Two are in the legacy icdinode and one is directly in
struct xfs_inode.  Switch to a single counter in the xfs_ifork structure
where it uses up padding at the end of the structure.  This simplifies
various bits of code that just wants the number of extents counter and
can now directly dereference it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-19 09:40:58 -07:00
Christoph Hellwig 0f45a1b20c xfs: improve local fork verification
Call the data/attr local fork verifiers as soon as we are ready for them.
This keeps them close to the code setting up the forks, and avoids a
few branches later on.  Also open code xfs_inode_verify_forks in the
only remaining caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-19 09:40:58 -07:00
Christoph Hellwig 7c7ba21863 xfs: refactor xfs_inode_verify_forks
The split between xfs_inode_verify_forks and the two helpers
implementing the actual functionality is a little strange.  Reshuffle
it so that xfs_inode_verify_forks verifies if the data and attr forks
are actually in local format and only call the low-level helpers if
that is the case.  Handle the actual error reporting in the low-level
handlers to streamline the caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-19 09:40:57 -07:00
Christoph Hellwig 1934c8bd81 xfs: remove xfs_ifork_ops
xfs_ifork_ops add up to two indirect calls per inode read and flush,
despite just having a single instance in the kernel.  In xfsprogs
phase6 in xfs_repair overrides the verify_dir method to deal with inodes
that do not have a valid parent, but that can be fixed pretty easily
by ensuring they always have a valid looking parent.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-19 09:40:57 -07:00
Christoph Hellwig 9229d18e80 xfs: split xfs_iformat_fork
xfs_iformat_fork is a weird catchall.  Split it into one helper for
the data fork and one for the attr fork, and then call both helper
as well as the COW fork initialization from xfs_inode_from_disk.  Order
the COW fork initialization after the attr fork initialization given
that it can't fail to simplify the error handling.

Note that the newly split helpers are moved down the file in
xfs_inode_fork.c to avoid the need for forward declarations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-19 09:40:57 -07:00
Christoph Hellwig fd9cbe5121 xfs: remove the xfs_inode_log_item_t typedef
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:16 -07:00
Christoph Hellwig e9e2eae89d xfs: only check the superblock version for dinode size calculation
The size of the dinode structure is only dependent on the file system
version, so instead of checking the individual inode version just use
the newly added xfs_sb_version_has_large_dinode helper, and simplify
various calling conventions.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-19 08:48:47 -07:00
Carlos Maiolino 377bcd5f3b xfs: Remove kmem_zone_free() wrapper
We can remove it now, without needing to rework the KM_ flags.

Use kmem_cache_free() directly.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-18 08:40:44 -08:00
Darrick J. Wong a5155b870d xfs: always log corruption errors
Make sure we log something to dmesg whenever we return -EFSCORRUPTED up
the call stack.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-11-04 13:55:54 -08:00
Dave Chinner 3f8a4f1d87 xfs: fix inode fork extent count overflow
[commit message is verbose for discussion purposes - will trim it
down later. Some questions about implementation details at the end.]

Zorro Lang recently ran a new test to stress single inode extent
counts now that they are no longer limited by memory allocation.
The test was simply:

# xfs_io -f -c "falloc 0 40t" /mnt/scratch/big-file
# ~/src/xfstests-dev/punch-alternating /mnt/scratch/big-file

This test uncovered a problem where the hole punching operation
appeared to finish with no error, but apparently only created 268M
extents instead of the 10 billion it was supposed to.

Further, trying to punch out extents that should have been present
resulted in success, but no change in the extent count. It looked
like a silent failure.

While running the test and observing the behaviour in real time,
I observed the extent coutn growing at ~2M extents/minute, and saw
this after about an hour:

# xfs_io -f -c "stat" /mnt/scratch/big-file |grep next ; \
> sleep 60 ; \
> xfs_io -f -c "stat" /mnt/scratch/big-file |grep next
fsxattr.nextents = 127657993
fsxattr.nextents = 129683339
#

And a few minutes later this:

# xfs_io -f -c "stat" /mnt/scratch/big-file |grep next
fsxattr.nextents = 4177861124
#

Ah, what? Where did that 4 billion extra extents suddenly come from?

Stop the workload, unmount, mount:

# xfs_io -f -c "stat" /mnt/scratch/big-file |grep next
fsxattr.nextents = 166044375
#

And it's back at the expected number. i.e. the extent count is
correct on disk, but it's screwed up in memory. I loaded up the
extent list, and immediately:

# xfs_io -f -c "stat" /mnt/scratch/big-file |grep next
fsxattr.nextents = 4192576215
#

It's bad again. So, where does that number come from?
xfs_fill_fsxattr():

                if (ip->i_df.if_flags & XFS_IFEXTENTS)
                        fa->fsx_nextents = xfs_iext_count(&ip->i_df);
                else
                        fa->fsx_nextents = ip->i_d.di_nextents;

And that's the behaviour I just saw in a nutshell. The on disk count
is correct, but once the tree is loaded into memory, it goes whacky.
Clearly there's something wrong with xfs_iext_count():

inline xfs_extnum_t xfs_iext_count(struct xfs_ifork *ifp)
{
        return ifp->if_bytes / sizeof(struct xfs_iext_rec);
}

Simple enough, but 134M extents is 2**27, and that's right about
where things went wrong. A struct xfs_iext_rec is 16 bytes in size,
which means 2**27 * 2**4 = 2**31 and we're right on target for an
integer overflow. And, sure enough:

struct xfs_ifork {
        int                     if_bytes;       /* bytes in if_u1 */
....

Once we get 2**27 extents in a file, we overflow if_bytes and the
in-core extent count goes wrong. And when we reach 2**28 extents,
if_bytes wraps back to zero and things really start to go wrong
there. This is where the silent failure comes from - only the first
2**28 extents can be looked up directly due to the overflow, all the
extents above this index wrap back to somewhere in the first 2**28
extents. Hence with a regular pattern, trying to punch a hole in the
range that didn't have holes mapped to a hole in the first 2**28
extents and so "succeeded" without changing anything. Hence "silent
failure"...

Fix this by converting if_bytes to a int64_t and converting all the
index variables and size calculations to use int64_t types to avoid
overflows in future. Signed integers are still used to enable easy
detection of extent count underflows. This enables scalability of
extent counts to the limits of the on-disk format - MAXEXTNUM
(2**31) extents.

Current testing is at over 500M extents and still going:

fsxattr.nextents = 517310478

Reported-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 09:04:58 -07:00