Commit Graph

227 Commits

Author SHA1 Message Date
Patrick Talbert fdb3eab93b Merge: XFS: Update #3 for RHEL9.6 (upstream v6.7-6.8)
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5785

XFS: update #3 for RHEL9.6. Backport upstream v6.7-6.8, including fixes patches post v6.8.

JIRA: https://issues.redhat.com/browse/RHEL-65728

Depends: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5633

Omitted-fix:  a18a69bbec083 ("xfs: use the recalculated transaction reservation in xfs_growfs_rt_bmblock")

Missing several dependency patches that merged upstream in Aug 2024, well beyond this update
(e.g. 7996f10ce6c xfs: factor out a xfs_growfs_rt_bmblock helper).

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>

Approved-by: Rafael Aquini <raquini@redhat.com>
Approved-by: Brian Foster <bfoster@redhat.com>
Approved-by: Eric Sandeen <esandeen@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Patrick Talbert <ptalbert@redhat.com>
2024-12-30 07:30:09 -05:00
Bill O'Donnell cf65fb211c bdev: rename freeze and thaw helpers
JIRA: https://issues.redhat.com/browse/RHEL-65728

Conflicts: limit coverage to xfs and ext4.

commit 982c3b3058433f20aba9fb032599cee5dfc17328
Author: Christian Brauner <brauner@kernel.org>
Date:   Tue Oct 24 15:01:08 2023 +0200

    bdev: rename freeze and thaw helpers

    We have bdev_mark_dead() etc and we're going to move block device
    freezing to holder ops in the next patch. Make the naming consistent:

    * freeze_bdev() -> bdev_freeze()
    * thaw_bdev()   -> bdev_thaw()

    Also document the return code.

    Link: https://lore.kernel.org/r/20231024-vfs-super-freeze-v2-2-599c19f4faac@kernel.org
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Christian Brauner <brauner@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-11-20 11:25:46 -06:00
Ian Kent dcafde7644 fs: port inode_owner_or_capable() to mnt_idmap
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus

Conflicts: For consistency drop btrfs hunks because it isn't supported in
        CentOS Stream and other backports also drop such hunks.
        CentOS Stream does not have upstream commit 3db1de0e582c3 ("f2fs:
        change the current atomic write way") which results in a reject for
        hunk #1 in f2fs_ioc_start_atomic_write() of fs/f2fs/file.c, so
        make the required change manually.
	Upstream commit 7bc155fec5b371 ("f2fs: kill volatile write
	support") is not present in CentOS Stream so make the additional
	changes needed.

commit 01beba7957a26f9b7179127e8ad56bb5a0f56138
Author: Christian Brauner <brauner@kernel.org>
Date:   Fri Jan 13 12:49:26 2023 +0100

    fs: port inode_owner_or_capable() to mnt_idmap

    Convert to struct mnt_idmap.

    Last cycle we merged the necessary infrastructure in
    256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
    This is just the conversion to struct mnt_idmap.

    Currently we still pass around the plain namespace that was attached to a
    mount. This is in general pretty convenient but it makes it easy to
    conflate namespaces that are relevant on the filesystem with namespaces
    that are relevent on the mount level. Especially for non-vfs developers
    without detailed knowledge in this area this can be a potential source for
    bugs.

    Once the conversion to struct mnt_idmap is done all helpers down to the
    really low-level helpers will take a struct mnt_idmap argument instead of
    two namespace arguments. This way it becomes impossible to conflate the two
    eliminating the possibility of any bugs. All of the vfs and all filesystems
    only operate on struct mnt_idmap.

    Acked-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>

Signed-off-by: Ian Kent <ikent@redhat.com>
2024-10-16 10:45:27 +08:00
Ian Kent 060dc0b240 fs: port ->fileattr_set() to pass mnt_idmap
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus

Conflicts: For consistency drop btrfs hunks because it isn't supported in
	CentOS Stream and other backports also drop such hunks.

commit 8782a9aea3ab4d697ad67d1f8ebca38a4e1c24ab
Author: Christian Brauner <brauner@kernel.org>
Date:   Fri Jan 13 12:49:21 2023 +0100

    fs: port ->fileattr_set() to pass mnt_idmap

    Convert to struct mnt_idmap.

    Last cycle we merged the necessary infrastructure in
    256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
    This is just the conversion to struct mnt_idmap.

    Currently we still pass around the plain namespace that was attached to a
    mount. This is in general pretty convenient but it makes it easy to
    conflate namespaces that are relevant on the filesystem with namespaces
    that are relevent on the mount level. Especially for non-vfs developers
    without detailed knowledge in this area this can be a potential source for
    bugs.

    Once the conversion to struct mnt_idmap is done all helpers down to the
    really low-level helpers will take a struct mnt_idmap argument instead of
    two namespace arguments. This way it becomes impossible to conflate the two
    eliminating the possibility of any bugs. All of the vfs and all filesystems
    only operate on struct mnt_idmap.

    Acked-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>

Signed-off-by: Ian Kent <ikent@redhat.com>
2024-10-16 10:45:18 +08:00
Carlos Maiolino e17a65d449 ext4: initialize quota before expanding inode in setproject ioctl
JIRA: https://issues.redhat.com/browse/RHEL-5335

Make sure we initialize quotas before possibly expanding inode space
(and thus maybe needing to allocate external xattr block) in
ext4_ioctl_setproject(). This prevents not accounting the necessary
block allocation.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20221207115937.26601-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit 1485f726c6dec1a1f85438f2962feaa3d585526f)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2023-11-06 13:21:19 +01:00
Carlos Maiolino 2e401f1634 ext4: don't fail GETFSUUID when the caller provides a long buffer
JIRA: https://issues.redhat.com/browse/RHEL-5335

If userspace provides a longer UUID buffer than is required, we
shouldn't fail the call with EINVAL -- rather, we can fill the caller's
buffer with the bytes we /can/ fill, and update the length field to
reflect what we copied.  This doesn't break the UAPI since we're
enabling a case that currently fails, and so far Ted hasn't released a
version of e2fsprogs that uses the new ext4 ioctl.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Catherine Hoang <catherine.hoang@oracle.com>
Link: https://lore.kernel.org/r/166811139478.327006.13879198441587445544.stgit@magnolia
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
(cherry picked from commit a7e9d977e031fceefe1e7cd69ebd7202d5758b56)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2023-11-06 13:21:17 +01:00
Carlos Maiolino b5100deb93 ext4: dont return EINVAL from GETFSUUID when reporting UUID length
JIRA: https://issues.redhat.com/browse/RHEL-5335

If userspace calls this ioctl with fsu_length (the length of the
fsuuid.fsu_uuid array) set to zero, ext4 copies the desired uuid length
out to userspace.  The kernel call returned a result from a valid input,
so the return value here should be zero, not EINVAL.

While we're at it, fix the copy_to_user call to make it clear that we're
only copying out fsu_len.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Catherine Hoang <catherine.hoang@oracle.com>
Link: https://lore.kernel.org/r/166811138914.327006.9241306894437166566.stgit@magnolia
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
(cherry picked from commit b76abb5157468756163fe7e3431c9fe32cba57ca)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2023-11-06 13:21:17 +01:00
Carlos Maiolino 87b6f504f2 ext4: fix bug_on in __es_tree_search caused by bad boot loader inode
JIRA: https://issues.redhat.com/browse/RHEL-5335

We got a issue as fllows:
==================================================================
 kernel BUG at fs/ext4/extents_status.c:203!
 invalid opcode: 0000 [#1] PREEMPT SMP
 CPU: 1 PID: 945 Comm: cat Not tainted 6.0.0-next-20221007-dirty #349
 RIP: 0010:ext4_es_end.isra.0+0x34/0x42
 RSP: 0018:ffffc9000143b768 EFLAGS: 00010203
 RAX: 0000000000000000 RBX: ffff8881769cd0b8 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: ffffffff8fc27cf7 RDI: 00000000ffffffff
 RBP: ffff8881769cd0bc R08: 0000000000000000 R09: ffffc9000143b5f8
 R10: 0000000000000001 R11: 0000000000000001 R12: ffff8881769cd0a0
 R13: ffff8881768e5668 R14: 00000000768e52f0 R15: 0000000000000000
 FS: 00007f359f7f05c0(0000)GS:ffff88842fd00000(0000)knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f359f5a2000 CR3: 000000017130c000 CR4: 00000000000006e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  <TASK>
  __es_tree_search.isra.0+0x6d/0xf5
  ext4_es_cache_extent+0xfa/0x230
  ext4_cache_extents+0xd2/0x110
  ext4_find_extent+0x5d5/0x8c0
  ext4_ext_map_blocks+0x9c/0x1d30
  ext4_map_blocks+0x431/0xa50
  ext4_mpage_readpages+0x48e/0xe40
  ext4_readahead+0x47/0x50
  read_pages+0x82/0x530
  page_cache_ra_unbounded+0x199/0x2a0
  do_page_cache_ra+0x47/0x70
  page_cache_ra_order+0x242/0x400
  ondemand_readahead+0x1e8/0x4b0
  page_cache_sync_ra+0xf4/0x110
  filemap_get_pages+0x131/0xb20
  filemap_read+0xda/0x4b0
  generic_file_read_iter+0x13a/0x250
  ext4_file_read_iter+0x59/0x1d0
  vfs_read+0x28f/0x460
  ksys_read+0x73/0x160
  __x64_sys_read+0x1e/0x30
  do_syscall_64+0x35/0x80
  entry_SYSCALL_64_after_hwframe+0x63/0xcd
  </TASK>
==================================================================

In the above issue, ioctl invokes the swap_inode_boot_loader function to
swap inode<5> and inode<12>. However, inode<5> contain incorrect imode and
disordered extents, and i_nlink is set to 1. The extents check for inode in
the ext4_iget function can be bypassed bacause 5 is EXT4_BOOT_LOADER_INO.
While links_count is set to 1, the extents are not initialized in
swap_inode_boot_loader. After the ioctl command is executed successfully,
the extents are swapped to inode<12>, in this case, run the `cat` command
to view inode<12>. And Bug_ON is triggered due to the incorrect extents.

When the boot loader inode is not initialized, its imode can be one of the
following:
1) the imode is a bad type, which is marked as bad_inode in ext4_iget and
   set to S_IFREG.
2) the imode is good type but not S_IFREG.
3) the imode is S_IFREG.

The BUG_ON may be triggered by bypassing the check in cases 1 and 2.
Therefore, when the boot loader inode is bad_inode or its imode is not
S_IFREG, initialize the inode to avoid triggering the BUG.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20221026042310.3839669-5-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
(cherry picked from commit 991ed014de0840c5dc405b679168924afb2952ac)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2023-11-06 13:21:16 +01:00
Carlos Maiolino 5b9384cc2d ext4: add EXT4_IGET_BAD flag to prevent unexpected bad inode
JIRA: https://issues.redhat.com/browse/RHEL-5335

There are many places that will get unhappy (and crash) when ext4_iget()
returns a bad inode. However, if iget the boot loader inode, allows a bad
inode to be returned, because the inode may not be initialized. This
mechanism can be used to bypass some checks and cause panic. To solve this
problem, we add a special iget flag EXT4_IGET_BAD. Only with this flag
we'd be returning bad inode from ext4_iget(), otherwise we always return
the error code if the inode is bad inode.(suggested by Jan Kara)

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20221026042310.3839669-4-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
(cherry picked from commit 63b1e9bccb71fe7d7e3ddc9877dbdc85e5d2d023)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2023-11-06 13:21:16 +01:00
Chris von Recklinghausen 1f619343f6 treewide: use get_random_u32() when possible
Conflicts:
	drivers/gpu/drm/tests/drm_buddy_test.c
	drivers/gpu/drm/tests/drm_mm_test.c - We already have
		ce28ab1380e8 ("drm/tests: Add back seed value information")
		so keep calls to kunit_info.
	drop changes to drivers/misc/habanalabs/gaudi2/gaudi2.c
		fs/ntfs3/fslog.c - files not in CS9
	net/sunrpc/auth_gss/gss_krb5_wrap.c - We already have
		7f675ca7757b ("SUNRPC: Improve Kerberos confounder generation")
		so code to change is gone.
	drivers/gpu/drm/i915/i915_gem_gtt.c
	drivers/gpu/drm/i915/selftests/i915_selftest.c
	drivers/gpu/drm/tests/drm_buddy_test.c
	drivers/gpu/drm/tests/drm_mm_test.c
		change added under
		4cb818386e ("Merge DRM changes from upstream v6.0.8..v6.1")

JIRA: https://issues.redhat.com/browse/RHEL-1848

commit a251c17aa558d8e3128a528af5cf8b9d7caae4fd
Author: Jason A. Donenfeld <Jason@zx2c4.com>
Date:   Wed Oct 5 17:43:22 2022 +0200

    treewide: use get_random_u32() when possible

    The prandom_u32() function has been a deprecated inline wrapper around
    get_random_u32() for several releases now, and compiles down to the
    exact same code. Replace the deprecated wrapper with a direct call to
    the real function. The same also applies to get_random_int(), which is
    just a wrapper around get_random_u32(). This was done as a basic find
    and replace.

    Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Reviewed-by: Kees Cook <keescook@chromium.org>
    Reviewed-by: Yury Norov <yury.norov@gmail.com>
    Reviewed-by: Jan Kara <jack@suse.cz> # for ext4
    Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake
    Acked-by: Chuck Lever <chuck.lever@oracle.com> # for nfsd
    Acked-by: Jakub Kicinski <kuba@kernel.org>
    Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> # for thunderbol
t
    Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
    Acked-by: Helge Deller <deller@gmx.de> # for parisc
    Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390
    Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:03 -04:00
Ming Lei 1633e7ded8 ext4: split ext4_shutdown
JIRA: https://issues.redhat.com/browse/RHEL-1516

commit 97524b454bc562f4052751f0e635a61dad78f1b2
Author: Christoph Hellwig <hch@lst.de>
Date:   Thu Jun 1 11:44:57 2023 +0200

    ext4: split ext4_shutdown

    Split ext4_shutdown into a low-level helper that will be reused for
    implementing the shutdown super operation and a wrapper for the ioctl
    handling.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Acked-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Link: https://lore.kernel.org/r/20230601094459.1350643-15-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2023-09-18 15:59:32 +08:00
Carlos Maiolino 3164bd4a17 ext4: fix i_version handling in ext4
Bugzilla: https://bugzilla.redhat.com/2107587
Tested: BZ reproducer and xfstests

ext4 currently updates the i_version counter when the atime is updated
during a read. This is less than ideal as it can cause unnecessary cache
invalidations with NFSv4 and unnecessary remeasurements for IMA.

The increment in ext4_mark_iloc_dirty is also problematic since it can
corrupt the i_version counter for ea_inodes. We aren't bumping the file
times in ext4_mark_iloc_dirty, so changing the i_version there seems
wrong, and is the cause of both problems.

Remove that callsite and add increments to the setattr, setxattr and
ioctl codepaths, at the same times that we update the ctime. The
i_version bump that already happens during timestamp updates should take
care of the rest.

In ext4_move_extents, increment the i_version on both inodes, and also
add in missing ctime updates.

[ Some minor updates since we've already enabled the i_version counter
  unconditionally already via another patch series. -- TYT ]

Cc: stable@kernel.org
Cc: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20220908172448.208585-3-jlayton@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit a642c2c0827f5604a93f9fa1e5701eecdce4ae22)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2023-06-13 14:44:49 +02:00
Carlos Maiolino 5d0a89f107 ext4: zero i_disksize when initializing the bootloader inode
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2188241
Tested: With xfstests

If the boot loader inode has never been used before, the
EXT4_IOC_SWAP_BOOT inode will initialize it, including setting the
i_size to 0.  However, if the "never before used" boot loader has a
non-zero i_size, then i_disksize will be non-zero, and the
inconsistency between i_size and i_disksize can trigger a kernel
warning:

 WARNING: CPU: 0 PID: 2580 at fs/ext4/file.c:319
 CPU: 0 PID: 2580 Comm: bb Not tainted 6.3.0-rc1-00004-g703695902cfa
 RIP: 0010:ext4_file_write_iter+0xbc7/0xd10
 Call Trace:
  vfs_write+0x3b1/0x5c0
  ksys_write+0x77/0x160
  __x64_sys_write+0x22/0x30
  do_syscall_64+0x39/0x80

Reproducer:
 1. create corrupted image and mount it:
       mke2fs -t ext4 /tmp/foo.img 200
       debugfs -wR "sif <5> size 25700" /tmp/foo.img
       mount -t ext4 /tmp/foo.img /mnt
       cd /mnt
       echo 123 > file
 2. Run the reproducer program:
       posix_memalign(&buf, 1024, 1024)
       fd = open("file", O_RDWR | O_DIRECT);
       ioctl(fd, EXT4_IOC_SWAP_BOOT);
       write(fd, buf, 1024);

Fix this by setting i_disksize as well as i_size to zero when
initiaizing the boot loader inode.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217159
Cc: stable@kernel.org
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Link: https://lore.kernel.org/r/20230308032643.641113-1-chengzhihao1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit f5361da1e60d54ec81346aee8e3d8baf1be0b762)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2023-04-20 11:01:18 +02:00
Carlos Maiolino 5582f303d3 ext4: remove dead code in updating backup sb
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2188241
Tested: With xfstests

ext4_update_backup_sb checks for err having some value
after unlocking buffer. But err has not been updated
till that point in any code which will lead execution
of the code in question.

Signed-off-by: Tanmay Bhushan <007047221b@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20221230141858.3828-1-007047221b@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit 08abd0466ec9113908e674d042ec2a36dfc2875c)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2023-04-20 10:58:08 +02:00
Lukas Czerner 68d8a632c1 ext4: update the backup superblock's at the end of the online resize
Bugzilla: https://bugzilla.redhat.com/2145193
Tested: xfstests
Upstream Status: upstream

commit 9a8c5b0d061554fedd7dbe894e63aa34d0bac7c4
Author: Theodore Ts'o <tytso@mit.edu>
    
    When expanding a file system using online resize, various fields in
    the superblock (e.g., s_blocks_count, s_inodes_count, etc.) change.
    To update the backup superblocks, the online resize uses the function
    update_backups() in fs/ext4/resize.c.  This function was not updating
    the checksum field in the backup superblocks.  This wasn't a big deal
    previously, because e2fsck didn't care about the checksum field in the
    backup superblock.  (And indeed, update_backups() goes all the way
    back to the ext3 days, well before we had support for metadata
    checksums.)
    
    However, there is an alternate, more general way of updating
    superblock fields, ext4_update_primary_sb() in fs/ext4/ioctl.c.  This
    function does check the checksum of the backup superblock, and if it
    doesn't match will mark the file system as corrupted.  That was
    clearly not the intent, so avoid to aborting the resize when a bad
    superblock is found.
    
    In addition, teach update_backups() to properly update the checksum in
    the backup superblocks.  We will eventually want to unify
    updapte_backups() with the infrasture in ext4_update_primary_sb(), but
    that's for another day.
    
    Note: The problem has been around for a while; it just didn't really
    matter until ext4_update_primary_sb() was added by commit bbc605cdb1e1
    ("ext4: implement support for get/set fs label").  And it became
    trivially easy to reproduce after commit 827891a38acc ("ext4: update
    the s_overhead_clusters in the backup sb's when resizing") in v6.0.
    
    Cc: stable@kernel.org # 5.17+
    Fixes: bbc605cdb1e1 ("ext4: implement support for get/set fs label")
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit 9a8c5b0d061554fedd7dbe894e63aa34d0bac7c4)

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2023-01-12 16:15:36 +01:00
Lukas Czerner 0fbd0f6bb5 ext4: remove redundant checking in ext4_ioctl_checkpoint
Bugzilla: https://bugzilla.redhat.com/2145193
Tested: xfstests
Upstream Status: upstream

commit 78ed9354c57f31000e88783496fc84dce7c8021c
Author: Guoqing Jiang <guoqing.jiang@linux.dev>
    
    It is already checked after comment "check for invalid bits set",
    so let's remove this one.
    
    Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
    Link: https://lore.kernel.org/r/20220918115219.12407-1-guoqing.jiang@linux.dev
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit 78ed9354c57f31000e88783496fc84dce7c8021c)

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2023-01-12 16:15:36 +01:00
Lukas Czerner de6bb15fd8 ext4: add ioctls to get/set the ext4 superblock uuid
Bugzilla: https://bugzilla.redhat.com/2145193
Tested: xfstests
Upstream Status: upstream

commit d95efb14c0b81b684deb32ba10cdb78b4178ae5b
Author: Jeremy Bongio <bongiojp@gmail.com>
    
    This fixes a race between changing the ext4 superblock uuid and operations
    like mounting, resizing, changing features, etc.
    
    Reviewed-by: Theodore Ts'o <tytso@mit.edu>
    Signed-off-by: Jeremy Bongio <bongiojp@gmail.com>
    Link: https://lore.kernel.org/r/20220721224422.438351-1-bongiojp@gmail.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit d95efb14c0b81b684deb32ba10cdb78b4178ae5b)

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2023-01-12 16:15:34 +01:00
Lukas Czerner a49d3244ad ext4: update the s_overhead_clusters in the backup sb's when resizing
Bugzilla: https://bugzilla.redhat.com/2145193
Tested: xfstests
Upstream Status: upstream

commit 827891a38accfb4e04dbcdefe710f8746c6ad16d
Author: Theodore Ts'o <tytso@mit.edu>
    
    When the EXT4_IOC_RESIZE_FS ioctl is complete, update the backup
    superblocks.  We don't do this for the old-style resize ioctls since
    they are quite ancient, and only used by very old versions of
    resize2fs --- and we don't want to update the backup superblocks every
    time EXT4_IOC_GROUP_ADD is called, since it might get called a lot.
    
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Reviewed-by: Andreas Dilger <adilger@dilger.ca>
    Link: https://lore.kernel.org/r/20220629040026.112371-2-tytso@mit.edu
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit 827891a38accfb4e04dbcdefe710f8746c6ad16d)

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2023-01-12 16:15:33 +01:00
Lukas Czerner 2c4a78af6e ext4: refactor and move ext4_ioctl_get_encryption_pwsalt()
Bugzilla: https://bugzilla.redhat.com/2145193
Tested: xfstests
Upstream Status: upstream

commit 72f63f4a770310233dba3bff8fc6e41660ebaa27
Author: Ritesh Harjani <ritesh.list@gmail.com>
    
    This patch move code for FS_IOC_GET_ENCRYPTION_PWSALT case into
    ext4's crypto.c file, i.e. ext4_ioctl_get_encryption_pwsalt()
    and uuid_is_zero(). This is mostly refactoring logic and should
    not affect any functionality change.
    
    Suggested-by: Eric Biggers <ebiggers@google.com>
    Reviewed-by: Eric Biggers <ebiggers@google.com>
    Signed-off-by: Ritesh Harjani <ritesh.list@gmail.com>
    Link: https://lore.kernel.org/r/5af98b17152a96b245b4f7d2dfb8607fc93e36aa.1652595565.git.ritesh.list@gmail.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit 72f63f4a770310233dba3bff8fc6e41660ebaa27)

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2023-01-12 16:15:32 +01:00
Ming Lei b144a611f6 block: remove QUEUE_FLAG_DISCARD
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917
Conflicts: deal with xfs conflict because we don't backport commit ("
0560f31a09e5 xfs: convert mount flags to features"), and what we need
is just to replace blk_queue_discard() with bdev_max_discard_sectors().

commit 70200574cc229f6ba038259e8142af2aa09e6976
Author: Christoph Hellwig <hch@lst.de>
Date:   Fri Apr 15 06:52:55 2022 +0200

    block: remove QUEUE_FLAG_DISCARD

    Just use a non-zero max_discard_sectors as an indicator for discard
    support, similar to what is done for write zeroes.

    The only places where needs special attention is the RAID5 driver,
    which must clear discard support for security reasons by default,
    even if the default stacking rules would allow for it.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
    Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd]
    Acked-by: Jan Höppner <hoeppner@linux.ibm.com> [s390]
    Acked-by: Coly Li <colyli@suse.de> [bcache]
    Acked-by: David Sterba <dsterba@suse.com> [btrfs]
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Link: https://lore.kernel.org/r/20220415045258.199825-25-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2022-06-22 08:58:02 +08:00
Lukas Czerner 29abb211ac ext4: update the cached overhead value in the superblock
Bugzilla: https://bugzilla.redhat.com/2079868
Tested: xfstests
Upstream Status: upstream

commit eb7054212eac8b451d727bf079eae3db8c88f9d3
Author: Theodore Ts'o <tytso@mit.edu>
    
    If we (re-)calculate the file system overhead amount and it's
    different from the on-disk s_overhead_clusters value, update the
    on-disk version since this can take potentially quite a while on
    bigalloc file systems.
    
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Cc: stable@kernel.org

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2022-05-16 11:28:02 +02:00
Lukas Czerner 2755b012bb ext4: fix kernel doc warnings
Bugzilla: https://bugzilla.redhat.com/2079868
Tested: xfstests
Upstream Status: upstream

commit 919adbfec29d5b89b3e45620653cbeeb0d42e6fd
Author: Theodore Ts'o <tytso@mit.edu>
    
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2022-05-16 10:57:32 +02:00
Lukas Czerner 3ef03ed998 ext4: fast commit may not fallback for ineligible commit
Bugzilla: https://bugzilla.redhat.com/2079868
Tested: xfstests
Upstream Status: upstream

commit e85c81ba8859a4c839bcd69c5d83b32954133a5b
Author: Xin Yin <yinxin.x@bytedance.com>
    
    For the follow scenario:
    1. jbd start commit transaction n
    2. task A get new handle for transaction n+1
    3. task A do some ineligible actions and mark FC_INELIGIBLE
    4. jbd complete transaction n and clean FC_INELIGIBLE
    5. task A call fsync
    
    In this case fast commit will not fallback to full commit and
    transaction n+1 also not handled by jbd.
    
    Make ext4_fc_mark_ineligible() also record transaction tid for
    latest ineligible case, when call ext4_fc_cleanup() check
    current transaction tid, if small than latest ineligible tid
    do not clear the EXT4_MF_FC_INELIGIBLE.
    
    Reported-by: kernel test robot <lkp@intel.com>
    Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
    Reported-by: Ritesh Harjani <riteshh@linux.ibm.com>
    Suggested-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
    Signed-off-by: Xin Yin <yinxin.x@bytedance.com>
    Link: https://lore.kernel.org/r/20220117093655.35160-2-yinxin.x@bytedance.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Cc: stable@kernel.org

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2022-05-16 10:57:30 +02:00
Lukas Czerner fcaed67b6a ext4: implement support for get/set fs label
Bugzilla: https://bugzilla.redhat.com/2041486
Tested: xfstests
Upstream Status: upstream

commit bbc605cdb1e15aafaec899fedc385dc75dddac0e
Author: Lukas Czerner <lczerner@redhat.com>
    
    Implement support for FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL ioctls for
    online reading and setting of file system label.
    
    ext4_ioctl_getlabel() is simple, just get the label from the primary
    superblock. This might not be the first sb on the file system if
    'sb=' mount option is used.
    
    In ext4_ioctl_setlabel() we update what ext4 currently views as a
    primary superblock and then proceed to update backup superblocks. There
    are two caveats:
     - the primary superblock might not be the first superblock and so it
       might not be the one used by userspace tools if read directly
       off the disk.
     - because the primary superblock might not be the first superblock we
       potentialy have to update it as part of backup superblock update.
       However the first sb location is a bit more complicated than the rest
       so we have to account for that.
    
    The superblock modification is created generic enough so the
    infrastructure can be used for other potential superblock modification
    operations, such as chaning UUID.
    
    Tested with generic/492 with various configurations. I also checked the
    behavior with 'sb=' mount options, including very large file systems
    with and without sparse_super/sparse_super2.
    
    Signed-off-by: Lukas Czerner <lczerner@redhat.com>
    Link: https://lore.kernel.org/r/20211213135618.43303-1-lczerner@redhat.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2022-01-19 10:31:00 +01:00
Lukas Czerner 3df7707443 ext4: avoid trim error on fs with small groups
Bugzilla: https://bugzilla.redhat.com/2041486
Tested: xfstests
Upstream Status: upstream

commit 173b6e383d2a204c9921ffc1eca3b87aa2106c33
Author: Jan Kara <jack@suse.cz>
    
    A user reported FITRIM ioctl failing for him on ext4 on some devices
    without apparent reason.  After some debugging we've found out that
    these devices (being LVM volumes) report rather large discard
    granularity of 42MB and the filesystem had 1k blocksize and thus group
    size of 8MB. Because ext4 FITRIM implementation puts discard
    granularity into minlen, ext4_trim_fs() declared the trim request as
    invalid. However just silently doing nothing seems to be a more
    appropriate reaction to such combination of parameters since user did
    not specify anything wrong.
    
    CC: Lukas Czerner <lczerner@redhat.com>
    Fixes: 5c2ed62fd4 ("ext4: Adjust minlen with discard_granularity in the FITRIM ioctl")
    Signed-off-by: Jan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/20211112152202.26614-1-jack@suse.cz
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2022-01-19 10:31:00 +01:00
Lukas Czerner 5ebd9167cf ext4: drop ineligible txn start stop APIs
Bugzilla: https://bugzilla.redhat.com/2041486
Tested: xfstests
Upstream Status: upstream

commit 7bbbe241ec7ce0def9f71464c878fdbd2b0dcf37
Author: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
    
    This patch drops ext4_fc_start_ineligible() and
    ext4_fc_stop_ineligible() APIs. Fast commit ineligible transactions
    should simply call ext4_fc_mark_ineligible() after starting the
    trasaction.
    
    Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
    Link: https://lore.kernel.org/r/20211223202140.2061101-3-harshads@google.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2022-01-19 10:30:57 +01:00
Lukas Czerner b61d0a6eac ext4: use ext4_journal_start/stop for fast commit transactions
Bugzilla: https://bugzilla.redhat.com/2041486
Tested: xfstests
Upstream Status: upstream

commit 2729cfdcfa1cc49bef5a90d046fa4a187fdfcc69
Author: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
    
    This patch drops all calls to ext4_fc_start_update() and
    ext4_fc_stop_update(). To ensure that there are no ongoing journal
    updates during fast commit, we also make jbd2_fc_begin_commit() lock
    journal for updates. This way we don't have to maintain two different
    transaction start stop APIs for fast commit and full commit. This
    patch doesn't remove the functions altogether since in future we want
    to have inode level locking for fast commits.
    
    Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
    Link: https://lore.kernel.org/r/20211223202140.2061101-2-harshads@google.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2022-01-19 10:30:57 +01:00
Lukas Czerner e3ca8adeb8 ext4: Support for checksumming from journal triggers
Bugzilla: https://bugzilla.redhat.com/2041486
Tested: xfstests
Upstream Status: upstream

commit 188c299e2a26cc33747187f87c9e044dfd85a782
Author: Jan Kara <jack@suse.cz>
    
    JBD2 layer support triggers which are called when journaling layer moves
    buffer to a certain state. We can use the frozen trigger, which gets
    called when buffer data is frozen and about to be written out to the
    journal, to compute block checksums for some buffer types (similarly as
    does ocfs2). This avoids unnecessary repeated recomputation of the
    checksum (at the cost of larger window where memory corruption won't be
    caught by checksumming) and is even necessary when there are
    unsynchronized updaters of the checksummed data.
    
    So add superblock and journal trigger type arguments to
    ext4_journal_get_write_access() and ext4_journal_get_create_access() so
    that frozen triggers can be set accordingly. Also add inode argument to
    ext4_walk_page_buffers() and all the callbacks used with that function
    for the same purpose. This patch is mostly only a change of prototype of
    the above mentioned functions and a few small helpers. Real checksumming
    will come later.
    
    Reviewed-by: Theodore Ts'o <tytso@mit.edu>
    Signed-off-by: Jan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/20210816095713.16537-1-jack@suse.cz
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2022-01-19 10:30:48 +01:00
Lukas Czerner 92386e70ae ext4: Convert to use mapping->invalidate_lock
Bugzilla: https://bugzilla.redhat.com/2041486
Tested: xfstests
Upstream Status: upstream

commit d4f5258eae7b38c2a28d0a7b28a6d0a8c1f9fe8e
Author: Jan Kara <jack@suse.cz>
    
    Convert ext4 to use mapping->invalidate_lock instead of its private
    EXT4_I(inode)->i_mmap_sem. This is mostly search-and-replace. By this
    conversion we fix a long standing race between hole punching and read(2)
    / readahead(2) paths that can lead to stale page cache contents.
    
    CC: <linux-ext4@vger.kernel.org>
    CC: Ted Tso <tytso@mit.edu>
    Acked-by: Theodore Ts'o <tytso@mit.edu>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Jan Kara <jack@suse.cz>

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2022-01-19 10:30:46 +01:00
Theodore Ts'o 0955901908 ext4: fix flags validity checking for EXT4_IOC_CHECKPOINT
Use the correct bitmask when checking for any not-yet-supported flags.

Link: https://lore.kernel.org/r/20210702173425.1276158-1-tytso@mit.edu
Fixes: 351a0a3fbc ("ext4: add ioctl EXT4_IOC_CHECKPOINT")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Leah Rumancik <leah.rumancik@gmail.com>
2021-07-08 08:37:31 -04:00
Theodore Ts'o 8813587a99 Revert "ext4: consolidate checks for resize of bigalloc into ext4_resize_begin"
The function ext4_resize_begin() gets called from three different
places, and online resize for bigalloc file systems is disallowed from
the old-style online resize (EXT4_IOC_GROUP_ADD and
EXT4_IOC_GROUP_EXTEND), but it *is* supposed to be allowed via
EXT4_IOC_RESIZE_FS.

This reverts commit e9f9f61d0c.
2021-06-30 20:54:22 -04:00
Josh Triplett e9f9f61d0c ext4: consolidate checks for resize of bigalloc into ext4_resize_begin
Two different places checked for attempts to resize a filesystem with
the bigalloc feature. Move the check into ext4_resize_begin, which both
places already call.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Link: https://lore.kernel.org/r/bee03303d999225ecb3bfa5be8576b2f4c6edbe6.1623093259.git.josh@joshtriplett.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24 10:22:36 -04:00
Leah Rumancik 351a0a3fbc ext4: add ioctl EXT4_IOC_CHECKPOINT
ioctl EXT4_IOC_CHECKPOINT checkpoints and flushes the journal. This
includes forcing all the transactions to the log, checkpointing the
transactions, and flushing the log to disk. This ioctl takes u32 "flags"
as an argument. Three flags are supported. EXT4_IOC_CHECKPOINT_FLAG_DRY_RUN
can be used to verify input to the ioctl. It returns error if there is any
invalid input, otherwise it returns success without performing
any checkpointing. The other two flags, EXT4_IOC_CHECKPOINT_FLAG_DISCARD
and EXT4_IOC_CHECKPOINT_FLAG_ZEROOUT, can be used to issue requests to
discard or zeroout the journal logs blocks, respectively. At this
point, EXT4_IOC_CHECKPOINT_FLAG_ZEROOUT is primarily added to enable
testing of this codepath on devices that don't support discard.
EXT4_IOC_CHECKPOINT_FLAG_DISCARD and EXT4_IOC_CHECKPOINT_FLAG_ZEROOUT
cannot both be set.

Systems that wish to achieve content deletion SLO can set up a daemon
that calls this ioctl at a regular interval such that it matches with the
SLO requirement. Thus, with this patch, the ext4_dir_entry2 wipeout
patch[1], and the Ext4 "-o discard" mount option set, Ext4 can now
guarantee that all file contents, file metatdata, and filenames will not
be accessible through the filesystem and will have had discard or
zeroout requests issued for corresponding device blocks.

The __jbd2_journal_erase function could also be used to discard or
zero-fill the journal during journal load after recovery. This would
provide a potential solution to a journal replay bug reported earlier this
year[2]. After a successful journal recovery, e2fsck can call this ioctl to
discard the journal as well.

[1] https://lore.kernel.org/linux-ext4/YIHknqxngB1sUdie@mit.edu/
[2] https://lore.kernel.org/linux-ext4/YDZoaacIYStFQT8g@mit.edu/

Link: https://lore.kernel.org/r/20210518151327.130198-2-leah.rumancik@gmail.com
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-22 21:34:08 -04:00
Leah Rumancik 01d5d96542 ext4: add discard/zeroout flags to journal flush
Add a flags argument to jbd2_journal_flush to enable discarding or
zero-filling the journal blocks while flushing the journal.

Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Link: https://lore.kernel.org/r/20210518151327.130198-1-leah.rumancik@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-22 19:27:10 -04:00
Jiapeng Chong 1fc57ca5a2 ext4: remove redundant assignment to error
Variable error is set to zero but this value is never read as it's not
used later on, hence it is a redundant assignment and can be removed.

Cleans up the following clang-analyzer warning:

fs/ext4/ioctl.c:657:3: warning: Value stored to 'error' is never read
[clang-analyzer-deadcode.DeadStores].

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Link: https://lore.kernel.org/r/1619691409-83160-1-git-send-email-jiapeng.chong@linux.alibaba.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-17 10:53:19 -04:00
Linus Torvalds 9f67672a81 New features for ext4 this cycle include support for encrypted
casefold, ensure that deleted file names are cleared in directory
 blocks by zeroing directory entries when they are unlinked or moved as
 part of a hash tree node split.  We also improve the block allocator's
 performance on a freshly mounted file system by prefetching block
 bitmaps.
 
 There are also the usual cleanups and bug fixes, including fixing a
 page cache invalidation race when there is mixed buffered and direct
 I/O and the block size is less than page size, and allow the dax flag
 to be set and cleared on inline directories.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmCLei4ACgkQ8vlZVpUN
 gaPZkgf/VH08xjMf3VthC+BpvVmChQXfV4yjigHbO2pmPyYWZhyJzkEGCQD8u2eB
 b7ShW+B1NCifcTU34xAkKHwEtakzzEv3WIMrT1oZNWrpfo8tt850EkwQggaGGDpd
 /HnP1/wLtziJ5hE6DwutmX7qB4VFghVj898MjDrEPSOBqItOjWps9mn/JWL7SHyI
 Dqzhf5XZTYPaXWuJmSmKw3q8O70JDHnZe/rRWlfX1jLI5KDtqp71Nw1B+gszUB66
 IUdncyZKvInsyjYhkbCQ8U6WFih82MrbKeuGYDp/RFvg5eMELEYkwT9j0ofuDHq8
 zn62sAlbOXv1DiqkPDHKVm9GkHx8/g==
 =UpnH
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "New features for ext4 this cycle include support for encrypted
  casefold, ensure that deleted file names are cleared in directory
  blocks by zeroing directory entries when they are unlinked or moved as
  part of a hash tree node split. We also improve the block allocator's
  performance on a freshly mounted file system by prefetching block
  bitmaps.

  There are also the usual cleanups and bug fixes, including fixing a
  page cache invalidation race when there is mixed buffered and direct
  I/O and the block size is less than page size, and allow the dax flag
  to be set and cleared on inline directories"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (32 commits)
  ext4: wipe ext4_dir_entry2 upon file deletion
  ext4: Fix occasional generic/418 failure
  fs: fix reporting supported extra file attributes for statx()
  ext4: allow the dax flag to be set and cleared on inline directories
  ext4: fix debug format string warning
  ext4: fix trailing whitespace
  ext4: fix various seppling typos
  ext4: fix error return code in ext4_fc_perform_commit()
  ext4: annotate data race in jbd2_journal_dirty_metadata()
  ext4: annotate data race in start_this_handle()
  ext4: fix ext4_error_err save negative errno into superblock
  ext4: fix error code in ext4_commit_super
  ext4: always panic when errors=panic is specified
  ext4: delete redundant uptodate check for buffer
  ext4: do not set SB_ACTIVE in ext4_orphan_cleanup()
  ext4: make prefetch_block_bitmaps default
  ext4: add proc files to monitor new structures
  ext4: improve cr 0 / cr 1 group scanning
  ext4: add MB_NUM_ORDERS macro
  ext4: add mballoc stats proc file
  ...
2021-04-30 15:35:30 -07:00
Theodore Ts'o 4811d9929c ext4: allow the dax flag to be set and cleared on inline directories
This is needed to allow generic/607 to pass for file systems with the
inline data_feature enabled, and it allows the use of file systems
where the directories use inline_data, while the files are accessed
via DAX.

Cc: stable@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-12 23:33:01 -04:00
Miklos Szeredi 4db5c2e623 ext4: convert to fileattr
Use the fileattr API to let the VFS handle locking, permission checking and
conversion.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
2021-04-12 15:04:29 +02:00
Linus Torvalds 7d6beb71da idmapped-mounts-v5.12
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYCegywAKCRCRxhvAZXjc
 ouJ6AQDlf+7jCQlQdeKKoN9QDFfMzG1ooemat36EpRRTONaGuAD8D9A4sUsG4+5f
 4IU5Lj9oY4DEmF8HenbWK2ZHsesL2Qg=
 =yPaw
 -----END PGP SIGNATURE-----

Merge tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull idmapped mounts from Christian Brauner:
 "This introduces idmapped mounts which has been in the making for some
  time. Simply put, different mounts can expose the same file or
  directory with different ownership. This initial implementation comes
  with ports for fat, ext4 and with Christoph's port for xfs with more
  filesystems being actively worked on by independent people and
  maintainers.

  Idmapping mounts handle a wide range of long standing use-cases. Here
  are just a few:

   - Idmapped mounts make it possible to easily share files between
     multiple users or multiple machines especially in complex
     scenarios. For example, idmapped mounts will be used in the
     implementation of portable home directories in
     systemd-homed.service(8) where they allow users to move their home
     directory to an external storage device and use it on multiple
     computers where they are assigned different uids and gids. This
     effectively makes it possible to assign random uids and gids at
     login time.

   - It is possible to share files from the host with unprivileged
     containers without having to change ownership permanently through
     chown(2).

   - It is possible to idmap a container's rootfs and without having to
     mangle every file. For example, Chromebooks use it to share the
     user's Download folder with their unprivileged containers in their
     Linux subsystem.

   - It is possible to share files between containers with
     non-overlapping idmappings.

   - Filesystem that lack a proper concept of ownership such as fat can
     use idmapped mounts to implement discretionary access (DAC)
     permission checking.

   - They allow users to efficiently changing ownership on a per-mount
     basis without having to (recursively) chown(2) all files. In
     contrast to chown (2) changing ownership of large sets of files is
     instantenous with idmapped mounts. This is especially useful when
     ownership of a whole root filesystem of a virtual machine or
     container is changed. With idmapped mounts a single syscall
     mount_setattr syscall will be sufficient to change the ownership of
     all files.

   - Idmapped mounts always take the current ownership into account as
     idmappings specify what a given uid or gid is supposed to be mapped
     to. This contrasts with the chown(2) syscall which cannot by itself
     take the current ownership of the files it changes into account. It
     simply changes the ownership to the specified uid and gid. This is
     especially problematic when recursively chown(2)ing a large set of
     files which is commong with the aforementioned portable home
     directory and container and vm scenario.

   - Idmapped mounts allow to change ownership locally, restricting it
     to specific mounts, and temporarily as the ownership changes only
     apply as long as the mount exists.

  Several userspace projects have either already put up patches and
  pull-requests for this feature or will do so should you decide to pull
  this:

   - systemd: In a wide variety of scenarios but especially right away
     in their implementation of portable home directories.

         https://systemd.io/HOME_DIRECTORY/

   - container runtimes: containerd, runC, LXD:To share data between
     host and unprivileged containers, unprivileged and privileged
     containers, etc. The pull request for idmapped mounts support in
     containerd, the default Kubernetes runtime is already up for quite
     a while now: https://github.com/containerd/containerd/pull/4734

   - The virtio-fs developers and several users have expressed interest
     in using this feature with virtual machines once virtio-fs is
     ported.

   - ChromeOS: Sharing host-directories with unprivileged containers.

  I've tightly synced with all those projects and all of those listed
  here have also expressed their need/desire for this feature on the
  mailing list. For more info on how people use this there's a bunch of
  talks about this too. Here's just two recent ones:

      https://www.cncf.io/wp-content/uploads/2020/12/Rootless-Containers-in-Gitpod.pdf
      https://fosdem.org/2021/schedule/event/containers_idmap/

  This comes with an extensive xfstests suite covering both ext4 and
  xfs:

      https://git.kernel.org/brauner/xfstests-dev/h/idmapped_mounts

  It covers truncation, creation, opening, xattrs, vfscaps, setid
  execution, setgid inheritance and more both with idmapped and
  non-idmapped mounts. It already helped to discover an unrelated xfs
  setgid inheritance bug which has since been fixed in mainline. It will
  be sent for inclusion with the xfstests project should you decide to
  merge this.

  In order to support per-mount idmappings vfsmounts are marked with
  user namespaces. The idmapping of the user namespace will be used to
  map the ids of vfs objects when they are accessed through that mount.
  By default all vfsmounts are marked with the initial user namespace.
  The initial user namespace is used to indicate that a mount is not
  idmapped. All operations behave as before and this is verified in the
  testsuite.

  Based on prior discussions we want to attach the whole user namespace
  and not just a dedicated idmapping struct. This allows us to reuse all
  the helpers that already exist for dealing with idmappings instead of
  introducing a whole new range of helpers. In addition, if we decide in
  the future that we are confident enough to enable unprivileged users
  to setup idmapped mounts the permission checking can take into account
  whether the caller is privileged in the user namespace the mount is
  currently marked with.

  The user namespace the mount will be marked with can be specified by
  passing a file descriptor refering to the user namespace as an
  argument to the new mount_setattr() syscall together with the new
  MOUNT_ATTR_IDMAP flag. The system call follows the openat2() pattern
  of extensibility.

  The following conditions must be met in order to create an idmapped
  mount:

   - The caller must currently have the CAP_SYS_ADMIN capability in the
     user namespace the underlying filesystem has been mounted in.

   - The underlying filesystem must support idmapped mounts.

   - The mount must not already be idmapped. This also implies that the
     idmapping of a mount cannot be altered once it has been idmapped.

   - The mount must be a detached/anonymous mount, i.e. it must have
     been created by calling open_tree() with the OPEN_TREE_CLONE flag
     and it must not already have been visible in the filesystem.

  The last two points guarantee easier semantics for userspace and the
  kernel and make the implementation significantly simpler.

  By default vfsmounts are marked with the initial user namespace and no
  behavioral or performance changes are observed.

  The manpage with a detailed description can be found here:

      1d7b902e28

  In order to support idmapped mounts, filesystems need to be changed
  and mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The
  patches to convert individual filesystem are not very large or
  complicated overall as can be seen from the included fat, ext4, and
  xfs ports. Patches for other filesystems are actively worked on and
  will be sent out separately. The xfstestsuite can be used to verify
  that port has been done correctly.

  The mount_setattr() syscall is motivated independent of the idmapped
  mounts patches and it's been around since July 2019. One of the most
  valuable features of the new mount api is the ability to perform
  mounts based on file descriptors only.

  Together with the lookup restrictions available in the openat2()
  RESOLVE_* flag namespace which we added in v5.6 this is the first time
  we are close to hardened and race-free (e.g. symlinks) mounting and
  path resolution.

  While userspace has started porting to the new mount api to mount
  proper filesystems and create new bind-mounts it is currently not
  possible to change mount options of an already existing bind mount in
  the new mount api since the mount_setattr() syscall is missing.

  With the addition of the mount_setattr() syscall we remove this last
  restriction and userspace can now fully port to the new mount api,
  covering every use-case the old mount api could. We also add the
  crucial ability to recursively change mount options for a whole mount
  tree, both removing and adding mount options at the same time. This
  syscall has been requested multiple times by various people and
  projects.

  There is a simple tool available at

      https://github.com/brauner/mount-idmapped

  that allows to create idmapped mounts so people can play with this
  patch series. I'll add support for the regular mount binary should you
  decide to pull this in the following weeks:

  Here's an example to a simple idmapped mount of another user's home
  directory:

	u1001@f2-vm:/$ sudo ./mount --idmap both:1000:1001:1 /home/ubuntu/ /mnt

	u1001@f2-vm:/$ ls -al /home/ubuntu/
	total 28
	drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 .
	drwxr-xr-x 4 root   root   4096 Oct 28 04:00 ..
	-rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history
	-rw-r--r-- 1 ubuntu ubuntu  220 Feb 25  2020 .bash_logout
	-rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25  2020 .bashrc
	-rw-r--r-- 1 ubuntu ubuntu  807 Feb 25  2020 .profile
	-rw-r--r-- 1 ubuntu ubuntu    0 Oct 16 16:11 .sudo_as_admin_successful
	-rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo

	u1001@f2-vm:/$ ls -al /mnt/
	total 28
	drwxr-xr-x  2 u1001 u1001 4096 Oct 28 22:07 .
	drwxr-xr-x 29 root  root  4096 Oct 28 22:01 ..
	-rw-------  1 u1001 u1001 3154 Oct 28 22:12 .bash_history
	-rw-r--r--  1 u1001 u1001  220 Feb 25  2020 .bash_logout
	-rw-r--r--  1 u1001 u1001 3771 Feb 25  2020 .bashrc
	-rw-r--r--  1 u1001 u1001  807 Feb 25  2020 .profile
	-rw-r--r--  1 u1001 u1001    0 Oct 16 16:11 .sudo_as_admin_successful
	-rw-------  1 u1001 u1001 1144 Oct 28 00:43 .viminfo

	u1001@f2-vm:/$ touch /mnt/my-file

	u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file

	u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file

	u1001@f2-vm:/$ ls -al /mnt/my-file
	-rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file

	u1001@f2-vm:/$ ls -al /home/ubuntu/my-file
	-rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file

	u1001@f2-vm:/$ getfacl /mnt/my-file
	getfacl: Removing leading '/' from absolute path names
	# file: mnt/my-file
	# owner: u1001
	# group: u1001
	user::rw-
	user:u1001:rwx
	group::rw-
	mask::rwx
	other::r--

	u1001@f2-vm:/$ getfacl /home/ubuntu/my-file
	getfacl: Removing leading '/' from absolute path names
	# file: home/ubuntu/my-file
	# owner: ubuntu
	# group: ubuntu
	user::rw-
	user:ubuntu:rwx
	group::rw-
	mask::rwx
	other::r--"

* tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: (41 commits)
  xfs: remove the possibly unused mp variable in xfs_file_compat_ioctl
  xfs: support idmapped mounts
  ext4: support idmapped mounts
  fat: handle idmapped mounts
  tests: add mount_setattr() selftests
  fs: introduce MOUNT_ATTR_IDMAP
  fs: add mount_setattr()
  fs: add attr_flags_to_mnt_flags helper
  fs: split out functions to hold writers
  namespace: only take read lock in do_reconfigure_mnt()
  mount: make {lock,unlock}_mount_hash() static
  namespace: take lock_mount_hash() directly when changing flags
  nfs: do not export idmapped mounts
  overlayfs: do not mount on top of idmapped mounts
  ecryptfs: do not mount on top of idmapped mounts
  ima: handle idmapped mounts
  apparmor: handle idmapped mounts
  fs: make helpers idmap mount aware
  exec: handle idmapped mounts
  would_dump: handle idmapped mounts
  ...
2021-02-23 13:39:45 -08:00
Eric Biggers e17fe6579d fs-verity: add FS_IOC_READ_VERITY_METADATA ioctl
Add an ioctl FS_IOC_READ_VERITY_METADATA which will allow reading verity
metadata from a file that has fs-verity enabled, including:

- The Merkle tree
- The fsverity_descriptor (not including the signature if present)
- The built-in signature, if present

This ioctl has similar semantics to pread().  It is passed the type of
metadata to read (one of the above three), and a buffer, offset, and
size.  It returns the number of bytes read or an error.

Separate patches will add support for each of the above metadata types.
This patch just adds the ioctl itself.

This ioctl doesn't make any assumption about where the metadata is
stored on-disk.  It does assume the metadata is in a stable format, but
that's basically already the case:

- The Merkle tree and fsverity_descriptor are defined by how fs-verity
  file digests are computed; see the "File digest computation" section
  of Documentation/filesystems/fsverity.rst.  Technically, the way in
  which the levels of the tree are ordered relative to each other wasn't
  previously specified, but it's logical to put the root level first.

- The built-in signature is the value passed to FS_IOC_ENABLE_VERITY.

This ioctl is useful because it allows writing a server program that
takes a verity file and serves it to a client program, such that the
client can do its own fs-verity compatible verification of the file.
This only makes sense if the client doesn't trust the server and if the
server needs to provide the storage for the client.

More concretely, there is interest in using this ability in Android to
export APK files (which are protected by fs-verity) to "protected VMs".
This would use Protected KVM (https://lwn.net/Articles/836693), which
provides an isolated execution environment without having to trust the
traditional "host".  A "guest" VM can boot from a signed image and
perform specific tasks in a minimum trusted environment using files that
have fs-verity enabled on the host, without trusting the host or
requiring that the guest has its own trusted storage.

Technically, it would be possible to duplicate the metadata and store it
in separate files for serving.  However, that would be less efficient
and would require extra care in userspace to maintain file consistency.

In addition to the above, the ability to read the built-in signatures is
useful because it allows a system that is using the in-kernel signature
verification to migrate to userspace signature verification.

Link: https://lore.kernel.org/r/20210115181819.34732-4-ebiggers@kernel.org
Reviewed-by: Victor Hsieh <victorhsieh@google.com>
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-02-07 14:51:11 -08:00
Christian Brauner 14f3db5542
ext4: support idmapped mounts
Enable idmapped mounts for ext4. All dedicated helpers we need for this
exist. So this basically just means we're passing down the
user_namespace argument from the VFS methods to the relevant helpers.

Let's create simple example where we idmap an ext4 filesystem:

 root@f2-vm:~# truncate -s 5G ext4.img

 root@f2-vm:~# mkfs.ext4 ./ext4.img
 mke2fs 1.45.5 (07-Jan-2020)
 Discarding device blocks: done
 Creating filesystem with 1310720 4k blocks and 327680 inodes
 Filesystem UUID: 3fd91794-c6ca-4b0f-9964-289a000919cf
 Superblock backups stored on blocks:
         32768, 98304, 163840, 229376, 294912, 819200, 884736

 Allocating group tables: done
 Writing inode tables: done
 Creating journal (16384 blocks): done
 Writing superblocks and filesystem accounting information: done

 root@f2-vm:~# losetup -f --show ./ext4.img
 /dev/loop0

 root@f2-vm:~# mount /dev/loop0 /mnt

 root@f2-vm:~# ls -al /mnt/
 total 24
 drwxr-xr-x  3 root root  4096 Oct 28 13:34 .
 drwxr-xr-x 30 root root  4096 Oct 28 13:22 ..
 drwx------  2 root root 16384 Oct 28 13:34 lost+found

 # Let's create an idmapped mount at /idmapped1 where we map uid and gid
 # 0 to uid and gid 1000
 root@f2-vm:/# ./mount-idmapped --map-mount b:0:1000:1 /mnt/ /idmapped1/

 root@f2-vm:/# ls -al /idmapped1/
 total 24
 drwxr-xr-x  3 ubuntu ubuntu  4096 Oct 28 13:34 .
 drwxr-xr-x 30 root   root    4096 Oct 28 13:22 ..
 drwx------  2 ubuntu ubuntu 16384 Oct 28 13:34 lost+found

 # Let's create an idmapped mount at /idmapped2 where we map uid and gid
 # 0 to uid and gid 2000
 root@f2-vm:/# ./mount-idmapped --map-mount b:0:2000:1 /mnt/ /idmapped2/

 root@f2-vm:/# ls -al /idmapped2/
 total 24
 drwxr-xr-x  3 2000 2000  4096 Oct 28 13:34 .
 drwxr-xr-x 31 root root  4096 Oct 28 13:39 ..
 drwx------  2 2000 2000 16384 Oct 28 13:34 lost+found

Let's create another example where we idmap the rootfs filesystem
without a mapping for uid 0 and gid 0:

 # Create an idmapped mount of for a full POSIX range of rootfs under
 # /mnt but without a mapping for uid 0 to reduce attack surface

 root@f2-vm:/# ./mount-idmapped --map-mount b:1:1:65536 / /mnt/

 # Since we don't have a mapping for uid and gid 0 all files owned by
 # uid and gid 0 should show up as uid and gid 65534:
 root@f2-vm:/# ls -al /mnt/
 total 664
 drwxr-xr-x 31 nobody nogroup   4096 Oct 28 13:39 .
 drwxr-xr-x 31 root   root      4096 Oct 28 13:39 ..
 lrwxrwxrwx  1 nobody nogroup      7 Aug 25 07:44 bin -> usr/bin
 drwxr-xr-x  4 nobody nogroup   4096 Oct 28 13:17 boot
 drwxr-xr-x  2 nobody nogroup   4096 Aug 25 07:48 dev
 drwxr-xr-x 81 nobody nogroup   4096 Oct 28 04:00 etc
 drwxr-xr-x  4 nobody nogroup   4096 Oct 28 04:00 home
 lrwxrwxrwx  1 nobody nogroup      7 Aug 25 07:44 lib -> usr/lib
 lrwxrwxrwx  1 nobody nogroup      9 Aug 25 07:44 lib32 -> usr/lib32
 lrwxrwxrwx  1 nobody nogroup      9 Aug 25 07:44 lib64 -> usr/lib64
 lrwxrwxrwx  1 nobody nogroup     10 Aug 25 07:44 libx32 -> usr/libx32
 drwx------  2 nobody nogroup  16384 Aug 25 07:47 lost+found
 drwxr-xr-x  2 nobody nogroup   4096 Aug 25 07:44 media
 drwxr-xr-x 31 nobody nogroup   4096 Oct 28 13:39 mnt
 drwxr-xr-x  2 nobody nogroup   4096 Aug 25 07:44 opt
 drwxr-xr-x  2 nobody nogroup   4096 Apr 15  2020 proc
 drwx--x--x  6 nobody nogroup   4096 Oct 28 13:34 root
 drwxr-xr-x  2 nobody nogroup   4096 Aug 25 07:46 run
 lrwxrwxrwx  1 nobody nogroup      8 Aug 25 07:44 sbin -> usr/sbin
 drwxr-xr-x  2 nobody nogroup   4096 Aug 25 07:44 srv
 drwxr-xr-x  2 nobody nogroup   4096 Apr 15  2020 sys
 drwxrwxrwt 10 nobody nogroup   4096 Oct 28 13:19 tmp
 drwxr-xr-x 14 nobody nogroup   4096 Oct 20 13:00 usr
 drwxr-xr-x 12 nobody nogroup   4096 Aug 25 07:45 var

 # Since we do have a mapping for uid and gid 1000 all files owned by
 # uid and gid 1000 should simply show up as uid and gid 1000:
 root@f2-vm:/# ls -al /mnt/home/ubuntu/
 total 40
 drwxr-xr-x 3 ubuntu ubuntu  4096 Oct 28 00:43 .
 drwxr-xr-x 4 nobody nogroup 4096 Oct 28 04:00 ..
 -rw------- 1 ubuntu ubuntu  2936 Oct 28 12:26 .bash_history
 -rw-r--r-- 1 ubuntu ubuntu   220 Feb 25  2020 .bash_logout
 -rw-r--r-- 1 ubuntu ubuntu  3771 Feb 25  2020 .bashrc
 -rw-r--r-- 1 ubuntu ubuntu   807 Feb 25  2020 .profile
 -rw-r--r-- 1 ubuntu ubuntu     0 Oct 16 16:11 .sudo_as_admin_successful
 -rw------- 1 ubuntu ubuntu  1144 Oct 28 00:43 .viminfo

Link: https://lore.kernel.org/r/20210121131959.646623-39-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-ext4@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24 14:43:46 +01:00
Christian Brauner 21cb47be6f
inode: make init and permission helpers idmapped mount aware
The inode_owner_or_capable() helper determines whether the caller is the
owner of the inode or is capable with respect to that inode. Allow it to
handle idmapped mounts. If the inode is accessed through an idmapped
mount it according to the mount's user namespace. Afterwards the checks
are identical to non-idmapped mounts. If the initial user namespace is
passed nothing changes so non-idmapped mounts will see identical
behavior as before.

Similarly, allow the inode_init_owner() helper to handle idmapped
mounts. It initializes a new inode on idmapped mounts by mapping the
fsuid and fsgid of the caller from the mount's user namespace. If the
initial user namespace is passed nothing changes so non-idmapped mounts
will see identical behavior as before.

Link: https://lore.kernel.org/r/20210121131959.646623-7-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24 14:27:16 +01:00
Linus Torvalds 0bc9bc1d8b A number of bug fixes for ext4:
* For the new fast_commit feature
    * Fix some error handling codepaths in whiteout handling and
      mountpoint sampling
    * Fix how we write ext4_error information so it goes through the journal
      when journalling is active, to avoid races that can lead to lost
      error information, superblock checksum failures, or DIF/DIX features.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmAB8eMACgkQ8vlZVpUN
 gaMUxAf+MW22dceTto2RO0ox9OEBNoZDFiVnlEuUaIOxkqOlovIWaqX7wwuF/121
 +FaNeDVzqNSS/QjQSB5lHF5OfHCD2u1Ef/bGzCm9cQyeN2/n0sCsStfPCcyLHy/0
 4R8PsjF0xhhbCETLcAc0U/YBFEoqSn1i7DG5nnpx63Wt1S/SSMmTAXzafWbzisEZ
 XNsz3CEPCDDSmSzOt3qMMHxkSoOZhYcLe7fCoKkhZ2pvTyrQsHrne6NNLtxc+sDL
 AcKkaI0EWFiFRhebowQO/5ouq6nnGKLCsukuZN9//Br8ht5gNcFpuKNVFl+LOiM6
 ud4H3qcRokcdPPAn3uwI0AJKFXqLvg==
 =Dgdj
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "A number of bug fixes for ext4:

   - Fix for the new fast_commit feature

   - Fix some error handling codepaths in whiteout handling and
     mountpoint sampling

   - Fix how we write ext4_error information so it goes through the
     journal when journalling is active, to avoid races that can lead to
     lost error information, superblock checksum failures, or DIF/DIX
     features"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: remove expensive flush on fast commit
  ext4: fix bug for rename with RENAME_WHITEOUT
  ext4: fix wrong list_splice in ext4_fc_cleanup
  ext4: use IS_ERR instead of IS_ERR_OR_NULL and set inode null when IS_ERR
  ext4: don't leak old mountpoint samples
  ext4: drop ext4_handle_dirty_super()
  ext4: fix superblock checksum failure when setting password salt
  ext4: use sbi instead of EXT4_SB(sb) in ext4_update_super()
  ext4: save error info to sb through journal if available
  ext4: protect superblock modifications with a buffer lock
  ext4: drop sync argument of ext4_commit_super()
  ext4: combine ext4_handle_error() and save_error_info()
2021-01-15 14:54:24 -08:00
Jan Kara dfd56c2c0c ext4: fix superblock checksum failure when setting password salt
When setting password salt in the superblock, we forget to recompute the
superblock checksum so it will not match until the next superblock
modification which recomputes the checksum. Fix it.

CC: Michael Halcrow <mhalcrow@google.com>
Reported-by: Andreas Dilger <adilger@dilger.ca>
Fixes: 9bd8212f98 ("ext4 crypto: add encryption policy and password salt support")
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20201216101844.22917-8-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-12-22 13:08:46 -05:00
Christoph Hellwig 040f04bd2e fs: simplify freeze_bdev/thaw_bdev
Store the frozen superblock in struct block_device to avoid the awkward
interface that can return a sb only used a cookie, an ERR_PTR or NULL.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Chao Yu <yuchao0@huawei.com>		[f2fs]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01 14:53:38 -07:00
Harshad Shirwadkar 8016e29f43 ext4: fast commit recovery path
This patch adds fast commit recovery path support for Ext4 file
system. We add several helper functions that are similar in spirit to
e2fsprogs journal recovery path handlers. Example of such functions
include - a simple block allocator, idempotent block bitmap update
function etc. Using these routines and the fast commit log in the fast
commit area, the recovery path (ext4_fc_replay()) performs fast commit
log recovery.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201015203802.3597742-8-harshadshirwadkar@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-21 23:22:38 -04:00
Harshad Shirwadkar aa75f4d3da ext4: main fast-commit commit path
This patch adds main fast commit commit path handlers. The overall
patch can be divided into two inter-related parts:

(A) Metadata updates tracking

    This part consists of helper functions to track changes that need
    to be committed during a commit operation. These updates are
    maintained by Ext4 in different in-memory queues. Following are
    the APIs and their short description that are implemented in this
    patch:

    - ext4_fc_track_link/unlink/creat() - Track unlink. link and creat
      operations
    - ext4_fc_track_range() - Track changed logical block offsets
      inodes
    - ext4_fc_track_inode() - Track inodes
    - ext4_fc_mark_ineligible() - Mark file system fast commit
      ineligible()
    - ext4_fc_start_update() / ext4_fc_stop_update() /
      ext4_fc_start_ineligible() / ext4_fc_stop_ineligible() These
      functions are useful for co-ordinating inode updates with
      commits.

(B) Main commit Path

    This part consists of functions to convert updates tracked in
    in-memory data structures into on-disk commits. Function
    ext4_fc_commit() is the main entry point to commit path.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201015203802.3597742-6-harshadshirwadkar@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-21 23:22:37 -04:00
brookxu 27bc446e2d ext4: limit the length of per-inode prealloc list
In the scenario of writing sparse files, the per-inode prealloc list may
be very long, resulting in high overhead for ext4_mb_use_preallocated().
To circumvent this problem, we limit the maximum length of per-inode
prealloc list to 512 and allow users to modify it.

After patching, we observed that the sys ratio of cpu has dropped, and
the system throughput has increased significantly. We created a process
to write the sparse file, and the running time of the process on the
fixed kernel was significantly reduced, as follows:

Running time on unfixed kernel:
[root@TENCENT64 ~]# time taskset 0x01 ./sparse /data1/sparce.dat
real    0m2.051s
user    0m0.008s
sys     0m2.026s

Running time on fixed kernel:
[root@TENCENT64 ~]# time taskset 0x01 ./sparse /data1/sparce.dat
real    0m0.471s
user    0m0.004s
sys     0m0.395s

Signed-off-by: Chunguang Xu <brookxu@tencent.com>
Link: https://lore.kernel.org/r/d7a98178-056b-6db5-6bce-4ead23f4a257@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-19 12:04:36 -04:00
Eric Biggers cb29a02d3a ext4: use generic names for generic ioctls
Don't define EXT4_IOC_* aliases to ioctls that already have a generic
FS_IOC_* name.  These aliases are unnecessary, and they make it unclear
which ioctls are ext4-specific and which are generic.

Exception: leave EXT4_IOC_GETVERSION_OLD and EXT4_IOC_SETVERSION_OLD
as-is for now, since renaming them to FS_IOC_GETVERSION and
FS_IOC_SETVERSION would probably make them more likely to be confused
with EXT4_IOC_GETVERSION and EXT4_IOC_SETVERSION which also exist.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200714230909.56349-1-ebiggers@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-06 01:35:05 -04:00
Theodore Ts'o 68cd44920d Enable ext4 support for per-file/directory dax operations
This adds the same per-file/per-directory DAX support for ext4 as was
done for xfs, now that we finally have consensus over what the
interface should be.
2020-06-11 10:51:44 -04:00