Commit Graph

216 Commits

Author SHA1 Message Date
Pavel Reichl ae01072673 ext4: fix off by one issue in alloc_flex_gd()
JIRA: https://issues.redhat.com/browse/RHEL-61252

Wesley reported an issue:

==================================================================
EXT4-fs (dm-5): resizing filesystem from 7168 to 786432 blocks
------------[ cut here ]------------
kernel BUG at fs/ext4/resize.c:324!
CPU: 9 UID: 0 PID: 3576 Comm: resize2fs Not tainted 6.11.0+ #27
RIP: 0010:ext4_resize_fs+0x1212/0x12d0
Call Trace:
 __ext4_ioctl+0x4e0/0x1800
 ext4_ioctl+0x12/0x20
 __x64_sys_ioctl+0x99/0xd0
 x64_sys_call+0x1206/0x20d0
 do_syscall_64+0x72/0x110
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
==================================================================

While reviewing the patch, Honza found that when adjusting resize_bg in
alloc_flex_gd(), it was possible for flex_gd->resize_bg to be bigger than
flexbg_size.

The reproduction of the problem requires the following:

 o_group = flexbg_size * 2 * n;
 o_size = (o_group + 1) * group_size;
 n_group: [o_group + flexbg_size, o_group + flexbg_size * 2)
 o_size = (n_group + 1) * group_size;

Take n=0,flexbg_size=16 as an example:

              last:15
|o---------------|--------------n-|
o_group:0    resize to      n_group:30

The corresponding reproducer is:

img=test.img
rm -f $img
truncate -s 600M $img
mkfs.ext4 -F $img -b 1024 -G 16 8M
dev=`losetup -f --show $img`
mkdir -p /tmp/test
mount $dev /tmp/test
resize2fs $dev 248M

Delete the problematic plus 1 to fix the issue, and add a WARN_ON_ONCE()
to prevent the issue from happening again.

[ Note: another reproucer which this commit fixes is:

  img=test.img
  rm -f $img
  truncate -s 25MiB $img
  mkfs.ext4 -b 4096 -E nodiscard,lazy_itable_init=0,lazy_journal_init=0 $img
  truncate -s 3GiB $img
  dev=`losetup -f --show $img`
  mkdir -p /tmp/test
  mount $dev /tmp/test
  resize2fs $dev 3G
  umount $dev
  losetup -d $dev

  -- TYT ]

Reported-by: Wesley Hershberger <wesley.hershberger@canonical.com>
Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2081231
Reported-by: Stéphane Graber <stgraber@stgraber.org>
Closes: https://lore.kernel.org/all/20240925143325.518508-1-aleksandr.mikhalitsyn@canonical.com/
Tested-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Tested-by: Eric Sandeen <sandeen@redhat.com>
Fixes: 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()")
Cc: stable@vger.kernel.org
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240927133329.1015041-1-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit 6121258c2b33ceac3d21f6a221452692c465df88)
Signed-off-by: Pavel Reichl <preichl@redhat.com>
2024-10-11 02:00:27 +02:00
Carlos Maiolino 71458e9a5c ext4: enable meta_bg only when new desc blocks are needed
JIRA: https://issues.redhat.com/browse/RHEL-36282
Tested: with xfstests

This patch addresses an issue observed when resize_inode is disabled
and an online extension of a filesysyem is performed. When a filesystem
is expanded to a size that does not require a addition of a new
descriptor block, the meta_bg feature is being enabled even though no
part of the filesystem uses this layout.

This patch ensures that the meta_bg feature is only enabled if
any of the added block groups utilize meta_bg layout.

Signed-off-by: Srivathsa Dara <srivathsa.d.dara@oracle.com>
Link: https://lore.kernel.org/r/20240227131329.2608466-1-srivathsa.d.dara@oracle.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit 07be778c70149321f785611a9c50125b904b0508)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2024-06-06 18:23:30 +02:00
Carlos Maiolino 8ad9ebe7ec ext4: fix corruption during on-line resize
JIRA: https://issues.redhat.com/browse/RHEL-36976
CVE: CVE-2024-35807
Tested: with xfstests

We observed a corruption during on-line resize of a file system that is
larger than 16 TiB with 4k block size. With having more then 2^32 blocks
resize_inode is turned off by default by mke2fs. The issue can be
reproduced on a smaller file system for convenience by explicitly
turning off resize_inode. An on-line resize across an 8 GiB boundary (the
size of a meta block group in this setup) then leads to a corruption:

  dev=/dev/<some_dev> # should be >= 16 GiB
  mkdir -p /corruption
  /sbin/mke2fs -t ext4 -b 4096 -O ^resize_inode $dev $((2 * 2**21 - 2**15))
  mount -t ext4 $dev /corruption

  dd if=/dev/zero bs=4096 of=/corruption/test count=$((2*2**21 - 4*2**15))
  sha1sum /corruption/test
  # 79d2658b39dcfd77274e435b0934028adafaab11  /corruption/test

  /sbin/resize2fs $dev $((2*2**21))
  # drop page cache to force reload the block from disk
  echo 1 > /proc/sys/vm/drop_caches

  sha1sum /corruption/test
  # 3c2abc63cbf1a94c9e6977e0fbd72cd832c4d5c3  /corruption/test

2^21 = 2^15*2^6 equals 8 GiB whereof 2^15 is the number of blocks per
block group and 2^6 are the number of block groups that make a meta
block group.

The last checksum might be different depending on how the file is laid
out across the physical blocks. The actual corruption occurs at physical
block 63*2^15 = 2064384 which would be the location of the backup of the
meta block group's block descriptor. During the on-line resize the file
system will be converted to meta_bg starting at s_first_meta_bg which is
2 in the example - meaning all block groups after 16 GiB. However, in
ext4_flex_group_add we might add block groups that are not part of the
first meta block group yet. In the reproducer we achieved this by
substracting the size of a whole block group from the point where the
meta block group would start. This must be considered when updating the
backup block group descriptors to follow the non-meta_bg layout. The fix
is to add a test whether the group to add is already part of the meta
block group or not.

Fixes: 01f795f9e0 ("ext4: add online resizing support for meta_bg and 64-bit file systems")
Cc:  <stable@vger.kernel.org>
Signed-off-by: Maximilian Heyne <mheyne@amazon.de>
Tested-by: Srivathsa Dara <srivathsa.d.dara@oracle.com>
Reviewed-by: Srivathsa Dara <srivathsa.d.dara@oracle.com>
Link: https://lore.kernel.org/r/20240215155009.94493-1-mheyne@amazon.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit a6b3bfe176e8a5b05ec4447404e412c2a3fc92cc)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2024-06-06 18:23:30 +02:00
Carlos Maiolino 3d1a6876fe ext4: remove unnecessary initialization of count2 in set_flexbg_block_bitmap
JIRA: https://issues.redhat.com/browse/RHEL-36282
Tested: with xfstests

We always overwrite count2 to "EXT4_CLUSTERS_PER_GROUP(sb) -
(first_cluster - start)" after its initialization in for loop
initialization statement .
Just remove unnecessary initialization of count2.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230826174712.4059355-14-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit 248b45b621a77155f81129e6b572ec833edb4cf4)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2024-06-05 16:34:51 +02:00
Carlos Maiolino ee7134e077 ext4: remove unnecessary check to avoid repeat update_backups for the same gdb
JIRA: https://issues.redhat.com/browse/RHEL-36282
Tested: with xfstests

The sbi->s_group_desc contains array of bh's for block group descriptors
and continuous EXT4_DESC_PER_BLOCK(sb) bg descriptors in single block
share the same bh.
Simply call update_backups for each gdb_bh in sbi->s_group_desc will not
update same group descriptors block for multiple times.

Commit 0acdb8876f ("ext4: don't call update_backups() multiple times for
the same bg") wrongly assumed each block group descriptor in the same block
has a individual bh and unnecessary check was added.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Link: https://lore.kernel.org/r/20230826174712.4059355-13-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit 350bb48b84b8f4ad4ea179dbb97f568d12626188)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2024-06-05 16:34:51 +02:00
Carlos Maiolino 10fcf5bd5d ext4: simplify the gdbblock calculation in add_new_gdb_meta_bg
JIRA: https://issues.redhat.com/browse/RHEL-36282
Tested: with xfstests

We always call add_new_gdb_meta_bg with first group in mete_bg. Remove the
unnecessary ext4_meta_bg_first_group conversion to simplify the gdbblock
calculation.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Link: https://lore.kernel.org/r/20230826174712.4059355-12-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit 9dca529bdaad7a7242a36d04f73cb998b817ab48)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2024-06-05 16:34:51 +02:00
Carlos Maiolino a4a8953ff7 ext4: use saved local variable sbi instead of EXT4_SB(sb)
JIRA: https://issues.redhat.com/browse/RHEL-36282
Tested: with xfstests

We save EXT4_SB(sb) to local variable sbi at beginning of function
ext4_resize_begin. Use sbi directly instead of EXT4_SB(sb) to
remove unnecessary pointer dereference.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230826174712.4059355-11-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit 70cbfd257995b3f23c2408fd893cc18b61e58b4a)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2024-06-05 16:34:51 +02:00
Carlos Maiolino b5dff72ec3 ext4: remove EXT4FS_DEBUG defination in resize.c
JIRA: https://issues.redhat.com/browse/RHEL-36282
Tested: with xfstests

Remove EXT4FS_DEBUG defination in resize.c for following reasons:
1. EXT4FS_DEBUG will enable debug messages, it should only be defined
when debugging.
2. ext4.h included from ext4_jbd2.h after EXT4FS_DEBUG defination will
"#undef EXT4FS_DEBUG", then EXT4FS_DEBUG defination in resize.c can't
actually turn on ext4_debug messages.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230826174712.4059355-10-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit 95b635689b58e0ebe5197bf99c82c681eabe17ee)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2024-06-05 16:34:51 +02:00
Carlos Maiolino 270faf2f3b ext4: calculate free_clusters_count in cluster unit in verify_group_input
JIRA: https://issues.redhat.com/browse/RHEL-36282
Tested: with xfstests

The field free_cluster_count in struct ext4_new_group_data should be
in units of clusters.  In verify_group_input() this field is being
filled in units of blocks.  Fortunately, we don't support online
resizing of bigalloc file systems, and for non-bigalloc file systems,
the cluster size == block size.  But fix this in case we do support
online resizing of bigalloc file systems in the future.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230826174712.4059355-9-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit 1fc1bd2d18bbade157f7b14270f509ebbd89881b)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2024-06-05 16:34:51 +02:00
Carlos Maiolino da6fa48c15 ext4: remove commented code in reserve_backup_gdb
JIRA: https://issues.redhat.com/browse/RHEL-36282
Tested: with xfstests

Remove commented code in reserve_backup_gdb

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230826174712.4059355-8-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit 31458077273b5f883d99bee33a7fb295f155712d)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2024-06-05 16:34:51 +02:00
Carlos Maiolino 8293033298 ext4: remove redundant check of count
JIRA: https://issues.redhat.com/browse/RHEL-36282
Tested: with xfstests

Remove zero check of count which is always non-zero.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230826174712.4059355-7-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit 7d4cd3b45af025befe3bca94f87359a6603b6e95)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2024-06-05 16:34:51 +02:00
Carlos Maiolino 6c1e083f34 ext4: fix typo in setup_new_flex_group_blocks
JIRA: https://issues.redhat.com/browse/RHEL-36282
Tested: with xfstests

grop -> group

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230826174712.4059355-6-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit e44fc921b84ff08a9e2fb827a146fa4021d016f3)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2024-06-05 16:34:51 +02:00
Carlos Maiolino 562f29eec5 ext4: remove gdb backup copy for meta bg in setup_new_flex_group_blocks
JIRA: https://issues.redhat.com/browse/RHEL-36282
Tested: with xfstests

Wrong check of gdb backup in meta bg as following:
first_group is the first group of meta_bg which contains target group, so
target group is always >= first_group. We check if target group has gdb
backup by comparing first_group with [group + 1] and [group +
EXT4_DESC_PER_BLOCK(sb) - 1]. As group >= first_group, then [group + N] is
> first_group. So no copy of gdb backup in meta bg is done in
setup_new_flex_group_blocks.

No need to do gdb backup copy in meta bg from setup_new_flex_group_blocks
as we always copy updated gdb block to backups at end of
ext4_flex_group_add as following:

ext4_flex_group_add
  /* no gdb backup copy for meta bg any more */
  setup_new_flex_group_blocks

  /* update current group number */
  ext4_update_super
    sbi->s_groups_count += flex_gd->count;

  /*
   * if group in meta bg contains backup is added, the primary gdb block
   * of the meta bg will be copy to backup in new added group here.
   */
  for (; gdb_num <= gdb_num_end; gdb_num++)
    update_backups(...)

In summary, we can remove wrong gdb backup copy code in
setup_new_flex_group_blocks.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230826174712.4059355-5-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
(cherry picked from commit 40dd7953f4d606c280074f10d23046b6812708ce)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2024-06-05 16:34:50 +02:00
Carlos Maiolino 387de1f30b ext4: correct return value of ext4_convert_meta_bg
JIRA: https://issues.redhat.com/browse/RHEL-36282
Tested: with xfstests

Avoid to ignore error in "err".

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Link: https://lore.kernel.org/r/20230826174712.4059355-4-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
(cherry picked from commit 48f1551592c54f7d8e2befc72a99ff4e47f7dca0)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2024-06-05 16:34:50 +02:00
Carlos Maiolino 8264bbe1b3 ext4: add missed brelse in update_backups
JIRA: https://issues.redhat.com/browse/RHEL-36282
Tested: with xfstests

add missed brelse in update_backups

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230826174712.4059355-3-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
(cherry picked from commit 9adac8b01f4be28acd5838aade42b8daa4f0b642)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2024-06-05 16:34:50 +02:00
Carlos Maiolino bc5d98a398 ext4: correct offset of gdb backup in non meta_bg group to update_backups
JIRA: https://issues.redhat.com/browse/RHEL-36282
Tested: with xfstests

Commit 0aeaa2559d6d5 ("ext4: fix corruption when online resizing a 1K
bigalloc fs") found that primary superblock's offset in its group is
not equal to offset of backup superblock in its group when block size
is 1K and bigalloc is enabled. As group descriptor blocks are right
after superblock, we can't pass block number of gdb to update_backups
for the same reason.

The root casue of the issue above is that leading 1K padding block is
count as data block offset for primary block while backup block has no
padding block offset in its group.

Remove padding data block count to fix the issue for gdb backups.

For meta_bg case, update_backups treat blk_off as block number, do no
conversion in this case.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230826174712.4059355-2-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
(cherry picked from commit 31f13421c004a420c0e9d288859c9ea9259ea0cc)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2024-06-05 16:34:50 +02:00
Pavel Reichl 3bd573a318 ext4: reduce unnecessary memory allocation in alloc_flex_gd()
JIRA: https://issues.redhat.com/browse/RHEL-30509
CVE: CVE-2023-52622

When a large flex_bg file system is resized, the number of groups to be
added may be small, and a large amount of memory that will not be used will
be allocated. Therefore, resize_bg can be set to the size after the number
of new_group_data to be used is aligned upwards to the power of 2. This
does not affect the disk layout after online resize and saves some memory.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20231023013057.2117948-5-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit 665d3e0af4d35acf9a5f58dfd471bc27dbf55880)
Signed-off-by: Pavel Reichl <preichl@redhat.com>
2024-04-30 10:08:34 +02:00
Pavel Reichl 6dda8325cf ext4: avoid online resizing failures due to oversized flex bg
JIRA: https://issues.redhat.com/browse/RHEL-30509
CVE: CVE-2023-52622

When we online resize an ext4 filesystem with a oversized flexbg_size,

     mkfs.ext4 -F -G 67108864 $dev -b 4096 100M
     mount $dev $dir
     resize2fs $dev 16G

the following WARN_ON is triggered:
==================================================================
WARNING: CPU: 0 PID: 427 at mm/page_alloc.c:4402 __alloc_pages+0x411/0x550
Modules linked in: sg(E)
CPU: 0 PID: 427 Comm: resize2fs Tainted: G  E  6.6.0-rc5+ #314
RIP: 0010:__alloc_pages+0x411/0x550
Call Trace:
 <TASK>
 __kmalloc_large_node+0xa2/0x200
 __kmalloc+0x16e/0x290
 ext4_resize_fs+0x481/0xd80
 __ext4_ioctl+0x1616/0x1d90
 ext4_ioctl+0x12/0x20
 __x64_sys_ioctl+0xf0/0x150
 do_syscall_64+0x3b/0x90
==================================================================

This is because flexbg_size is too large and the size of the new_group_data
array to be allocated exceeds MAX_ORDER. Currently, the minimum value of
MAX_ORDER is 8, the minimum value of PAGE_SIZE is 4096, the corresponding
maximum number of groups that can be allocated is:

 (PAGE_SIZE << MAX_ORDER) / sizeof(struct ext4_new_group_data) ≈ 21845

And the value that is down-aligned to the power of 2 is 16384. Therefore,
this value is defined as MAX_RESIZE_BG, and the number of groups added
each time does not exceed this value during resizing, and is added multiple
times to complete the online resizing. The difference is that the metadata
in a flex_bg may be more dispersed.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20231023013057.2117948-4-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit 5d1935ac02ca5aee364a449a35e2977ea84509b0)
Signed-off-by: Pavel Reichl <preichl@redhat.com>
2024-04-30 10:08:33 +02:00
Pavel Reichl 002b4ebc98 ext4: remove unnecessary check from alloc_flex_gd()
JIRA: https://issues.redhat.com/browse/RHEL-30509
CVE: CVE-2023-52622

In commit 967ac8af44 ("ext4: fix potential integer overflow in
alloc_flex_gd()"), an overflow check is added to alloc_flex_gd() to
prevent the allocated memory from being smaller than expected due to
the overflow. However, after kmalloc() is replaced with kmalloc_array()
in commit 6da2ec5605 ("treewide: kmalloc() -> kmalloc_array()"), the
kmalloc_array() function has an overflow check, so the above problem
will not occur. Therefore, the extra check is removed.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20231023013057.2117948-3-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit b099eb87de105cf07cad731ded6fb40b2675108b)
Signed-off-by: Pavel Reichl <preichl@redhat.com>
2024-04-30 10:08:33 +02:00
Pavel Reichl 57a45bc274 ext4: unify the type of flexbg_size to unsigned int
JIRA: https://issues.redhat.com/browse/RHEL-30509
CVE: CVE-2023-52622

The maximum value of flexbg_size is 2^31, but the maximum value of int
is (2^31 - 1), so overflow may occur when the type of flexbg_size is
declared as int.

For example, when uninit_mask is initialized in ext4_alloc_group_tables(),
if flexbg_size == 2^31, the initialized uninit_mask is incorrect, and this
may causes set_flexbg_block_bitmap() to trigger a BUG_ON().

Therefore, the flexbg_size type is declared as unsigned int to avoid
overflow and memory waste.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20231023013057.2117948-2-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit 658a52344fb139f9531e7543a6e0015b630feb38)
Signed-off-by: Pavel Reichl <preichl@redhat.com>
2024-04-30 10:08:33 +02:00
Carlos Maiolino 219d03447d ext4: remove unused group parameter in ext4_block_bitmap_csum_set
JIRA: https://issues.redhat.com/browse/RHEL-5335

Remove unused group parameter in ext4_block_bitmap_csum_set. After this,
group parameter in ext4_set_bitmap_checksums is also not used, just
remove it too.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230221203027.2359920-5-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit 1df9bde48fc6e8efe520f4d89ff35769b2c56b8b)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2023-11-06 13:21:20 +01:00
Carlos Maiolino 7d703c4dce ext4: remove unused group parameter in ext4_inode_bitmap_csum_set
JIRA: https://issues.redhat.com/browse/RHEL-5335

Remove unused group parameter in ext4_inode_bitmap_csum_set.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230221203027.2359920-3-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit 4fd873c8175dd695aa3de2473951c51715b64d8c)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2023-11-06 13:21:20 +01:00
Carlos Maiolino 1607927d67 ext4: remove redundant variable err
JIRA: https://issues.redhat.com/browse/RHEL-5335

Return value directly from ext4_group_extend_no_check()
instead of getting value from redundant variable err.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Jinpeng Cui <cui.jinpeng2@zte.com.cn>
Link: https://lore.kernel.org/r/20220831160843.305836-1-cui.jinpeng2@zte.com.cn
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit 71df9683827a17fb6279fcc2e52efdc7062a03b9)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2023-11-06 13:21:16 +01:00
Lukas Czerner bde2abc98e ext4: fix corruption when online resizing a 1K bigalloc fs
Bugzilla: https://bugzilla.redhat.com/2145193
Tested: xfstests
Upstream Status: upstream

commit 0aeaa2559d6d53358fca3e3fce73807367adca74
Author: Baokun Li <libaokun1@huawei.com>
    
    When a backup superblock is updated in update_backups(), the primary
    superblock's offset in the group (that is, sbi->s_sbh->b_blocknr) is used
    as the backup superblock's offset in its group. However, when the block
    size is 1K and bigalloc is enabled, the two offsets are not equal. This
    causes the backup group descriptors to be overwritten by the superblock
    in update_backups(). Moreover, if meta_bg is enabled, the file system will
    be corrupted because this feature uses backup group descriptors.
    
    To solve this issue, we use a more accurate ext4_group_first_block_no() as
    the offset of the backup superblock in its group.
    
    Fixes: d77147ff44 ("ext4: add support for online resizing with bigalloc")
    Signed-off-by: Baokun Li <libaokun1@huawei.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Cc: stable@kernel.org
    Link: https://lore.kernel.org/r/20221117040341.1380702-4-libaokun1@huawei.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    (cherry picked from commit 0aeaa2559d6d53358fca3e3fce73807367adca74)

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2023-01-12 16:15:37 +01:00
Lukas Czerner 6cd3e263e5 ext4: fix corrupt backup group descriptors after online resize
Bugzilla: https://bugzilla.redhat.com/2145193
Tested: xfstests
Upstream Status: upstream

commit 8f49ec603ae3e213bfab2799182724e3abac55a1
Author: Baokun Li <libaokun1@huawei.com>
    
    In commit 9a8c5b0d0615 ("ext4: update the backup superblock's at the end
    of the online resize"), it is assumed that update_backups() only updates
    backup superblocks, so each b_data is treated as a backupsuper block to
    update its s_block_group_nr and s_checksum. However, update_backups()
    also updates the backup group descriptors, which causes the backup group
    descriptors to be corrupted.
    
    The above commit fixes the problem of invalid checksum of the backup
    superblock. The root cause of this problem is that the checksum of
    ext4_update_super() is not set correctly. This problem has been fixed
    in the previous patch ("ext4: fix bad checksum after online resize").
    
    However, we do need to set block_group_nr for the backup superblock in
    update_backups(). When a block is in a group that contains a backup
    superblock, and the block is the first block in the group, the block is
    definitely a superblock. We add a helper function that includes setting
    s_block_group_nr and updating checksum, and then call it only when the
    above conditions are met to prevent the backup group descriptors from
    being incorrectly modified.
    
    Fixes: 9a8c5b0d0615 ("ext4: update the backup superblock's at the end of the online resize")
    Signed-off-by: Baokun Li <libaokun1@huawei.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Cc: stable@kernel.org
    Link: https://lore.kernel.org/r/20221117040341.1380702-3-libaokun1@huawei.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    (cherry picked from commit 8f49ec603ae3e213bfab2799182724e3abac55a1)

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2023-01-12 16:15:37 +01:00
Lukas Czerner 10ed471082 ext4: fix bad checksum after online resize
Bugzilla: https://bugzilla.redhat.com/2145193
Tested: xfstests
Upstream Status: upstream

commit a408f33e895e455f16cf964cb5cd4979b658db7b
Author: Baokun Li <libaokun1@huawei.com>
    
    When online resizing is performed twice consecutively, the error message
    "Superblock checksum does not match superblock" is displayed for the
    second time. Here's the reproducer:
    
    	mkfs.ext4 -F /dev/sdb 100M
    	mount /dev/sdb /tmp/test
    	resize2fs /dev/sdb 5G
    	resize2fs /dev/sdb 6G
    
    To solve this issue, we moved the update of the checksum after the
    es->s_overhead_clusters is updated.
    
    Fixes: 026d0d27c488 ("ext4: reduce computation of overhead during resize")
    Fixes: de394a86658f ("ext4: update s_overhead_clusters in the superblock during an on-line resize")
    Signed-off-by: Baokun Li <libaokun1@huawei.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Cc: stable@kernel.org
    Link: https://lore.kernel.org/r/20221117040341.1380702-2-libaokun1@huawei.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    (cherry picked from commit a408f33e895e455f16cf964cb5cd4979b658db7b)

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2023-01-12 16:15:37 +01:00
Lukas Czerner 68d8a632c1 ext4: update the backup superblock's at the end of the online resize
Bugzilla: https://bugzilla.redhat.com/2145193
Tested: xfstests
Upstream Status: upstream

commit 9a8c5b0d061554fedd7dbe894e63aa34d0bac7c4
Author: Theodore Ts'o <tytso@mit.edu>
    
    When expanding a file system using online resize, various fields in
    the superblock (e.g., s_blocks_count, s_inodes_count, etc.) change.
    To update the backup superblocks, the online resize uses the function
    update_backups() in fs/ext4/resize.c.  This function was not updating
    the checksum field in the backup superblocks.  This wasn't a big deal
    previously, because e2fsck didn't care about the checksum field in the
    backup superblock.  (And indeed, update_backups() goes all the way
    back to the ext3 days, well before we had support for metadata
    checksums.)
    
    However, there is an alternate, more general way of updating
    superblock fields, ext4_update_primary_sb() in fs/ext4/ioctl.c.  This
    function does check the checksum of the backup superblock, and if it
    doesn't match will mark the file system as corrupted.  That was
    clearly not the intent, so avoid to aborting the resize when a bad
    superblock is found.
    
    In addition, teach update_backups() to properly update the checksum in
    the backup superblocks.  We will eventually want to unify
    updapte_backups() with the infrasture in ext4_update_primary_sb(), but
    that's for another day.
    
    Note: The problem has been around for a while; it just didn't really
    matter until ext4_update_primary_sb() was added by commit bbc605cdb1e1
    ("ext4: implement support for get/set fs label").  And it became
    trivially easy to reproduce after commit 827891a38acc ("ext4: update
    the s_overhead_clusters in the backup sb's when resizing") in v6.0.
    
    Cc: stable@kernel.org # 5.17+
    Fixes: bbc605cdb1e1 ("ext4: implement support for get/set fs label")
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit 9a8c5b0d061554fedd7dbe894e63aa34d0bac7c4)

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2023-01-12 16:15:36 +01:00
Lukas Czerner 96e2d608b4 ext4: continue to expand file system when the target size doesn't reach
Bugzilla: https://bugzilla.redhat.com/2145193
Tested: xfstests
Upstream Status: upstream

commit df3cb754d13d2cd5490db9b8d536311f8413a92e
Author: Jerry Lee 李修賢 <jerrylee@qnap.com>
    
    When expanding a file system from (16TiB-2MiB) to 18TiB, the operation
    exits early which leads to result inconsistency between resize2fs and
    Ext4 kernel driver.
    
    === before ===
    ○ → resize2fs /dev/mapper/thin
    resize2fs 1.45.5 (07-Jan-2020)
    Filesystem at /dev/mapper/thin is mounted on /mnt/test; on-line resizing required
    old_desc_blocks = 2048, new_desc_blocks = 2304
    The filesystem on /dev/mapper/thin is now 4831837696 (4k) blocks long.
    
    [  865.186308] EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
    [  912.091502] dm-4: detected capacity change from 34359738368 to 38654705664
    [  970.030550] dm-5: detected capacity change from 34359734272 to 38654701568
    [ 1000.012751] EXT4-fs (dm-5): resizing filesystem from 4294966784 to 4831837696 blocks
    [ 1000.012878] EXT4-fs (dm-5): resized filesystem to 4294967296
    
    === after ===
    [  129.104898] EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
    [  143.773630] dm-4: detected capacity change from 34359738368 to 38654705664
    [  198.203246] dm-5: detected capacity change from 34359734272 to 38654701568
    [  207.918603] EXT4-fs (dm-5): resizing filesystem from 4294966784 to 4831837696 blocks
    [  207.918754] EXT4-fs (dm-5): resizing filesystem from 4294967296 to 4831837696 blocks
    [  207.918758] EXT4-fs (dm-5): Converting file system to meta_bg
    [  207.918790] EXT4-fs (dm-5): resizing filesystem from 4294967296 to 4831837696 blocks
    [  221.454050] EXT4-fs (dm-5): resized to 4658298880 blocks
    [  227.634613] EXT4-fs (dm-5): resized filesystem to 4831837696
    
    Signed-off-by: Jerry Lee <jerrylee@qnap.com>
    Link: https://lore.kernel.org/r/PU1PR04MB22635E739BD21150DC182AC6A18C9@PU1PR04MB2263.apcprd04.prod.outlook.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit df3cb754d13d2cd5490db9b8d536311f8413a92e)

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2023-01-12 16:15:34 +01:00
Lukas Czerner 1b017d27ff ext4: avoid resizing to a partial cluster size
Bugzilla: https://bugzilla.redhat.com/2145193
Tested: xfstests
Upstream Status: upstream

commit 69cb8e9d8cd97cdf5e293b26d70a9dee3e35e6bd
Author: Kiselev, Oleg <okiselev@amazon.com>
    
    This patch avoids an attempt to resize the filesystem to an
    unaligned cluster boundary.  An online resize to a size that is not
    integral to cluster size results in the last iteration attempting to
    grow the fs by a negative amount, which trips a BUG_ON and leaves the fs
    with a corrupted in-memory superblock.
    
    Signed-off-by: Oleg Kiselev <okiselev@amazon.com>
    Link: https://lore.kernel.org/r/0E92A0AB-4F16-4F1A-94B7-702CC6504FDE@amazon.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit 69cb8e9d8cd97cdf5e293b26d70a9dee3e35e6bd)

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2023-01-12 16:15:34 +01:00
Lukas Czerner f8c55b9ef8 ext4: reduce computation of overhead during resize
Bugzilla: https://bugzilla.redhat.com/2145193
Tested: xfstests
Upstream Status: upstream

commit 026d0d27c4882303e4b071ca6d996640cc2932c3
Author: Kiselev, Oleg <okiselev@amazon.com>
    
    This patch avoids doing an O(n**2)-complexity walk through every flex group.
    Instead, it uses the already computed overhead information for the newly
    allocated space, and simply adds it to the previously calculated
    overhead stored in the superblock.  This drastically reduces the time
    taken to resize very large bigalloc filesystems (from 3+ hours for a
    64TB fs down to milliseconds).
    
    Signed-off-by: Oleg Kiselev <okiselev@amazon.com>
    Link: https://lore.kernel.org/r/CE4F359F-4779-45E6-B6A9-8D67FDFF5AE2@amazon.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit 026d0d27c4882303e4b071ca6d996640cc2932c3)

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2023-01-12 16:15:34 +01:00
Lukas Czerner a49d3244ad ext4: update the s_overhead_clusters in the backup sb's when resizing
Bugzilla: https://bugzilla.redhat.com/2145193
Tested: xfstests
Upstream Status: upstream

commit 827891a38accfb4e04dbcdefe710f8746c6ad16d
Author: Theodore Ts'o <tytso@mit.edu>
    
    When the EXT4_IOC_RESIZE_FS ioctl is complete, update the backup
    superblocks.  We don't do this for the old-style resize ioctls since
    they are quite ancient, and only used by very old versions of
    resize2fs --- and we don't want to update the backup superblocks every
    time EXT4_IOC_GROUP_ADD is called, since it might get called a lot.
    
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Reviewed-by: Andreas Dilger <adilger@dilger.ca>
    Link: https://lore.kernel.org/r/20220629040026.112371-2-tytso@mit.edu
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit 827891a38accfb4e04dbcdefe710f8746c6ad16d)

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2023-01-12 16:15:33 +01:00
Lukas Czerner d55eb0db17 ext4: update s_overhead_clusters in the superblock during an on-line resize
Bugzilla: https://bugzilla.redhat.com/2145193
Tested: xfstests
Upstream Status: upstream

commit de394a86658ffe4e89e5328fd4993abfe41b7435
Author: Theodore Ts'o <tytso@mit.edu>
    
    When doing an online resize, the on-disk superblock on-disk wasn't
    updated.  This means that when the file system is unmounted and
    remounted, and the on-disk overhead value is non-zero, this would
    result in the results of statfs(2) to be incorrect.
    
    This was partially fixed by Commits 10b01ee92df5 ("ext4: fix overhead
    calculation to account for the reserved gdt blocks"), 85d825dbf489
    ("ext4: force overhead calculation if the s_overhead_cluster makes no
    sense"), and eb7054212eac ("ext4: update the cached overhead value in
    the superblock").
    
    However, since it was too expensive to forcibly recalculate the
    overhead for bigalloc file systems at every mount, this didn't fix the
    problem for bigalloc file systems.  This commit should address the
    problem when resizing file systems with the bigalloc feature enabled.
    
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Cc: stable@kernel.org
    Reviewed-by: Andreas Dilger <adilger@dilger.ca>
    Link: https://lore.kernel.org/r/20220629040026.112371-1-tytso@mit.edu
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit de394a86658ffe4e89e5328fd4993abfe41b7435)

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2023-01-12 16:15:33 +01:00
Lukas Czerner 3c99fd9426 ext4: add reserved GDT blocks check
Bugzilla: https://bugzilla.redhat.com/2099577
Tested: xfstests
Upstream Status: upstream

commit b55c3cd102a6f48b90e61c44f7f3dda8c290c694
Author: Zhang Yi <yi.zhang@huawei.com>
    
    We capture a NULL pointer issue when resizing a corrupt ext4 image which
    is freshly clear resize_inode feature (not run e2fsck). It could be
    simply reproduced by following steps. The problem is because of the
    resize_inode feature was cleared, and it will convert the filesystem to
    meta_bg mode in ext4_resize_fs(), but the es->s_reserved_gdt_blocks was
    not reduced to zero, so could we mistakenly call reserve_backup_gdb()
    and passing an uninitialized resize_inode to it when adding new group
    descriptors.
    
     mkfs.ext4 /dev/sda 3G
     tune2fs -O ^resize_inode /dev/sda #forget to run requested e2fsck
     mount /dev/sda /mnt
     resize2fs /dev/sda 8G
    
     ========
     BUG: kernel NULL pointer dereference, address: 0000000000000028
     CPU: 19 PID: 3243 Comm: resize2fs Not tainted 5.18.0-rc7-00001-gfde086c5ebfd #748
     ...
     RIP: 0010:ext4_flex_group_add+0xe08/0x2570
     ...
     Call Trace:
      <TASK>
      ext4_resize_fs+0xbec/0x1660
      __ext4_ioctl+0x1749/0x24e0
      ext4_ioctl+0x12/0x20
      __x64_sys_ioctl+0xa6/0x110
      do_syscall_64+0x3b/0x90
      entry_SYSCALL_64_after_hwframe+0x44/0xae
     RIP: 0033:0x7f2dd739617b
     ========
    
    The fix is simple, add a check in ext4_resize_begin() to make sure that
    the es->s_reserved_gdt_blocks is zero when the resize_inode feature is
    disabled.
    
    Cc: stable@kernel.org
    Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
    Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/20220601092717.763694-1-yi.zhang@huawei.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    (cherry picked from commit b55c3cd102a6f48b90e61c44f7f3dda8c290c694)

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2022-06-30 14:00:23 +02:00
Lukas Czerner c6518633cb ext4: use time_is_before_jiffies() instead of open coding it
Bugzilla: https://bugzilla.redhat.com/2079868
Tested: xfstests
Upstream Status: upstream

commit a861fb9fa51da7b1957f612b742ce62a95591628
Author: Wang Qing <wangqing@vivo.com>
    
    Use the helper function time_is_{before,after}_jiffies() to improve
    code readability.
    
    Signed-off-by: Wang Qing <wangqing@vivo.com>
    Link: https://lore.kernel.org/r/1646018120-61462-1-git-send-email-wangqing@vivo.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2022-05-16 10:57:31 +02:00
Lukas Czerner 80dc1a0201 ext4: rename ext4_set_bits to mb_set_bits
Bugzilla: https://bugzilla.redhat.com/2079868
Tested: xfstests
Upstream Status: upstream

commit 123e3016ee9b3674a819537bc4c3174e25cd48fc
Author: Ritesh Harjani <riteshh@linux.ibm.com>
    
    ext4_set_bits() should actually be mb_set_bits() for uniform API naming
    convention.
    This is via below cmd -
    
    grep -nr "ext4_set_bits" fs/ext4/ | cut -d ":" -f 1 | xargs sed -i 's/ext4_set_bits/mb_set_bits/g'
    
    Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/f1f6ece1405b76a7a987e9145d1adfaf71e30695.1644992610.git.riteshh@linux.ibm.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2022-05-16 10:57:31 +02:00
Lukas Czerner fcaed67b6a ext4: implement support for get/set fs label
Bugzilla: https://bugzilla.redhat.com/2041486
Tested: xfstests
Upstream Status: upstream

commit bbc605cdb1e15aafaec899fedc385dc75dddac0e
Author: Lukas Czerner <lczerner@redhat.com>
    
    Implement support for FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL ioctls for
    online reading and setting of file system label.
    
    ext4_ioctl_getlabel() is simple, just get the label from the primary
    superblock. This might not be the first sb on the file system if
    'sb=' mount option is used.
    
    In ext4_ioctl_setlabel() we update what ext4 currently views as a
    primary superblock and then proceed to update backup superblocks. There
    are two caveats:
     - the primary superblock might not be the first superblock and so it
       might not be the one used by userspace tools if read directly
       off the disk.
     - because the primary superblock might not be the first superblock we
       potentialy have to update it as part of backup superblock update.
       However the first sb location is a bit more complicated than the rest
       so we have to account for that.
    
    The superblock modification is created generic enough so the
    infrastructure can be used for other potential superblock modification
    operations, such as chaning UUID.
    
    Tested with generic/492 with various configurations. I also checked the
    behavior with 'sb=' mount options, including very large file systems
    with and without sparse_super/sparse_super2.
    
    Signed-off-by: Lukas Czerner <lczerner@redhat.com>
    Link: https://lore.kernel.org/r/20211213135618.43303-1-lczerner@redhat.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2022-01-19 10:31:00 +01:00
Lukas Czerner e3ca8adeb8 ext4: Support for checksumming from journal triggers
Bugzilla: https://bugzilla.redhat.com/2041486
Tested: xfstests
Upstream Status: upstream

commit 188c299e2a26cc33747187f87c9e044dfd85a782
Author: Jan Kara <jack@suse.cz>
    
    JBD2 layer support triggers which are called when journaling layer moves
    buffer to a certain state. We can use the frozen trigger, which gets
    called when buffer data is frozen and about to be written out to the
    journal, to compute block checksums for some buffer types (similarly as
    does ocfs2). This avoids unnecessary repeated recomputation of the
    checksum (at the cost of larger window where memory corruption won't be
    caught by checksumming) and is even necessary when there are
    unsynchronized updaters of the checksummed data.
    
    So add superblock and journal trigger type arguments to
    ext4_journal_get_write_access() and ext4_journal_get_create_access() so
    that frozen triggers can be set accordingly. Also add inode argument to
    ext4_walk_page_buffers() and all the callbacks used with that function
    for the same purpose. This patch is mostly only a change of prototype of
    the above mentioned functions and a few small helpers. Real checksumming
    will come later.
    
    Reviewed-by: Theodore Ts'o <tytso@mit.edu>
    Signed-off-by: Jan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/20210816095713.16537-1-jack@suse.cz
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2022-01-19 10:30:48 +01:00
Theodore Ts'o 8813587a99 Revert "ext4: consolidate checks for resize of bigalloc into ext4_resize_begin"
The function ext4_resize_begin() gets called from three different
places, and online resize for bigalloc file systems is disallowed from
the old-style online resize (EXT4_IOC_GROUP_ADD and
EXT4_IOC_GROUP_EXTEND), but it *is* supposed to be allowed via
EXT4_IOC_RESIZE_FS.

This reverts commit e9f9f61d0c.
2021-06-30 20:54:22 -04:00
Josh Triplett b1489186cc ext4: add check to prevent attempting to resize an fs with sparse_super2
The in-kernel ext4 resize code doesn't support filesystem with the
sparse_super2 feature. It fails with errors like this and doesn't finish
the resize:
EXT4-fs (loop0): resizing filesystem from 16640 to 7864320 blocks
EXT4-fs warning (device loop0): verify_reserved_gdb:760: reserved GDT 2 missing grp 1 (32770)
EXT4-fs warning (device loop0): ext4_resize_fs:2111: error (-22) occurred during file system resize
EXT4-fs (loop0): resized filesystem to 2097152

To reproduce:
mkfs.ext4 -b 4096 -I 256 -J size=32 -E resize=$((256*1024*1024)) -O sparse_super2 ext4.img 65M
truncate -s 30G ext4.img
mount ext4.img /mnt
python3 -c 'import fcntl, os, struct ; fd = os.open("/mnt", os.O_RDONLY | os.O_DIRECTORY) ; fcntl.ioctl(fd, 0x40086610, struct.pack("Q", 30 * 1024 * 1024 * 1024 // 4096), False) ; os.close(fd)'
dmesg | tail
e2fsck ext4.img

The userspace resize2fs tool has a check for this case: it checks if the
filesystem has sparse_super2 set and if the kernel provides
/sys/fs/ext4/features/sparse_super2. However, the former check requires
manually reading and parsing the filesystem superblock.

Detect this case in ext4_resize_begin and error out early with a clear
error message.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Link: https://lore.kernel.org/r/74b8ae78405270211943cd7393e65586c5faeed1.1623093259.git.josh@joshtriplett.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24 10:22:36 -04:00
Josh Triplett e9f9f61d0c ext4: consolidate checks for resize of bigalloc into ext4_resize_begin
Two different places checked for attempts to resize a filesystem with
the bigalloc feature. Move the check into ext4_resize_begin, which both
places already call.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Link: https://lore.kernel.org/r/bee03303d999225ecb3bfa5be8576b2f4c6edbe6.1623093259.git.josh@joshtriplett.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24 10:22:36 -04:00
Jan Kara a3f5cf14ff ext4: drop ext4_handle_dirty_super()
The wrapper is now useless since it does what
ext4_handle_dirty_metadata() does. Just remove it.

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20201216101844.22917-9-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-12-22 13:08:46 -05:00
Jan Kara 05c2c00f37 ext4: protect superblock modifications with a buffer lock
Protect all superblock modifications (including checksum computation)
with a superblock buffer lock. That way we are sure computed checksum
matches current superblock contents (a mismatch could cause checksum
failures in nojournal mode or if an unjournalled superblock update races
with a journalled one). Also we avoid modifying superblock contents
while it is being written out (which can cause DIF/DIX failures if we
are running in nojournal mode).

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20201216101844.22917-4-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-12-22 13:08:46 -05:00
zhangyi (F) 0a846f496d ext4: use ext4_sb_bread() instead of sb_bread()
We have already remove open codes that invoke helpers provide by
fs/buffer.c in all places reading metadata buffers. This patch switch to
use ext4_sb_bread() to replace all sb_bread() helpers, which is
ext4_read_bh() helper back end.

Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20200924073337.861472-7-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18 10:37:14 -04:00
zhangyi (F) 2d069c0889 ext4: use common helpers in all places reading metadata buffers
Revome all open codes that read metadata buffers, switch to use
ext4_read_bh_*() common helpers.

Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Suggested-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200924073337.861472-4-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18 10:37:14 -04:00
Dinghao Liu c9e87161cc ext4: fix error handling code in add_new_gdb
When ext4_journal_get_write_access() fails, we should
terminate the execution flow and release n_group_desc,
iloc.bh, dind and gdb_bh.

Cc: stable@kernel.org
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20200829025403.3139-1-dinghao.liu@zju.edu.cn
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18 10:36:14 -04:00
Suraj Jitindar Singh 7c990728b9 ext4: fix potential race between s_flex_groups online resizing and access
During an online resize an array of s_flex_groups structures gets replaced
so it can get enlarged. If there is a concurrent access to the array and
this memory has been reused then this can lead to an invalid memory access.

The s_flex_group array has been converted into an array of pointers rather
than an array of structures. This is to ensure that the information
contained in the structures cannot get out of sync during a resize due to
an accessor updating the value in the old structure after it has been
copied but before the array pointer is updated. Since the structures them-
selves are no longer copied but only the pointers to them this case is
mitigated.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206443
Link: https://lore.kernel.org/r/20200221053458.730016-4-tytso@mit.edu
Signed-off-by: Suraj Jitindar Singh <surajjs@amazon.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2020-02-21 19:31:46 -05:00
Theodore Ts'o 1d0c3924a9 ext4: fix potential race between online resizing and write operations
During an online resize an array of pointers to buffer heads gets
replaced so it can get enlarged.  If there is a racing block
allocation or deallocation which uses the old array, and the old array
has gotten reused this can lead to a GPF or some other random kernel
memory getting modified.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206443
Link: https://lore.kernel.org/r/20200221053458.730016-2-tytso@mit.edu
Reported-by: Suraj Jitindar Singh <surajjs@amazon.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2020-02-21 00:37:09 -05:00
Theodore Ts'o 71b565ceff ext4: drop ext4_kvmalloc()
As Jan pointed out[1], as of commit 81378da64d ("jbd2: mark the
transaction context with the scope GFP_NOFS context") we use
memalloc_nofs_{save,restore}() while a jbd2 handle is active.  So
ext4_kvmalloc() so we can call allocate using GFP_NOFS is no longer
necessary.

[1] https://lore.kernel.org/r/20200109100007.GC27035@quack2.suse.cz

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20200116155031.266620-1-tytso@mit.edu
Reviewed-by: Jan Kara <jack@suse.cz>
2020-01-17 16:24:55 -05:00
Jan Kara 83448bdfb5 ext4: Reserve revoke credits for freed blocks
So far we have reserved only relatively high fixed amount of revoke
credits for each transaction. We over-reserved by large amount for most
cases but when freeing large directories or files with data journalling,
the fixed amount is not enough. In fact the worst case estimate is
inconveniently large (maximum extent size) for freeing of one extent.

We fix this by doing proper estimate of the amount of blocks that need
to be revoked when removing blocks from the inode due to truncate or
hole punching and otherwise reserve just a small amount of revoke
credits for each transaction to accommodate freeing of xattrs block or
so.

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-23-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-05 16:00:49 -05:00
Jan Kara a413036791 ext4: Provide function to handle transaction restarts
Provide ext4_journal_ensure_credits_fn() function to ensure transaction
has given amount of credits and call helper function to prepare for
restarting a transaction. This allows to remove some boilerplate code
from various places, add proper error handling for the case where
transaction extension or restart fails, and reduces following changes
needed for proper revoke record reservation tracking.

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-10-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-05 16:00:48 -05:00