Commit Graph

123 Commits

Author SHA1 Message Date
Carlos Maiolino 7183df734f ext4: bail out of ext4_xattr_ibody_get() fails for any reason
JIRA: https://issues.redhat.com/browse/RHEL-5335

In ext4_update_inline_data(), if ext4_xattr_ibody_get() fails for any
reason, it's best if we just fail as opposed to stumbling on,
especially if the failure is EFSCORRUPTED.

Cc: stable@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit 2a534e1d0d1591e951f9ece2fb460b2ff92edabd)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2023-11-06 13:21:25 +01:00
Carlos Maiolino 918e3d2027 ext4: add bounds checking in get_max_inline_xattr_value_size()
JIRA: https://issues.redhat.com/browse/RHEL-5335

Normally the extended attributes in the inode body would have been
checked when the inode is first opened, but if someone is writing to
the block device while the file system is mounted, it's possible for
the inode table to get corrupted.  Add bounds checking to avoid
reading beyond the end of allocated memory if this happens.

Reported-by: syzbot+1966db24521e5f6e23f7@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=1966db24521e5f6e23f7
Cc: stable@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit 2220eaf90992c11d888fe771055d4de330385f01)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2023-11-06 13:21:25 +01:00
Carlos Maiolino 19d16a471d ext4: fix deadlock when converting an inline directory in nojournal mode
JIRA: https://issues.redhat.com/browse/RHEL-5335

In no journal mode, ext4_finish_convert_inline_dir() can self-deadlock
by calling ext4_handle_dirty_dirblock() when it already has taken the
directory lock.  There is a similar self-deadlock in
ext4_incvert_inline_data_nolock() for data files which we'll fix at
the same time.

A simple reproducer demonstrating the problem:

    mke2fs -Fq -t ext2 -O inline_data -b 4k /dev/vdc 64
    mount -t ext4 -o dirsync /dev/vdc /vdc
    cd /vdc
    mkdir file0
    cd file0
    touch file0
    touch file1
    attr -s BurnSpaceInEA -V abcde .
    touch supercalifragilisticexpialidocious

Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20230507021608.1290720-1-tytso@mit.edu
Reported-by: syzbot+91dccab7c64e2850a4e5@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=ba84cc80a9491d65416bc7877e1650c87530fe8a
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit f4ce24f54d9cca4f09a395f3eecce20d6bec4663)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2023-11-06 13:21:25 +01:00
Carlos Maiolino e48c30b13c fs/ext4: replace ternary operator with min()/max() and min_t()
JIRA: https://issues.redhat.com/browse/RHEL-5335

Fix the following coccicheck warning:

fs/ext4/inline.c:183: WARNING opportunity for min().
fs/ext4/extents.c:2631: WARNING opportunity for max().
fs/ext4/extents.c:2632: WARNING opportunity for min().
fs/ext4/extents.c:5559: WARNING opportunity for max().
fs/ext4/super.c:6908: WARNING opportunity for min().

min()/max() and min_t() macro is defined in include/linux/minmax.h.
It avoids multiple evaluations of the arguments when non-constant and
performs strict type-checking.

Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Jiangshan Yi <yijiangshan@kylinos.cn>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Link: https://lore.kernel.org/r/20220817025928.612851-1-13667453960@163.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit 66267814ba0ee0732c69ca87eb1fd6eb63bf0d5f)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2023-11-06 13:21:16 +01:00
Carlos Maiolino b0bbfd2724 ext4: move where set the MAY_INLINE_DATA flag is set
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2188241
Tested: With xfstests

The only caller of ext4_find_inline_data_nolock() that needs setting of
EXT4_STATE_MAY_INLINE_DATA flag is ext4_iget_extra_inode().  In
ext4_write_inline_data_end() we just need to update inode->i_inline_off.
Since we are going to add one more caller that does not need to set
EXT4_STATE_MAY_INLINE_DATA, just move setting of EXT4_STATE_MAY_INLINE_DATA
out to ext4_iget_extra_inode().

Signed-off-by: Ye Bin <yebin10@huawei.com>
Cc: stable@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230307015253.2232062-2-yebin@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit 1dcdce5919115a471bf4921a57f20050c545a236)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2023-04-20 11:00:42 +02:00
Chris von Recklinghausen 3e0aa0ace0 ext4: fix reading leftover inlined symlinks
Conflicts: fs/ext4/ext4.h - We already have
	ebd5d23e88b7 ("ext4: remove ext4_inline_data_fiemap() declaration")
	so remove ext4_inline_data_fiemap() declaration

Bugzilla: https://bugzilla.redhat.com/2160210

commit 5a57bca9050d740ca37184302e23d0e7633e3ebc
Author: Zhang Yi <yi.zhang@huawei.com>
Date:   Thu Jun 30 17:01:00 2022 +0800

    ext4: fix reading leftover inlined symlinks

    Since commit 6493792d3299 ("ext4: convert symlink external data block
    mapping to bdev"), create new symlink with inline_data is not supported,
    but it missing to handle the leftover inlined symlinks, which could
    cause below error message and fail to read symlink.

     ls: cannot read symbolic link 'foo': Structure needs cleaning

     EXT4-fs error (device sda): ext4_map_blocks:605: inode #12: block
     2021161080: comm ls: lblock 0 mapped to illegal pblock 2021161080
     (length 1)

    Fix this regression by adding ext4_read_inline_link(), which read the
    inline data directly and convert it through a kmalloced buffer.

    Fixes: 6493792d3299 ("ext4: convert symlink external data block mapping to b
dev")
    Cc: stable@kernel.org
    Reported-by: Torge Matthies <openglfreak@googlemail.com>
    Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
    Tested-by: Torge Matthies <openglfreak@googlemail.com>
    Link: https://lore.kernel.org/r/20220630090100.2769490-1-yi.zhang@huawei.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:30 -04:00
Chris von Recklinghausen ef6a91bc18 fs: Remove aop flags parameter from grab_cache_page_write_begin()
Conflicts: drop changes to fs/ntfs3/inode.c, fs/jffs2/file.c -
	unsupported configs

Conflicts: drop changes to fs/ntfs3/inode.c - unsupported config

Bugzilla: https://bugzilla.redhat.com/2160210

commit b7446e7cf15f0926866c8e5de90ab278998bf8c8
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Tue Feb 22 11:25:12 2022 -0500

    fs: Remove aop flags parameter from grab_cache_page_write_begin()

    There are no more aop flags left, so remove the parameter.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:18:58 -04:00
Chris von Recklinghausen cae92f5bda mm: introduce memalloc_retry_wait()
Conflicts:
	drop changes to fs/f2fs/gc.c fs/f2fs/data.c fs/f2fs/inode.c
	fs/f2fs/node.c fs/f2fs/recovery.c fs/f2fs/segment.c fs/f2fs/super.c -
		unsupported config
	fs/ext4/inline.c - The backport of
		36d116e99da7 ("ext4: Use scoped memory APIs in ext4_da_write_begin()")
		added an include of linux/sched/mm.h, citing the lack of this
		patch in its conflicts section. Keep it.

Bugzilla: https://bugzilla.redhat.com/2160210

commit 4034247a0d6ab281ba3293798ce67af494d86129
Author: NeilBrown <neilb@suse.de>
Date:   Fri Jan 14 14:07:14 2022 -0800

    mm: introduce memalloc_retry_wait()

    Various places in the kernel - largely in filesystems - respond to a
    memory allocation failure by looping around and re-trying.  Some of
    these cannot conveniently use __GFP_NOFAIL, for reasons such as:

     - a GFP_ATOMIC allocation, which __GFP_NOFAIL doesn't work on
     - a need to check for the process being signalled between failures
     - the possibility that other recovery actions could be performed
     - the allocation is quite deep in support code, and passing down an
       extra flag to say if __GFP_NOFAIL is wanted would be clumsy.

    Many of these currently use congestion_wait() which (in almost all
    cases) simply waits the given timeout - congestion isn't tracked for
    most devices.

    It isn't clear what the best delay is for loops, but it is clear that
    the various filesystems shouldn't be responsible for choosing a timeout.

    This patch introduces memalloc_retry_wait() with takes on that
    responsibility.  Code that wants to retry a memory allocation can call
    this function passing the GFP flags that were used.  It will wait
    however is appropriate.

    For now, it only considers __GFP_NORETRY and whatever
    gfpflags_allow_blocking() tests.  If blocking is allowed without
    __GFP_NORETRY, then alloc_page either made some reclaim progress, or
    waited for a while, before failing.  So there is no need for much
    further waiting.  memalloc_retry_wait() will wait until the current
    jiffie ends.  If this condition is not met, then alloc_page() won't have
    waited much if at all.  In that case memalloc_retry_wait() waits about
    200ms.  This is the delay that most current loops uses.

    linux/sched/mm.h needs to be included in some files now,
    but linux/backing-dev.h does not.

    Link: https://lkml.kernel.org/r/163754371968.13692.1277530886009912421@noble
.neil.brown.name
    Signed-off-by: NeilBrown <neilb@suse.de>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: "Theodore Ts'o" <tytso@mit.edu>
    Cc: Jaegeuk Kim <jaegeuk@kernel.org>
    Cc: Chao Yu <chao@kernel.org>
    Cc: Darrick J. Wong <djwong@kernel.org>
    Cc: Chuck Lever <chuck.lever@oracle.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:18:45 -04:00
Lukas Czerner ca963fed2a ext4: correct max_inline_xattr_value_size computing
Bugzilla: https://bugzilla.redhat.com/2145193
Tested: xfstests
Upstream Status: upstream

commit c9fd167d57133c5b748d16913c4eabc55e531c73
Author: Baokun Li <libaokun1@huawei.com>
    
    If the ext4 inode does not have xattr space, 0 is returned in the
    get_max_inline_xattr_value_size function. Otherwise, the function returns
    a negative value when the inode does not contain EXT4_STATE_XATTR.
    
    Cc: stable@kernel.org
    Signed-off-by: Baokun Li <libaokun1@huawei.com>
    Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/20220616021358.2504451-4-libaokun1@huawei.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit c9fd167d57133c5b748d16913c4eabc55e531c73)

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2023-01-12 16:15:33 +01:00
Lukas Czerner 3e259a7d95 ext4: remove unnecessary type castings
Bugzilla: https://bugzilla.redhat.com/2145193
Tested: xfstests
Upstream Status: upstream

commit c30365b90ab26fa991751119cde047312d370cab
Author: Yu Zhe <yuzhe@nfschina.com>
    
    remove unnecessary void* type castings.
    
    Signed-off-by: Yu Zhe <yuzhe@nfschina.com>
    Link: https://lore.kernel.org/r/20220401081321.73735-1-yuzhe@nfschina.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit c30365b90ab26fa991751119cde047312d370cab)

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2023-01-12 13:59:47 +01:00
Lukas Czerner d47f240482 ext4: Use scoped memory APIs in ext4_write_begin()
Bugzilla: https://bugzilla.redhat.com/2145193
Tested: xfstests
Upstream Status: upstream

commit 832ee62d992d9b2d599a6dc70ac822dec4557ea4
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
    
    Instead of setting AOP_FLAG_NOFS, use memalloc_nofs_save() and
    memalloc_nofs_restore() to prevent GFP_FS allocations recursing
    into the filesystem with a journal already started.
    
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Acked-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit 832ee62d992d9b2d599a6dc70ac822dec4557ea4)

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2023-01-12 13:59:47 +01:00
Lukas Czerner d606c48d1d ext4: Use scoped memory APIs in ext4_da_write_begin()
Bugzilla: https://bugzilla.redhat.com/2145193
Tested: xfstests
Upstream Status: upstream

commit 36d116e99da7e45c8827a157a0a92da0fbbfcaa2
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
    
    RHEL9 change: include linux/sched/mm.h in fs/ext4/inline.c because we're
                  missing upstream commit
    	      4034247a0d6ab281ba3293798ce67af494d86129 that does that.
    
    Instead of setting AOP_FLAG_NOFS, use memalloc_nofs_save() and
    memalloc_nofs_restore() to prevent GFP_FS allocations recursing
    into the filesystem with a journal already started.
    
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Acked-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit 36d116e99da7e45c8827a157a0a92da0fbbfcaa2)

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2023-01-12 13:59:47 +01:00
Lukas Czerner 2616535617 ext4: Allow GFP_FS allocations in ext4_da_convert_inline_data_to_extent()
Bugzilla: https://bugzilla.redhat.com/2145193
Tested: xfstests
Upstream Status: upstream

commit 7333ed3587700680cfcd83a72dabc37ec40f08bf
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
    
    Since commit 8bc1379b82, the transaction is stopped before calling
    ext4_da_convert_inline_data_to_extent(), which means we can do GFP_FS
    allocations and recurse into the filesystem.
    
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Acked-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit 7333ed3587700680cfcd83a72dabc37ec40f08bf)

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2023-01-12 13:59:47 +01:00
Lukas Czerner 0c41dfe5bb ext4: fix bug_on in ext4_writepages
Bugzilla: https://bugzilla.redhat.com/2099577
Tested: xfstests
Upstream Status: upstream

commit ef09ed5d37b84d18562b30cf7253e57062d0db05
Author: Ye Bin <yebin10@huawei.com>
    
    we got issue as follows:
    EXT4-fs error (device loop0): ext4_mb_generate_buddy:1141: group 0, block bitmap and bg descriptor inconsistent: 25 vs 31513 free cls
    ------------[ cut here ]------------
    kernel BUG at fs/ext4/inode.c:2708!
    invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
    CPU: 2 PID: 2147 Comm: rep Not tainted 5.18.0-rc2-next-20220413+ #155
    RIP: 0010:ext4_writepages+0x1977/0x1c10
    RSP: 0018:ffff88811d3e7880 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff88811c098000
    RDX: 0000000000000000 RSI: ffff88811c098000 RDI: 0000000000000002
    RBP: ffff888128140f50 R08: ffffffffb1ff6387 R09: 0000000000000000
    R10: 0000000000000007 R11: ffffed10250281ea R12: 0000000000000001
    R13: 00000000000000a4 R14: ffff88811d3e7bb8 R15: ffff888128141028
    FS:  00007f443aed9740(0000) GS:ffff8883aef00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000020007200 CR3: 000000011c2a4000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     <TASK>
     do_writepages+0x130/0x3a0
     filemap_fdatawrite_wbc+0x83/0xa0
     filemap_flush+0xab/0xe0
     ext4_alloc_da_blocks+0x51/0x120
     __ext4_ioctl+0x1534/0x3210
     __x64_sys_ioctl+0x12c/0x170
     do_syscall_64+0x3b/0x90
    
    It may happen as follows:
    1. write inline_data inode
    vfs_write
      new_sync_write
        ext4_file_write_iter
          ext4_buffered_write_iter
            generic_perform_write
              ext4_da_write_begin
                ext4_da_write_inline_data_begin -> If inline data size too
                small will allocate block to write, then mapping will has
                dirty page
                    ext4_da_convert_inline_data_to_extent ->clear EXT4_STATE_MAY_INLINE_DATA
    2. fallocate
    do_vfs_ioctl
      ioctl_preallocate
        vfs_fallocate
          ext4_fallocate
            ext4_convert_inline_data
              ext4_convert_inline_data_nolock
                ext4_map_blocks -> fail will goto restore data
                ext4_restore_inline_data
                  ext4_create_inline_data
                  ext4_write_inline_data
                  ext4_set_inode_state -> set inode EXT4_STATE_MAY_INLINE_DATA
    3. writepages
    __ext4_ioctl
      ext4_alloc_da_blocks
        filemap_flush
          filemap_fdatawrite_wbc
            do_writepages
              ext4_writepages
                if (ext4_has_inline_data(inode))
                  BUG_ON(ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA))
    
    The root cause of this issue is we destory inline data until call
    ext4_writepages under delay allocation mode.  But there maybe already
    convert from inline to extent.  To solve this issue, we call
    filemap_flush first..
    
    Cc: stable@kernel.org
    Signed-off-by: Ye Bin <yebin10@huawei.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/20220516122634.1690462-1-yebin10@huawei.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    (cherry picked from commit ef09ed5d37b84d18562b30cf7253e57062d0db05)

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2022-06-30 14:00:23 +02:00
Lukas Czerner 93ae46f1eb ext4: fix fs corruption when tring to remove a non-empty directory with IO error
Bugzilla: https://bugzilla.redhat.com/2079868
Tested: xfstests
Upstream Status: upstream

commit 7aab5c84a0f6ec2290e2ba4a6b245178b1bf949a
Author: Ye Bin <yebin10@huawei.com>
    
    We inject IO error when rmdir non empty direcory, then got issue as follows:
    step1: mkfs.ext4 -F /dev/sda
    step2: mount /dev/sda  test
    step3: cd test
    step4: mkdir -p 1/2
    step5: rmdir 1
    	[  110.920551] ext4_empty_dir: inject fault
    	[  110.921926] EXT4-fs warning (device sda): ext4_rmdir:3113: inode #12:
    	comm rmdir: empty directory '1' has too many links (3)
    step6: cd ..
    step7: umount test
    step8: fsck.ext4 -f /dev/sda
    	e2fsck 1.42.9 (28-Dec-2013)
    	Pass 1: Checking inodes, blocks, and sizes
    	Pass 2: Checking directory structure
    	Entry '..' in .../??? (13) has deleted/unused inode 12.  Clear<y>? yes
    	Pass 3: Checking directory connectivity
    	Unconnected directory inode 13 (...)
    	Connect to /lost+found<y>? yes
    	Pass 4: Checking reference counts
    	Inode 13 ref count is 3, should be 2.  Fix<y>? yes
    	Pass 5: Checking group summary information
    
    	/dev/sda: ***** FILE SYSTEM WAS MODIFIED *****
    	/dev/sda: 12/131072 files (0.0% non-contiguous), 26157/524288 blocks
    
    ext4_rmdir
    	if (!ext4_empty_dir(inode))
    		goto end_rmdir;
    ext4_empty_dir
    	bh = ext4_read_dirblock(inode, 0, DIRENT_HTREE);
    	if (IS_ERR(bh))
    		return true;
    Now if read directory block failed, 'ext4_empty_dir' will return true, assume
    directory is empty. Obviously, it will lead to above issue.
    To solve this issue, if read directory block failed 'ext4_empty_dir' just
    return false. To avoid making things worse when file system is already
    corrupted, 'ext4_empty_dir' also return false.
    
    Signed-off-by: Ye Bin <yebin10@huawei.com>
    Cc: stable@kernel.org
    Link: https://lore.kernel.org/r/20220228024815.3952506-1-yebin10@huawei.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2022-05-16 10:57:31 +02:00
Lukas Czerner a90307cf11 ext4: remove redundant max inline_size check in ext4_da_write_inline_data_begin()
Bugzilla: https://bugzilla.redhat.com/2079868
Tested: xfstests
Upstream Status: upstream

commit 09355d9d038a1590ee055831a4ad3a79952cfa8b
Author: Ritesh Harjani <riteshh@linux.ibm.com>
    
    ext4_prepare_inline_data() already checks for ext4_get_max_inline_size()
    and returns -ENOSPC. So there is no need to check it twice within
    ext4_da_write_inline_data_begin(). This patch removes the extra check.
    
    It also makes it more clean.
    
    No functionality change in this patch.
    
    Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/cdd1654128d5105550c65fd13ca5da53b2162cc4.1642416995.git.riteshh@linux.ibm.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2022-05-16 10:57:30 +02:00
Lukas Czerner 30434641ad ext4: fix error handling in ext4_restore_inline_data()
Bugzilla: https://bugzilla.redhat.com/2079868
Tested: xfstests
Upstream Status: upstream

commit 897026aaa73eb2517dfea8d147f20ddb0b813044
Author: Ritesh Harjani <riteshh@linux.ibm.com>
    
    While running "./check -I 200 generic/475" it sometimes gives below
    kernel BUG(). Ideally we should not call ext4_write_inline_data() if
    ext4_create_inline_data() has failed.
    
    <log snip>
    [73131.453234] kernel BUG at fs/ext4/inline.c:223!
    
    <code snip>
     212 static void ext4_write_inline_data(struct inode *inode, struct ext4_iloc *iloc,
     213                                    void *buffer, loff_t pos, unsigned int len)
     214 {
    <...>
     223         BUG_ON(!EXT4_I(inode)->i_inline_off);
     224         BUG_ON(pos + len > EXT4_I(inode)->i_inline_size);
    
    This patch handles the error and prints out a emergency msg saying potential
    data loss for the given inode (since we couldn't restore the original
    inline_data due to some previous error).
    
    [ 9571.070313] EXT4-fs (dm-0): error restoring inline_data for inode -- potential data loss! (inode 1703982, error -30)
    
    Reported-by: Eric Whitney <enwlinux@gmail.com>
    Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/9f4cd7dfd54fa58ff27270881823d94ddf78dd07.1642416995.git.riteshh@linux.ibm.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Cc: stable@kernel.org

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2022-05-16 10:57:30 +02:00
Lukas Czerner f859d9c47e ext4: remove extent cache entries when truncating inline data
Bugzilla: https://bugzilla.redhat.com/2041486
Tested: xfstests
Upstream Status: upstream

commit 0add491df4e5e2c8cc6eeeaa6dbcca50f932090c
Author: Eric Whitney <enwlinux@gmail.com>
    
    Conditionally remove all cached extents belonging to an inode
    when truncating its inline data.  It's only necessary to attempt to
    remove cached extents when a conversion from inline to extent storage
    has been initiated (!EXT4_STATE_MAY_INLINE_DATA).  This avoids
    unnecessary es lock overhead in the more common inline case.
    
    Signed-off-by: Eric Whitney <enwlinux@gmail.com>
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Link: https://lore.kernel.org/r/20210819144927.25163-2-enwlinux@gmail.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2022-01-19 10:30:50 +01:00
Lukas Czerner 01d7d209e1 ext4: factor out write end code of inline file
Bugzilla: https://bugzilla.redhat.com/2041486
Tested: xfstests
Upstream Status: upstream

commit 6984aef59814fb5c47b0e30c56e101186b5ebf8c
Author: Zhang Yi <yi.zhang@huawei.com>
    
    Now that the inline_data file write end procedure are falled into the
    common write end functions, it is not clear. Factor them out and do
    some cleanup. This patch also drop ext4_da_write_inline_data_end()
    and switch to use ext4_write_inline_data_end() instead because we also
    need to do the same error processing if we failed to write data into
    inline entry.
    
    Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Link: https://lore.kernel.org/r/20210716122024.1105856-4-yi.zhang@huawei.com

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2022-01-19 10:30:50 +01:00
Lukas Czerner d6e1d9af55 ext4: correct the error path of ext4_write_inline_data_end()
Bugzilla: https://bugzilla.redhat.com/2041486
Tested: xfstests
Upstream Status: upstream

commit 55ce2f649b9e88111270333a8127e23f4f8f42d7
Author: Zhang Yi <yi.zhang@huawei.com>
    
    Current error path of ext4_write_inline_data_end() is not correct.
    
    Firstly, it should pass out the error value if ext4_get_inode_loc()
    return fail, or else it could trigger infinite loop if we inject error
    here. And then it's better to add inode to orphan list if it return fail
    in ext4_journal_stop(), otherwise we could not restore inline xattr
    entry after power failure. Finally, we need to reset the 'ret' value if
    ext4_write_inline_data_end() return success in ext4_write_end() and
    ext4_journalled_write_end(), otherwise we could not get the error return
    value of ext4_journal_stop().
    
    Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Link: https://lore.kernel.org/r/20210716122024.1105856-3-yi.zhang@huawei.com

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2022-01-19 10:30:50 +01:00
Lukas Czerner e3ca8adeb8 ext4: Support for checksumming from journal triggers
Bugzilla: https://bugzilla.redhat.com/2041486
Tested: xfstests
Upstream Status: upstream

commit 188c299e2a26cc33747187f87c9e044dfd85a782
Author: Jan Kara <jack@suse.cz>
    
    JBD2 layer support triggers which are called when journaling layer moves
    buffer to a certain state. We can use the frozen trigger, which gets
    called when buffer data is frozen and about to be written out to the
    journal, to compute block checksums for some buffer types (similarly as
    does ocfs2). This avoids unnecessary repeated recomputation of the
    checksum (at the cost of larger window where memory corruption won't be
    caught by checksumming) and is even necessary when there are
    unsynchronized updaters of the checksummed data.
    
    So add superblock and journal trigger type arguments to
    ext4_journal_get_write_access() and ext4_journal_get_create_access() so
    that frozen triggers can be set accordingly. Also add inode argument to
    ext4_walk_page_buffers() and all the callbacks used with that function
    for the same purpose. This patch is mostly only a change of prototype of
    the above mentioned functions and a few small helpers. Real checksumming
    will come later.
    
    Reviewed-by: Theodore Ts'o <tytso@mit.edu>
    Signed-off-by: Jan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/20210816095713.16537-1-jack@suse.cz
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2022-01-19 10:30:48 +01:00
Lukas Czerner d42fd7d35c ext4: fix race writing to an inline_data file while its xattrs are changing
Bugzilla: https://bugzilla.redhat.com/2003461
Tested: xfstests
Upstream Status: upstream

commit a54c4613dac1500b40e4ab55199f7c51f028e848
Author: Theodore Ts'o <tytso@mit.edu>

    The location of the system.data extended attribute can change whenever
    xattr_sem is not taken.  So we need to recalculate the i_inline_off
    field since it mgiht have changed between ext4_write_begin() and
    ext4_write_end().

    This means that caching i_inline_off is probably not helpful, so in
    the long run we should probably get rid of it and shrink the in-memory
    ext4 inode slightly, but let's fix the race the simple way for now.

    Cc: stable@kernel.org
    Fixes: f19d5870cb ("ext4: add normal write support for inline data")
    Reported-by: syzbot+13146364637c7363a7de@syzkaller.appspotmail.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2022-01-19 10:30:48 +01:00
Ritesh Harjani 310c097c2b ext4: remove duplicate definition of ext4_xattr_ibody_inline_set()
ext4_xattr_ibody_inline_set() & ext4_xattr_ibody_set() have the exact
same definition.  Hence remove ext4_xattr_ibody_inline_set() and all
its call references. Convert the callers of it to call
ext4_xattr_ibody_set() instead.

[ Modified to preserve ext4_xattr_ibody_set() and remove
  ext4_xattr_ibody_inline_set() instead. -- TYT ]

Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/fd566b799bbbbe9b668eb5eecde5b5e319e3694f.1622685482.git.riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24 10:09:39 -04:00
Bhaskar Chowdhury 3088e5a515 ext4: fix various seppling typos
Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Link: https://lore.kernel.org/r/cover.1616840203.git.unixbhaskar@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09 23:14:59 -04:00
Daniel Rosenberg 471fbbea7f ext4: handle casefolding with encryption
This adds support for encryption with casefolding.

Since the name on disk is case preserving, and also encrypted, we can no
longer just recompute the hash on the fly. Additionally, to avoid
leaking extra information from the hash of the unencrypted name, we use
siphash via an fscrypt v2 policy.

The hash is stored at the end of the directory entry for all entries
inside of an encrypted and casefolded directory apart from those that
deal with '.' and '..'. This way, the change is backwards compatible
with existing ext4 filesystems.

[ Changed to advertise this feature via the file:
  /sys/fs/ext4/features/encrypted_casefold -- TYT ]

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20210319073414.1381041-2-drosen@google.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-05 22:04:20 -04:00
Joseph Qi 7067b26190 ext4: unlock xattr_sem properly in ext4_inline_data_truncate()
It takes xattr_sem to check inline data again but without unlock it
in case not have. So unlock it before return.

Fixes: aef1c8513c ("ext4: let ext4_truncate handle inline data correctly")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/1604370542-124630-1-git-send-email-joseph.qi@linux.alibaba.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2020-11-06 22:52:36 -05:00
Randy Dunlap b483bb7719 ext4: delete duplicated words + other fixes
Delete repeated words in fs/ext4/.
{the, this, of, we, after}

Also change spelling of "xttr" in inline.c to "xattr" in 2 places.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200805024850.12129-1-rdunlap@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18 10:36:13 -04:00
kyoungho koo 7ca4fcba92 ext4: Fix comment typo "the the".
I have found double typed comments "the the". So i modified it to
one "the"

Signed-off-by: kyoungho koo <rnrudgh@gmail.com>
Link: https://lore.kernel.org/r/20200424171620.GA11943@koo-Z370-HD3
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-19 12:04:35 -04:00
Kyoungho Koo 2fe34d2938 ext4: remove unused parameter of ext4_generic_delete_entry function
The ext4_generic_delete_entry function does not use the parameter
handle, so it can be removed.

Signed-off-by: Kyoungho Koo <rnrudgh@gmail.com>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/20200810080701.GA14160@koo-Z370-HD3
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-18 14:25:54 -04:00
Harshad Shirwadkar 4209ae12b1 ext4: handle ext4_mark_inode_dirty errors
ext4_mark_inode_dirty() can fail for real reasons. Ignoring its return
value may lead ext4 to ignore real failures that would result in
corruption / crashes. Harden ext4_mark_inode_dirty error paths to fail
as soon as possible and return errors to the caller whenever
appropriate.

One of the possible scnearios when this bug could affected is that
while creating a new inode, its directory entry gets added
successfully but while writing the inode itself mark_inode_dirty
returns error which is ignored. This would result in inconsistency
that the directory entry points to a non-existent inode.

Ran gce-xfstests smoke tests and verified that there were no
regressions.

Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20200427013438.219117-1-harshadshirwadkar@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03 23:16:50 -04:00
Theodore Ts'o 54d3adbc29 ext4: save all error info in save_error_info() and drop ext4_set_errno()
Using a separate function, ext4_set_errno() to set the errno is
problematic because it doesn't do the right thing once
s_last_error_errorcode is non-zero.  It's also less racy to set all of
the error information all at once.  (Also, as a bonus, it shrinks code
size slightly.)

Link: https://lore.kernel.org/r/20200329020404.686965-1-tytso@mit.edu
Fixes: 878520ac45 ("ext4: save the error code which triggered...")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-04-01 17:29:06 -04:00
Ritesh Harjani d3b6f23f71 ext4: move ext4_fiemap to use iomap framework
This patch moves ext4_fiemap to use iomap framework.
For xattr a new 'ext4_iomap_xattr_ops' is added.

Reported-by: kbuild test robot <lkp@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Link: https://lore.kernel.org/r/b9f45c885814fcdd0631747ff0fe08886270828c.1582880246.git.riteshh@linux.ibm.com
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-03-14 14:43:13 -04:00
Shijie Luo 8d6ce13679 ext4,jbd2: fix comment and code style
Fix comment and remove unneccessary blank.

Signed-off-by: Shijie Luo <luoshijie1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200123064325.36358-1-luoshijie1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-01-25 02:24:53 -05:00
Theodore Ts'o 878520ac45 ext4: save the error code which triggered an ext4_error() in the superblock
This allows the cause of an ext4_error() report to be categorized
based on whether it was triggered due to an I/O error, or an memory
allocation error, or other possible causes.  Most errors are caused by
a detected file system inconsistency, so the default code stored in
the superblock will be EXT4_ERR_EFSCORRUPTED.

Link: https://lore.kernel.org/r/20191204032335.7683-1-tytso@mit.edu
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-12-26 11:28:23 -05:00
Colin Ian King 7a14826ede ext4: set error return correctly when ext4_htree_store_dirent fails
Currently when the call to ext4_htree_store_dirent fails the error return
variable 'ret' is is not being set to the error code and variable count is
instead, hence the error code is not being returned.  Fix this by assigning
ret to the error return code.

Addresses-Coverity: ("Unused value")
Fixes: 8af0f08227 ("ext4: fix readdir error in the case of inline_data+dir_index")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-08-12 14:29:38 -04:00
Theodore Ts'o 7633b08b27 ext4: rename htree_inline_dir_to_tree() to ext4_inlinedir_to_tree()
Clean up namespace pollution by the inline_data code.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-06-21 21:57:00 -04:00
Theodore Ts'o ddce3b9471 ext4: refactor initialize_dirent_tail()
Move the calculation of the location of the dirent tail into
initialize_dirent_tail().  Also prefix the function with ext4_ to fix
kernel namepsace polution.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-06-21 16:31:47 -04:00
Theodore Ts'o f036adb399 ext4: rename "dirent_csum" functions to use "dirblock"
Functions such as ext4_dirent_csum_verify() and ext4_dirent_csum_set()
don't actually operate on a directory entry, but a directory block.
And while they take a struct ext4_dir_entry *dirent as an argument, it
had better be the first directory at the beginning of the direct
block, or things will go very wrong.

Rename the following functions so that things make more sense, and
remove a lot of confusing casts along the way:

   ext4_dirent_csum_verify	 -> ext4_dirblock_csum_verify
   ext4_dirent_csum_set		 -> ext4_dirblock_csum_set
   ext4_dirent_csum		 -> ext4_dirblock_csum
   ext4_handle_dirty_dirent_node -> ext4_handle_dirty_dirblock

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-06-21 15:49:26 -04:00
Gabriel Krisman Bertazi b886ee3e77 ext4: Support case-insensitive file name lookups
This patch implements the actual support for case-insensitive file name
lookups in ext4, based on the feature bit and the encoding stored in the
superblock.

A filesystem that has the casefold feature set is able to configure
directories with the +F (EXT4_CASEFOLD_FL) attribute, enabling lookups
to succeed in that directory in a case-insensitive fashion, i.e: match
a directory entry even if the name used by userspace is not a byte per
byte match with the disk name, but is an equivalent case-insensitive
version of the Unicode string.  This operation is called a
case-insensitive file name lookup.

The feature is configured as an inode attribute applied to directories
and inherited by its children.  This attribute can only be enabled on
empty directories for filesystems that support the encoding feature,
thus preventing collision of file names that only differ by case.

* dcache handling:

For a +F directory, Ext4 only stores the first equivalent name dentry
used in the dcache. This is done to prevent unintentional duplication of
dentries in the dcache, while also allowing the VFS code to quickly find
the right entry in the cache despite which equivalent string was used in
a previous lookup, without having to resort to ->lookup().

d_hash() of casefolded directories is implemented as the hash of the
casefolded string, such that we always have a well-known bucket for all
the equivalencies of the same string. d_compare() uses the
utf8_strncasecmp() infrastructure, which handles the comparison of
equivalent, same case, names as well.

For now, negative lookups are not inserted in the dcache, since they
would need to be invalidated anyway, because we can't trust missing file
dentries.  This is bad for performance but requires some leveraging of
the vfs layer to fix.  We can live without that for now, and so does
everyone else.

* on-disk data:

Despite using a specific version of the name as the internal
representation within the dcache, the name stored and fetched from the
disk is a byte-per-byte match with what the user requested, making this
implementation 'name-preserving'. i.e. no actual information is lost
when writing to storage.

DX is supported by modifying the hashes used in +F directories to make
them case/encoding-aware.  The new disk hashes are calculated as the
hash of the full casefolded string, instead of the string directly.
This allows us to efficiently search for file names in the htree without
requiring the user to provide an exact name.

* Dealing with invalid sequences:

By default, when a invalid UTF-8 sequence is identified, ext4 will treat
it as an opaque byte sequence, ignoring the encoding and reverting to
the old behavior for that unique file.  This means that case-insensitive
file name lookup will not work only for that file.  An optional bit can
be set in the superblock telling the filesystem code and userspace tools
to enforce the encoding.  When that optional bit is set, any attempt to
create a file name using an invalid UTF-8 sequence will fail and return
an error to userspace.

* Normalization algorithm:

The UTF-8 algorithms used to compare strings in ext4 is implemented
lives in fs/unicode, and is based on a previous version developed by
SGI.  It implements the Canonical decomposition (NFD) algorithm
described by the Unicode specification 12.1, or higher, combined with
the elimination of ignorable code points (NFDi) and full
case-folding (CF) as documented in fs/unicode/utf8_norm.c.

NFD seems to be the best normalization method for EXT4 because:

  - It has a lower cost than NFC/NFKC (which requires
    decomposing to NFD as an intermediary step)
  - It doesn't eliminate important semantic meaning like
    compatibility decompositions.

Although:

  - This implementation is not completely linguistic accurate, because
  different languages have conflicting rules, which would require the
  specialization of the filesystem to a given locale, which brings all
  sorts of problems for removable media and for users who use more than
  one language.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-04-25 14:12:08 -04:00
Theodore Ts'o 2b08b1f12c ext4: fix a potential fiemap/page fault deadlock w/ inline_data
The ext4_inline_data_fiemap() function calls fiemap_fill_next_extent()
while still holding the xattr semaphore.  This is not necessary and it
triggers a circular lockdep warning.  This is because
fiemap_fill_next_extent() could trigger a page fault when it writes
into page which triggers a page fault.  If that page is mmaped from
the inline file in question, this could very well result in a
deadlock.

This problem can be reproduced using generic/519 with a file system
configuration which has the inline_data feature enabled.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2018-12-25 00:56:33 -05:00
Maurizio Lombardi 132d00becb ext4: missing unlock/put_page() in ext4_try_to_write_inline_data()
In case of error, ext4_try_to_write_inline_data() should unlock
and release the page it holds.

Fixes: f19d5870cb ("ext4: add normal write support for inline data")
Cc: stable@kernel.org # 3.8
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-12-04 00:06:53 -05:00
Lukas Czerner 625ef8a3ac ext4: initialize retries variable in ext4_da_write_inline_data_begin()
Variable retries is not initialized in ext4_da_write_inline_data_begin()
which can lead to nondeterministic number of retries in case we hit
ENOSPC. Initialize retries to zero as we do everywhere else.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Fixes: bc0ca9df3b ("ext4: retry allocation when inline->extent conversion failed")
Cc: stable@kernel.org
2018-10-02 21:18:45 -04:00
Theodore Ts'o 4d982e25d0 ext4: avoid divide by zero fault when deleting corrupted inline directories
A specially crafted file system can trick empty_inline_dir() into
reading past the last valid entry in a inline directory, and then run
into the end of xattr marker. This will trigger a divide by zero
fault.  Fix this by using the size of the inline directory instead of
dir->i_size.

Also clean up error reporting in __ext4_check_dir_entry so that the
message is clearer and more understandable --- and avoids the division
by zero trap if the size passed in is zero.  (I'm not sure why we
coded it that way in the first place; printing offset % size is
actually more confusing and less useful.)

https://bugzilla.kernel.org/show_bug.cgi?id=200933

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reported-by: Wen Xu <wen.xu@gatech.edu>
Cc: stable@vger.kernel.org
2018-08-27 09:22:45 -04:00
Theodore Ts'o 362eca70b5 ext4: fix inline data updates with checksums enabled
The inline data code was updating the raw inode directly; this is
problematic since if metadata checksums are enabled,
ext4_mark_inode_dirty() must be called to update the inode's checksum.
In addition, the jbd2 layer requires that get_write_access() be called
before the metadata buffer is modified.  Fix both of these problems.

https://bugzilla.kernel.org/show_bug.cgi?id=200443

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
2018-07-10 01:07:43 -04:00
Linus Torvalds 70a2dc6abc Bug fixes for ext4; most of which relate to vulnerabilities where a
maliciously crafted file system image can result in a kernel OOPS or
 hang.  At least one fix addresses an inline data bug could be
 triggered by userspace without the need of a crafted file system
 (although it does require that the inline data feature be enabled).
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAltBmcYACgkQ8vlZVpUN
 gaPDJgf/cEa9QuiYTbNOmcOMorK9LEk5XO8qsiJdUVNQtLsHZfl0QowbkF9/F/W5
 andTJzNpFvXeLADMTTjpsDnQ90i8LKD11Kol3dPJcMhJhELtQsjxUBguxpQBP86R
 dvHuCl2/AaqX7rr6Co80yYSinRCquqkzJNhdM5/MLNGziSpkQL3dPSs93rmV+YbU
 8DkUwmhDhoiToLBTLaldrAsAzKvor3uyjNPJ3qhxeE2kXrnuI1V4XfstBGjhVKFB
 /5aYWexDZkL5qiCo+lZnqdITqUnPx3uAkUdBn0dj7V+nDow+/R/8nApvlvJu6usF
 OfMoKr098/pmPAjE5aZ8QpBNVtLFpg==
 =njzR
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 bugfixes from Ted Ts'o:
 "Bug fixes for ext4; most of which relate to vulnerabilities where a
  maliciously crafted file system image can result in a kernel OOPS or
  hang.

  At least one fix addresses an inline data bug could be triggered by
  userspace without the need of a crafted file system (although it does
  require that the inline data feature be enabled)"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: check superblock mapped prior to committing
  ext4: add more mount time checks of the superblock
  ext4: add more inode number paranoia checks
  ext4: avoid running out of journal credits when appending to an inline file
  jbd2: don't mark block as modified if the handle is out of credits
  ext4: never move the system.data xattr out of the inode body
  ext4: clear i_data in ext4_inode_info when removing inline data
  ext4: include the illegal physical block in the bad map ext4_error msg
  ext4: verify the depth of extent tree in ext4_find_extent()
  ext4: only look at the bg_flags field if it is valid
  ext4: make sure bitmaps and the inode table don't overlap with bg descriptors
  ext4: always check block group bounds in ext4_init_block_bitmap()
  ext4: always verify the magic number in xattr blocks
  ext4: add corruption check in ext4_xattr_set_entry()
  ext4: add warn_on_error mount option
2018-07-08 11:10:30 -07:00
Theodore Ts'o 8bc1379b82 ext4: avoid running out of journal credits when appending to an inline file
Use a separate journal transaction if it turns out that we need to
convert an inline file to use an data block.  Otherwise we could end
up failing due to not having journal credits.

This addresses CVE-2018-10883.

https://bugzilla.kernel.org/show_bug.cgi?id=200071

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2018-06-16 23:41:59 -04:00
Theodore Ts'o 6e8ab72a81 ext4: clear i_data in ext4_inode_info when removing inline data
When converting from an inode from storing the data in-line to a data
block, ext4_destroy_inline_data_nolock() was only clearing the on-disk
copy of the i_blocks[] array.  It was not clearing copy of the
i_blocks[] in ext4_inode_info, in i_data[], which is the copy actually
used by ext4_map_blocks().

This didn't matter much if we are using extents, since the extents
header would be invalid and thus the extents could would re-initialize
the extents tree.  But if we are using indirect blocks, the previous
contents of the i_blocks array will be treated as block numbers, with
potentially catastrophic results to the file system integrity and/or
user data.

This gets worse if the file system is using a 1k block size and
s_first_data is zero, but even without this, the file system can get
quite badly corrupted.

This addresses CVE-2018-10881.

https://bugzilla.kernel.org/show_bug.cgi?id=200015

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2018-06-15 12:28:16 -04:00
Linus Torvalds 6567af78ac Changes for 4.18:
- Strengthen inode number and structure validation when allocating inodes.
 - Reduce pointless buffer allocations during cache miss
 - Use FUA for pure data O_DSYNC directio writes
 - Various iomap refactorings
 - Strengthen quota metadata verification to avoid unfixable broken quota
 - Make AGFL block freeing a deferred operation to avoid blowing out
   transaction reservations when running complex operations
 - Get rid of the log item descriptors to reduce log overhead
 - Fix various reflink bugs where inodes were double-joined to
   transactions
 - Don't issue discards when trimming unwritten extents
 - Refactor incore dquot initialization and retrieval interfaces
 - Fix some locking problmes in the quota scrub code
 - Strengthen btree structure checks in scrub code
 - Rewrite swapfile activation to use iomap and support unwritten extents
 - Make scrub exit to userspace sooner when corruptions or
   cross-referencing problems are found
 - Make scrub invoke the data fork scrubber directly on metadata inodes
 - Don't do background reclamation of post-eof and cow blocks when the fs
   is suspended
 - Fix secondary superblock buffer lifespan hinting
 - Refactor growfs to use table-dispatched functions instead of long
   stringy functions
 - Move growfs code to libxfs
 - Implement online fs label getting and setting
 - Introduce online filesystem repair (in a very limited capacity)
 - Fix unit conversion problems in the realtime freemap iteration
   functions
 - Various refactorings and cleanups in preparation to remove buffer
   heads in a future release
 - Reimplement the old bmap call with iomap
 - Remove direct buffer head accesses from seek hole/data
 - Various bug fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAlsR9dEACgkQ+H93GTRK
 tOv0dw//cBwRgY4jhC6b9oMk2DNRWUiTt1F2yoqr28661GPo124iXAMLIwJe1DiV
 W/qpN3HUz7P46xKOVY+MXaj0JIDFxJ8c5tHAQMH/TkDc49S+mkcGyaoPJ39hnc6u
 yikG+Hq4m0YWhHaeUhKTe8pnhXBaziz5A2NtKtwh6lPOIW+Wds51T77DJnViqADq
 tZzmAq8fS9/ELpxe0Th/2D7iTWCr2c3FLsW2KgbbNvQ4e34zVE1ix1eBtEzQE+Mm
 GUjdQhYVS1oCzqZfCxJkzR4R/1TAFyS0FXOW7PHo8FAX/kas9aQbRlnHSAQ/08EE
 8Z2p3GsFip7dgmd6O6nAmFAStW6GRvgyycJ7Y+Y0IsJj6aDp9OxhRExyF+uocJR9
 b9ChOH6PMEtRB/RRlBg66pbS61abvNGutzl61ZQZGBHEvL3VqDcd68IomdD5bNSB
 pXo6mOJIcKuXsghZszsHAV9uuMe4zQAMbLy7QH6V8LyWeSAG9hTXOT9EA4MWktEJ
 SCQFf7RRPgU5pEAgOS8LgKrawqnBaqFcFvkvWsQhyiltTFz29cwxH7tjSXYMAOFE
 W+RMp8kbkPnGOaJJeKxT+/RGRB534URk0jIEKtRb679xkEF3HE58exXEVrnojJq6
 0m712+EYuZSYhFBwrvEnQjNHr0x2r/A/iBJZ6HhyV0aO1RWm4n4=
 =11pr
 -----END PGP SIGNATURE-----

Merge tag 'xfs-4.18-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs updates from Darrick Wong:
 "New features this cycle include the ability to relabel mounted
  filesystems, support for fallocated swapfiles, and using FUA for pure
  data O_DSYNC directio writes. With this cycle we begin to integrate
  online filesystem repair and refactor the growfs code in preparation
  for eventual subvolume support, though the road ahead for both
  features is quite long.

  There are also numerous refactorings of the iomap code to remove
  unnecessary log overhead, to disentangle some of the quota code, and
  to prepare for buffer head removal in a future upstream kernel.

  Metadata validation continues to improve, both in the hot path
  veifiers and the online filesystem check code. I anticipate sending a
  second pull request in a few days with more metadata validation
  improvements.

  This series has been run through a full xfstests run over the weekend
  and through a quick xfstests run against this morning's master, with
  no major failures reported.

  Summary:

   - Strengthen inode number and structure validation when allocating
     inodes.

   - Reduce pointless buffer allocations during cache miss

   - Use FUA for pure data O_DSYNC directio writes

   - Various iomap refactorings

   - Strengthen quota metadata verification to avoid unfixable broken
     quota

   - Make AGFL block freeing a deferred operation to avoid blowing out
     transaction reservations when running complex operations

   - Get rid of the log item descriptors to reduce log overhead

   - Fix various reflink bugs where inodes were double-joined to
     transactions

   - Don't issue discards when trimming unwritten extents

   - Refactor incore dquot initialization and retrieval interfaces

   - Fix some locking problmes in the quota scrub code

   - Strengthen btree structure checks in scrub code

   - Rewrite swapfile activation to use iomap and support unwritten
     extents

   - Make scrub exit to userspace sooner when corruptions or
     cross-referencing problems are found

   - Make scrub invoke the data fork scrubber directly on metadata
     inodes

   - Don't do background reclamation of post-eof and cow blocks when the
     fs is suspended

   - Fix secondary superblock buffer lifespan hinting

   - Refactor growfs to use table-dispatched functions instead of long
     stringy functions

   - Move growfs code to libxfs

   - Implement online fs label getting and setting

   - Introduce online filesystem repair (in a very limited capacity)

   - Fix unit conversion problems in the realtime freemap iteration
     functions

   - Various refactorings and cleanups in preparation to remove buffer
     heads in a future release

   - Reimplement the old bmap call with iomap

   - Remove direct buffer head accesses from seek hole/data

   - Various bug fixes"

* tag 'xfs-4.18-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (121 commits)
  fs: use ->is_partially_uptodate in page_cache_seek_hole_data
  fs: remove the buffer_unwritten check in page_seek_hole_data
  fs: move page_cache_seek_hole_data to iomap.c
  xfs: use iomap_bmap
  iomap: add an iomap-based bmap implementation
  iomap: add a iomap_sector helper
  iomap: use __bio_add_page in iomap_dio_zero
  iomap: move IOMAP_F_BOUNDARY to gfs2
  iomap: fix the comment describing IOMAP_NOWAIT
  iomap: inline data should be an iomap type, not a flag
  mm: split ->readpages calls to avoid non-contiguous pages lists
  mm: return an unsigned int from __do_page_cache_readahead
  mm: give the 'ret' variable a better name __do_page_cache_readahead
  block: add a lower-level bio_add_page interface
  xfs: fix error handling in xfs_refcount_insert()
  xfs: fix xfs_rtalloc_rec units
  xfs: strengthen rtalloc query range checks
  xfs: xfs_rtbuf_get should check the bmapi_read results
  xfs: xfs_rtword_t should be unsigned, not signed
  dax: change bdev_dax_supported() to support boolean returns
  ...
2018-06-05 13:24:20 -07:00
Christoph Hellwig 19319b5321 iomap: inline data should be an iomap type, not a flag
Inline data is fundamentally different from our normal mapped case in that
it doesn't even have a block address.  So instead of having a flag for it
it should be an entirely separate iomap range type.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-01 18:37:32 -07:00
Theodore Ts'o 117166efb1 ext4: do not allow external inodes for inline data
The inline data feature was implemented before we added support for
external inodes for xattrs.  It makes no sense to support that
combination, but the problem is that there are a number of extended
attribute checks that are skipped if e_value_inum is non-zero.

Unfortunately, the inline data code is completely e_value_inum
unaware, and attempts to interpret the xattr fields as if it were an
inline xattr --- at which point, Hilarty Ensues.

This addresses CVE-2018-11412.

https://bugzilla.kernel.org/show_bug.cgi?id=199803

Reported-by: Jann Horn <jannh@google.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Fixes: e50e5129f3 ("ext4: xattr-in-inode support")
Cc: stable@kernel.org
2018-05-22 16:15:24 -04:00