Centos-kernel-stream-9

Commit Graph

Author	SHA1	Message	Date
Bill O'Donnell	b282be6864	xfs: ensure submit buffers on LSN boundaries in error handlers JIRA: https://issues.redhat.com/browse/RHEL-68860 Conflicts: out of order application required line numbering change. commit e4c3b72a6ea93ed9c1815c74312eee9305638852 Author: Long Li <leo.lilong@huawei.com> Date: Wed Jan 17 20:31:26 2024 +0800 xfs: ensure submit buffers on LSN boundaries in error handlers While performing the IO fault injection test, I caught the following data corruption report: XFS (dm-0): Internal error ltbno + ltlen > bno at line 1957 of file fs/xfs/libxfs/xfs_alloc.c. Caller xfs_free_ag_extent+0x79c/0x1130 CPU: 3 PID: 33 Comm: kworker/3:0 Not tainted 6.5.0-rc7-next-20230825-00001-g7f8666926889 #214 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014 Workqueue: xfs-inodegc/dm-0 xfs_inodegc_worker Call Trace: <TASK> dump_stack_lvl+0x50/0x70 xfs_corruption_error+0x134/0x150 xfs_free_ag_extent+0x7d3/0x1130 __xfs_free_extent+0x201/0x3c0 xfs_trans_free_extent+0x29b/0xa10 xfs_extent_free_finish_item+0x2a/0xb0 xfs_defer_finish_noroll+0x8d1/0x1b40 xfs_defer_finish+0x21/0x200 xfs_itruncate_extents_flags+0x1cb/0x650 xfs_free_eofblocks+0x18f/0x250 xfs_inactive+0x485/0x570 xfs_inodegc_worker+0x207/0x530 process_scheduled_works+0x24a/0xe10 worker_thread+0x5ac/0xc60 kthread+0x2cd/0x3c0 ret_from_fork+0x4a/0x80 ret_from_fork_asm+0x11/0x20 </TASK> XFS (dm-0): Corruption detected. Unmount and run xfs_repair After analyzing the disk image, it was found that the corruption was triggered by the fact that extent was recorded in both inode datafork and AGF btree blocks. After a long time of reproduction and analysis, we found that the reason of free sapce btree corruption was that the AGF btree was not recovered correctly. Consider the following situation, Checkpoint A and Checkpoint B are in the same record and share the same start LSN1, buf items of same object (AGF btree block) is included in both Checkpoint A and Checkpoint B. If the buf item in Checkpoint A has been recovered and updates metadata LSN permanently, then the buf item in Checkpoint B cannot be recovered, because log recovery skips items with a metadata LSN >= the current LSN of the recovery item. If there is still an inode item in Checkpoint B that records the Extent X, the Extent X will be recorded in both inode datafork and AGF btree block after Checkpoint B is recovered. Such transaction can be seen when allocing enxtent for inode bmap, it record both the addition of extent to the inode extent list and the removing extent from the AGF. \|------------Record (LSN1)------------------\|---Record (LSN2)---\| \|-------Checkpoint A----------\|----------Checkpoint B-----------\| \| Buf Item(Extent X) \| Buf Item / Inode item(Extent X) \| \| Extent X is freed \| Extent X is allocated \| After commit `12818d24db` ("xfs: rework log recovery to submit buffers on LSN boundaries") was introduced, we submit buffers on lsn boundaries during log recovery. The above problem can be avoided under normal paths, but it's not guaranteed under abnormal paths. Consider the following process, if an error was encountered after recover buf item in Checkpoint A and before recover buf item in Checkpoint B, buffers that have been added to the buffer_list will still be submitted, this violates the submits rule on lsn boundaries. So buf item in Checkpoint B cannot be recovered on the next mount due to current lsn of transaction equal to metadata lsn on disk. The detailed process of the problem is as follows. First Mount: xlog_do_recovery_pass error = xlog_recover_process xlog_recover_process_data xlog_recover_process_ophdr xlog_recovery_process_trans ... /* recover buf item in Checkpoint A / xlog_recover_buf_commit_pass2 xlog_recover_do_reg_buffer / add buffer of agf btree block to buffer_list / xfs_buf_delwri_queue(bp, buffer_list) ... ==> Encounter read IO error and return / submit buffers regardless of error / if (!list_empty(&buffer_list)) xfs_buf_delwri_submit(&buffer_list); <buf items of agf btree block in Checkpoint A recovery success> Second Mount: xlog_do_recovery_pass error = xlog_recover_process xlog_recover_process_data xlog_recover_process_ophdr xlog_recovery_process_trans ... / recover buf item in Checkpoint B / xlog_recover_buf_commit_pass2 / buffer of agf btree block wouldn't added to buffer_list due to lsn equal to current_lsn */ if (XFS_LSN_CMP(lsn, current_lsn) >= 0) goto out_release <buf items of agf btree block in Checkpoint B wouldn't recovery> In order to make sure that submits buffers on lsn boundaries in the abnormal paths, we need to check error status before submit buffers that have been added from the last record processed. If error status exist, buffers in the bufffer_list should not be writen to disk. Canceling the buffers in the buffer_list directly isn't correct, unlike any other place where write list was canceled, these buffers has been initialized by xfs_buf_item_init() during recovery and held by buf item, buf items will not be released in xfs_buf_delwri_cancel(), it's not easy to solve. If the filesystem has been shut down, then delwri list submission will error out all buffers on the list via IO submission/completion and do all the correct cleanup automatically. So shutting down the filesystem could prevents buffers in the bufffer_list from being written to disk. Fixes: `50d5c8d8e9` ("xfs: check LSN ordering for v5 superblocks during recovery") Signed-off-by: Long Li <leo.lilong@huawei.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2025-01-10 16:38:27 -06:00
Bill O'Donnell	642f6fa085	xfs: pass the defer ops instead of type to xfs_defer_start_recovery JIRA: https://issues.redhat.com/browse/RHEL-65728 commit dc22af64368291a86fb6b7eb2adab21c815836b7 Author: Christoph Hellwig <hch@lst.de> Date: Thu Dec 14 06:16:32 2023 +0100 xfs: pass the defer ops instead of type to xfs_defer_start_recovery xfs_defer_start_recovery is only called from xlog_recover_intent_item, and the callers of that all have the actual xfs_defer_ops_type operation vector at hand. Pass that directly instead of looking it up from the defer_op_types table. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2024-11-20 11:25:59 -06:00
Bill O'Donnell	2990c8ef58	xfs: move ->iop_recover to xfs_defer_op_type JIRA: https://issues.redhat.com/browse/RHEL-65728 commit db7ccc0bac2add5a41b66578e376b49328fc99d0 Author: Darrick J. Wong <djwong@kernel.org> Date: Wed Nov 22 13:39:25 2023 -0800 xfs: move ->iop_recover to xfs_defer_op_type Finish off the series by moving the intent item recovery function pointer to the xfs_defer_op_type struct, since this is really a deferred work function now. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2024-11-20 11:25:49 -06:00
Bill O'Donnell	f2a94e0149	xfs: use xfs_defer_finish_one to finish recovered work items JIRA: https://issues.redhat.com/browse/RHEL-65728 commit e5f1a5146ec35f3ed5d7f5ac7807a10c0062b6b8 Author: Darrick J. Wong <djwong@kernel.org> Date: Wed Nov 22 11:25:45 2023 -0800 xfs: use xfs_defer_finish_one to finish recovered work items Get rid of the open-coded calls to xfs_defer_finish_one. This also means that the recovery transaction takes care of cleaning up the dfp, and we have solved (I hope) all the ownership issues in recovery. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2024-11-20 11:25:48 -06:00
Bill O'Donnell	75128f30ea	xfs: transfer recovered intent item ownership in ->iop_recover JIRA: https://issues.redhat.com/browse/RHEL-65728 commit deb4cd8ba87f17b12c72b3827820d9c703e9fd95 Author: Darrick J. Wong <djwong@kernel.org> Date: Wed Nov 22 10:47:10 2023 -0800 xfs: transfer recovered intent item ownership in ->iop_recover Now that we pass the xfs_defer_pending object into the intent item recovery functions, we know exactly when ownership of the sole refcount passes from the recovery context to the intent done item. At that point, we need to null out dfp_intent so that the recovery mechanism won't release it. This should fix the UAF problem reported by Long Li. Note that we still want to recreate the full deferred work state. That will be addressed in the next patches. Fixes: `2e76f188fd` ("xfs: cancel intents immediately if process_intents fails") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2024-11-20 11:25:47 -06:00
Bill O'Donnell	d3b894d85d	xfs: pass the xfs_defer_pending object to iop_recover JIRA: https://issues.redhat.com/browse/RHEL-65728 commit a050acdfa8003a44eae4558fddafc7afb1aef458 Author: Darrick J. Wong <djwong@kernel.org> Date: Wed Nov 22 10:38:10 2023 -0800 xfs: pass the xfs_defer_pending object to iop_recover Now that log intent item recovery recreates the xfs_defer_pending state, we should pass that into the ->iop_recover routines so that the intent item can finish the recreation work. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2024-11-20 11:25:47 -06:00
Bill O'Donnell	ec6910b028	xfs: use xfs_defer_pending objects to recover intent items JIRA: https://issues.redhat.com/browse/RHEL-65728 commit 03f7767c9f6120ac933378fdec3bfd78bf07bc11 Author: Darrick J. Wong <djwong@kernel.org> Date: Wed Nov 22 10:23:23 2023 -0800 xfs: use xfs_defer_pending objects to recover intent items One thing I never quite got around to doing is porting the log intent item recovery code to reconstruct the deferred pending work state. As a result, each intent item open codes xfs_defer_finish_one in its recovery method, because that's what the EFI code did before xfs_defer.c even existed. This is a gross thing to have left unfixed -- if an EFI cannot proceed due to busy extents, we end up creating separate new EFIs for each unfinished work item, which is a change in behavior from what runtime would have done. Worse yet, Long Li pointed out that there's a UAF in the recovery code. The ->commit_pass2 function adds the intent item to the AIL and drops the refcount. The one remaining refcount is now owned by the recovery mechanism (aka the log intent items in the AIL) with the intent of giving the refcount to the intent done item in the ->iop_recover function. However, if something fails later in recovery, xlog_recover_finish will walk the recovered intent items in the AIL and release them. If the CIL hasn't been pushed before that point (which is possible since we don't force the log until later) then the intent done release will try to free its associated intent, which has already been freed. This patch starts to address this mess by having the ->commit_pass2 functions recreate the xfs_defer_pending state. The next few patches will fix the recovery functions. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2024-11-20 11:25:47 -06:00
Bill O'Donnell	164c7ed57d	xfs: abort intent items when recovery intents fail JIRA: https://issues.redhat.com/browse/RHEL-62760 commit f8f9d952e42dd49ae534f61f2fa7ca0876cb9848 Author: Long Li <leo.lilong@huawei.com> Date: Mon Jul 31 20:46:18 2023 +0800 xfs: abort intent items when recovery intents fail When recovering intents, we capture newly created intent items as part of committing recovered intent items. If intent recovery fails at a later point, we forget to remove those newly created intent items from the AIL and hang: [root@localhost ~]# cat /proc/539/stack [<0>] xfs_ail_push_all_sync+0x174/0x230 [<0>] xfs_unmount_flush_inodes+0x8d/0xd0 [<0>] xfs_mountfs+0x15f7/0x1e70 [<0>] xfs_fs_fill_super+0x10ec/0x1b20 [<0>] get_tree_bdev+0x3c8/0x730 [<0>] vfs_get_tree+0x89/0x2c0 [<0>] path_mount+0xecf/0x1800 [<0>] do_mount+0xf3/0x110 [<0>] __x64_sys_mount+0x154/0x1f0 [<0>] do_syscall_64+0x39/0x80 [<0>] entry_SYSCALL_64_after_hwframe+0x63/0xcd When newly created intent items fail to commit via transaction, intent recovery hasn't created done items for these newly created intent items, so the capture structure is the sole owner of the captured intent items. We must release them explicitly or else they leak: unreferenced object 0xffff888016719108 (size 432): comm "mount", pid 529, jiffies 4294706839 (age 144.463s) hex dump (first 32 bytes): 08 91 71 16 80 88 ff ff 08 91 71 16 80 88 ff ff ..q.......q..... 18 91 71 16 80 88 ff ff 18 91 71 16 80 88 ff ff ..q.......q..... backtrace: [<ffffffff8230c68f>] xfs_efi_init+0x18f/0x1d0 [<ffffffff8230c720>] xfs_extent_free_create_intent+0x50/0x150 [<ffffffff821b671a>] xfs_defer_create_intents+0x16a/0x340 [<ffffffff821bac3e>] xfs_defer_ops_capture_and_commit+0x8e/0xad0 [<ffffffff82322bb9>] xfs_cui_item_recover+0x819/0x980 [<ffffffff823289b6>] xlog_recover_process_intents+0x246/0xb70 [<ffffffff8233249a>] xlog_recover_finish+0x8a/0x9a0 [<ffffffff822eeafb>] xfs_log_mount_finish+0x2bb/0x4a0 [<ffffffff822c0f4f>] xfs_mountfs+0x14bf/0x1e70 [<ffffffff822d1f80>] xfs_fs_fill_super+0x10d0/0x1b20 [<ffffffff81a21fa2>] get_tree_bdev+0x3d2/0x6d0 [<ffffffff81a1ee09>] vfs_get_tree+0x89/0x2c0 [<ffffffff81a9f35f>] path_mount+0xecf/0x1800 [<ffffffff81a9fd83>] do_mount+0xf3/0x110 [<ffffffff81aa00e4>] __x64_sys_mount+0x154/0x1f0 [<ffffffff83968739>] do_syscall_64+0x39/0x80 Fix the problem above by abort intent items that don't have a done item when recovery intents fail. Fixes: `e6fff81e48` ("xfs: proper replay of deferred ops queued during log recovery") Signed-off-by: Long Li <leo.lilong@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2024-11-09 10:06:44 -06:00
Bill O'Donnell	6c8f0df398	xfs: use roundup_pow_of_two instead of ffs during xlog_find_tail JIRA: https://issues.redhat.com/browse/RHEL-57114 commit 8b010acb3154b669e52f0eef4a6d925e3cc1db2f Author: Wang Jianchao <jianchwa@outlook.com> Date: Wed Sep 13 09:38:01 2023 +0800 xfs: use roundup_pow_of_two instead of ffs during xlog_find_tail In our production environment, we find that mounting a 500M /boot which is umount cleanly needs ~6s. One cause is that ffs() is used by xlog_write_log_records() to decide the buffer size. It can cause a lot of small IO easily when xlog_clear_stale_blocks() needs to wrap around the end of log area and log head block is not power of two. Things are similar in xlog_find_verify_cycle(). The code is able to handed bigger buffer very well, we can use roundup_pow_of_two() to replace ffs() directly to avoid small and sychronous IOs. Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Wang Jianchao <wangjc136@midea.com> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2024-10-15 10:46:33 -05:00
Lucas Zampieri	bfdd109754	Merge: CVE-2024-41014: xfs: add bounds checking to xlog_recover_process_data MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4835 JIRA: https://issues.redhat.com/browse/RHEL-50862 CVE: CVE-2024-41014 ``` xfs: add bounds checking to xlog_recover_process_data There is a lack of verification of the space occupied by fixed members of xlog_op_header in the xlog_recover_process_data. We can create a crafted image to trigger an out of bounds read by following these steps: 1) Mount an image of xfs, and do some file operations to leave records 2) Before umounting, copy the image for subsequent steps to simulate abnormal exit. Because umount will ensure that tail_blk and head_blk are the same, which will result in the inability to enter xlog_recover_process_data 3) Write a tool to parse and modify the copied image in step 2 4) Make the end of the xlog_op_header entries only 1 byte away from xlog_rec_header->h_size 5) xlog_rec_header->h_num_logops++ 6) Modify xlog_rec_header->h_crc Fix: Add a check to make sure there is sufficient space to access fixed members of xlog_op_header. Signed-off-by: lei lu <llfamsec@gmail.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> (cherry picked from commit fb63435b7c7dc112b1ae1baea5486e0a6e27b196) ``` Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com> Approved-by: Eric Sandeen <esandeen@redhat.com> Approved-by: Andrey Albershteyn <aalbersh@redhat.com> Approved-by: Brian Foster <bfoster@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Lucas Zampieri <lzampier@redhat.com>	2024-08-13 12:44:39 +00:00
CKI Backport Bot	66642cc409	xfs: add bounds checking to xlog_recover_process_data JIRA: https://issues.redhat.com/browse/RHEL-50862 CVE: CVE-2024-41014 commit fb63435b7c7dc112b1ae1baea5486e0a6e27b196 Author: lei lu <llfamsec@gmail.com> Date: Mon Jun 3 17:46:08 2024 +0800 xfs: add bounds checking to xlog_recover_process_data There is a lack of verification of the space occupied by fixed members of xlog_op_header in the xlog_recover_process_data. We can create a crafted image to trigger an out of bounds read by following these steps: 1) Mount an image of xfs, and do some file operations to leave records 2) Before umounting, copy the image for subsequent steps to simulate abnormal exit. Because umount will ensure that tail_blk and head_blk are the same, which will result in the inability to enter xlog_recover_process_data 3) Write a tool to parse and modify the copied image in step 2 4) Make the end of the xlog_op_header entries only 1 byte away from xlog_rec_header->h_size 5) xlog_rec_header->h_num_logops++ 6) Modify xlog_rec_header->h_crc Fix: Add a check to make sure there is sufficient space to access fixed members of xlog_op_header. Signed-off-by: lei lu <llfamsec@gmail.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>	2024-07-29 16:59:35 +00:00
Bill O'Donnell	2a8af76e78	xfs: fix log recovery buffer allocation for the legacy h_size fixup JIRA: https://issues.redhat.com/browse/RHEL-46479 CVE: CVE-2024-39472 Conflicts: diffs from upstream since patches from the same series are deemed unnecessary and skipped here. Also the entire series from dchinner from Jan 16, 2024, beginning with 10634530f7 (xfs: convert kmem_zalloc() to kzalloc()), and ending with 2c1e31ed5c88 (xfs: place intent recovery under NOFS allocation context) includes some dependencies with conversions from kmem_free() to kfree(), etc, that are unnecessary for this fix patch. commit 45cf976008ddef4a9c9a30310c9b4fb2a9a6602a Author: Christoph Hellwig <hch@lst.de> Date: Tue Apr 30 06:07:55 2024 +0200 xfs: fix log recovery buffer allocation for the legacy h_size fixup Commit `a70f9fe52d` ("xfs: detect and handle invalid iclog size set by mkfs") added a fixup for incorrect h_size values used for the initial umount record in old xfsprogs versions. Later commit `0c771b99d6` ("xfs: clean up calculation of LR header blocks") cleaned up the log reover buffer calculation, but stoped using the fixed up h_size value to size the log recovery buffer, which can lead to an out of bounds access when the incorrect h_size does not come from the old mkfs tool, but a fuzzer. Fix this by open coding xlog_logrec_hblks and taking the fixed h_size into account for this calculation. Fixes: `0c771b99d6` ("xfs: clean up calculation of LR header blocks") Reported-by: Sam Sun <samsun1006219@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2024-07-15 15:39:39 -05:00
Bill O'Donnell	19fab0b814	xfs: collect errors from inodegc for unlinked inode recovery JIRA: https://issues.redhat.com/browse/RHEL-2002 Conflicts: context differences due to out of order patch application commit d4d12c02bf5f768f1b423c7ae2909c5afdfe0d5f Author: Dave Chinner <dchinner@redhat.com> Date: Mon Jun 5 14:48:15 2023 +1000 xfs: collect errors from inodegc for unlinked inode recovery Unlinked list recovery requires errors removing the inode the from the unlinked list get fed back to the main recovery loop. Now that we offload the unlinking to the inodegc work, we don't get errors being fed back when we trip over a corruption that prevents the inode from being removed from the unlinked list. This means we never clear the corrupt unlinked list bucket, resulting in runtime operations eventually tripping over it and shutting down. Fix this by collecting inodegc worker errors and feed them back to the flush caller. This is largely best effort - the only context that really cares is log recovery, and it only flushes a single inode at a time so we don't need complex synchronised handling. Essentially the inodegc workers will capture the first error that occurs and the next flush will gather them and clear them. The flush itself will only report the first gathered error. In the cases where callers can return errors, propagate the collected inodegc flush error up the error handling chain. In the case of inode unlinked list recovery, there are several superfluous calls to flush queued unlinked inodes - xlog_recover_iunlink_bucket() guarantees that it has flushed the inodegc and collected errors before it returns. Hence nothing in the calling path needs to run a flush, even when an error is returned. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2023-11-10 07:22:27 -06:00
Bill O'Donnell	454ef639aa	xfs: avoid a UAF when log intent item recovery fails JIRA: https://issues.redhat.com/browse/RHEL-2002 commit 97cf79677ecb50a38517253ae2fd705849a7e51a Author: Darrick J. Wong <djwong@kernel.org> Date: Sun Oct 16 17:54:40 2022 -0700 xfs: avoid a UAF when log intent item recovery fails KASAN reported a UAF bug when I was running xfs/235: BUG: KASAN: use-after-free in xlog_recover_process_intents+0xa77/0xae0 [xfs] Read of size 8 at addr ffff88804391b360 by task mount/5680 CPU: 2 PID: 5680 Comm: mount Not tainted 6.0.0-xfsx #6.0.0 77e7b52a4943a975441e5ac90a5ad7748b7867f6 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_report.cold+0x2cc/0x682 kasan_report+0xa3/0x120 xlog_recover_process_intents+0xa77/0xae0 [xfs fb841c7180aad3f8359438576e27867f5795667e] xlog_recover_finish+0x7d/0x970 [xfs fb841c7180aad3f8359438576e27867f5795667e] xfs_log_mount_finish+0x2d7/0x5d0 [xfs fb841c7180aad3f8359438576e27867f5795667e] xfs_mountfs+0x11d4/0x1d10 [xfs fb841c7180aad3f8359438576e27867f5795667e] xfs_fs_fill_super+0x13d5/0x1a80 [xfs fb841c7180aad3f8359438576e27867f5795667e] get_tree_bdev+0x3da/0x6e0 vfs_get_tree+0x7d/0x240 path_mount+0xdd3/0x17d0 __x64_sys_mount+0x1fa/0x270 do_syscall_64+0x2b/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7ff5bc069eae Code: 48 8b 0d 85 1f 0f 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 52 1f 0f 00 f7 d8 64 89 01 48 RSP: 002b:00007ffe433fd448 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff5bc069eae RDX: 00005575d7213290 RSI: 00005575d72132d0 RDI: 00005575d72132b0 RBP: 00005575d7212fd0 R08: 00005575d7213230 R09: 00005575d7213fe0 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00005575d7213290 R14: 00005575d72132b0 R15: 00005575d7212fd0 </TASK> Allocated by task 5680: kasan_save_stack+0x1e/0x40 __kasan_slab_alloc+0x66/0x80 kmem_cache_alloc+0x152/0x320 xfs_rui_init+0x17a/0x1b0 [xfs] xlog_recover_rui_commit_pass2+0xb9/0x2e0 [xfs] xlog_recover_items_pass2+0xe9/0x220 [xfs] xlog_recover_commit_trans+0x673/0x900 [xfs] xlog_recovery_process_trans+0xbe/0x130 [xfs] xlog_recover_process_data+0x103/0x2a0 [xfs] xlog_do_recovery_pass+0x548/0xc60 [xfs] xlog_do_log_recovery+0x62/0xc0 [xfs] xlog_do_recover+0x73/0x480 [xfs] xlog_recover+0x229/0x460 [xfs] xfs_log_mount+0x284/0x640 [xfs] xfs_mountfs+0xf8b/0x1d10 [xfs] xfs_fs_fill_super+0x13d5/0x1a80 [xfs] get_tree_bdev+0x3da/0x6e0 vfs_get_tree+0x7d/0x240 path_mount+0xdd3/0x17d0 __x64_sys_mount+0x1fa/0x270 do_syscall_64+0x2b/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Freed by task 5680: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_set_free_info+0x20/0x30 ____kasan_slab_free+0x144/0x1b0 slab_free_freelist_hook+0xab/0x180 kmem_cache_free+0x1f1/0x410 xfs_rud_item_release+0x33/0x80 [xfs] xfs_trans_free_items+0xc3/0x220 [xfs] xfs_trans_cancel+0x1fa/0x590 [xfs] xfs_rui_item_recover+0x913/0xd60 [xfs] xlog_recover_process_intents+0x24e/0xae0 [xfs] xlog_recover_finish+0x7d/0x970 [xfs] xfs_log_mount_finish+0x2d7/0x5d0 [xfs] xfs_mountfs+0x11d4/0x1d10 [xfs] xfs_fs_fill_super+0x13d5/0x1a80 [xfs] get_tree_bdev+0x3da/0x6e0 vfs_get_tree+0x7d/0x240 path_mount+0xdd3/0x17d0 __x64_sys_mount+0x1fa/0x270 do_syscall_64+0x2b/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 The buggy address belongs to the object at ffff88804391b300 which belongs to the cache xfs_rui_item of size 688 The buggy address is located 96 bytes inside of 688-byte region [ffff88804391b300, ffff88804391b5b0) The buggy address belongs to the physical page: page:ffffea00010e4600 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888043919320 pfn:0x43918 head:ffffea00010e4600 order:2 compound_mapcount:0 compound_pincount:0 flags: 0x4fff80000010200(slab\|head\|node=1\|zone=1\|lastcpupid=0xfff) raw: 04fff80000010200 0000000000000000 dead000000000122 ffff88807f0eadc0 raw: ffff888043919320 0000000080140010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88804391b200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88804391b280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88804391b300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88804391b380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88804391b400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== The test fuzzes an rmap btree block and starts writer threads to induce a filesystem shutdown on the corrupt block. When the filesystem is remounted, recovery will try to replay the committed rmap intent item, but the corruption problem causes the recovery transaction to fail. Cancelling the transaction frees the RUD, which frees the RUI that we recovered. When we return to xlog_recover_process_intents, @lip is now a dangling pointer, and we cannot use it to find the iop_recover method for the tracepoint. Hence we must store the item ops before calling ->iop_recover if we want to give it to the tracepoint so that the trace data will tell us exactly which intent item failed. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2023-11-06 19:42:17 -06:00
Bill O'Donnell	439ec50781	xfs: double link the unlinked inode list Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832 commit 2fd26cc07e9f8050e29bf314cbf1bcb64dbe088c Author: Dave Chinner <dchinner@redhat.com> Date: Thu Jul 14 11:46:43 2022 +1000 xfs: double link the unlinked inode list Now we have forwards traversal via the incore inode in place, we now need to add back pointers to the incore inode to entirely replace the back reference cache. We use the same lookup semantics and constraints as for the forwards pointer lookups during unlinks, and so we can look up any inode in the unlinked list directly and update the list pointers, forwards or backwards, at any time. The only wrinkle in converting the unlinked list manipulations to use in-core previous pointers is that log recovery doesn't have the incore inode state built up so it can't just read in an inode and release it to finish off the unlink. Hence we need to modify the traversal in recovery to read one inode ahead before we release the inode at the head of the list. This populates the next->prev relationship sufficient to be able to replay the unlinked list and hence greatly simplify the runtime code. This recovery algorithm also requires that we actually remove inodes from the unlinked list one at a time as background inode inactivation will result in unlinked list removal racing with the building of the in-memory unlinked list state. We could serialise this by holding the AGI buffer lock when constructing the in memory state, but all that does is lockstep background processing with list building. It is much simpler to flush the inodegc immediately after releasing the inode so that it is unlinked immediately and there is no races present at all. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2023-05-18 11:11:44 -05:00
Bill O'Donnell	0a26a83d3a	xfs: refactor xlog_recover_process_iunlinks() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832 commit 04755d2e5821b3afbaadd09fe5df58d04de36484 Author: Dave Chinner <dchinner@redhat.com> Date: Thu Jul 14 11:42:39 2022 +1000 xfs: refactor xlog_recover_process_iunlinks() For upcoming changes to the way inode unlinked list processing is done, the structure of recovery needs to change slightly. We also really need to untangle the messy error handling in list recovery so that actions like emptying the bucket on inode lookup failure are associated with the bucket list walk failing, not failing to look up the inode. Refactor the recovery code now to keep the re-organisation seperate to the algorithm changes. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2023-05-18 11:11:44 -05:00
Bill O'Donnell	959addd052	xfs: track the iunlink list pointer in the xfs_inode Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832 commit 4fcc94d653270fcc7800dbaf3b11f78cb462b293 Author: Dave Chinner <dchinner@redhat.com> Date: Thu Jul 14 11:38:54 2022 +1000 xfs: track the iunlink list pointer in the xfs_inode Having direct access to the i_next_unlinked pointer in unlinked inodes greatly simplifies the processing of inodes on the unlinked list. We no longer need to look up the inode buffer just to find next inode in the list if the xfs_inode is in memory. These improvements will be realised over upcoming patches as other dependencies on the inode buffer for unlinked list processing are removed. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2023-05-18 11:11:44 -05:00
Bill O'Donnell	a27ea962ac	xfs: Pre-calculate per-AG agbno geometry Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832 commit 0800169e3e2c97a033e8b7f3d1e6c689e0d71a19 Author: Dave Chinner <dchinner@redhat.com> Date: Thu Jul 7 19:13:02 2022 +1000 xfs: Pre-calculate per-AG agbno geometry There is a lot of overhead in functions like xfs_verify_agbno() that repeatedly calculate the geometry limits of an AG. These can be pre-calculated as they are static and the verification context has a per-ag context it can quickly reference. In the case of xfs_verify_agbno(), we now always have a perag context handy, so we can store the AG length and the minimum valid block in the AG in the perag. This means we don't have to calculate it on every call and it can be inlined in callers if we move it to xfs_ag.h. Move xfs_ag_block_count() to xfs_ag.c because it's really a per-ag function and not an XFS type function. We need a little bit of rework that is specific to xfs_initialise_perag() to allow growfs to calculate the new perag sizes before we've updated the primary superblock during the grow (chicken/egg situation). Note that we leave the original xfs_verify_agbno in place in xfs_types.c as a static function as other callers in that file do not have per-ag contexts so still need to go the long way. It's been renamed to xfs_verify_agno_agbno() to indicate it takes both an agno and an agbno to differentiate it from new function. Future commits will make similar changes for other per-ag geometry validation functions. Further: $ size --totals fs/xfs/built-in.a text data bss dec hex filename before 1483006 329588 572 1813166 1baaae (TOTALS) after 1482185 329588 572 1812345 1ba779 (TOTALS) This rework reduces the binary size by ~820 bytes, indicating that much less work is being done to bounds check the agbno values against on per-ag geometry information. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2023-05-18 11:11:40 -05:00
Bill O'Donnell	7048b0f4f2	xfs: pass perag to xfs_read_agi Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832 Conflicts: fix one line error due to out of order rhel patch `13e2b274` commit 61021deb1faa5b2b913bf0ad76e2769276160b04 Author: Dave Chinner <dchinner@redhat.com> Date: Thu Jul 7 19:07:47 2022 +1000 xfs: pass perag to xfs_read_agi We have the perag in most palces we call xfs_read_agi, so pass the perag instead of a mount/agno pair. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2023-05-18 11:11:39 -05:00
Bill O'Donnell	e208e0be52	xfs: convert buf_cancel_table allocation to kmalloc_array Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832 commit 910bbdf2f4d7df46781bc9b723048f5ebed3d0d7 Author: Darrick J. Wong <djwong@kernel.org> Date: Fri May 27 10:27:19 2022 +1000 xfs: convert buf_cancel_table allocation to kmalloc_array While we're messing around with how recovery allocates and frees the buffer cancellation table, convert the allocation to use kmalloc_array instead of the old kmem_alloc APIs, and make it handle a null return, even though that's not likely. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2023-05-18 11:11:31 -05:00
Bill O'Donnell	a245138396	xfs: refactor buffer cancellation table allocation Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832 commit 2723234923b3294dbcf6019c288c87465e927ed4 Author: Darrick J. Wong <djwong@kernel.org> Date: Fri May 27 10:26:17 2022 +1000 xfs: refactor buffer cancellation table allocation Move the code that allocates and frees the buffer cancellation tables used by log recovery into the file that actually uses the tables. This is a precursor to some cleanups and a memory leak fix. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2023-05-18 11:11:30 -05:00
Bill O'Donnell	7d7d1f5774	xfs: Remove dead code Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832 commit e62c720817597f259b81f1ff004eb042293bf046 Author: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Date: Sun May 22 16:46:57 2022 +1000 xfs: Remove dead code Remove tht entire xlog_recover_check_summary() function, this entire function is dead code and has been for 12 years. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2023-05-18 11:11:28 -05:00
Bill O'Donnell	db4b5bf1ae	xfs: Set up infrastructure for log attribute replay Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832 commit fd920008784ead369e79c2be2f8d9cc736e306ca Author: Allison Henderson <allison.henderson@oracle.com> Date: Wed May 4 12:41:02 2022 +1000 xfs: Set up infrastructure for log attribute replay Currently attributes are modified directly across one or more transactions. But they are not logged or replayed in the event of an error. The goal of log attr replay is to enable logging and replaying of attribute operations using the existing delayed operations infrastructure. This will later enable the attributes to become part of larger multi part operations that also must first be recorded to the log. This is mostly of interest in the scheme of parent pointers which would need to maintain an attribute containing parent inode information any time an inode is moved, created, or removed. Parent pointers would then be of interest to any feature that would need to quickly derive an inode path from the mount point. Online scrub, nfs lookups and fs grow or shrink operations are all features that could take advantage of this. This patch adds two new log item types for setting or removing attributes as deferred operations. The xfs_attri_log_item will log an intent to set or remove an attribute. The corresponding xfs_attrd_log_item holds a reference to the xfs_attri_log_item and is freed once the transaction is done. Both log items use a generic xfs_attr_log_format structure that contains the attribute name, value, flags, inode, and an op_flag that indicates if the operations is a set or remove. [dchinner: added extra little bits needed for intent whiteouts] Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2023-05-18 11:11:16 -05:00
Bill O'Donnell	35f4bdef44	xfs: log shutdown triggers should only shut down the log Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832 commit b5f17bec1213a3ed2f4d79ad4c566e00cabe2a9b Author: Dave Chinner <dchinner@redhat.com> Date: Tue Mar 29 18:22:01 2022 -0700 xfs: log shutdown triggers should only shut down the log We've got a mess on our hands. 1. xfs_trans_commit() cannot cancel transactions because the mount is shut down - that causes dirty, aborted, unlogged log items to sit unpinned in memory and potentially get written to disk before the log is shut down. Hence xfs_trans_commit() can only abort transactions when xlog_is_shutdown() is true. 2. xfs_force_shutdown() is used in places to cause the current modification to be aborted via xfs_trans_commit() because it may be impractical or impossible to cancel the transaction directly, and hence xfs_trans_commit() must cancel transactions when xfs_is_shutdown() is true in this situation. But we can't do that because of #1. 3. Log IO errors cause log shutdowns by calling xfs_force_shutdown() to shut down the mount and then the log from log IO completion. 4. xfs_force_shutdown() can result in a log force being issued, which has to wait for log IO completion before it will mark the log as shut down. If #3 races with some other shutdown trigger that runs a log force, we rely on xfs_force_shutdown() silently ignoring #3 and avoiding shutting down the log until the failed log force completes. 5. To ensure #2 always works, we have to ensure that xfs_force_shutdown() does not return until the the log is shut down. But in the case of #4, this will result in a deadlock because the log Io completion will block waiting for a log force to complete which is blocked waiting for log IO to complete.... So the very first thing we have to do here to untangle this mess is dissociate log shutdown triggers from mount shutdowns. We already have xlog_forced_shutdown, which will atomically transistion to the log a shutdown state. Due to internal asserts it cannot be called multiple times, but was done simply because the only place that could call it was xfs_do_force_shutdown() (i.e. the mount shutdown!) and that could only call it once and once only. So the first thing we do is remove the asserts. We then convert all the internal log shutdown triggers to call xlog_force_shutdown() directly instead of xfs_force_shutdown(). This allows the log shutdown triggers to shut down the log without needing to care about mount based shutdown constraints. This means we shut down the log independently of the mount and the mount may not notice this until it's next attempt to read or modify metadata. At that point (e.g. xfs_trans_commit()) it will see that the log is shutdown, error out and shutdown the mount. To ensure that all the unmount behaviours and asserts track correctly as a result of a log shutdown, propagate the shutdown up to the mount if it is not already set. This keeps the mount and log state in sync, and saves a huge amount of hassle where code fails because of a log shutdown but only checks for mount shutdowns and hence ends up doing the wrong thing. Cleaning up that mess is an exercise for another day. This enables us to address the other problems noted above in followup patches. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2023-05-18 11:10:53 -05:00
Bill O'Donnell	79973eeacb	xfs: shutdown in intent recovery has non-intent items in the AIL Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832 commit ab9c81ef321f90dd208b1d4809c196c2794e4b15 Author: Dave Chinner <dchinner@redhat.com> Date: Tue Mar 29 18:22:00 2022 -0700 xfs: shutdown in intent recovery has non-intent items in the AIL generic/388 triggered a failure in RUI recovery due to a corrupted btree record and the system then locked up hard due to a subsequent assert failure while holding a spinlock cancelling intents: XFS (pmem1): Corruption of in-memory data (0x8) detected at xfs_do_force_shutdown+0x1a/0x20 (fs/xfs/xfs_trans.c:964). Shutting down filesystem. XFS (pmem1): Please unmount the filesystem and rectify the problem(s) XFS: Assertion failed: !xlog_item_is_intent(lip), file: fs/xfs/xfs_log_recover.c, line: 2632 Call Trace: <TASK> xlog_recover_cancel_intents.isra.0+0xd1/0x120 xlog_recover_finish+0xb9/0x110 xfs_log_mount_finish+0x15a/0x1e0 xfs_mountfs+0x540/0x910 xfs_fs_fill_super+0x476/0x830 get_tree_bdev+0x171/0x270 ? xfs_init_fs_context+0x1e0/0x1e0 xfs_fs_get_tree+0x15/0x20 vfs_get_tree+0x24/0xc0 path_mount+0x304/0xba0 ? putname+0x55/0x60 __x64_sys_mount+0x108/0x140 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Essentially, there's dirty metadata in the AIL from intent recovery transactions, so when we go to cancel the remaining intents we assume that all objects after the first non-intent log item in the AIL are not intents. This is not true. Intent recovery can log new intents to continue the operations the original intent could not complete in a single transaction. The new intents are committed before they are deferred, which means if the CIL commits in the background they will get inserted into the AIL at the head. Hence if we shut down the filesystem while processing intent recovery, the AIL may have new intents active at the current head. Hence this check: /* * We're done when we see something other than an intent. * There should be no intents left in the AIL now. */ if (!xlog_item_is_intent(lip)) { #ifdef DEBUG for (; lip; lip = xfs_trans_ail_cursor_next(ailp, &cur)) ASSERT(!xlog_item_is_intent(lip)); #endif break; } in both xlog_recover_process_intents() and log_recover_cancel_intents() is simply not valid. It was valid back when we only had EFI/EFD intents and didn't chain intents, but it hasn't been valid ever since intent recovery could create and commit new intents. Given that crashing the mount task like this pretty much prevents diagnosing what went wrong that lead to the initial failure that triggered intent cancellation, just remove the checks altogether. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2023-05-18 11:10:53 -05:00
Bill O'Donnell	96ac087c0d	xfs: Remove redundant assignment of mp Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832 commit f4901a182d33d05a3b7020e2af97c635f6c47959 Author: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Date: Wed Jan 5 11:12:37 2022 -0800 xfs: Remove redundant assignment of mp mp is being initialized to log->l_mp but this is never read as record is overwritten later on. Remove the redundant assignment. Cleans up the following clang-analyzer warning: fs/xfs/xfs_log_recover.c:3543:20: warning: Value stored to 'mp' during its initialization is never read [clang-analyzer-deadcode.DeadStores]. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2023-05-18 11:10:46 -05:00
Bill O'Donnell	9e0bb79551	xfs: only run COW extent recovery when there are no live extents Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832 commit 7993f1a431bc5271369d359941485a9340658ac3 Author: Darrick J. Wong <djwong@kernel.org> Date: Wed Dec 15 11:52:23 2021 -0800 xfs: only run COW extent recovery when there are no live extents As part of multiple customer escalations due to file data corruption after copy on write operations, I wrote some fstests that use fsstress to hammer on COW to shake things loose. Regrettably, I caught some filesystem shutdowns due to incorrect rmap operations with the following loop: mount <filesystem> # (0) fsstress <run only readonly ops> & # (1) while true; do fsstress <run all ops> mount -o remount,ro # (2) fsstress <run only readonly ops> mount -o remount,rw # (3) done When (2) happens, notice that (1) is still running. xfs_remount_ro will call xfs_blockgc_stop to walk the inode cache to free all the COW extents, but the blockgc mechanism races with (1)'s reader threads to take IOLOCKs and loses, which means that it doesn't clean them all out. Call such a file (A). When (3) happens, xfs_remount_rw calls xfs_reflink_recover_cow, which walks the ondisk refcount btree and frees any COW extent that it finds. This function does not check the inode cache, which means that incore COW forks of inode (A) is now inconsistent with the ondisk metadata. If one of those former COW extents are allocated and mapped into another file (B) and someone triggers a COW to the stale reservation in (A), A's dirty data will be written into (B) and once that's done, those blocks will be transferred to (A)'s data fork without bumping the refcount. The results are catastrophic -- file (B) and the refcount btree are now corrupt. In the first patch, we fixed the race condition in (2) so that (A) will always flush the COW fork. In this second patch, we move the _recover_cow call to the initial mount call in (0) for safety. As mentioned previously, xfs_reflink_recover_cow walks the refcount btree looking for COW staging extents, and frees them. This was intended to be run at mount time (when we know there are no live inodes) to clean up any leftover staging events that may have been left behind during an unclean shutdown. As a time "optimization" for readonly mounts, we deferred this to the ro->rw transition, not realizing that any failure to clean all COW forks during a rw->ro transition would result in catastrophic corruption. Therefore, remove this optimization and only run the recovery routine when we're guaranteed not to have any COW staging extents anywhere, which means we always run this at mount time. While we're at it, move the callsite to xfs_log_mount_finish because any refcount btree expansion (however unlikely given that we're removing records from the right side of the index) must be fed by a per-AG reservation, which doesn't exist in its current location. Fixes: `174edb0e46` ("xfs: store in-progress CoW allocations in the refcount btree") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2023-05-18 11:10:44 -05:00
Frantisek Hrbata	64e5412cb6	Merge: XFS update to v5.16 MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1508 Bugzilla: https://bugzilla.redhat.com/2125724 Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Approved-by: Bill O'Donnell <bodonnel@redhat.com> Approved-by: Brian Foster <bfoster@redhat.com> Approved-by: Eric Sandeen <esandeen@redhat.com> Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>	2022-11-23 02:46:01 -05:00
Carlos Maiolino	854effbe4c	xfs: port the defer ops capture and continue to resource capture Bugzilla: https://bugzilla.redhat.com/2125724 When log recovery tries to recover a transaction that had log intent items attached to it, it has to save certain parts of the transaction state (reservation, dfops chain, inodes with no automatic unlock) so that it can finish single-stepping the recovered transactions before finishing the chains. This is done with the xfs_defer_ops_capture and xfs_defer_ops_continue functions. Right now they open-code this functionality, so let's port this to the formalized resource capture structure that we introduced in the previous patch. This enables us to hold up to two inodes and two buffers during log recovery, the same way we do for regular runtime. With this patch applied, we'll be ready to support atomic extent swap which holds two inodes; and logged xattrs which holds one inode and one xattr leaf buffer. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> (cherry picked from commit 512edfac85d243ed6a5a5f42f513ebb7c2d32863)	2022-10-21 12:50:46 +02:00
Ming Lei	f0231c3baa	fs/xfs: Use the enum req_op and blk_opf_t types Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2118511 commit d03025aef8676e826b69f8e3ec9bb59a5ad0c31d Author: Bart Van Assche <bvanassche@acm.org> Date: Thu Jul 14 11:07:28 2022 -0700 fs/xfs: Use the enum req_op and blk_opf_t types Improve static type checking by using the enum req_op type for variables that represent a request operation and the new blk_opf_t type for the combination of a request operation with request flags. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-63-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2022-10-12 09:20:22 +08:00
Brian Foster	13e2b27442	xfs: flush inode gc workqueue before clearing agi bucket Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143 Upstream Status: linux.git Conflicts: Pass mp directly due to nonexistent pag. The pag reference comes in subsequent refactoring patches. commit 04a98a036cf8b810dda172a9dcfcbd783bf63655 Author: Zhang Yi <yi.zhang@huawei.com> Date: Thu Jul 14 11:36:36 2022 +1000 xfs: flush inode gc workqueue before clearing agi bucket In the procedure of recover AGI unlinked lists, if something bad happenes on one of the unlinked inode in the bucket list, we would call xlog_recover_clear_agi_bucket() to clear the whole unlinked bucket list, not the unlinked inodes after the bad one. If we have already added some inodes to the gc workqueue before the bad inode in the list, we could get below error when freeing those inodes, and finaly fail to complete the log recover procedure. XFS (ram0): Internal error xfs_iunlink_remove at line 2456 of file fs/xfs/xfs_inode.c. Caller xfs_ifree+0xb0/0x360 [xfs] The problem is xlog_recover_clear_agi_bucket() clear the bucket list, so the gc worker fail to check the agino in xfs_verify_agino(). Fix this by flush workqueue before clearing the bucket. Fixes: ab23a7768739 ("xfs: per-cpu deferred inode inactivation queues") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Brian Foster <bfoster@redhat.com>	2022-08-25 08:11:38 -04:00
Brian Foster	e406884b42	xfs: introduce xfs_sb_is_v5 helper Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143 Upstream Status: linux.git commit d6837c1aab42e70141fd3875ba05eb69ffb220f0 Author: Dave Chinner <dchinner@redhat.com> Date: Wed Aug 18 18:46:56 2021 -0700 xfs: introduce xfs_sb_is_v5 helper Rather than open coding XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5 checks everywhere, add a simple wrapper to encapsulate this and make the code easier to read. This allows us to remove the xfs_sb_version_has_v3inode() wrapper which is only used in xfs_format.h now and is just a version number check. There are a couple of places where we should be checking the mount feature bits rather than the superblock version (e.g. remount), so those are converted to use xfs_has_crc(mp) instead. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Brian Foster <bfoster@redhat.com>	2022-08-25 08:11:35 -04:00
Brian Foster	a672539203	xfs: convert remaining mount flags to state flags Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143 Upstream Status: linux.git commit 2e973b2cd4cdb993be94cca4c33f532f1ed05316 Author: Dave Chinner <dchinner@redhat.com> Date: Wed Aug 18 18:46:52 2021 -0700 xfs: convert remaining mount flags to state flags The remaining mount flags kept in m_flags are actually runtime state flags. These change dynamically, so they really should be updated atomically so we don't potentially lose an update due to racing modifications. Convert these remaining flags to be stored in m_opstate and use atomic bitops to set and clear the flags. This also adds a couple of simple wrappers for common state checks - read only and shutdown. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Brian Foster <bfoster@redhat.com>	2022-08-25 08:11:34 -04:00
Brian Foster	d54a790d1d	xfs: replace xfs_sb_version checks with feature flag checks Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143 Upstream Status: linux.git commit 38c26bfd90e1999650d5ef40f90d721f05916643 Author: Dave Chinner <dchinner@redhat.com> Date: Wed Aug 18 18:46:37 2021 -0700 xfs: replace xfs_sb_version checks with feature flag checks Convert the xfs_sb_version_hasfoo() to checks against mp->m_features. Checks of the superblock itself during disk operations (e.g. in the read/write verifiers and the to/from disk formatters) are not converted - they operate purely on the superblock state. Everything else should use the mount features. Large parts of this conversion were done with sed with commands like this: for f in `git grep -l xfs_sb_version_has fs/xfs/.c`; do sed -i -e 's/xfs_sb_version_has$.$(&$.*$->m_sb)/xfs_has_\1(\2)/' $f done With manual cleanups for things like "xfs_has_extflgbit" and other little inconsistencies in naming. The result is ia lot less typing to check features and an XFS binary size reduced by a bit over 3kB: $ size -t fs/xfs/built-in.a text data bss dec hex filenam before 1130866 311352 484 1442702 16038e (TOTALS) after 1127727 311352 484 1439563 15f74b (TOTALS) Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Brian Foster <bfoster@redhat.com>	2022-08-25 08:11:34 -04:00
Brian Foster	0cb6373dde	xfs: reflect sb features in xfs_mount Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143 Upstream Status: linux.git commit a1d86e8dec8c1325d301c9d5594bb794bc428fc3 Author: Dave Chinner <dchinner@redhat.com> Date: Wed Aug 18 18:46:26 2021 -0700 xfs: reflect sb features in xfs_mount Currently on-disk feature checks require decoding the superblock fileds and so can be non-trivial. We have almost 400 hundred individual feature checks in the XFS code, so this is a significant amount of code. To reduce runtime check overhead, pre-process all the version flags into a features field in the xfs_mount at mount time so we can convert all the feature checks to a simple flag check. There is also a need to convert the dynamic feature flags to update the m_features field. This is required for attr, attr2 and quota features. New xfs_mount based wrappers are added for this. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Brian Foster <bfoster@redhat.com>	2022-08-25 08:11:34 -04:00
Brian Foster	93c5f50b4c	xfs: convert log flags to an operational state field Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143 Upstream Status: linux.git commit e1d06e5f668a403f48538f0d6b163edfd4342adf Author: Dave Chinner <dchinner@redhat.com> Date: Tue Aug 10 17:59:02 2021 -0700 xfs: convert log flags to an operational state field log->l_flags doesn't actually contain "flags" as such, it contains operational state information that can change at runtime. For the shutdown state, this at least should be an atomic bit because it is read without holding locks in many places and so using atomic bitops for the state field modifications makes sense. This allows us to use things like test_and_set_bit() on state changes (e.g. setting XLOG_TAIL_WARN) to avoid races in setting the state when we aren't holding locks. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Brian Foster <bfoster@redhat.com>	2022-08-25 08:11:26 -04:00
Brian Foster	b275d24234	xfs: move recovery needed state updates to xfs_log_mount_finish Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143 Upstream Status: linux.git commit fd67d8a07208ab06560287b7b9334c2d50b7d6d7 Author: Dave Chinner <dchinner@redhat.com> Date: Tue Aug 10 17:59:02 2021 -0700 xfs: move recovery needed state updates to xfs_log_mount_finish xfs_log_mount_finish() needs to know if recovery is needed or not to make decisions on whether to flush the log and AIL. Move the handling of the NEED_RECOVERY state out to this function rather than needing a temporary variable to store this state over the call to xlog_recover_finish(). Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Brian Foster <bfoster@redhat.com>	2022-08-25 08:11:25 -04:00
Brian Foster	156272e64a	xfs: convert XLOG_FORCED_SHUTDOWN() to xlog_is_shutdown() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143 Upstream Status: linux.git commit 2039a272300b949c05888428877317b834c0b1fb Author: Dave Chinner <dchinner@redhat.com> Date: Tue Aug 10 17:59:01 2021 -0700 xfs: convert XLOG_FORCED_SHUTDOWN() to xlog_is_shutdown() Make it less shouty and a static inline before adding more calls through the log code. Also convert internal log code that uses XFS_FORCED_SHUTDOWN(mount) to use xlog_is_shutdown(log) as well. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Brian Foster <bfoster@redhat.com>	2022-08-25 08:11:25 -04:00
Brian Foster	63abcca3f1	xfs: refactor xfs_iget calls from log intent recovery Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143 Upstream Status: linux.git commit 4bc619833f738f4fa8d931a71610795ebf5cec0e Author: Darrick J. Wong <djwong@kernel.org> Date: Sun Aug 8 08:27:13 2021 -0700 xfs: refactor xfs_iget calls from log intent recovery Hoist the code from xfs_bui_item_recover that igets an inode and marks it as being part of log intent recovery. The next patch will want a common function. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Signed-off-by: Brian Foster <bfoster@redhat.com>	2022-08-25 08:11:25 -04:00
Brian Foster	8207eb529d	xfs: allow setting and clearing of log incompat feature flags Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143 Upstream Status: linux.git commit 908ce71e54f8265fa909200410d6c50ab9a2d302 Author: Darrick J. Wong <djwong@kernel.org> Date: Sun Aug 8 08:27:12 2021 -0700 xfs: allow setting and clearing of log incompat feature flags Log incompat feature flags in the superblock exist for one purpose: to protect the contents of a dirty log from replay on a kernel that isn't prepared to handle those dirty contents. This means that they can be cleared if (a) we know the log is clean and (b) we know that there aren't any other threads in the system that might be setting or relying upon a log incompat flag. Therefore, clear the log incompat flags when we've finished recovering the log, when we're unmounting cleanly, remounting read-only, or freezing; and provide a function so that subsequent patches can start using this. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Signed-off-by: Brian Foster <bfoster@redhat.com>	2022-08-25 08:11:24 -04:00
Brian Foster	8bf8cc906b	xfs: replace kmem_alloc_large() with kvmalloc() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143 Upstream Status: linux.git commit d634525db63e9e946c3229fb93c8d9b763afbaf3 Author: Dave Chinner <dchinner@redhat.com> Date: Mon Aug 9 10:10:01 2021 -0700 xfs: replace kmem_alloc_large() with kvmalloc() There is no reason for this wrapper existing anymore. All the places that use KM_NOFS allocation are within transaction contexts and hence covered by memalloc_nofs_save/restore contexts. Hence we don't need any special handling of vmalloc for large IOs anymore and so special casing this code isn't necessary. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Brian Foster <bfoster@redhat.com>	2022-08-25 08:11:24 -04:00
Brian Foster	7fe76aa101	xfs: remove kmem_alloc_io() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143 Upstream Status: linux.git commit 98fe2c3cef21b784e2efd1d9d891430d95b4f073 Author: Dave Chinner <dchinner@redhat.com> Date: Mon Aug 9 10:10:01 2021 -0700 xfs: remove kmem_alloc_io() Since commit `59bb47985c` ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)"), the core slab code now guarantees slab alignment in all situations sufficient for IO purposes (i.e. minimum of 512 byte alignment of >= 512 byte sized heap allocations) we no longer need the workaround in the XFS code to provide this guarantee. Replace the use of kmem_alloc_io() with kmem_alloc() or kmem_alloc_large() appropriately, and remove the kmem_alloc_io() interface altogether. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Brian Foster <bfoster@redhat.com>	2022-08-25 08:11:24 -04:00
Brian Foster	552c0d6db7	xfs: per-cpu deferred inode inactivation queues Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143 Upstream Status: linux.git commit ab23a7768739a23d21d8a16ca37dff96b1ca957a Author: Dave Chinner <dchinner@redhat.com> Date: Fri Aug 6 11:05:39 2021 -0700 xfs: per-cpu deferred inode inactivation queues Move inode inactivation to background work contexts so that it no longer runs in the context that releases the final reference to an inode. This will allow process work that ends up blocking on inactivation to continue doing work while the filesytem processes the inactivation in the background. A typical demonstration of this is unlinking an inode with lots of extents. The extents are removed during inactivation, so this blocks the process that unlinked the inode from the directory structure. By moving the inactivation to the background process, the userspace applicaiton can keep working (e.g. unlinking the next inode in the directory) while the inactivation work on the previous inode is done by a different CPU. The implementation of the queue is relatively simple. We use a per-cpu lockless linked list (llist) to queue inodes for inactivation without requiring serialisation mechanisms, and a work item to allow the queue to be processed by a CPU bound worker thread. We also keep a count of the queue depth so that we can trigger work after a number of deferred inactivations have been queued. The use of a bound workqueue with a single work depth allows the workqueue to run one work item per CPU. We queue the work item on the CPU we are currently running on, and so this essentially gives us affine per-cpu worker threads for the per-cpu queues. THis maintains the effective CPU affinity that occurs within XFS at the AG level due to all objects in a directory being local to an AG. Hence inactivation work tends to run on the same CPU that last accessed all the objects that inactivation accesses and this maintains hot CPU caches for unlink workloads. A depth of 32 inodes was chosen to match the number of inodes in an inode cluster buffer. This hopefully allows sequential allocation/unlink behaviours to defering inactivation of all the inodes in a single cluster buffer at a time, further helping maintain hot CPU and buffer cache accesses while running inactivations. A hard per-cpu queue throttle of 256 inode has been set to avoid runaway queuing when inodes that take a long to time inactivate are being processed. For example, when unlinking inodes with large numbers of extents that can take a lot of processing to free. Signed-off-by: Dave Chinner <dchinner@redhat.com> [djwong: tweak comments and tracepoints, convert opflags to state bits] Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Brian Foster <bfoster@redhat.com>	2022-08-25 08:11:22 -04:00
Rafael Aquini	e2e7fe38b6	mm: Add kvrealloc() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2023396 This patch is a backport of the following upstream commit: commit de2860f4636256836450c6543be744a50118fc66 Author: Dave Chinner <dchinner@redhat.com> Date: Mon Aug 9 10:10:00 2021 -0700 mm: Add kvrealloc() During log recovery of an XFS filesystem with 64kB directory buffers, rebuilding a buffer split across two log records results in a memory allocation warning from krealloc like this: xfs filesystem being mounted at /mnt/scratch supports timestamps until 2038 (0x7fffffff) XFS (dm-0): Unmounting Filesystem XFS (dm-0): Mounting V5 Filesystem XFS (dm-0): Starting recovery (logdev: internal) ------------[ cut here ]------------ WARNING: CPU: 5 PID: 3435170 at mm/page_alloc.c:3539 get_page_from_freelist+0xdee/0xe40 ..... RIP: 0010:get_page_from_freelist+0xdee/0xe40 Call Trace: ? complete+0x3f/0x50 __alloc_pages+0x16f/0x300 alloc_pages+0x87/0x110 kmalloc_order+0x2c/0x90 kmalloc_order_trace+0x1d/0x90 __kmalloc_track_caller+0x215/0x270 ? xlog_recover_add_to_cont_trans+0x63/0x1f0 krealloc+0x54/0xb0 xlog_recover_add_to_cont_trans+0x63/0x1f0 xlog_recovery_process_trans+0xc1/0xd0 xlog_recover_process_ophdr+0x86/0x130 xlog_recover_process_data+0x9f/0x160 xlog_recover_process+0xa2/0x120 xlog_do_recovery_pass+0x40b/0x7d0 ? __irq_work_queue_local+0x4f/0x60 ? irq_work_queue+0x3a/0x50 xlog_do_log_recovery+0x70/0x150 xlog_do_recover+0x38/0x1d0 xlog_recover+0xd8/0x170 xfs_log_mount+0x181/0x300 xfs_mountfs+0x4a1/0x9b0 xfs_fs_fill_super+0x3c0/0x7b0 get_tree_bdev+0x171/0x270 ? suffix_kstrtoint.constprop.0+0xf0/0xf0 xfs_fs_get_tree+0x15/0x20 vfs_get_tree+0x24/0xc0 path_mount+0x2f5/0xaf0 __x64_sys_mount+0x108/0x140 do_syscall_64+0x3a/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xae Essentially, we are taking a multi-order allocation from kmem_alloc() (which has an open coded no fail, no warn loop) and then reallocating it out to 64kB using krealloc(__GFP_NOFAIL) and that is then triggering the above warning. This is a regression caused by converting this code from an open coded no fail/no warn reallocation loop to using __GFP_NOFAIL. What we actually need here is kvrealloc(), so that if contiguous page allocation fails we fall back to vmalloc() and we don't get nasty warnings happening in XFS. Fixes: `771915c4f6` ("xfs: remove kmem_realloc()") Signed-off-by: Dave Chinner <dchinner@redhat.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Rafael Aquini <aquini@redhat.com>	2021-11-29 11:40:25 -05:00
Darrick J. Wong	4e6b8270c8	xfs: force the log offline when log intent item recovery fails If any part of log intent item recovery fails, we should shut down the log immediately to stop the log from writing a clean unmount record to disk, because the metadata is not consistent. The inability to cancel a dirty transaction catches most of these cases, but there are a few things that have slipped through the cracks, such as ENOSPC from a transaction allocation, or runtime errors that result in cancellation of a non-dirty transaction. This solves some weird behaviors reported by customers where a system goes down, the first mount fails, the second succeeds, but then the fs goes down later because of inconsistent metadata. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2021-06-21 10:14:24 -07:00
Dave Chinner	934933c3ee	xfs: convert raw ag walks to use for_each_perag Convert the raw walks to an iterator, pulling the current AG out of pag->pag_agno instead of the loop iterator variable. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-06-02 10:48:24 +10:00
Dave Chinner	9bbafc7191	xfs: move xfs_perag_get/put to xfs_ag.[ch] They are AG functions, not superblock functions, so move them to the appropriate location. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-06-02 10:48:24 +10:00
Christoph Hellwig	9b3beb028f	xfs: remove the di_dmevmask and di_dmstate fields from struct xfs_icdinode The legacy DMAPI fields were never set by upstream Linux XFS, and have no way to be read using the kernel APIs. So instead of bloating the in-core inode for them just copy them from the on-disk inode into the log when logging the inode. The only caveat is that we need to make sure to zero the fields for newly read or deleted inodes, which is solved using a new flag in the inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-04-07 14:37:03 -07:00
Christoph Hellwig	af9dcddef6	xfs: split xfs_imap_to_bp Split looking up the dinode from xfs_imap_to_bp, which can be significantly simplified as a result. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-04-07 14:37:02 -07:00
Bhaskar Chowdhury	bd24a4f5f7	xfs: Rudimentary typo fixes s/filesytem/filesystem/ s/instrumention/instrumentation/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-03-25 16:47:52 -07:00

1 2 3 4 5 ...

506 Commits