Centos-kernel-stream-9

Commit Graph

Author	SHA1	Message	Date
Bill O'Donnell	fb54bc42ca	xfs: force all buffers to be written during btree bulk load JIRA: https://issues.redhat.com/browse/RHEL-65728 commit 13ae04d8d45227c2ba51e188daf9fc13d08a1b12 Author: Darrick J. Wong <djwong@kernel.org> Date: Fri Dec 15 10:03:27 2023 -0800 xfs: force all buffers to be written during btree bulk load While stress-testing online repair of btrees, I noticed periodic assertion failures from the buffer cache about buffers with incorrect DELWRI_Q state. Looking further, I observed this race between the AIL trying to write out a btree block and repair zapping a btree block after the fact: AIL: Repair0: pin buffer X delwri_queue: set DELWRI_Q add to delwri list stale buf X: clear DELWRI_Q does not clear b_list free space X commit delwri_submit # oops Worse yet, I discovered that running the same repair over and over in a tight loop can result in a second race that cause data integrity problems with the repair: AIL: Repair0: Repair1: pin buffer X delwri_queue: set DELWRI_Q add to delwri list stale buf X: clear DELWRI_Q does not clear b_list free space X commit find free space X get buffer rewrite buffer delwri_queue: set DELWRI_Q already on a list, do not add commit BAD: committed tree root before all blocks written delwri_submit # too late now I traced this to my own misunderstanding of how the delwri lists work, particularly with regards to the AIL's buffer list. If a buffer is logged and committed, the buffer can end up on that AIL buffer list. If btree repairs are run twice in rapid succession, it's possible that the first repair will invalidate the buffer and free it before the next time the AIL wakes up. Marking the buffer stale clears DELWRI_Q from the buffer state without removing the buffer from its delwri list. The buffer doesn't know which list it's on, so it cannot know which lock to take to protect the list for a removal. If the second repair allocates the same block, it will then recycle the buffer to start writing the new btree block. Meanwhile, if the AIL wakes up and walks the buffer list, it will ignore the buffer because it can't lock it, and go back to sleep. When the second repair calls delwri_queue to put the buffer on the list of buffers to write before committing the new btree, it will set DELWRI_Q again, but since the buffer hasn't been removed from the AIL's buffer list, it won't add it to the bulkload buffer's list. This is incorrect, because the bulkload caller relies on delwri_submit to ensure that all the buffers have been sent to disk /before/ committing the new btree root pointer. This ordering requirement is required for data consistency. Worse, the AIL won't clear DELWRI_Q from the buffer when it does finally drop it, so the next thread to walk through the btree will trip over a debug assertion on that flag. To fix this, create a new function that waits for the buffer to be removed from any other delwri lists before adding the buffer to the caller's delwri list. By waiting for the buffer to clear both the delwri list and any potential delwri wait list, we can be sure that repair will initiate writes of all buffers and report all write errors back to userspace instead of committing the new structure. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2024-11-20 11:26:00 -06:00
Bill O'Donnell	d0e7df7358	xfs: allow scanning ranges of the buffer cache for live buffers JIRA: https://issues.redhat.com/browse/RHEL-57114 commit 9ed851f695c71d325758f8c18e265da9316afd26 Author: Darrick J. Wong <djwong@kernel.org> Date: Thu Aug 10 07:48:03 2023 -0700 xfs: allow scanning ranges of the buffer cache for live buffers After an online repair, we need to invalidate buffers representing the blocks from the old metadata that we're replacing. It's possible that parts of a tree that were previously cached in memory are no longer accessible due to media failure or other corruption on interior nodes, so repair figures out the old blocks from the reverse mapping data and scans the buffer cache directly. In other words, online fsck needs to find all the live (i.e. non-stale) buffers for a range of fsblocks so that it can invalidate them. Unfortunately, the current buffer cache code triggers asserts if the rhashtable lookup finds a non-stale buffer of a different length than the key we searched for. For regular operation this is desirable, but for this repair procedure, we don't care since we're going to forcibly stale the buffer anyway. Add an internal lookup flag to avoid the assert. Skip buffers that are already XBF_STALE. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2024-10-15 10:46:22 -05:00
Ming Lei	0c712a8085	xfs: port block device access to files JIRA: https://issues.redhat.com/browse/RHEL-29564 commit 1b9e2d90141c5e25faefbb7891f0ed8606aa02cf Author: Christian Brauner <brauner@kernel.org> Date: Tue Jan 23 14:26:24 2024 +0100 xfs: port block device access to files Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-7-adbd023e19cc@kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2024-04-17 10:18:37 +08:00
Ming Lei	079721ca6f	xfs: Convert to bdev_open_by_path() JIRA: https://issues.redhat.com/browse/RHEL-29262 commit e340dd63f6a11402424b3d77e51149bce8fcba7d Author: Jan Kara <jack@suse.cz> Date: Wed Sep 27 11:34:34 2023 +0200 xfs: Convert to bdev_open_by_path() Convert xfs to use bdev_open_by_path() and pass the handle around. CC: "Darrick J. Wong" <djwong@kernel.org> CC: linux-xfs@vger.kernel.org Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-28-jack@suse.cz Acked-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2024-03-19 10:07:46 +08:00
Bill O'Donnell	426777a415	xfs: xfs_buf cache destroy isn't RCU safe Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832 commit 231f91ab504ecebcb88e942341b3d7dd91de45f1 Author: Dave Chinner <dchinner@redhat.com> Date: Mon Jul 18 18:20:37 2022 -0700 xfs: xfs_buf cache destroy isn't RCU safe Darrick and Sachin Sant reported that xfs/435 and xfs/436 would report an non-empty xfs_buf slab on module remove. This isn't easily to reproduce, but is clearly a side effect of converting the buffer caceh to RUC freeing and lockless lookups. Sachin bisected and Darrick hit it when testing the patchset directly. Turns out that the xfs_buf slab is not destroyed when all the other XFS slab caches are destroyed. Instead, it's got it's own little wrapper function that gets called separately, and so it doesn't have an rcu_barrier() call in it that is needed to drain all the rcu callbacks before the slab is destroyed. Fix it by removing the xfs_buf_init/terminate wrappers that just allocate and destroy the xfs_buf slab, and move them to the same place that all the other slab caches are set up and destroyed. Reported-and-tested-by: Sachin Sant <sachinp@linux.ibm.com> Fixes: 298f34224506 ("xfs: lockless buffer lookup") Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2023-05-18 11:11:48 -05:00
Bill O'Donnell	2481f637d7	xfs: lockless buffer lookup Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832 commit 298f342245066309189d8637ca7339d56840c3e1 Author: Dave Chinner <dchinner@redhat.com> Date: Thu Jul 14 12:05:07 2022 +1000 xfs: lockless buffer lookup Now that we have a standalone fast path for buffer lookup, we can easily convert it to use rcu lookups. When we continually hammer the buffer cache with trylock lookups, we end up with a huge amount of lock contention on the per-ag buffer hash locks: - 92.71% 0.05% [kernel] [k] xfs_inodegc_worker - 92.67% xfs_inodegc_worker - 92.13% xfs_inode_unlink - 91.52% xfs_inactive_ifree - 85.63% xfs_read_agi - 85.61% xfs_trans_read_buf_map - 85.59% xfs_buf_read_map - xfs_buf_get_map - 85.55% xfs_buf_find - 72.87% _raw_spin_lock - do_raw_spin_lock 71.86% __pv_queued_spin_lock_slowpath - 8.74% xfs_buf_rele - 7.88% _raw_spin_lock - 7.88% do_raw_spin_lock 7.63% __pv_queued_spin_lock_slowpath - 1.70% xfs_buf_trylock - 1.68% down_trylock - 1.41% _raw_spin_lock_irqsave - 1.39% do_raw_spin_lock __pv_queued_spin_lock_slowpath - 0.76% _raw_spin_unlock 0.75% do_raw_spin_unlock This is basically hammering the pag->pag_buf_lock from lots of CPUs doing trylocks at the same time. Most of the buffer trylock operations ultimately fail after we've done the lookup, so we're really hammering the buf hash lock whilst making no progress. We can also see significant spinlock traffic on the same lock just under normal operation when lots of tasks are accessing metadata from the same AG, so let's avoid all this by converting the lookup fast path to leverages the rhashtable's ability to do rcu protected lookups. We avoid races with the buffer release path by using atomic_inc_not_zero() on the buffer hold count. Any buffer that is in the LRU will have a non-zero count, thereby allowing the lockless fast path to be taken in most cache hit situations. If the buffer hold count is zero, then it is likely going through the release path so in that case we fall back to the existing lookup miss slow path. The slow path will then do an atomic lookup and insert under the buffer hash lock and hence serialise correctly against buffer release freeing the buffer. The use of rcu protected lookups means that buffer handles now need to be freed by RCU callbacks (same as inodes). We still free the buffer pages before the RCU callback - we won't be trying to access them at all on a buffer that has zero references - but we need the buffer handle itself to be present for the entire rcu protected read side to detect a zero hold count correctly. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2023-05-18 11:11:47 -05:00
Bill O'Donnell	ac5ff39607	xfs: rework xfs_buf_incore() API Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832 commit 85c73bf726e41be276bcad3325d9a8aef10be289 Author: Dave Chinner <dchinner@redhat.com> Date: Thu Jul 7 22:05:18 2022 +1000 xfs: rework xfs_buf_incore() API Make it consistent with the other buffer APIs to return a error and the buffer is placed in a parameter. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2023-05-18 11:11:41 -05:00
Bill O'Donnell	ca371f43d0	xfs: convert buffer flags to unsigned. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2167832 commit b9b3fe152e4966cf8562630de67aa49e2f9c9222 Author: Dave Chinner <david@fromorbit.com> Date: Thu Apr 21 08:44:59 2022 +1000 xfs: convert buffer flags to unsigned. 5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned fields to be unsigned. This manifests as a compiler error such as: /kisskb/src/fs/xfs/./xfs_trace.h:432:2: note: in expansion of macro 'TP_printk' TP_printk("dev %d:%d daddr 0x%llx bbcount 0x%x hold %d pincount %d " ^ /kisskb/src/fs/xfs/./xfs_trace.h:440:5: note: in expansion of macro '__print_flags' __print_flags(__entry->flags, "\|", XFS_BUF_FLAGS), ^ /kisskb/src/fs/xfs/xfs_buf.h:67:4: note: in expansion of macro 'XBF_UNMAPPED' { XBF_UNMAPPED, "UNMAPPED" } ^ /kisskb/src/fs/xfs/./xfs_trace.h:440:40: note: in expansion of macro 'XFS_BUF_FLAGS' __print_flags(__entry->flags, "\|", XFS_BUF_FLAGS), ^ /kisskb/src/fs/xfs/./xfs_trace.h: In function 'trace_raw_output_xfs_buf_flags_class': /kisskb/src/fs/xfs/xfs_buf.h:46:23: error: initializer element is not constant #define XBF_UNMAPPED (1 << 31)/* do not map the buffer */ as __print_flags assigns XFS_BUF_FLAGS to a structure that uses an unsigned long for the flag. Since this results in the value of XBF_UNMAPPED causing a signed integer overflow, the result is technically undefined behavior, which gcc-5 does not accept as an integer constant. This is based on a patch from Arnd Bergman <arnd@arndb.de>. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2023-05-18 11:11:00 -05:00
Jeff Moyer	7f25e45b02	dax: return the partition offset from fs_dax_get_by_bdev Bugzilla: https://bugzilla.redhat.com/2162211 Conflicts: dropped ext2 and erofs hunks, as they're not supported in RHEL. Fixed up ext4 conflict due to RHEL differences. commit cd913c76f489def1a388e3a5b10df94948ede3f5 Author: Christoph Hellwig <hch@lst.de> Date: Mon Nov 29 11:21:59 2021 +0100 dax: return the partition offset from fs_dax_get_by_bdev Prepare for the removal of the block_device from the DAX I/O path by returning the partition offset from fs_dax_get_by_bdev so that the file systems have it at hand for use during I/O. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-26-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2023-03-14 10:54:18 -04:00
Jeff Moyer	cf1518f312	xfs: move dax device handling into xfs_{alloc,free}_buftarg Bugzilla: https://bugzilla.redhat.com/2162211 commit 5b5abbefec1bea98abba8f1cffcf72c11c32a92d Author: Christoph Hellwig <hch@lst.de> Date: Mon Nov 29 11:21:55 2021 +0100 xfs: move dax device handling into xfs_{alloc,free}_buftarg Hide the DAX device lookup from the xfs_super.c code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20211129102203.2243509-22-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2023-03-14 10:54:17 -04:00
Brian Foster	21a25a1300	xfs: rename buffer cache index variable b_bn Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143 Upstream Status: linux.git commit 4c7f65aea7b7fe66c08f8f7304c1ea3f7a871d5a Author: Dave Chinner <dchinner@redhat.com> Date: Wed Aug 18 18:48:54 2021 -0700 xfs: rename buffer cache index variable b_bn To stop external users from using b_bn as the disk address of the buffer, rename it to b_rhash_key to indicate that it is the buffer cache index, not the block number of the buffer. Code that needs the disk address should use xfs_buf_daddr() to obtain it. Do the rename and clean up any of the remaining internal b_bn users. Also clean up any remaining b_bn cruft that is now unused. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Brian Foster <bfoster@redhat.com>	2022-08-25 08:11:36 -04:00
Brian Foster	74f147b83b	xfs: introduce xfs_buf_daddr() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143 Upstream Status: linux.git commit 04fcad80cd068731a779fb442f78234732683755 Author: Dave Chinner <dchinner@redhat.com> Date: Wed Aug 18 18:46:57 2021 -0700 xfs: introduce xfs_buf_daddr() Introduce a helper function xfs_buf_daddr() to extract the disk address of the buffer from the struct xfs_buf. This will replace direct accesses to bp->b_bn and bp->b_maps[0].bm_bn, as well as the XFS_BUF_ADDR() macro. This patch introduces the helper function and replaces all uses of XFS_BUF_ADDR() as this is just a simple sed replacement. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Brian Foster <bfoster@redhat.com>	2022-08-25 08:11:36 -04:00
Brian Foster	f0f7eee07e	xfs: sb verifier doesn't handle uncached sb buffer Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143 Upstream Status: linux.git commit 8cf07f3dd56195316be97758cb8b4e1d7183ea84 Author: Dave Chinner <dchinner@redhat.com> Date: Wed Aug 18 18:46:24 2021 -0700 xfs: sb verifier doesn't handle uncached sb buffer The verifier checks explicitly for bp->b_bn == XFS_SB_DADDR to match the primary superblock buffer, but the primary superblock is an uncached buffer and so bp->b_bn is always -1ULL. Hence this never matches and the CRC error reporting is wholly dependent on the mount superblock already being populated so CRC feature checks pass and allow CRC errors to be reported. Fix this so that the primary superblock CRC error reporting is not dependent on already having read the superblock into memory. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Brian Foster <bfoster@redhat.com>	2022-08-25 08:11:33 -04:00
Brian Foster	7fe76aa101	xfs: remove kmem_alloc_io() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083143 Upstream Status: linux.git commit 98fe2c3cef21b784e2efd1d9d891430d95b4f073 Author: Dave Chinner <dchinner@redhat.com> Date: Mon Aug 9 10:10:01 2021 -0700 xfs: remove kmem_alloc_io() Since commit `59bb47985c` ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)"), the core slab code now guarantees slab alignment in all situations sufficient for IO purposes (i.e. minimum of 512 byte alignment of >= 512 byte sized heap allocations) we no longer need the workaround in the XFS code to provide this guarantee. Replace the use of kmem_alloc_io() with kmem_alloc() or kmem_alloc_large() appropriately, and remove the kmem_alloc_io() interface altogether. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Brian Foster <bfoster@redhat.com>	2022-08-25 08:11:24 -04:00
Christoph Hellwig	54cd3aa6f8	xfs: remove ->b_offset handling for page backed buffers ->b_offset can only be non-zero for _XBF_KMEM backed buffers, so remove all code dealing with it for page backed buffers. Signed-off-by: Christoph Hellwig <hch@lst.de> [dgc: modified to fit this patchset] Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-06-07 11:49:50 +10:00
Brian Foster	8321ddb2fa	xfs: don't drain buffer lru on freeze and read-only remount xfs_buftarg_drain() is called from xfs_log_quiesce() to ensure the buffer cache is reclaimed during unmount. xfs_log_quiesce() is also called from xfs_quiesce_attr(), however, which means that cache state is completely drained for filesystem freeze and read-only remount. While technically harmless, this is unnecessarily heavyweight. Both freeze and read-only mounts allow reads and thus allow population of the buffer cache. Therefore, the transitional sequence in either case really only needs to quiesce outstanding writes to return the filesystem in a generally read-only state. Additionally, some users have reported that attempts to freeze a filesystem concurrent with a read-heavy workload causes the freeze process to stall for a significant amount of time. This occurs because, as mentioned above, the read workload repopulates the buffer LRU while the freeze task attempts to drain it. To improve this situation, replace the drain in xfs_log_quiesce() with a buffer I/O quiesce and lift the drain into the unmount path. This removes buffer LRU reclaim from freeze and read-only [re]mount, but ensures the LRU is still drained before the filesystem unmounts. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-01-22 16:54:50 -08:00
Brian Foster	10fb9ac125	xfs: rename xfs_wait_buftarg() to xfs_buftarg_drain() xfs_wait_buftarg() is vaguely named and somewhat overloaded. Its primary purpose is to reclaim all buffers from the provided buffer target LRU. In preparation to refactor xfs_wait_buftarg() into serialization and LRU draining components, rename the function and associated helpers to something more descriptive. This patch has no functional changes with the minor exception of renaming a tracepoint. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-01-22 16:54:50 -08:00
Dave Chinner	e82226138b	xfs: remove xfs_buf_t typedef Prepare for kernel xfs_buf alignment by getting rid of the xfs_buf_t typedef from userspace. [darrick: This patch is a port of a userspace patch removing the xfs_buf_t typedef in preparation to make the userspace xfs_buf code behave more like its kernel counterpart.] Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2020-12-16 16:07:34 -08:00
Christoph Hellwig	26e328759b	xfs: reuse _xfs_buf_read for re-reading the superblock Instead of poking deeply into buffer cache internals when re-reading the superblock during log recovery just generalize _xfs_buf_read and use it there. Note that we don't have to explicitly set up the ops as they must be set from the initial read. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2020-09-15 20:52:39 -07:00
Christoph Hellwig	6a7584b1d8	xfs: fold xfs_buf_ioend_finish into xfs_ioend No need to keep a separate helper for this logic. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2020-09-15 20:52:38 -07:00
Christoph Hellwig	76b2d32346	xfs: mark xfs_buf_ioend static Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2020-09-15 20:52:38 -07:00
Dave Chinner	b01d1461ae	xfs: call xfs_buf_iodone directly All unmarked dirty buffers should be in the AIL and have log items attached to them. Hence when they are written, we will run a callback to remove the item from the AIL if appropriate. Now that we've handled inode and dquot buffers, all remaining calls are to xfs_buf_iodone() and so we can hard code this rather than use an indirect call. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2020-07-06 10:46:58 -07:00
Dave Chinner	9fe5c77cbe	xfs: mark log recovery buffers for completion Log recovery has it's own buffer write completion handler for buffers that it directly recovers. Convert these to direct calls by flagging these buffers as being log recovery buffers. The flag will get cleared by the log recovery IO completion routine, so it will never leak out of log recovery. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2020-07-06 10:46:58 -07:00
Dave Chinner	0c7e5afbea	xfs: mark dquot buffers in cache dquot buffers always have write IO callbacks, so by marking them directly we can avoid needing to attach ->b_iodone functions to them. This avoids an indirect call, and makes future modifications much simpler. This is largely a rearrangement of the code at this point - no IO completion functionality changes at this point, just how the code is run is modified. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2020-07-06 10:46:58 -07:00
Dave Chinner	f593bf144c	xfs: mark inode buffers in cache Inode buffers always have write IO callbacks, so by marking them directly we can avoid needing to attach ->b_iodone functions to them. This avoids an indirect call, and makes future modifications much simpler. While this is largely a refactor of existing functionality, we broaden the scope of the flag to beyond where inodes are explicitly attached because future changes need to know what type of log items are attached to the buffer. Adding this buffer flag may invoke the inode iodone callback in cases where it wouldn't have been previously, but this is not a functional change because the callback is identical to the normal buffer write iodone callback when inodes are not attached. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2020-07-06 10:46:58 -07:00
Brian Foster	f9bccfcc3b	xfs: refactor ratelimited buffer error messages into helper XFS has some inconsistent log message rate limiting with respect to buffer alerts. The metadata I/O error notification uses the generic ratelimited alert, the buffer push code uses a custom rate limit and the similar quiesce time failure checks are not rate limited at all (when they should be). The custom rate limit defined in the buf item code is specifically crafted for buffer alerts. It is more aggressive than generic rate limiting code because it must accommodate a high frequency of I/O error events in a relative short timeframe. Factor out the custom rate limit state from the buf item code into a per-buftarg rate limit so various alerts are limited based on the target. Define a buffer alert helper function and use it for the buffer alerts that are already ratelimited. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2020-05-07 08:27:46 -07:00
Brian Foster	54b3b1f619	xfs: factor out buffer I/O failure code We use the same buffer I/O failure code in a few different places. It's not much code, but it's not necessarily self-explanatory. Factor it into a helper and document it in one place. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2020-05-07 08:27:45 -07:00
Darrick J. Wong	8d57c21600	xfs: add a function to deal with corrupt buffers post-verifiers Add a helper function to get rid of buffers that we have decided are corrupt after the verifiers have run. This function is intended to handle metadata checks that can't happen in the verifiers, such as inter-block relationship checking. Note that we now mark the buffer stale so that it will not end up on any LRU and will be purged on release. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2020-03-12 07:58:12 -07:00
Darrick J. Wong	cdbcf82b86	xfs: fix xfs_buf_ioerror_alert location reporting Instead of passing __func__ to the error reporting function, let's use the return address builtins so that the messages actually tell you which higher level function called the buffer functions. This was previously true for the xfs_buf_read callers, but not for the xfs_trans_read_buf callers. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2020-01-26 14:32:27 -08:00
Darrick J. Wong	0e3eccce5e	xfs: make xfs_buf_read return an error code Convert xfs_buf_read() to return numeric error codes like most everywhere else in xfs. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2020-01-26 14:32:26 -08:00
Darrick J. Wong	2842b6db3d	xfs: make xfs_buf_get_uncached return an error code Convert xfs_buf_get_uncached() to return numeric error codes like most everywhere else in xfs. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2020-01-26 14:32:26 -08:00
Darrick J. Wong	841263e933	xfs: make xfs_buf_get return an error code Convert xfs_buf_get() to return numeric error codes like most everywhere else in xfs. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2020-01-26 14:32:26 -08:00
Darrick J. Wong	4ed8e27b4f	xfs: make xfs_buf_read_map return an error code Convert xfs_buf_read_map() to return numeric error codes like most everywhere else in xfs. This involves moving the open-coded logic that reports metadata IO read / corruption errors and stales the buffer into xfs_buf_read_map so that the logic is all in one place. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2020-01-26 14:32:26 -08:00
Darrick J. Wong	3848b5f670	xfs: make xfs_buf_get_map return an error code Convert xfs_buf_get_map() to return numeric error codes like most everywhere else in xfs. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2020-01-26 14:32:25 -08:00
Christoph Hellwig	25a409572b	xfs: mark xfs_buf_free static Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2019-10-28 08:37:54 -07:00
Dave Chinner	d916275aa4	xfs: get allocation alignment from the buftarg Needed to feed into the allocation routine to guarantee the memory buffers we add to bios are correctly aligned to the underlying device. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2019-08-26 17:43:14 -07:00
Christoph Hellwig	dbd329f1e4	xfs: add struct xfs_mount pointer to struct xfs_buf We need to derive the mount pointer from a buffer in a lot of place. Add a direct pointer to short cut the pointer chasing. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2019-06-28 19:27:29 -07:00
Christoph Hellwig	8124b9b601	xfs: remove the b_io_length field in struct xfs_buf This field is now always idential to b_length. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2019-06-28 19:27:28 -07:00
Christoph Hellwig	e99b4bd0cb	xfs: properly type the b_log_item field in struct xfs_buf Now that the log code doesn't abuse this field any more we can declare it as a struct xfs_buf_log_item pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2019-06-28 19:27:28 -07:00
Christoph Hellwig	0564501ff5	xfs: remove unused buffer cache APIs Now that the log code uses bios directly we can drop various special cases in the buffer cache code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2019-06-28 19:27:27 -07:00
Christoph Hellwig	ce89755cdf	xfs: renumber XBF_WRITE_FAIL Assining a numerical value that is not close to the flags defined near by is just asking for conflicts later on. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2019-06-28 19:27:18 -07:00
Christoph Hellwig	153fd7b57c	xfs: remove the never used _XBF_COMPOUND flag Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2019-06-28 19:27:18 -07:00
Eric Sandeen	f5b999c03f	xfs: remove unused flag arguments There are several functions which take a flag argument that is only ever passed as "0," so remove these arguments. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2019-06-12 09:00:00 -07:00
Christoph Hellwig	f9a196ee5a	xfs: merge xfs_buf_zero and xfs_buf_iomove xfs_buf_zero is the only caller of xfs_buf_iomove. Remove support for copying from or to the buffer in xfs_buf_iomove and merge the two functions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2019-06-12 08:59:59 -07:00
Darrick J. Wong	15baadf72c	xfs: fix xfs_buf magic number endian checks Create a separate magic16 check function so that we don't run afoul of static checkers. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2019-02-18 09:38:41 -08:00
Brian Foster	8473fee340	xfs: distinguish between inobt and finobt magic values The inode btree verifier code is shared between the inode btree and free inode btree because the underlying metadata formats are essentially equivalent. A side effect of this is that the verifier cannot determine whether a particular btree block should have an inobt or finobt magic value. This logic allows an unfortunate xfs_repair bug to escape detection where certain level > 0 nodes of the finobt are stamped with inobt magic by xfs_repair finobt reconstruction. This is fortunately not a severe problem since the inode btree magic values do not contribute to any changes in kernel behavior, but we do need a means to detect and prevent this problem in the future. Add a field to xfs_buf_ops to store the v4 and v5 superblock magic values expected by a particular verifier. Add a helper to check an on-disk magic value against the value expected by the verifier. Call the helper from the shared [f]inobt verifier code for magic value verification. This ensures that the inode btree blocks each have the appropriate magic value based on specific tree type and superblock version. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2019-02-11 16:07:01 -08:00
Brian Foster	75d0230314	xfs: clarify documentation for the function to reverify buffers Improve the documentation around xfs_buf_ensure_ops, which is the function that is responsible for cleaning up the b_ops state of buffers that go through xrep_findroot_block but don't match anything. Rename the function to xfs_buf_reverify. [darrick: this started off as bfoster mods of a previous patch of mine, but the renaming part is now this separate patch.] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Brian Foster <bfoster@redhat.com>	2019-02-11 16:07:01 -08:00
Darrick J. Wong	1aff5696f3	xfs: always assign buffer verifiers when one is provided If a caller supplies buffer ops when trying to read a buffer and the buffer doesn't already have buf ops assigned, ensure that the ops are assigned to the buffer and the verifier is run on that buffer. Note that current XFS code is careful to assign buffer ops after a xfs_{trans_,}buf_read call in which ops were not supplied. However, we should apply ops defensively in case there is ever a coding mistake; and an upcoming repair patch will need to be able to read a buffer without assigning buf ops. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2018-10-18 17:20:30 +11:00
Eric Sandeen	fa6c668d80	xfs: remove b_last_holder & associated macros The old lock tracking infrastructure in xfs using the b_last_holder field seems to only be useful if you can get into the system with a debugger; it seems that the existing tracepoints would be the way to go these days, and this old infrastructure can be removed. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-08-12 08:37:31 -07:00
Brian Foster	6af88cda00	xfs: combine [a]sync buffer submission apis The buffer I/O submission path consists of separate function calls per type. The buffer I/O type is already controlled via buffer state (XBF_ASYNC), however, so there is no real need for separate submission functions. Combine the buffer submission functions into a single function that processes the buffer appropriately based on XBF_ASYNC. Retain an internal helper with a conditional wait parameter to continue to support batched !XBF_ASYNC submission/completion required by delwri queues. Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:35 -07:00

1 2 3

135 Commits