JIRA: https://issues.redhat.com/browse/RHEL-27745
This patch is a backport of the following upstream commit:
commit 91e79d22be75fec88ae58d274a7c9e49d6215099
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date: Wed Aug 23 00:13:14 2023 +0100
mm: convert DAX lock/unlock page to lock/unlock folio
The one caller of DAX lock/unlock page already calls compound_head(), so
use page_folio() instead, then use a folio throughout the DAX code to
remove uses of page->mapping and page->index.
[jane.chu@oracle.com: add comment to mf_generic_kill_procss(), simplify mf_generic_kill_procs:folio initialization]
Link: https://lkml.kernel.org/r/20230908222336.186313-1-jane.chu@oracle.com
Link: https://lkml.kernel.org/r/20230822231314.349200-1-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Jane Chu <jane.chu@oracle.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jane Chu <jane.chu@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rafael Aquini <raquini@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-64959
commit 50793801fc7f6d08def48754fb0f0706b0cfc394
Author: Darrick J. Wong <djwong@kernel.org>
Date: Thu Oct 3 08:09:48 2024 -0700
fsdax: dax_unshare_iter needs to copy entire blocks
The code that copies data from srcmap to iomap in dax_unshare_iter is
very very broken, which bfoster's recent fsx changes have exposed.
If the pos and len passed to dax_file_unshare are not aligned to an
fsblock boundary, the iter pos and length in the _iter function will
reflect this unalignment.
dax_iomap_direct_access always returns a pointer to the start of the
kmapped fsdax page, even if its pos argument is in the middle of that
page. This is catastrophic for data integrity when iter->pos is not
aligned to a page, because daddr/saddr do not point to the same byte in
the file as iter->pos. Hence we corrupt user data by copying it to the
wrong place.
If iter->pos + iomap_length() in the _iter function not aligned to a
page, then we fail to copy a full block, and only partially populate the
destination block. This is catastrophic for data confidentiality
because we expose stale pmem contents.
Fix both of these issues by aligning copy_pos/copy_len to a page
boundary (remember, this is fsdax so 1 fsblock == 1 base page) so that
we always copy full blocks.
We're not done yet -- there's no call to invalidate_inode_pages2_range,
so programs that have the file range mmap'd will continue accessing the
old memory mapping after the file metadata updates have completed.
Be careful with the return value -- if the unshare succeeds, we still
need to return the number of bytes that the iomap iter thinks we're
operating on.
Cc: ruansy.fnst@fujitsu.com
Fixes: d984648e428b ("fsdax,xfs: port unshare to fsdax")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/172796813328.1131942.16777025316348797355.stgit@frogsfrogsfrogs
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Brian Foster <bfoster@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-64959
commit 95472274b6fed8f2d30fbdda304e12174b3d4099
Author: Darrick J. Wong <djwong@kernel.org>
Date: Thu Oct 3 08:09:32 2024 -0700
fsdax: remove zeroing code from dax_unshare_iter
Remove the code in dax_unshare_iter that zeroes the destination memory
because it's not necessary.
If srcmap is unwritten, we don't have to do anything because that
unwritten extent came from the regular file mapping, and unwritten
extents cannot be shared. The same applies to holes.
Furthermore, zeroing to unshare a mapping is just plain wrong because
unsharing means copy on write, and we should be copying data.
This is effectively a revert of commit 13dd4e04625f ("fsdax: unshare:
zero destination if srcmap is HOLE or UNWRITTEN")
Cc: ruansy.fnst@fujitsu.com
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/172796813311.1131942.16033376284752798632.stgit@frogsfrogsfrogs
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Brian Foster <bfoster@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-64959
commit 6ef6a0e821d3dad6bf8a5d5508762dba9042c84b
Author: Darrick J. Wong <djwong@kernel.org>
Date: Thu Oct 3 08:09:16 2024 -0700
iomap: share iomap_unshare_iter predicate code with fsdax
The predicate code that iomap_unshare_iter uses to decide if it's really
needs to unshare a file range mapping should be shared with the fsdax
version, because right now they're opencoded and inconsistent.
Note that we simplify the predicate logic a bit -- we no longer allow
unsharing of inline data mappings, but there aren't any filesystems that
allow shared inline data currently.
This is a fix in the sense that it should have been ported to fsdax.
Fixes: b53fdb215d13 ("iomap: improve shared block detection in iomap_unshare_iter")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/172796813294.1131942.15762084021076932620.stgit@frogsfrogsfrogs
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Brian Foster <bfoster@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-64959
commit a311a08a4237241fb5b9d219d3e33346de6e83e0
Author: Darrick J. Wong <djwong@kernel.org>
Date: Wed Oct 2 08:02:13 2024 -0700
iomap: constrain the file range passed to iomap_file_unshare
File contents can only be shared (i.e. reflinked) below EOF, so it makes
no sense to try to unshare ranges beyond EOF. Constrain the file range
parameters here so that we don't have to do that in the callers.
Fixes: 5f4e5752a8 ("fs: add iomap_file_dirty")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20241002150213.GC21853@frogsfrogsfrogs
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Brian Foster <bfoster@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27743
Conflicts:
* fs/erofs/data.c: hunks dropped as RHEL is missing commit 06252e9ce05b
("erofs: dax support for non-tailpacking regular file")
* fs/ext2/file.c: minor contex difference as RHEL is missing commit
70f3bad8c315 ("ext2: Convert to using invalidate_lock")
* fs/fuse/dax.c: minor contex difference as RHEL is missing commit
8bcbbe9c7c8e ("fuse: Convert to using invalidate_lock")
This patch is a backport of the following upstream commit:
commit 1d024e7a8dabcc3c84d77532a88c774c32cf8245
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date: Fri Aug 18 21:23:35 2023 +0100
mm: remove enum page_entry_size
Remove the unnecessary encoding of page order into an enum and pass the
page order directly. That lets us get rid of pe_order().
The switch constructs have to be changed to if/else constructs to prevent
GCC from warning on builds with 3-level page tables where PMD_ORDER and
PUD_ORDER have the same value.
If you are looking at this commit because your driver stopped compiling,
look at the previous commit as well and audit your driver to be sure it
doesn't depend on mmap_lock being held in its ->huge_fault method.
[willy@infradead.org: use "order %u" to match the (non dev_t) style]
Link: https://lkml.kernel.org/r/ZOUYekbtTv+n8hYf@casper.infradead.org
Link: https://lkml.kernel.org/r/20230818202335.2739663-4-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rafael Aquini <raquini@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27743
This patch is a backport of the following upstream commit:
commit 051ddcfeb1bdbae45e660c0db2468d29ca15c6c2
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date: Fri Aug 18 21:23:33 2023 +0100
mm: move PMD_ORDER to pgtable.h
Patch series "Change calling convention for ->huge_fault", v2.
There are two unrelated changes to the calling convention for
->huge_fault. I've bundled them together to help people notice the
change. The first is to improve scalability of DAX page faults by
allowing them to be handled under the VMA lock. The second is to remove
enum page_entry_size since it's really unnecessary. The changelogs and
documentation updates hopefully work to that end.
This patch (of 3):
Allow this to be used in generic code. Also add PUD_ORDER.
Link: https://lkml.kernel.org/r/20230818202335.2739663-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20230818202335.2739663-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rafael Aquini <raquini@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-23824
commit 297945d9bc13a10e2ce39f0a3aad38c6812435a5
Author: Abhinav Singh <singhabhinav9051571833@gmail.com>
Date: Wed Nov 8 10:15:50 2023 +0530
fs : Fix warning using plain integer as NULL
Sparse static analysis tools generate a warning with this message
"Using plain integer as NULL pointer". In this case this warning is
being shown because we are trying to initialize pointer to NULL using
integer value 0.
Signed-off-by: Abhinav Singh <singhabhinav9051571833@gmail.com>
Link: https://lore.kernel.org/r/20231108044550.1006555-1-singhabhinav9051571833@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-23824
commit 1ea7ca1b090145519aad998679222f0a14ab8fce
Author: Jane Chu <jane.chu@oracle.com>
Date: Thu Jun 15 12:13:25 2023 -0600
dax: enable dax fault handler to report VM_FAULT_HWPOISON
When multiple processes mmap() a dax file, then at some point,
a process issues a 'load' and consumes a hwpoison, the process
receives a SIGBUS with si_code = BUS_MCEERR_AR and with si_lsb
set for the poison scope. Soon after, any other process issues
a 'load' to the poisoned page (that is unmapped from the kernel
side by memory_failure), it receives a SIGBUS with
si_code = BUS_ADRERR and without valid si_lsb.
This is confusing to user, and is different from page fault due
to poison in RAM memory, also some helpful information is lost.
Channel dax backend driver's poison detection to the filesystem
such that instead of reporting VM_FAULT_SIGBUS, it could report
VM_FAULT_HWPOISON.
If user level block IO syscalls fail due to poison, the errno will
be converted to EIO to maintain block API consistency.
Signed-off-by: Jane Chu <jane.chu@oracle.com>
Link: https://lore.kernel.org/r/20230615181325.1327259-2-jane.chu@oracle.com
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-23824
commit dd0c64258a9d9e74b4896f05c7e77fa3365b5f12
Author: Colin Ian King <colin.i.king@gmail.com>
Date: Wed Jun 21 14:02:56 2023 +0100
fsdax: remove redundant variable 'error'
The variable 'error' is being assigned a value that is never read,
the assignment and the variable and redundant and can be removed.
Cleans up clang scan build warning:
fs/dax.c:1880:10: warning: Although the value stored to 'error' is
used in the enclosing expression, the value is never actually read
from 'error' [deadcode.DeadStores]
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20230621130256.2676126-1-colin.i.king@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2192730
commit e900ba10d15041a6236cc75778cc6e06c3590a58
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date: Wed Mar 22 07:25:58 2023 +0000
fsdax: dedupe should compare the min of two iters' length
In an dedupe comparison iter loop, the length of iomap_iter decreases
because it implies the remaining length after each iteration.
The dedupe command will fail with -EIO if the range is larger than one
page size and not aligned to the page size. Also report warning in dmesg:
[ 4338.498374] ------------[ cut here ]------------
[ 4338.498689] WARNING: CPU: 3 PID: 1415645 at fs/iomap/iter.c:16
...
The compare function should use the min length of the current iters,
not the total length.
Link: https://lkml.kernel.org/r/1679469958-2-1-git-send-email-ruansy.fnst@fujitsu.com
Fixes: 0e79e3736d54 ("fsdax: dedupe: iter two files at the same time")
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2192730
commit 13dd4e04625f600e5affb1b3f0b6c35268ab839b
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date: Wed Mar 22 11:11:09 2023 +0000
fsdax: unshare: zero destination if srcmap is HOLE or UNWRITTEN
unshare copies data from source to destination. But if the source is
HOLE or UNWRITTEN extents, we should zero the destination, otherwise
the HOLE or UNWRITTEN part will be user-visible old data of the new
allocated extent.
Found by running generic/649 while mounting with -o dax=always on pmem.
Link: https://lkml.kernel.org/r/1679483469-2-1-git-send-email-ruansy.fnst@fujitsu.com
Fixes: d984648e428b ("fsdax,xfs: port unshare to fsdax")
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2192730
commit 388bc034d91d480efa88abc5c8d6e6c8a878b1ab
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date: Thu Feb 2 12:33:47 2023 +0000
fsdax: dax_unshare_iter() should return a valid length
The copy_mc_to_kernel() will return 0 if it executed successfully. Then
the return value should be set to the length it copied.
[akpm@linux-foundation.org: don't mess up `ret', per Matthew]
Link: https://lkml.kernel.org/r/1675341227-14-1-git-send-email-ruansy.fnst@fujitsu.com
Fixes: d984648e428b ("fsdax,xfs: port unshare to fsdax")
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2192730
commit d984648e428bf88cbd94ebe346c73632cb92fffb
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date: Thu Dec 1 15:32:33 2022 +0000
fsdax,xfs: port unshare to fsdax
Implement unshare in fsdax mode: copy data from srcmap to iomap.
Link: https://lkml.kernel.org/r/1669908753-169-1-git-send-email-ruansy.fnst@fujitsu.com
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2192730
commit 0e79e3736d54bb8efbc9fb29cc3b54a132783565
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date: Thu Dec 1 15:31:41 2022 +0000
fsdax: dedupe: iter two files at the same time
The iomap_iter() on a range of one file may loop more than once. In this
case, the inner dst_iter can update its iomap but the outer src_iter
can't. This may cause the wrong remapping in filesystem. Let them called
at the same time.
Link: https://lkml.kernel.org/r/1669908701-93-1-git-send-email-ruansy.fnst@fujitsu.com
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2192730
Conflicts: line numbering in xfs_iomap.c due to previous out of order patch
commit c6f0b395b2110aa26a134a9a395875b1ec0a5aae
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date: Thu Dec 1 15:28:54 2022 +0000
fsdax,xfs: set the shared flag when file extent is shared
If a dax page is shared, mapread at different offsets can also trigger
page fault on same dax page. So, change the flag from "cow" to "shared".
And get the shared flag from filesystem when read.
Link: https://lkml.kernel.org/r/1669908538-55-5-git-send-email-ruansy.fnst@fujitsu.com
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2192730
commit 708dfad2eb4169324189782edd6d3763237e0489
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date: Thu Dec 1 15:28:53 2022 +0000
fsdax: zero the edges if source is HOLE or UNWRITTEN
If srcmap contains invalid data, such as HOLE and UNWRITTEN, the dest page
should be zeroed. Otherwise, since it's a pmem, old data may remains on
the dest page, the result of CoW will be incorrect.
The function name is also not easy to understand, rename it to
"dax_iomap_copy_around()", which means it copies data around the range.
[akpm@linux-foundation.org: update dax_iomap_copy_around() kerneldoc, per Darrick]
Link: https://lkml.kernel.org/r/1669973145-318-1-git-send-email-ruansy.fnst@fujitsu.com
Link: https://lkml.kernel.org/r/1669908538-55-4-git-send-email-ruansy.fnst@fujitsu.com
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2192730
commit f80e1668888f34c0764822e74953c997daf2ccdb
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date: Thu Dec 1 15:28:52 2022 +0000
fsdax: invalidate pages when CoW
CoW changes the share state of a dax page, but the share count of the page
isn't updated. The next time access this page, it should have been a
newly accessed, but old association exists. So, we need to clear the
share state when CoW happens, in both dax_iomap_rw() and dax_zero_iter().
Link: https://lkml.kernel.org/r/1669908538-55-3-git-send-email-ruansy.fnst@fujitsu.com
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2192730
commit 169004265860327182ecf92297b25b6271e81e96
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date: Thu Dec 1 15:28:51 2022 +0000
fsdax: introduce page->share for fsdax in reflink mode
Patch series "fsdax,xfs: fix warning messages", v2.
Many testcases failed in dax+reflink mode with warning message in dmesg.
Such as generic/051,075,127. The warning message is like this:
[ 775.509337] ------------[ cut here ]------------
[ 775.509636] WARNING: CPU: 1 PID: 16815 at fs/dax.c:386 dax_insert_entry.cold+0x2e/0x69
[ 775.510151] Modules linked in: auth_rpcgss oid_registry nfsv4 algif_hash af_alg af_packet nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter ip_tables x_tables dax_pmem nd_pmem nd_btt sch_fq_codel configfs xfs libcrc32c fuse
[ 775.524288] CPU: 1 PID: 16815 Comm: fsx Kdump: loaded Tainted: G W 6.1.0-rc4+ #164 eb34e4ee4200c7cbbb47de2b1892c5a3e027fd6d
[ 775.524904] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.0-3-3 04/01/2014
[ 775.525460] RIP: 0010:dax_insert_entry.cold+0x2e/0x69
[ 775.525797] Code: c7 c7 18 eb e0 81 48 89 4c 24 20 48 89 54 24 10 e8 73 6d ff ff 48 83 7d 18 00 48 8b 54 24 10 48 8b 4c 24 20 0f 84 e3 e9 b9 ff <0f> 0b e9 dc e9 b9 ff 48 c7 c6 a0 20 c3 81 48 c7 c7 f0 ea e0 81 48
[ 775.526708] RSP: 0000:ffffc90001d57b30 EFLAGS: 00010082
[ 775.527042] RAX: 000000000000002a RBX: 0000000000000000 RCX: 0000000000000042
[ 775.527396] RDX: ffffea000a0f6c80 RSI: ffffffff81dfab1b RDI: 00000000ffffffff
[ 775.527819] RBP: ffffea000a0f6c40 R08: 0000000000000000 R09: ffffffff820625e0
[ 775.528241] R10: ffffc90001d579d8 R11: ffffffff820d2628 R12: ffff88815fc98320
[ 775.528598] R13: ffffc90001d57c18 R14: 0000000000000000 R15: 0000000000000001
[ 775.528997] FS: 00007f39fc75d740(0000) GS:ffff88817bc80000(0000) knlGS:0000000000000000
[ 775.529474] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 775.529800] CR2: 00007f39fc772040 CR3: 0000000107eb6001 CR4: 00000000003706e0
[ 775.530214] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 775.530592] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 775.531002] Call Trace:
[ 775.531230] <TASK>
[ 775.531444] dax_fault_iter+0x267/0x6c0
[ 775.531719] dax_iomap_pte_fault+0x198/0x3d0
[ 775.532002] __xfs_filemap_fault+0x24a/0x2d0 [xfs aa8d25411432b306d9554da38096f4ebb86bdfe7]
[ 775.532603] __do_fault+0x30/0x1e0
[ 775.532903] do_fault+0x314/0x6c0
[ 775.533166] __handle_mm_fault+0x646/0x1250
[ 775.533480] handle_mm_fault+0xc1/0x230
[ 775.533810] do_user_addr_fault+0x1ac/0x610
[ 775.534110] exc_page_fault+0x63/0x140
[ 775.534389] asm_exc_page_fault+0x22/0x30
[ 775.534678] RIP: 0033:0x7f39fc55820a
[ 775.534950] Code: 00 01 00 00 00 74 99 83 f9 c0 0f 87 7b fe ff ff c5 fe 6f 4e 20 48 29 fe 48 83 c7 3f 49 8d 0c 10 48 83 e7 c0 48 01 fe 48 29 f9 <f3> a4 c4 c1 7e 7f 00 c4 c1 7e 7f 48 20 c5 f8 77 c3 0f 1f 44 00 00
[ 775.535839] RSP: 002b:00007ffc66a08118 EFLAGS: 00010202
[ 775.536157] RAX: 00007f39fc772001 RBX: 0000000000042001 RCX: 00000000000063c1
[ 775.536537] RDX: 0000000000006400 RSI: 00007f39fac42050 RDI: 00007f39fc772040
[ 775.536919] RBP: 0000000000006400 R08: 00007f39fc772001 R09: 0000000000042000
[ 775.537304] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000001
[ 775.537694] R13: 00007f39fc772000 R14: 0000000000006401 R15: 0000000000000003
[ 775.538086] </TASK>
[ 775.538333] ---[ end trace 0000000000000000 ]---
This also affects dax+noreflink mode if we run the test after a
dax+reflink test. So, the most urgent thing is solving the warning
messages.
With these fixes, most warning messages in dax_associate_entry() are gone.
But honestly, generic/388 will randomly failed with the warning. The
case shutdown the xfs when fsstress is running, and do it for many times.
I think the reason is that dax pages in use are not able to be invalidated
in time when fs is shutdown. The next time dax page to be associated, it
still remains the mapping value set last time. I'll keep on solving it.
The warning message in dax_writeback_one() can also be fixed because of
the dax unshare.
This patch (of 8):
fsdax page is used not only when CoW, but also mapread. To make the it
easily understood, use 'share' to indicate that the dax page is shared by
more than one extent. And add helper functions to use it.
Also, the flag needs to be renamed to PAGE_MAPPING_DAX_SHARED.
[ruansy.fnst@fujitsu.com: rename several functions]
Link: https://lkml.kernel.org/r/1669972991-246-1-git-send-email-ruansy.fnst@fujitsu.com
[ruansy.fnst@fujitsu.com: v2.2]
Link: https://lkml.kernel.org/r/1670381359-53-1-git-send-email-ruansy.fnst@fujitsu.com
Link: https://lkml.kernel.org/r/1669908538-55-1-git-send-email-ruansy.fnst@fujitsu.com
Link: https://lkml.kernel.org/r/1669908538-55-2-git-send-email-ruansy.fnst@fujitsu.com
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2192730
commit 6f7db3894ae23eb5d40af4efb404aa0c072a68d2
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date: Fri Jun 3 13:37:36 2022 +0800
fsdax: dedup file range to use a compare function
With dax we cannot deal with readpage() etc. So, we create a dax
comparison function which is similar with vfs_dedupe_file_range_compare().
And introduce dax_remap_file_range_prep() for filesystem use.
Link: https://lkml.kernel.org/r/20220603053738.1218681-13-ruansy.fnst@fujitsu.com
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.wiliams@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Ritesh Harjani <riteshh@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2192730
commit 8dbfc76da30472cfa07218a27eaaa538f0a49551
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date: Fri Jun 3 13:37:35 2022 +0800
fsdax: add dax_iomap_cow_copy() for dax zero
Punch hole on a reflinked file needs dax_iomap_cow_copy() too. Otherwise,
data in not aligned area will be not correct. So, add the CoW operation
for not aligned case in dax_memzero().
Link: https://lkml.kernel.org/r/20220603053738.1218681-12-ruansy.fnst@fujitsu.com
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.wiliams@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2192730
commit e5d6df73302c8d1e7ab2d3555f0faafd0d4b0027
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date: Fri Jun 3 13:37:34 2022 +0800
fsdax: replace mmap entry in case of CoW
Replace the existing entry to the newly allocated one in case of CoW.
Also, we mark the entry as PAGECACHE_TAG_TOWRITE so writeback marks this
entry as writeprotected. This helps us snapshots so new write pagefaults
after snapshots trigger a CoW.
Link: https://lkml.kernel.org/r/20220603053738.1218681-11-ruansy.fnst@fujitsu.com
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.wiliams@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2192730
commit ff17b8df224b98e282ec39a9949a3672fa3dbe93
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date: Fri Jun 3 13:37:33 2022 +0800
fsdax: introduce dax_iomap_cow_copy()
In the case where the iomap is a write operation and iomap is not equal to
srcmap after iomap_begin, we consider it is a CoW operation.
In this case, the destination (iomap->addr) points to a newly allocated
extent. It is needed to copy the data from srcmap to the extent. In
theory, it is better to copy the head and tail ranges which is outside of
the non-aligned area instead of copying the whole aligned range. But in
dax page fault, it will always be an aligned range. So copy the whole
range in this case.
Link: https://lkml.kernel.org/r/20220603053738.1218681-10-ruansy.fnst@fujitsu.com
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.wiliams@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Ritesh Harjani <riteshh@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2192730
commit e28cd3e50f3041186ba7fe74a9c7443cd8afc2da
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date: Fri Jun 3 13:37:32 2022 +0800
fsdax: output address in dax_iomap_pfn() and rename it
Add address output in dax_iomap_pfn() in order to perform a memcpy() in
CoW case. Since this function both output address and pfn, rename it to
dax_iomap_direct_access().
[ruansy.fnst@fujitsu.com: initialize `rc', per Dan]
Link: https://lore.kernel.org/linux-fsdevel/Yp8FUZnO64Qvyx5G@kili/
Link: https://lkml.kernel.org/r/20220607143837.161174-1-ruansy.fnst@fujitsu.com
Link: https://lkml.kernel.org/r/20220603053738.1218681-9-ruansy.fnst@fujitsu.com
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.wiliams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2192730
commit 6061b69b9a550a2ab84e805d0d2315ba6215f112
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date: Fri Jun 3 13:37:31 2022 +0800
fsdax: set a CoW flag when associate reflink mappings
Introduce a PAGE_MAPPING_DAX_COW flag to support association with CoW file
mappings. In this case, since the dax-rmap has already took the
responsibility to look up for shared files by given dax page, the
page->mapping is no longer to used for rmap but for marking that this dax
page is shared. And to make sure disassociation works fine, we use
page->index as refcount, and clear page->mapping to the initial state when
page->index is decreased to 0.
With the help of this new flag, it is able to distinguish normal case and
CoW case, and keep the warning in normal case.
Link: https://lkml.kernel.org/r/20220603053738.1218681-8-ruansy.fnst@fujitsu.com
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.wiliams@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Ritesh Harjani <riteshh@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2192730
commit 2f437effc689ef913fbe5e31110580b4e7cf04be
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date: Fri Jun 3 13:37:28 2022 +0800
fsdax: introduce dax_lock_mapping_entry()
The current dax_lock_page() locks dax entry by obtaining mapping and index
in page. To support 1-to-N RMAP in NVDIMM, we need a new function to lock
a specific dax entry corresponding to this file's mapping,index. And
output the page corresponding to the specific dax entry for caller use.
Link: https://lkml.kernel.org/r/20220603053738.1218681-5-ruansy.fnst@fujitsu.com
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.wiliams@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Ritesh Harjani <riteshh@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2162211
commit f8189d5d5fbf082786fb91c549f5127f23daec09
Author: Kaixu Xia <kaixuxia@tencent.com>
Date: Thu Jun 30 10:04:18 2022 -0700
dax: set did_zero to true when zeroing successfully
It is unnecessary to check and set did_zero value in while() loop
in dax_zero_iter(), we can set did_zero to true only when zeroing
successfully at last.
Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2162211
commit 047218ec904da19c45c4a70274fc3f818a1fcba1
Author: Jane Chu <jane.chu@oracle.com>
Date: Fri Apr 22 16:45:06 2022 -0600
dax: add .recovery_write dax_operation
Introduce dax_recovery_write() operation. The function is used to
recover a dax range that contains poison. Typical use case is when
a user process receives a SIGBUS with si_code BUS_MCEERR_AR
indicating poison(s) in a dax range, in response, the user process
issues a pwrite() to the page-aligned dax range, thus clears the
poison and puts valid data in the range.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jane Chu <jane.chu@oracle.com>
Link: https://lore.kernel.org/r/20220422224508.440670-6-jane.chu@oracle.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2162211
commit e511c4a3d2a1f64aafc1f5df37a2ffcf7ef91b55
Author: Jane Chu <jane.chu@oracle.com>
Date: Fri May 13 15:10:58 2022 -0700
dax: introduce DAX_RECOVERY_WRITE dax access mode
Up till now, dax_direct_access() is used implicitly for normal
access, but for the purpose of recovery write, dax range with
poison is requested. To make the interface clear, introduce
enum dax_access_mode {
DAX_ACCESS,
DAX_RECOVERY_WRITE,
}
where DAX_ACCESS is used for normal dax access, and
DAX_RECOVERY_WRITE is used for dax recovery write.
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
Link: https://lore.kernel.org/r/165247982851.52965.11024212198889762949.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2162211
commit c2e8021a535d3e7cc4f5f1418c4acf97589f8eb5
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date: Thu Jan 27 20:40:53 2022 +0800
fsdax: fix function description
The function name has been changed, so the description should be updated
too.
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220127124058.1172422-5-ruansy.fnst@fujitsu.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2162211
commit 7ac5360cd4d02cc7e0eaf10867f599e041822f12
Author: Christoph Hellwig <hch@lst.de>
Date: Wed Dec 15 09:45:08 2021 +0100
dax: remove the copy_from_iter and copy_to_iter methods
These methods indirect the actual DAX read/write path. In the end pmem
uses magic flush and mc safe variants and fuse and dcssblk use plain ones
while device mapper picks redirects to the underlying device.
Add set_dax_nocache() and set_dax_nomc() APIs to control which copy
routines are used to remove indirect call from the read/write fast path
as well as a lot of boilerplate code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com> [virtiofs]
Link: https://lore.kernel.org/r/20211215084508.435401-5-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2162211
Conflicts: dropped erofs changes.
commit de2051147771017a61b62c02fd4e883c9b07712d
Author: Christoph Hellwig <hch@lst.de>
Date: Mon Nov 29 11:22:00 2021 +0100
fsdax: shift partition offset handling into the file systems
Remove the last user of ->bdev in dax.c by requiring the file system to
pass in an address that already includes the DAX offset. As part of the
only set ->bdev or ->daxdev when actually required in the ->iomap_begin
methods.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> [erofs]
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20211129102203.2243509-27-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2162211
Conflicts: Upstream commit 304a68b9c63b ("xfs: use iomap_valid method
to detect stale cached iomaps") was backported before this, leading
to some minor conflicts.
commit 952da06375c8f3aa58474fff718d9ae8442531b9
Author: Christoph Hellwig <hch@lst.de>
Date: Mon Nov 29 11:21:58 2021 +0100
iomap: add a IOMAP_DAX flag
Add a flag so that the file system can easily detect DAX operations
based just on the iomap operation requested instead of looking at
inode state using IS_DAX. This will be needed to apply the to be
added partition offset only for operations that actually use DAX,
but not things like fiemap that are based on the block device.
In the long run it should also allow turning the bdev, dax_dev
and inline_data into a union.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20211129102203.2243509-25-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2162211
Conflicts: Differences in the iomap code due to patches backported out
of order. Specifically, commit d7b64041164c ("iomap: write iomap
validity checks").
commit c6f40468657d16e4010ef84bf32a761feb3469ea
Author: Christoph Hellwig <hch@lst.de>
Date: Mon Nov 29 11:21:52 2021 +0100
fsdax: decouple zeroing from the iomap buffered I/O code
Unshare the DAX and iomap buffered I/O page zeroing code. This code
previously did a IS_DAX check deep inside the iomap code, which in
fact was the only DAX check in the code. Instead move these checks
into the callers. Most callers already have DAX special casing anyway
and XFS will need it for reflink support as well.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20211129102203.2243509-19-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2162211
commit e5c71954ca11df04d258a663a8a15262be0e17f6
Author: Christoph Hellwig <hch@lst.de>
Date: Mon Nov 29 11:21:51 2021 +0100
fsdax: factor out a dax_memzero helper
Factor out a helper for the "manual" zeroing of a DAX range to clean
up dax_iomap_zero a lot.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20211129102203.2243509-18-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2162211
commit 4a2d7d5950507a27e3074e4a29dc20720235f811
Author: Christoph Hellwig <hch@lst.de>
Date: Mon Nov 29 11:21:50 2021 +0100
fsdax: simplify the offset check in dax_iomap_zero
The file relative offset must have the same alignment as the storage
offset, so use that and get rid of the call to iomap_sector.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20211129102203.2243509-17-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2162211
commit 60696eb26a37ab0199f7833ddbc1b75138c36d16
Author: Christoph Hellwig <hch@lst.de>
Date: Mon Nov 29 11:21:48 2021 +0100
fsdax: simplify the pgoff calculation
Replace the two steps of dax_iomap_sector and bdev_dax_pgoff with a
single dax_iomap_pgoff helper that avoids lots of cumbersome sector
conversions.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20211129102203.2243509-15-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2162211
commit 429f8de70d9872c5ca9b3914b3c4db5659779331
Author: Christoph Hellwig <hch@lst.de>
Date: Mon Nov 29 11:21:47 2021 +0100
fsdax: use a saner calling convention for copy_cow_page_dax
Just pass the vm_fault and iomap_iter structures, and figure out the rest
locally. Note that this requires moving dax_iomap_sector up in the file.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20211129102203.2243509-14-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2162211
commit 9dc2f9cdc63e7db82b6b2ec17894ca1b254f5e5d
Author: Christoph Hellwig <hch@lst.de>
Date: Mon Nov 29 11:21:46 2021 +0100
fsdax: remove a pointless __force cast in copy_cow_page_dax
Despite its name copy_user_page expected kernel addresses, which is what
we already have.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20211129102203.2243509-13-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
commit 06083a0921fd5939223a8bf30e83d5f483d348dc
Author: Muchun Song <songmuchun@bytedance.com>
Date: Thu Apr 28 23:16:10 2022 -0700
dax: fix missing writeprotect the pte entry
Currently dax_mapping_entry_mkclean() fails to clean and write protect the
pte entry within a DAX PMD entry during an *sync operation. This can
result in data loss in the following sequence:
1) process A mmap write to DAX PMD, dirtying PMD radix tree entry and
making the pmd entry dirty and writeable.
2) process B mmap with the @offset (e.g. 4K) and @length (e.g. 4K)
write to the same file, dirtying PMD radix tree entry (already
done in 1)) and making the pte entry dirty and writeable.
3) fsync, flushing out PMD data and cleaning the radix tree entry. We
currently fail to mark the pte entry as clean and write protected
since the vma of process B is not covered in dax_entry_mkclean().
4) process B writes to the pte. These don't cause any page faults since
the pte entry is dirty and writeable. The radix tree entry remains
clean.
5) fsync, which fails to flush the dirty PMD data because the radix tree
entry was clean.
6) crash - dirty data that should have been fsync'd as part of 5) could
still have been in the processor cache, and is lost.
Just to use pfn_mkclean_range() to clean the pfns to fix this issue.
Link: https://lkml.kernel.org/r/20220403053957.10770-6-songmuchun@bytedance.com
Fixes: 4b4bb46d00 ("dax: clear dirty entry tags on cache flush")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2089498
Signed-off-by: Nico Pache <npache@redhat.com>
commit e583b5c472bd23d450e06f148dc1f37be74f7666
Author: Muchun Song <songmuchun@bytedance.com>
Date: Thu Apr 28 23:16:09 2022 -0700
dax: fix cache flush on PMD-mapped pages
The flush_cache_page() only remove a PAGE_SIZE sized range from the cache.
However, it does not cover the full pages in a THP except a head page.
Replace it with flush_cache_range() to fix this issue. This is just a
documentation issue with the respect to properly documenting the expected
usage of cache flushing before modifying the pmd. However, in practice
this is not a problem due to the fact that DAX is not available on
architectures with virtually indexed caches per:
commit d92576f116 ("dax: does not work correctly with virtual aliasing caches")
Link: https://lkml.kernel.org/r/20220403053957.10770-3-songmuchun@bytedance.com
Fixes: f729c8c9b2 ("dax: wrprotect pmd_t in dax_mapping_entry_mkclean")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2089498
Signed-off-by: Nico Pache <npache@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2130933
Avoid the open coded calls to ->iomap_begin and ->iomap_end and call
iomap_iter instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
(cherry picked from commit 65dd814a6187ff46e33718d8eb76244e027837a3)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2130933
The core logic in the two dax page fault functions is similar. So, move
the logic into a common helper function. Also, to facilitate the
addition of new features, such as CoW, switch-case is no longer used to
handle different iomap types.
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
(cherry picked from commit c2436190e492b243235262fc080a2c3189021be9)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2130933
The dax page fault code is too long and a bit difficult to read. And it
is hard to understand when we trying to add new features. Some of the
PTE/PMD codes have similar logic. So, factor out helper functions to
simplify the code.
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[hch: minor cleanups]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
(cherry picked from commit 55f81639a7152848f204f9af3f9b1a14a5944be1)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2130933
I got an infinite loop and a WARNING report when executing a tail command
in virtiofs.
WARNING: CPU: 10 PID: 964 at fs/iomap/iter.c:34 iomap_iter+0x3a2/0x3d0
Modules linked in:
CPU: 10 PID: 964 Comm: tail Not tainted 5.19.0-rc7
Call Trace:
<TASK>
dax_iomap_rw+0xea/0x620
? __this_cpu_preempt_check+0x13/0x20
fuse_dax_read_iter+0x47/0x80
fuse_file_read_iter+0xae/0xd0
new_sync_read+0xfe/0x180
? 0xffffffff81000000
vfs_read+0x14d/0x1a0
ksys_read+0x6d/0xf0
__x64_sys_read+0x1a/0x20
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
The tail command will call read() with a count of 0. In this case,
iomap_iter() will report this WARNING, and always return 1 which casuing
the infinite loop in dax_iomap_rw().
Fixing by checking count whether is 0 in dax_iomap_rw().
Fixes: ca289e0b95af ("fsdax: switch dax_iomap_rw to use iomap_iter")
Signed-off-by: Li Jinlin <lijinlin3@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20220725032050.3873372-1-lijinlin3@huawei.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
(cherry picked from commit 17d9c15c9b9e7fb285f7ac5367dfb5f00ff575e3)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2130933
Switch the dax_iomap_rw implementation to use iomap_iter.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
(cherry picked from commit ca289e0b95afa973d204c77a4ad5c37e06145fbf)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2130933
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
(cherry picked from commit 7e4f4b2d689d959b03cb07dfbdb97b9696cb1076)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917
Conflicts: drop changes on fs/ksmbd/vfs.c which doesn't exist on cs9
commit 322cbb50de711814c42fb088f6d31901502c711a
Author: Christoph Hellwig <hch@lst.de>
Date: Mon Jan 24 10:39:13 2022 +0100
block: remove genhd.h
There is no good reason to keep genhd.h separate from the main blkdev.h
header that includes it. So fold the contents of genhd.h into blkdev.h
and remove genhd.h entirely.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220124093913.742411-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ming Lei <ming.lei@redhat.com>