Commit Graph

340 Commits

Author SHA1 Message Date
Rafael Aquini f91158fc54 mm: convert DAX lock/unlock page to lock/unlock folio
JIRA: https://issues.redhat.com/browse/RHEL-27745

This patch is a backport of the following upstream commit:
commit 91e79d22be75fec88ae58d274a7c9e49d6215099
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Wed Aug 23 00:13:14 2023 +0100

    mm: convert DAX lock/unlock page to lock/unlock folio

    The one caller of DAX lock/unlock page already calls compound_head(), so
    use page_folio() instead, then use a folio throughout the DAX code to
    remove uses of page->mapping and page->index.

    [jane.chu@oracle.com: add comment to mf_generic_kill_procss(), simplify mf_generic_kill_procs:folio initialization]
      Link: https://lkml.kernel.org/r/20230908222336.186313-1-jane.chu@oracle.com
    Link: https://lkml.kernel.org/r/20230822231314.349200-1-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Signed-off-by: Jane Chu <jane.chu@oracle.com>
    Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Jane Chu <jane.chu@oracle.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-12-09 12:22:12 -05:00
Brian Foster b1c13262f6 fsdax: dax_unshare_iter needs to copy entire blocks
JIRA: https://issues.redhat.com/browse/RHEL-64959

commit 50793801fc7f6d08def48754fb0f0706b0cfc394
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Thu Oct 3 08:09:48 2024 -0700

    fsdax: dax_unshare_iter needs to copy entire blocks

    The code that copies data from srcmap to iomap in dax_unshare_iter is
    very very broken, which bfoster's recent fsx changes have exposed.

    If the pos and len passed to dax_file_unshare are not aligned to an
    fsblock boundary, the iter pos and length in the _iter function will
    reflect this unalignment.

    dax_iomap_direct_access always returns a pointer to the start of the
    kmapped fsdax page, even if its pos argument is in the middle of that
    page.  This is catastrophic for data integrity when iter->pos is not
    aligned to a page, because daddr/saddr do not point to the same byte in
    the file as iter->pos.  Hence we corrupt user data by copying it to the
    wrong place.

    If iter->pos + iomap_length() in the _iter function not aligned to a
    page, then we fail to copy a full block, and only partially populate the
    destination block.  This is catastrophic for data confidentiality
    because we expose stale pmem contents.

    Fix both of these issues by aligning copy_pos/copy_len to a page
    boundary (remember, this is fsdax so 1 fsblock == 1 base page) so that
    we always copy full blocks.

    We're not done yet -- there's no call to invalidate_inode_pages2_range,
    so programs that have the file range mmap'd will continue accessing the
    old memory mapping after the file metadata updates have completed.

    Be careful with the return value -- if the unshare succeeds, we still
    need to return the number of bytes that the iomap iter thinks we're
    operating on.

    Cc: ruansy.fnst@fujitsu.com
    Fixes: d984648e428b ("fsdax,xfs: port unshare to fsdax")
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Link: https://lore.kernel.org/r/172796813328.1131942.16777025316348797355.stgit@frogsfrogsfrogs
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Christian Brauner <brauner@kernel.org>

Signed-off-by: Brian Foster <bfoster@redhat.com>
2024-11-12 09:52:41 -05:00
Brian Foster c10a1b2474 fsdax: remove zeroing code from dax_unshare_iter
JIRA: https://issues.redhat.com/browse/RHEL-64959

commit 95472274b6fed8f2d30fbdda304e12174b3d4099
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Thu Oct 3 08:09:32 2024 -0700

    fsdax: remove zeroing code from dax_unshare_iter

    Remove the code in dax_unshare_iter that zeroes the destination memory
    because it's not necessary.

    If srcmap is unwritten, we don't have to do anything because that
    unwritten extent came from the regular file mapping, and unwritten
    extents cannot be shared.  The same applies to holes.

    Furthermore, zeroing to unshare a mapping is just plain wrong because
    unsharing means copy on write, and we should be copying data.

    This is effectively a revert of commit 13dd4e04625f ("fsdax: unshare:
    zero destination if srcmap is HOLE or UNWRITTEN")

    Cc: ruansy.fnst@fujitsu.com
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Link: https://lore.kernel.org/r/172796813311.1131942.16033376284752798632.stgit@frogsfrogsfrogs
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Christian Brauner <brauner@kernel.org>

Signed-off-by: Brian Foster <bfoster@redhat.com>
2024-11-12 09:52:41 -05:00
Brian Foster f377f1295a iomap: share iomap_unshare_iter predicate code with fsdax
JIRA: https://issues.redhat.com/browse/RHEL-64959

commit 6ef6a0e821d3dad6bf8a5d5508762dba9042c84b
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Thu Oct 3 08:09:16 2024 -0700

    iomap: share iomap_unshare_iter predicate code with fsdax

    The predicate code that iomap_unshare_iter uses to decide if it's really
    needs to unshare a file range mapping should be shared with the fsdax
    version, because right now they're opencoded and inconsistent.

    Note that we simplify the predicate logic a bit -- we no longer allow
    unsharing of inline data mappings, but there aren't any filesystems that
    allow shared inline data currently.

    This is a fix in the sense that it should have been ported to fsdax.

    Fixes: b53fdb215d13 ("iomap: improve shared block detection in iomap_unshare_iter")
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Link: https://lore.kernel.org/r/172796813294.1131942.15762084021076932620.stgit@frogsfrogsfrogs
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Christian Brauner <brauner@kernel.org>

Signed-off-by: Brian Foster <bfoster@redhat.com>
2024-11-12 09:52:40 -05:00
Brian Foster 8a48af02c5 iomap: constrain the file range passed to iomap_file_unshare
JIRA: https://issues.redhat.com/browse/RHEL-64959

commit a311a08a4237241fb5b9d219d3e33346de6e83e0
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Wed Oct 2 08:02:13 2024 -0700

    iomap: constrain the file range passed to iomap_file_unshare

    File contents can only be shared (i.e. reflinked) below EOF, so it makes
    no sense to try to unshare ranges beyond EOF.  Constrain the file range
    parameters here so that we don't have to do that in the callers.

    Fixes: 5f4e5752a8 ("fs: add iomap_file_dirty")
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Link: https://lore.kernel.org/r/20241002150213.GC21853@frogsfrogsfrogs
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Brian Foster <bfoster@redhat.com>
    Signed-off-by: Christian Brauner <brauner@kernel.org>

Signed-off-by: Brian Foster <bfoster@redhat.com>
2024-11-12 09:52:40 -05:00
Rafael Aquini 930d4bbabf mm: remove enum page_entry_size
JIRA: https://issues.redhat.com/browse/RHEL-27743
Conflicts:
  * fs/erofs/data.c: hunks dropped as RHEL is missing commit 06252e9ce05b
      ("erofs: dax support for non-tailpacking regular file")
  * fs/ext2/file.c: minor contex difference as RHEL is missing commit
      70f3bad8c315 ("ext2: Convert to using invalidate_lock")
  * fs/fuse/dax.c: minor contex difference as RHEL is missing commit
      8bcbbe9c7c8e ("fuse: Convert to using invalidate_lock")

This patch is a backport of the following upstream commit:
commit 1d024e7a8dabcc3c84d77532a88c774c32cf8245
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Fri Aug 18 21:23:35 2023 +0100

    mm: remove enum page_entry_size

    Remove the unnecessary encoding of page order into an enum and pass the
    page order directly.  That lets us get rid of pe_order().

    The switch constructs have to be changed to if/else constructs to prevent
    GCC from warning on builds with 3-level page tables where PMD_ORDER and
    PUD_ORDER have the same value.

    If you are looking at this commit because your driver stopped compiling,
    look at the previous commit as well and audit your driver to be sure it
    doesn't depend on mmap_lock being held in its ->huge_fault method.

    [willy@infradead.org: use "order %u" to match the (non dev_t) style]
      Link: https://lkml.kernel.org/r/ZOUYekbtTv+n8hYf@casper.infradead.org
    Link: https://lkml.kernel.org/r/20230818202335.2739663-4-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-10-01 11:22:01 -04:00
Rafael Aquini 7781b4d3e4 mm: move PMD_ORDER to pgtable.h
JIRA: https://issues.redhat.com/browse/RHEL-27743

This patch is a backport of the following upstream commit:
commit 051ddcfeb1bdbae45e660c0db2468d29ca15c6c2
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Fri Aug 18 21:23:33 2023 +0100

    mm: move PMD_ORDER to pgtable.h

    Patch series "Change calling convention for ->huge_fault", v2.

    There are two unrelated changes to the calling convention for
    ->huge_fault.  I've bundled them together to help people notice the
    change.  The first is to improve scalability of DAX page faults by
    allowing them to be handled under the VMA lock.  The second is to remove
    enum page_entry_size since it's really unnecessary.  The changelogs and
    documentation updates hopefully work to that end.

    This patch (of 3):

    Allow this to be used in generic code.  Also add PUD_ORDER.

    Link: https://lkml.kernel.org/r/20230818202335.2739663-1-willy@infradead.org
    Link: https://lkml.kernel.org/r/20230818202335.2739663-2-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-10-01 11:22:00 -04:00
Jeff Moyer 987ed932b6 fs : Fix warning using plain integer as NULL
JIRA: https://issues.redhat.com/browse/RHEL-23824

commit 297945d9bc13a10e2ce39f0a3aad38c6812435a5
Author: Abhinav Singh <singhabhinav9051571833@gmail.com>
Date:   Wed Nov 8 10:15:50 2023 +0530

    fs : Fix warning using plain integer as NULL
    
    Sparse static analysis tools generate a warning with this message
    "Using plain integer as NULL pointer". In this case this warning is
    being shown because we are trying to initialize  pointer to NULL using
    integer value 0.
    
    Signed-off-by: Abhinav Singh <singhabhinav9051571833@gmail.com>
    Link: https://lore.kernel.org/r/20231108044550.1006555-1-singhabhinav9051571833@gmail.com
    Reviewed-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Christian Brauner <brauner@kernel.org>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-07-26 14:57:28 -04:00
Jeff Moyer 95e03aeb98 dax: enable dax fault handler to report VM_FAULT_HWPOISON
JIRA: https://issues.redhat.com/browse/RHEL-23824

commit 1ea7ca1b090145519aad998679222f0a14ab8fce
Author: Jane Chu <jane.chu@oracle.com>
Date:   Thu Jun 15 12:13:25 2023 -0600

    dax: enable dax fault handler to report VM_FAULT_HWPOISON
    
    When multiple processes mmap() a dax file, then at some point,
    a process issues a 'load' and consumes a hwpoison, the process
    receives a SIGBUS with si_code = BUS_MCEERR_AR and with si_lsb
    set for the poison scope. Soon after, any other process issues
    a 'load' to the poisoned page (that is unmapped from the kernel
    side by memory_failure), it receives a SIGBUS with
    si_code = BUS_ADRERR and without valid si_lsb.
    
    This is confusing to user, and is different from page fault due
    to poison in RAM memory, also some helpful information is lost.
    
    Channel dax backend driver's poison detection to the filesystem
    such that instead of reporting VM_FAULT_SIGBUS, it could report
    VM_FAULT_HWPOISON.
    
    If user level block IO syscalls fail due to poison, the errno will
    be converted to EIO to maintain block API consistency.
    
    Signed-off-by: Jane Chu <jane.chu@oracle.com>
    Link: https://lore.kernel.org/r/20230615181325.1327259-2-jane.chu@oracle.com
    Reviewed-by: Dan Williams <dan.j.williams@intel.com>
    Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-07-26 14:52:28 -04:00
Jeff Moyer 025b4a54f8 fsdax: remove redundant variable 'error'
JIRA: https://issues.redhat.com/browse/RHEL-23824

commit dd0c64258a9d9e74b4896f05c7e77fa3365b5f12
Author: Colin Ian King <colin.i.king@gmail.com>
Date:   Wed Jun 21 14:02:56 2023 +0100

    fsdax: remove redundant variable 'error'
    
    The variable 'error' is being assigned a value that is never read,
    the assignment and the variable and redundant and can be removed.
    Cleans up clang scan build warning:
    
    fs/dax.c:1880:10: warning: Although the value stored to 'error' is
    used in the enclosing expression, the value is never actually read
    from 'error' [deadcode.DeadStores]
    
    Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
    Link: https://lore.kernel.org/r/20230621130256.2676126-1-colin.i.king@gmail.com
    Reviewed-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-07-26 14:48:28 -04:00
Bill O'Donnell e035b10ddb fsdax: force clear dirty mark if CoW
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2192730

commit f76b3a32879de215ced3f8c754c4077b0c2f79e3
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date:   Fri Mar 24 10:28:00 2023 +0000

    fsdax: force clear dirty mark if CoW

    XFS allows CoW on non-shared extents to combat fragmentation[1].  The old
    non-shared extent could be mwrited before, its dax entry is marked dirty.

    This results in a WARNing:

    [   28.512349] ------------[ cut here ]------------
    [   28.512622] WARNING: CPU: 2 PID: 5255 at fs/dax.c:390 dax_insert_entry+0x342/0x390
    [   28.513050] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache netfs nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables
    [   28.515462] CPU: 2 PID: 5255 Comm: fsstress Kdump: loaded Not tainted 6.3.0-rc1-00001-g85e1481e19c1-dirty #117
    [   28.515902] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.1-1-1 04/01/2014
    [   28.516307] RIP: 0010:dax_insert_entry+0x342/0x390
    [   28.516536] Code: 30 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 48 8b 45 20 48 83 c0 01 e9 e2 fe ff ff 48 8b 45 20 48 83 c0 01 e9 cd fe ff ff <0f> 0b e9 53 ff ff ff 48 8b 7c 24 08 31 f6 e8 1b 61 a1 00 eb 8c 48
    [   28.517417] RSP: 0000:ffffc9000845fb18 EFLAGS: 00010086
    [   28.517721] RAX: 0000000000000053 RBX: 0000000000000155 RCX: 000000000018824b
    [   28.518113] RDX: 0000000000000000 RSI: ffffffff827525a6 RDI: 00000000ffffffff
    [   28.518515] RBP: ffffea00062092c0 R08: 0000000000000000 R09: ffffc9000845f9c8
    [   28.518905] R10: 0000000000000003 R11: ffffffff82ddb7e8 R12: 0000000000000155
    [   28.519301] R13: 0000000000000000 R14: 000000000018824b R15: ffff88810cfa76b8
    [   28.519703] FS:  00007f14a0c94740(0000) GS:ffff88817bd00000(0000) knlGS:0000000000000000
    [   28.520148] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [   28.520472] CR2: 00007f14a0c8d000 CR3: 000000010321c004 CR4: 0000000000770ee0
    [   28.520863] PKRU: 55555554
    [   28.521043] Call Trace:
    [   28.521219]  <TASK>
    [   28.521368]  dax_fault_iter+0x196/0x390
    [   28.521595]  dax_iomap_pte_fault+0x19b/0x3d0
    [   28.521852]  __xfs_filemap_fault+0x234/0x2b0
    [   28.522116]  __do_fault+0x30/0x130
    [   28.522334]  do_fault+0x193/0x340
    [   28.522586]  __handle_mm_fault+0x2d3/0x690
    [   28.522975]  handle_mm_fault+0xe6/0x2c0
    [   28.523259]  do_user_addr_fault+0x1bc/0x6f0
    [   28.523521]  exc_page_fault+0x60/0x140
    [   28.523763]  asm_exc_page_fault+0x22/0x30
    [   28.524001] RIP: 0033:0x7f14a0b589ca
    [   28.524225] Code: c5 fe 7f 07 c5 fe 7f 47 20 c5 fe 7f 47 40 c5 fe 7f 47 60 c5 f8 77 c3 66 0f 1f 84 00 00 00 00 00 40 0f b6 c6 48 89 d1 48 89 fa <f3> aa 48 89 d0 c5 f8 77 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90
    [   28.525198] RSP: 002b:00007fff1dea1c98 EFLAGS: 00010202
    [   28.525505] RAX: 000000000000001e RBX: 000000000014a000 RCX: 0000000000006046
    [   28.525895] RDX: 00007f14a0c82000 RSI: 000000000000001e RDI: 00007f14a0c8d000
    [   28.526290] RBP: 000000000000006f R08: 0000000000000004 R09: 000000000014a000
    [   28.526681] R10: 0000000000000008 R11: 0000000000000246 R12: 028f5c28f5c28f5c
    [   28.527067] R13: 8f5c28f5c28f5c29 R14: 0000000000011046 R15: 00007f14a0c946c0
    [   28.527449]  </TASK>
    [   28.527600] ---[ end trace 0000000000000000 ]---

    To be able to delete this entry, clear its dirty mark before
    invalidate_inode_pages2_range().

    [1] https://lore.kernel.org/linux-xfs/20230321151339.GA11376@frogsfrogsfrogs/

    Link: https://lkml.kernel.org/r/1679653680-2-1-git-send-email-ruansy.fnst@fujitsu.com
    Fixes: f80e1668888f3 ("fsdax: invalidate pages when CoW")
    Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Darrick J. Wong <djwong@kernel.org>
    Cc: Jan Kara <jack@suse.cz>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-06-16 10:35:49 -05:00
Bill O'Donnell 4b03d4970c fsdax: dedupe should compare the min of two iters' length
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2192730

commit e900ba10d15041a6236cc75778cc6e06c3590a58
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date:   Wed Mar 22 07:25:58 2023 +0000

    fsdax: dedupe should compare the min of two iters' length

    In an dedupe comparison iter loop, the length of iomap_iter decreases
    because it implies the remaining length after each iteration.

    The dedupe command will fail with -EIO if the range is larger than one
    page size and not aligned to the page size.  Also report warning in dmesg:

    [ 4338.498374] ------------[ cut here ]------------
    [ 4338.498689] WARNING: CPU: 3 PID: 1415645 at fs/iomap/iter.c:16
    ...

    The compare function should use the min length of the current iters,
    not the total length.

    Link: https://lkml.kernel.org/r/1679469958-2-1-git-send-email-ruansy.fnst@fujitsu.com
    Fixes: 0e79e3736d54 ("fsdax: dedupe: iter two files at the same time")
    Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Jan Kara <jack@suse.cz>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-06-16 10:35:49 -05:00
Bill O'Donnell 98a39c547d fsdax: unshare: zero destination if srcmap is HOLE or UNWRITTEN
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2192730

commit 13dd4e04625f600e5affb1b3f0b6c35268ab839b
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date:   Wed Mar 22 11:11:09 2023 +0000

    fsdax: unshare: zero destination if srcmap is HOLE or UNWRITTEN

    unshare copies data from source to destination.  But if the source is
    HOLE or UNWRITTEN extents, we should zero the destination, otherwise
    the HOLE or UNWRITTEN part will be user-visible old data of the new
    allocated extent.

    Found by running generic/649 while mounting with -o dax=always on pmem.

    Link: https://lkml.kernel.org/r/1679483469-2-1-git-send-email-ruansy.fnst@fujitsu.com
    Fixes: d984648e428b ("fsdax,xfs: port unshare to fsdax")
    Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Darrick J. Wong <djwong@kernel.org>
    Cc: Jan Kara <jack@suse.cz>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Alistair Popple <apopple@nvidia.com>
    Cc: Jason Gunthorpe <jgg@nvidia.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-06-16 10:35:49 -05:00
Bill O'Donnell 82896d0bcd fsdax: dax_unshare_iter() should return a valid length
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2192730

commit 388bc034d91d480efa88abc5c8d6e6c8a878b1ab
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date:   Thu Feb 2 12:33:47 2023 +0000

    fsdax: dax_unshare_iter() should return a valid length

    The copy_mc_to_kernel() will return 0 if it executed successfully.  Then
    the return value should be set to the length it copied.

    [akpm@linux-foundation.org: don't mess up `ret', per Matthew]
    Link: https://lkml.kernel.org/r/1675341227-14-1-git-send-email-ruansy.fnst@fujitsu.com
    Fixes: d984648e428b ("fsdax,xfs: port unshare to fsdax")
    Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
    Cc: Darrick J. Wong <djwong@kernel.org>
    Cc: Alistair Popple <apopple@nvidia.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Jason Gunthorpe <jgg@nvidia.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-06-16 10:35:49 -05:00
Bill O'Donnell 49610cb9cc fsdax,xfs: port unshare to fsdax
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2192730

commit d984648e428bf88cbd94ebe346c73632cb92fffb
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date:   Thu Dec 1 15:32:33 2022 +0000

    fsdax,xfs: port unshare to fsdax

    Implement unshare in fsdax mode: copy data from srcmap to iomap.

    Link: https://lkml.kernel.org/r/1669908753-169-1-git-send-email-ruansy.fnst@fujitsu.com
    Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Cc: Alistair Popple <apopple@nvidia.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Jason Gunthorpe <jgg@nvidia.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-06-16 10:35:49 -05:00
Bill O'Donnell 6c05339c4a fsdax: dedupe: iter two files at the same time
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2192730

commit 0e79e3736d54bb8efbc9fb29cc3b54a132783565
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date:   Thu Dec 1 15:31:41 2022 +0000

    fsdax: dedupe: iter two files at the same time

    The iomap_iter() on a range of one file may loop more than once.  In this
    case, the inner dst_iter can update its iomap but the outer src_iter
    can't.  This may cause the wrong remapping in filesystem.  Let them called
    at the same time.

    Link: https://lkml.kernel.org/r/1669908701-93-1-git-send-email-ruansy.fnst@fujitsu.com
    Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Cc: Alistair Popple <apopple@nvidia.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Jason Gunthorpe <jgg@nvidia.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-06-16 10:35:48 -05:00
Bill O'Donnell 9e619fc946 fsdax,xfs: set the shared flag when file extent is shared
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2192730

Conflicts: line numbering in xfs_iomap.c due to previous out of order patch

commit c6f0b395b2110aa26a134a9a395875b1ec0a5aae
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date:   Thu Dec 1 15:28:54 2022 +0000

    fsdax,xfs: set the shared flag when file extent is shared

    If a dax page is shared, mapread at different offsets can also trigger
    page fault on same dax page.  So, change the flag from "cow" to "shared".
    And get the shared flag from filesystem when read.

    Link: https://lkml.kernel.org/r/1669908538-55-5-git-send-email-ruansy.fnst@fujitsu.com
    Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Cc: Alistair Popple <apopple@nvidia.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Jason Gunthorpe <jgg@nvidia.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-06-16 10:35:48 -05:00
Bill O'Donnell aa9f2f2873 fsdax: zero the edges if source is HOLE or UNWRITTEN
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2192730

commit 708dfad2eb4169324189782edd6d3763237e0489
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date:   Thu Dec 1 15:28:53 2022 +0000

    fsdax: zero the edges if source is HOLE or UNWRITTEN

    If srcmap contains invalid data, such as HOLE and UNWRITTEN, the dest page
    should be zeroed.  Otherwise, since it's a pmem, old data may remains on
    the dest page, the result of CoW will be incorrect.

    The function name is also not easy to understand, rename it to
    "dax_iomap_copy_around()", which means it copies data around the range.

    [akpm@linux-foundation.org: update dax_iomap_copy_around() kerneldoc, per Darrick]
    Link: https://lkml.kernel.org/r/1669973145-318-1-git-send-email-ruansy.fnst@fujitsu.com
    Link: https://lkml.kernel.org/r/1669908538-55-4-git-send-email-ruansy.fnst@fujitsu.com
    Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
    Cc: Alistair Popple <apopple@nvidia.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Jason Gunthorpe <jgg@nvidia.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-06-16 10:35:48 -05:00
Bill O'Donnell 9c02e20c66 fsdax: invalidate pages when CoW
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2192730

commit f80e1668888f34c0764822e74953c997daf2ccdb
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date:   Thu Dec 1 15:28:52 2022 +0000

    fsdax: invalidate pages when CoW

    CoW changes the share state of a dax page, but the share count of the page
    isn't updated.  The next time access this page, it should have been a
    newly accessed, but old association exists.  So, we need to clear the
    share state when CoW happens, in both dax_iomap_rw() and dax_zero_iter().

    Link: https://lkml.kernel.org/r/1669908538-55-3-git-send-email-ruansy.fnst@fujitsu.com
    Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Cc: Alistair Popple <apopple@nvidia.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Jason Gunthorpe <jgg@nvidia.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-06-16 10:35:48 -05:00
Bill O'Donnell 56b6ec502a fsdax: introduce page->share for fsdax in reflink mode
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2192730

commit 169004265860327182ecf92297b25b6271e81e96
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date:   Thu Dec 1 15:28:51 2022 +0000

    fsdax: introduce page->share for fsdax in reflink mode

    Patch series "fsdax,xfs: fix warning messages", v2.

    Many testcases failed in dax+reflink mode with warning message in dmesg.
    Such as generic/051,075,127.  The warning message is like this:
    [  775.509337] ------------[ cut here ]------------
    [  775.509636] WARNING: CPU: 1 PID: 16815 at fs/dax.c:386 dax_insert_entry.cold+0x2e/0x69
    [  775.510151] Modules linked in: auth_rpcgss oid_registry nfsv4 algif_hash af_alg af_packet nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter ip_tables x_tables dax_pmem nd_pmem nd_btt sch_fq_codel configfs xfs libcrc32c fuse
    [  775.524288] CPU: 1 PID: 16815 Comm: fsx Kdump: loaded Tainted: G        W          6.1.0-rc4+ #164 eb34e4ee4200c7cbbb47de2b1892c5a3e027fd6d
    [  775.524904] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.0-3-3 04/01/2014
    [  775.525460] RIP: 0010:dax_insert_entry.cold+0x2e/0x69
    [  775.525797] Code: c7 c7 18 eb e0 81 48 89 4c 24 20 48 89 54 24 10 e8 73 6d ff ff 48 83 7d 18 00 48 8b 54 24 10 48 8b 4c 24 20 0f 84 e3 e9 b9 ff <0f> 0b e9 dc e9 b9 ff 48 c7 c6 a0 20 c3 81 48 c7 c7 f0 ea e0 81 48
    [  775.526708] RSP: 0000:ffffc90001d57b30 EFLAGS: 00010082
    [  775.527042] RAX: 000000000000002a RBX: 0000000000000000 RCX: 0000000000000042
    [  775.527396] RDX: ffffea000a0f6c80 RSI: ffffffff81dfab1b RDI: 00000000ffffffff
    [  775.527819] RBP: ffffea000a0f6c40 R08: 0000000000000000 R09: ffffffff820625e0
    [  775.528241] R10: ffffc90001d579d8 R11: ffffffff820d2628 R12: ffff88815fc98320
    [  775.528598] R13: ffffc90001d57c18 R14: 0000000000000000 R15: 0000000000000001
    [  775.528997] FS:  00007f39fc75d740(0000) GS:ffff88817bc80000(0000) knlGS:0000000000000000
    [  775.529474] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [  775.529800] CR2: 00007f39fc772040 CR3: 0000000107eb6001 CR4: 00000000003706e0
    [  775.530214] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [  775.530592] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [  775.531002] Call Trace:
    [  775.531230]  <TASK>
    [  775.531444]  dax_fault_iter+0x267/0x6c0
    [  775.531719]  dax_iomap_pte_fault+0x198/0x3d0
    [  775.532002]  __xfs_filemap_fault+0x24a/0x2d0 [xfs aa8d25411432b306d9554da38096f4ebb86bdfe7]
    [  775.532603]  __do_fault+0x30/0x1e0
    [  775.532903]  do_fault+0x314/0x6c0
    [  775.533166]  __handle_mm_fault+0x646/0x1250
    [  775.533480]  handle_mm_fault+0xc1/0x230
    [  775.533810]  do_user_addr_fault+0x1ac/0x610
    [  775.534110]  exc_page_fault+0x63/0x140
    [  775.534389]  asm_exc_page_fault+0x22/0x30
    [  775.534678] RIP: 0033:0x7f39fc55820a
    [  775.534950] Code: 00 01 00 00 00 74 99 83 f9 c0 0f 87 7b fe ff ff c5 fe 6f 4e 20 48 29 fe 48 83 c7 3f 49 8d 0c 10 48 83 e7 c0 48 01 fe 48 29 f9 <f3> a4 c4 c1 7e 7f 00 c4 c1 7e 7f 48 20 c5 f8 77 c3 0f 1f 44 00 00
    [  775.535839] RSP: 002b:00007ffc66a08118 EFLAGS: 00010202
    [  775.536157] RAX: 00007f39fc772001 RBX: 0000000000042001 RCX: 00000000000063c1
    [  775.536537] RDX: 0000000000006400 RSI: 00007f39fac42050 RDI: 00007f39fc772040
    [  775.536919] RBP: 0000000000006400 R08: 00007f39fc772001 R09: 0000000000042000
    [  775.537304] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000001
    [  775.537694] R13: 00007f39fc772000 R14: 0000000000006401 R15: 0000000000000003
    [  775.538086]  </TASK>
    [  775.538333] ---[ end trace 0000000000000000 ]---

    This also affects dax+noreflink mode if we run the test after a
    dax+reflink test.  So, the most urgent thing is solving the warning
    messages.

    With these fixes, most warning messages in dax_associate_entry() are gone.
    But honestly, generic/388 will randomly failed with the warning.  The
    case shutdown the xfs when fsstress is running, and do it for many times.
    I think the reason is that dax pages in use are not able to be invalidated
    in time when fs is shutdown.  The next time dax page to be associated, it
    still remains the mapping value set last time.  I'll keep on solving it.

    The warning message in dax_writeback_one() can also be fixed because of
    the dax unshare.

    This patch (of 8):

    fsdax page is used not only when CoW, but also mapread.  To make the it
    easily understood, use 'share' to indicate that the dax page is shared by
    more than one extent.  And add helper functions to use it.

    Also, the flag needs to be renamed to PAGE_MAPPING_DAX_SHARED.

    [ruansy.fnst@fujitsu.com: rename several functions]
      Link: https://lkml.kernel.org/r/1669972991-246-1-git-send-email-ruansy.fnst@fujitsu.com
    [ruansy.fnst@fujitsu.com: v2.2]
      Link: https://lkml.kernel.org/r/1670381359-53-1-git-send-email-ruansy.fnst@fujitsu.com
    Link: https://lkml.kernel.org/r/1669908538-55-1-git-send-email-ruansy.fnst@fujitsu.com
    Link: https://lkml.kernel.org/r/1669908538-55-2-git-send-email-ruansy.fnst@fujitsu.com
    Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
    Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Jason Gunthorpe <jgg@nvidia.com>
    Cc: Alistair Popple <apopple@nvidia.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-06-16 10:35:48 -05:00
Bill O'Donnell 7554f41e28 fsdax: dedup file range to use a compare function
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2192730

commit 6f7db3894ae23eb5d40af4efb404aa0c072a68d2
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date:   Fri Jun 3 13:37:36 2022 +0800

    fsdax: dedup file range to use a compare function

    With dax we cannot deal with readpage() etc.  So, we create a dax
    comparison function which is similar with vfs_dedupe_file_range_compare().
    And introduce dax_remap_file_range_prep() for filesystem use.

    Link: https://lkml.kernel.org/r/20220603053738.1218681-13-ruansy.fnst@fujitsu.com
    Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
    Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Dan Williams <dan.j.wiliams@intel.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
    Cc: Jane Chu <jane.chu@oracle.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Cc: Ritesh Harjani <riteshh@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-06-16 10:35:47 -05:00
Bill O'Donnell 7b46022228 fsdax: add dax_iomap_cow_copy() for dax zero
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2192730

commit 8dbfc76da30472cfa07218a27eaaa538f0a49551
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date:   Fri Jun 3 13:37:35 2022 +0800

    fsdax: add dax_iomap_cow_copy() for dax zero

    Punch hole on a reflinked file needs dax_iomap_cow_copy() too.  Otherwise,
    data in not aligned area will be not correct.  So, add the CoW operation
    for not aligned case in dax_memzero().

    Link: https://lkml.kernel.org/r/20220603053738.1218681-12-ruansy.fnst@fujitsu.com
    Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
    Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Dan Williams <dan.j.wiliams@intel.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
    Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
    Cc: Jane Chu <jane.chu@oracle.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-06-16 10:35:47 -05:00
Bill O'Donnell e0f1c7fcf7 fsdax: replace mmap entry in case of CoW
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2192730

commit e5d6df73302c8d1e7ab2d3555f0faafd0d4b0027
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date:   Fri Jun 3 13:37:34 2022 +0800

    fsdax: replace mmap entry in case of CoW

    Replace the existing entry to the newly allocated one in case of CoW.
    Also, we mark the entry as PAGECACHE_TAG_TOWRITE so writeback marks this
    entry as writeprotected.  This helps us snapshots so new write pagefaults
    after snapshots trigger a CoW.

    Link: https://lkml.kernel.org/r/20220603053738.1218681-11-ruansy.fnst@fujitsu.com
    Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
    Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Dan Williams <dan.j.wiliams@intel.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
    Cc: Jane Chu <jane.chu@oracle.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-06-16 10:35:47 -05:00
Bill O'Donnell 9294943708 fsdax: introduce dax_iomap_cow_copy()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2192730

commit ff17b8df224b98e282ec39a9949a3672fa3dbe93
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date:   Fri Jun 3 13:37:33 2022 +0800

    fsdax: introduce dax_iomap_cow_copy()

    In the case where the iomap is a write operation and iomap is not equal to
    srcmap after iomap_begin, we consider it is a CoW operation.

    In this case, the destination (iomap->addr) points to a newly allocated
    extent.  It is needed to copy the data from srcmap to the extent.  In
    theory, it is better to copy the head and tail ranges which is outside of
    the non-aligned area instead of copying the whole aligned range.  But in
    dax page fault, it will always be an aligned range.  So copy the whole
    range in this case.

    Link: https://lkml.kernel.org/r/20220603053738.1218681-10-ruansy.fnst@fujitsu.com
    Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Dan Williams <dan.j.wiliams@intel.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
    Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
    Cc: Jane Chu <jane.chu@oracle.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Cc: Ritesh Harjani <riteshh@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-06-16 10:35:47 -05:00
Bill O'Donnell 67b45cf94a fsdax: output address in dax_iomap_pfn() and rename it
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2192730

commit e28cd3e50f3041186ba7fe74a9c7443cd8afc2da
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date:   Fri Jun 3 13:37:32 2022 +0800

    fsdax: output address in dax_iomap_pfn() and rename it

    Add address output in dax_iomap_pfn() in order to perform a memcpy() in
    CoW case.  Since this function both output address and pfn, rename it to
    dax_iomap_direct_access().

    [ruansy.fnst@fujitsu.com: initialize `rc', per Dan]
      Link: https://lore.kernel.org/linux-fsdevel/Yp8FUZnO64Qvyx5G@kili/
      Link: https://lkml.kernel.org/r/20220607143837.161174-1-ruansy.fnst@fujitsu.com
    Link: https://lkml.kernel.org/r/20220603053738.1218681-9-ruansy.fnst@fujitsu.com
    Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
    Reviewed-by: Dan Williams <dan.j.williams@intel.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Dan Williams <dan.j.wiliams@intel.com>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
    Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
    Cc: Jane Chu <jane.chu@oracle.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-06-16 10:35:47 -05:00
Bill O'Donnell 038e2dfe8c fsdax: set a CoW flag when associate reflink mappings
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2192730

commit 6061b69b9a550a2ab84e805d0d2315ba6215f112
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date:   Fri Jun 3 13:37:31 2022 +0800

    fsdax: set a CoW flag when associate reflink mappings

    Introduce a PAGE_MAPPING_DAX_COW flag to support association with CoW file
    mappings.  In this case, since the dax-rmap has already took the
    responsibility to look up for shared files by given dax page, the
    page->mapping is no longer to used for rmap but for marking that this dax
    page is shared.  And to make sure disassociation works fine, we use
    page->index as refcount, and clear page->mapping to the initial state when
    page->index is decreased to 0.

    With the help of this new flag, it is able to distinguish normal case and
    CoW case, and keep the warning in normal case.

    Link: https://lkml.kernel.org/r/20220603053738.1218681-8-ruansy.fnst@fujitsu.com
    Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Dan Williams <dan.j.wiliams@intel.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
    Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
    Cc: Jane Chu <jane.chu@oracle.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Cc: Ritesh Harjani <riteshh@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-06-16 10:35:47 -05:00
Bill O'Donnell 7b908ada76 fsdax: introduce dax_lock_mapping_entry()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2192730

commit 2f437effc689ef913fbe5e31110580b4e7cf04be
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date:   Fri Jun 3 13:37:28 2022 +0800

    fsdax: introduce dax_lock_mapping_entry()

    The current dax_lock_page() locks dax entry by obtaining mapping and index
    in page.  To support 1-to-N RMAP in NVDIMM, we need a new function to lock
    a specific dax entry corresponding to this file's mapping,index.  And
    output the page corresponding to the specific dax entry for caller use.

    Link: https://lkml.kernel.org/r/20220603053738.1218681-5-ruansy.fnst@fujitsu.com
    Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Dan Williams <dan.j.wiliams@intel.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
    Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
    Cc: Jane Chu <jane.chu@oracle.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Cc: Ritesh Harjani <riteshh@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2023-06-06 13:41:03 -05:00
Jeff Moyer 17d00f23e3 dax: set did_zero to true when zeroing successfully
Bugzilla: https://bugzilla.redhat.com/2162211

commit f8189d5d5fbf082786fb91c549f5127f23daec09
Author: Kaixu Xia <kaixuxia@tencent.com>
Date:   Thu Jun 30 10:04:18 2022 -0700

    dax: set did_zero to true when zeroing successfully
    
    It is unnecessary to check and set did_zero value in while() loop
    in dax_zero_iter(), we can set did_zero to true only when zeroing
    successfully at last.
    
    Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-03-14 10:55:15 -04:00
Jeff Moyer 5dea7d5ad5 dax: add .recovery_write dax_operation
Bugzilla: https://bugzilla.redhat.com/2162211

commit 047218ec904da19c45c4a70274fc3f818a1fcba1
Author: Jane Chu <jane.chu@oracle.com>
Date:   Fri Apr 22 16:45:06 2022 -0600

    dax: add .recovery_write dax_operation
    
    Introduce dax_recovery_write() operation. The function is used to
    recover a dax range that contains poison. Typical use case is when
    a user process receives a SIGBUS with si_code BUS_MCEERR_AR
    indicating poison(s) in a dax range, in response, the user process
    issues a pwrite() to the page-aligned dax range, thus clears the
    poison and puts valid data in the range.
    
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Jane Chu <jane.chu@oracle.com>
    Link: https://lore.kernel.org/r/20220422224508.440670-6-jane.chu@oracle.com
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-03-14 10:55:14 -04:00
Jeff Moyer dd24bc140d dax: introduce DAX_RECOVERY_WRITE dax access mode
Bugzilla: https://bugzilla.redhat.com/2162211

commit e511c4a3d2a1f64aafc1f5df37a2ffcf7ef91b55
Author: Jane Chu <jane.chu@oracle.com>
Date:   Fri May 13 15:10:58 2022 -0700

    dax: introduce DAX_RECOVERY_WRITE dax access mode
    
    Up till now, dax_direct_access() is used implicitly for normal
    access, but for the purpose of recovery write, dax range with
    poison is requested.  To make the interface clear, introduce
            enum dax_access_mode {
                    DAX_ACCESS,
                    DAX_RECOVERY_WRITE,
            }
    where DAX_ACCESS is used for normal dax access, and
    DAX_RECOVERY_WRITE is used for dax recovery write.
    
    Suggested-by: Dan Williams <dan.j.williams@intel.com>
    Signed-off-by: Jane Chu <jane.chu@oracle.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Cc: Mike Snitzer <snitzer@redhat.com>
    Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
    Link: https://lore.kernel.org/r/165247982851.52965.11024212198889762949.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-03-14 10:55:13 -04:00
Jeff Moyer b3c9399664 fsdax: fix function description
Bugzilla: https://bugzilla.redhat.com/2162211

commit c2e8021a535d3e7cc4f5f1418c4acf97589f8eb5
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date:   Thu Jan 27 20:40:53 2022 +0800

    fsdax: fix function description
    
    The function name has been changed, so the description should be updated
    too.
    
    Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20220127124058.1172422-5-ruansy.fnst@fujitsu.com
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-03-14 10:55:05 -04:00
Jeff Moyer 81e85c3d95 dax: remove the copy_from_iter and copy_to_iter methods
Bugzilla: https://bugzilla.redhat.com/2162211

commit 7ac5360cd4d02cc7e0eaf10867f599e041822f12
Author: Christoph Hellwig <hch@lst.de>
Date:   Wed Dec 15 09:45:08 2021 +0100

    dax: remove the copy_from_iter and copy_to_iter methods
    
    These methods indirect the actual DAX read/write path.  In the end pmem
    uses magic flush and mc safe variants and fuse and dcssblk use plain ones
    while device mapper picks redirects to the underlying device.
    
    Add set_dax_nocache() and set_dax_nomc() APIs to control which copy
    routines are used to remove indirect call from the read/write fast path
    as well as a lot of boilerplate code.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Vivek Goyal <vgoyal@redhat.com> [virtiofs]
    Link: https://lore.kernel.org/r/20211215084508.435401-5-hch@lst.de
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-03-14 10:54:21 -04:00
Jeff Moyer 20ce3bbee1 fsdax: shift partition offset handling into the file systems
Bugzilla: https://bugzilla.redhat.com/2162211
Conflicts: dropped erofs changes.

commit de2051147771017a61b62c02fd4e883c9b07712d
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Nov 29 11:22:00 2021 +0100

    fsdax: shift partition offset handling into the file systems
    
    Remove the last user of ->bdev in dax.c by requiring the file system to
    pass in an address that already includes the DAX offset.  As part of the
    only set ->bdev or ->daxdev when actually required in the ->iomap_begin
    methods.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> [erofs]
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Link: https://lore.kernel.org/r/20211129102203.2243509-27-hch@lst.de
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-03-14 10:54:19 -04:00
Jeff Moyer 6be36f8a2f iomap: add a IOMAP_DAX flag
Bugzilla: https://bugzilla.redhat.com/2162211
Conflicts: Upstream commit 304a68b9c63b ("xfs: use iomap_valid method
  to detect stale cached iomaps") was backported before this, leading
  to some minor conflicts.

commit 952da06375c8f3aa58474fff718d9ae8442531b9
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Nov 29 11:21:58 2021 +0100

    iomap: add a IOMAP_DAX flag
    
    Add a flag so that the file system can easily detect DAX operations
    based just on the iomap operation requested instead of looking at
    inode state using IS_DAX.  This will be needed to apply the to be
    added partition offset only for operations that actually use DAX,
    but not things like fiemap that are based on the block device.
    In the long run it should also allow turning the bdev, dax_dev
    and inline_data into a union.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Dan Williams <dan.j.williams@intel.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Link: https://lore.kernel.org/r/20211129102203.2243509-25-hch@lst.de
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-03-14 10:54:18 -04:00
Jeff Moyer 26dec3a599 fsdax: decouple zeroing from the iomap buffered I/O code
Bugzilla: https://bugzilla.redhat.com/2162211
Conflicts: Differences in the iomap code due to patches backported out
  of order.  Specifically, commit d7b64041164c ("iomap: write iomap
  validity checks").

commit c6f40468657d16e4010ef84bf32a761feb3469ea
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Nov 29 11:21:52 2021 +0100

    fsdax: decouple zeroing from the iomap buffered I/O code
    
    Unshare the DAX and iomap buffered I/O page zeroing code.  This code
    previously did a IS_DAX check deep inside the iomap code, which in
    fact was the only DAX check in the code.  Instead move these checks
    into the callers.  Most callers already have DAX special casing anyway
    and XFS will need it for reflink support as well.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Dan Williams <dan.j.williams@intel.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Link: https://lore.kernel.org/r/20211129102203.2243509-19-hch@lst.de
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-03-14 10:52:52 -04:00
Jeff Moyer dad80580c3 fsdax: factor out a dax_memzero helper
Bugzilla: https://bugzilla.redhat.com/2162211

commit e5c71954ca11df04d258a663a8a15262be0e17f6
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Nov 29 11:21:51 2021 +0100

    fsdax: factor out a dax_memzero helper
    
    Factor out a helper for the "manual" zeroing of a DAX range to clean
    up dax_iomap_zero a lot.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Dan Williams <dan.j.williams@intel.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Link: https://lore.kernel.org/r/20211129102203.2243509-18-hch@lst.de
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-03-09 03:59:06 -05:00
Jeff Moyer 4dd7deb163 fsdax: simplify the offset check in dax_iomap_zero
Bugzilla: https://bugzilla.redhat.com/2162211

commit 4a2d7d5950507a27e3074e4a29dc20720235f811
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Nov 29 11:21:50 2021 +0100

    fsdax: simplify the offset check in dax_iomap_zero
    
    The file relative offset must have the same alignment as the storage
    offset, so use that and get rid of the call to iomap_sector.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Dan Williams <dan.j.williams@intel.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Link: https://lore.kernel.org/r/20211129102203.2243509-17-hch@lst.de
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-03-09 03:58:06 -05:00
Jeff Moyer 7492ba9a6b fsdax: simplify the pgoff calculation
Bugzilla: https://bugzilla.redhat.com/2162211

commit 60696eb26a37ab0199f7833ddbc1b75138c36d16
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Nov 29 11:21:48 2021 +0100

    fsdax: simplify the pgoff calculation
    
    Replace the two steps of dax_iomap_sector and bdev_dax_pgoff with a
    single dax_iomap_pgoff helper that avoids lots of cumbersome sector
    conversions.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Dan Williams <dan.j.williams@intel.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Link: https://lore.kernel.org/r/20211129102203.2243509-15-hch@lst.de
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-03-09 03:56:06 -05:00
Jeff Moyer aa02d6e757 fsdax: use a saner calling convention for copy_cow_page_dax
Bugzilla: https://bugzilla.redhat.com/2162211

commit 429f8de70d9872c5ca9b3914b3c4db5659779331
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Nov 29 11:21:47 2021 +0100

    fsdax: use a saner calling convention for copy_cow_page_dax
    
    Just pass the vm_fault and iomap_iter structures, and figure out the rest
    locally.  Note that this requires moving dax_iomap_sector up in the file.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Dan Williams <dan.j.williams@intel.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Link: https://lore.kernel.org/r/20211129102203.2243509-14-hch@lst.de
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-03-09 03:55:06 -05:00
Jeff Moyer eb69e517be fsdax: remove a pointless __force cast in copy_cow_page_dax
Bugzilla: https://bugzilla.redhat.com/2162211

commit 9dc2f9cdc63e7db82b6b2ec17894ca1b254f5e5d
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Nov 29 11:21:46 2021 +0100

    fsdax: remove a pointless __force cast in copy_cow_page_dax
    
    Despite its name copy_user_page expected kernel addresses, which is what
    we already have.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Dan Williams <dan.j.williams@intel.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Link: https://lore.kernel.org/r/20211129102203.2243509-13-hch@lst.de
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-03-09 03:54:06 -05:00
Nico Pache 33a038780e dax: fix missing writeprotect the pte entry
commit 06083a0921fd5939223a8bf30e83d5f483d348dc
Author: Muchun Song <songmuchun@bytedance.com>
Date:   Thu Apr 28 23:16:10 2022 -0700

    dax: fix missing writeprotect the pte entry

    Currently dax_mapping_entry_mkclean() fails to clean and write protect the
    pte entry within a DAX PMD entry during an *sync operation.  This can
    result in data loss in the following sequence:

      1) process A mmap write to DAX PMD, dirtying PMD radix tree entry and
         making the pmd entry dirty and writeable.
      2) process B mmap with the @offset (e.g. 4K) and @length (e.g. 4K)
         write to the same file, dirtying PMD radix tree entry (already
         done in 1)) and making the pte entry dirty and writeable.
      3) fsync, flushing out PMD data and cleaning the radix tree entry. We
         currently fail to mark the pte entry as clean and write protected
         since the vma of process B is not covered in dax_entry_mkclean().
      4) process B writes to the pte. These don't cause any page faults since
         the pte entry is dirty and writeable. The radix tree entry remains
         clean.
      5) fsync, which fails to flush the dirty PMD data because the radix tree
         entry was clean.
      6) crash - dirty data that should have been fsync'd as part of 5) could
         still have been in the processor cache, and is lost.

    Just to use pfn_mkclean_range() to clean the pfns to fix this issue.

    Link: https://lkml.kernel.org/r/20220403053957.10770-6-songmuchun@bytedance.com
    Fixes: 4b4bb46d00 ("dax: clear dirty entry tags on cache flush")
    Signed-off-by: Muchun Song <songmuchun@bytedance.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Cc: Alistair Popple <apopple@nvidia.com>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Jan Kara <jack@suse.cz>
    Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Ralph Campbell <rcampbell@nvidia.com>
    Cc: Ross Zwisler <zwisler@kernel.org>
    Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
    Cc: Xiyu Yang <xiyuyang19@fudan.edu.cn>
    Cc: Yang Shi <shy828301@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2089498
Signed-off-by: Nico Pache <npache@redhat.com>
2022-11-08 10:11:36 -07:00
Nico Pache d05a1e8b52 dax: fix cache flush on PMD-mapped pages
commit e583b5c472bd23d450e06f148dc1f37be74f7666
Author: Muchun Song <songmuchun@bytedance.com>
Date:   Thu Apr 28 23:16:09 2022 -0700

    dax: fix cache flush on PMD-mapped pages

    The flush_cache_page() only remove a PAGE_SIZE sized range from the cache.
    However, it does not cover the full pages in a THP except a head page.
    Replace it with flush_cache_range() to fix this issue.  This is just a
    documentation issue with the respect to properly documenting the expected
    usage of cache flushing before modifying the pmd.  However, in practice
    this is not a problem due to the fact that DAX is not available on
    architectures with virtually indexed caches per:

      commit d92576f116 ("dax: does not work correctly with virtual aliasing caches")

    Link: https://lkml.kernel.org/r/20220403053957.10770-3-songmuchun@bytedance.com
    Fixes: f729c8c9b2 ("dax: wrprotect pmd_t in dax_mapping_entry_mkclean")
    Signed-off-by: Muchun Song <songmuchun@bytedance.com>
    Reviewed-by: Dan Williams <dan.j.williams@intel.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Cc: Alistair Popple <apopple@nvidia.com>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Jan Kara <jack@suse.cz>
    Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Ralph Campbell <rcampbell@nvidia.com>
    Cc: Ross Zwisler <zwisler@kernel.org>
    Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
    Cc: Xiyu Yang <xiyuyang19@fudan.edu.cn>
    Cc: Yang Shi <shy828301@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2089498
Signed-off-by: Nico Pache <npache@redhat.com>
2022-11-08 10:11:36 -07:00
Carlos Maiolino 4fa7e1e594 fsdax: switch the fault handlers to use iomap_iter
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2130933

Avoid the open coded calls to ->iomap_begin and ->iomap_end and call
iomap_iter instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
(cherry picked from commit 65dd814a6187ff46e33718d8eb76244e027837a3)

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2022-10-21 15:46:22 +02:00
Carlos Maiolino 59f1326acf fsdax: factor out a dax_fault_actor() helper
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2130933

The core logic in the two dax page fault functions is similar. So, move
the logic into a common helper function. Also, to facilitate the
addition of new features, such as CoW, switch-case is no longer used to
handle different iomap types.

Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
(cherry picked from commit c2436190e492b243235262fc080a2c3189021be9)

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2022-10-21 15:46:22 +02:00
Carlos Maiolino c65e1912b2 fsdax: factor out helpers to simplify the dax fault code
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2130933

The dax page fault code is too long and a bit difficult to read. And it
is hard to understand when we trying to add new features. Some of the
PTE/PMD codes have similar logic. So, factor out helper functions to
simplify the code.

Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[hch: minor cleanups]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
(cherry picked from commit 55f81639a7152848f204f9af3f9b1a14a5944be1)

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2022-10-21 15:46:22 +02:00
Carlos Maiolino 560b8b6916 fsdax: Fix infinite loop in dax_iomap_rw()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2130933

I got an infinite loop and a WARNING report when executing a tail command
in virtiofs.

  WARNING: CPU: 10 PID: 964 at fs/iomap/iter.c:34 iomap_iter+0x3a2/0x3d0
  Modules linked in:
  CPU: 10 PID: 964 Comm: tail Not tainted 5.19.0-rc7
  Call Trace:
  <TASK>
  dax_iomap_rw+0xea/0x620
  ? __this_cpu_preempt_check+0x13/0x20
  fuse_dax_read_iter+0x47/0x80
  fuse_file_read_iter+0xae/0xd0
  new_sync_read+0xfe/0x180
  ? 0xffffffff81000000
  vfs_read+0x14d/0x1a0
  ksys_read+0x6d/0xf0
  __x64_sys_read+0x1a/0x20
  do_syscall_64+0x3b/0x90
  entry_SYSCALL_64_after_hwframe+0x63/0xcd

The tail command will call read() with a count of 0. In this case,
iomap_iter() will report this WARNING, and always return 1 which casuing
the infinite loop in dax_iomap_rw().

Fixing by checking count whether is 0 in dax_iomap_rw().

Fixes: ca289e0b95af ("fsdax: switch dax_iomap_rw to use iomap_iter")
Signed-off-by: Li Jinlin <lijinlin3@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20220725032050.3873372-1-lijinlin3@huawei.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
(cherry picked from commit 17d9c15c9b9e7fb285f7ac5367dfb5f00ff575e3)

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2022-10-21 15:46:22 +02:00
Carlos Maiolino a82b6ef911 fsdax: switch dax_iomap_rw to use iomap_iter
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2130933

Switch the dax_iomap_rw implementation to use iomap_iter.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
(cherry picked from commit ca289e0b95afa973d204c77a4ad5c37e06145fbf)

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2022-10-21 15:46:21 +02:00
Carlos Maiolino 242434f5be fsdax: mark the iomap argument to dax_iomap_sector as const
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2130933

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
(cherry picked from commit 7e4f4b2d689d959b03cb07dfbdb97b9696cb1076)

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
2022-10-03 15:41:31 +02:00
Ming Lei dd69c0b377 block: remove genhd.h
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917
Conflicts: drop changes on fs/ksmbd/vfs.c which doesn't exist on cs9

commit 322cbb50de711814c42fb088f6d31901502c711a
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Jan 24 10:39:13 2022 +0100

    block: remove genhd.h

    There is no good reason to keep genhd.h separate from the main blkdev.h
    header that includes it.  So fold the contents of genhd.h into blkdev.h
    and remove genhd.h entirely.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
    Link: https://lore.kernel.org/r/20220124093913.742411-4-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2022-06-22 08:53:32 +08:00
Dan Williams 96dcb97d0a Merge branch 'for-5.14/dax' into libnvdimm-fixes
Pick up some small dax cleanups that make some of Ira's follow on work
easier.
2021-08-11 12:04:43 -07:00