Commit Graph

174 Commits

Author SHA1 Message Date
Benjamin Coddington 29b928da8e pnfs/flexfiles: retry getting layout segment for reads
JIRA: https://issues.redhat.com/browse/RHEL-87767

commit eb3fabde15bccdf34f1c9b35a83aa4c0dacbb4ca
Author: Mike Snitzer <snitzer@kernel.org>
Date:   Thu Jan 16 20:05:39 2025 -0500

    pnfs/flexfiles: retry getting layout segment for reads

    If ff_layout_pg_get_read()'s attempt to get a layout segment results
    in -EAGAIN have ff_layout_pg_init_read() retry it after sleeping.

    If "softerr" mount is used, use 'io_maxretrans' to limit the number of
    attempts to get a layout segment.

    This fixes a long-standing issue of O_DIRECT reads failing with
    -EAGAIN (11) when using flexfiles Client Side Mirroring (CSM).

    Cc: stable@vger.kernel.org
    Signed-off-by: Mike Snitzer <snitzer@kernel.org>
    Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
2025-04-17 14:48:58 -04:00
Scott Mayhew 92654b1545 NFSv4/pNFS: Do layout state recovery upon reboot
JIRA: https://issues.redhat.com/browse/RHEL-59704

commit 5468fc8298a97ca504aca8ac0c7b9e6fd7c1b588
Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date:   Thu Jun 13 01:00:55 2024 -0400

    NFSv4/pNFS: Do layout state recovery upon reboot

    Some pNFS implementations, such as flexible files, want the client to
    send the layout stats and layout errors that may have incurred while the
    metadata server was booting. To do so, the client sends a layoutreturn
    with an all-zero stateid while the server is in grace during reboot
    recovery.

    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
    Reviewed-by: Jeff Layton <jlayton@kernel.org>
    Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
2024-11-05 07:34:01 -05:00
Scott Mayhew 0eac390913 pNFS: rework pnfs_generic_pg_check_layout to check IO range
JIRA: https://issues.redhat.com/browse/RHEL-59704

commit a01b077a8743b2ed329c587cf4bbe49682da243f
Author: Olga Kornievskaia <kolga@netapp.com>
Date:   Fri May 10 15:07:43 2024 -0400

    pNFS: rework pnfs_generic_pg_check_layout to check IO range

    All callers of pnfs_generic_pg_check_layout() also want to do a call to
    check that the layout's range covers the IO range. Merge the functionality
    of the pnfs_generic_pg_check_range() into that of
    pnfs_generic_pg_check_layout().

    Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
    Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
2024-11-05 07:33:57 -05:00
Benjamin Coddington b4c1597bfd nfs: fix panic when nfs4_ff_layout_prepare_ds() fails
JIRA: https://issues.redhat.com/browse/RHEL-34875

commit 719fcafe07c12646691bd62d7f8d94d657fa0766
Author: Josef Bacik <josef@toxicpanda.com>
Date:   Mon Mar 11 11:11:53 2024 -0400

    nfs: fix panic when nfs4_ff_layout_prepare_ds() fails

    We've been seeing the following panic in production

    BUG: kernel NULL pointer dereference, address: 0000000000000065
    PGD 2f485f067 P4D 2f485f067 PUD 2cc5d8067 PMD 0
    RIP: 0010:ff_layout_cancel_io+0x3a/0x90 [nfs_layout_flexfiles]
    Call Trace:
     <TASK>
     ? __die+0x78/0xc0
     ? page_fault_oops+0x286/0x380
     ? __rpc_execute+0x2c3/0x470 [sunrpc]
     ? rpc_new_task+0x42/0x1c0 [sunrpc]
     ? exc_page_fault+0x5d/0x110
     ? asm_exc_page_fault+0x22/0x30
     ? ff_layout_free_layoutreturn+0x110/0x110 [nfs_layout_flexfiles]
     ? ff_layout_cancel_io+0x3a/0x90 [nfs_layout_flexfiles]
     ? ff_layout_cancel_io+0x6f/0x90 [nfs_layout_flexfiles]
     pnfs_mark_matching_lsegs_return+0x1b0/0x360 [nfsv4]
     pnfs_error_mark_layout_for_return+0x9e/0x110 [nfsv4]
     ? ff_layout_send_layouterror+0x50/0x160 [nfs_layout_flexfiles]
     nfs4_ff_layout_prepare_ds+0x11f/0x290 [nfs_layout_flexfiles]
     ff_layout_pg_init_write+0xf0/0x1f0 [nfs_layout_flexfiles]
     __nfs_pageio_add_request+0x154/0x6c0 [nfs]
     nfs_pageio_add_request+0x26b/0x380 [nfs]
     nfs_do_writepage+0x111/0x1e0 [nfs]
     nfs_writepages_callback+0xf/0x30 [nfs]
     write_cache_pages+0x17f/0x380
     ? nfs_pageio_init_write+0x50/0x50 [nfs]
     ? nfs_writepages+0x6d/0x210 [nfs]
     ? nfs_writepages+0x6d/0x210 [nfs]
     nfs_writepages+0x125/0x210 [nfs]
     do_writepages+0x67/0x220
     ? generic_perform_write+0x14b/0x210
     filemap_fdatawrite_wbc+0x5b/0x80
     file_write_and_wait_range+0x6d/0xc0
     nfs_file_fsync+0x81/0x170 [nfs]
     ? nfs_file_mmap+0x60/0x60 [nfs]
     __x64_sys_fsync+0x53/0x90
     do_syscall_64+0x3d/0x90
     entry_SYSCALL_64_after_hwframe+0x46/0xb0

    Inspecting the core with drgn I was able to pull this

      >>> prog.crashed_thread().stack_trace()[0]
      #0 at 0xffffffffa079657a (ff_layout_cancel_io+0x3a/0x84) in ff_layout_cancel_io at fs/nfs/flexfilelayout/flexfilelayout.c:2021:27
      >>> prog.crashed_thread().stack_trace()[0]['idx']
      (u32)1
      >>> prog.crashed_thread().stack_trace()[0]['flseg'].mirror_array[1].mirror_ds
      (struct nfs4_ff_layout_ds *)0xffffffffffffffed

    This is clear from the stack trace, we call nfs4_ff_layout_prepare_ds()
    which could error out initializing the mirror_ds, and then we go to
    clean it all up and our check is only for if (!mirror->mirror_ds).  This
    is inconsistent with the rest of the users of mirror_ds, which have

      if (IS_ERR_OR_NULL(mirror_ds))

    to keep from tripping over this exact scenario.  Fix this up in
    ff_layout_cancel_io() to make sure we don't panic when we get an error.
    I also spot checked all the other instances of checking mirror_ds and we
    appear to be doing the correct checks everywhere, only unconditionally
    dereferencing mirror_ds when we know it would be valid.

    Signed-off-by: Josef Bacik <josef@toxicpanda.com>
    Fixes: b739a5bd9d9f ("NFSv4/flexfiles: Cancel I/O if the layout is recalled or revoked")
    Reviewed-by: Jeff Layton <jlayton@kernel.org>
    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
2024-06-27 08:14:47 -04:00
Jeffrey Layton 792ff3ae42 pNFS/flexfiles: Check the layout validity in ff_layout_mirror_prepare_stats
JIRA: https://issues.redhat.com/browse/RHEL-7936

commit e1c6cfbb3bd1377e2ddcbe06cf8fb1ec323ea7d3
Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date:   Sun Oct 8 14:28:46 2023 -0400

    pNFS/flexfiles: Check the layout validity in ff_layout_mirror_prepare_stats

    Ensure that we check the layout pointer and validity after dereferencing
    it in ff_layout_mirror_prepare_stats.

    Fixes: 08e2e5bc6c ("pNFS/flexfiles: Clean up layoutstats")
    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
    Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

Signed-off-by: Jeffrey Layton <jlayton@redhat.com>
2023-12-02 05:12:43 -05:00
Jeffrey Layton 29cf1f1ec6 NFS/pNFS: Report EINVAL errors from connect() to the server
JIRA: https://issues.redhat.com/browse/RHEL-7936

commit dd7d7ee3ba2a70d12d02defb478790cf57d5b87b
Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date:   Mon Sep 4 12:43:58 2023 -0400

    NFS/pNFS: Report EINVAL errors from connect() to the server

    With IPv6, connect() can occasionally return EINVAL if a route is
    unavailable. If this happens during I/O to a data server, we want to
    report it using LAYOUTERROR as an inability to connect.

    Fixes: dd52128afd ("NFSv4.1/pnfs Ensure flexfiles reports all connection related errors")
    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
    Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

Signed-off-by: Jeffrey Layton <jlayton@redhat.com>
2023-12-02 05:12:42 -05:00
Scott Mayhew 7be952d943 NFSv4/flexfiles: Cancel I/O if the layout is recalled or revoked
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2183621

commit b739a5bd9d9f18cc69dced8db128ef7206e000cd
Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date:   Wed Oct 5 15:57:38 2022 -0400

    NFSv4/flexfiles: Cancel I/O if the layout is recalled or revoked

    If the layout is recalled or revoked, we want to cancel I/O as quickly
    as possible so that we can return the layout.

    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
    Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
2023-05-08 10:41:16 -04:00
Benjamin Coddington 9a6cb7b207 NFSv4/pNFS: Always return layout stats on layout return for flexfiles
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2154879

commit 90377158bd2d2acd20e6131e84c234d715b7aa42
Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date:   Wed Aug 24 16:56:48 2022 -0400

    NFSv4/pNFS: Always return layout stats on layout return for flexfiles

    We want to ensure that the server never misses the layout stats when
    we're closing the file, so that it knows whether or not to update its
    internal state. Otherwise, if we were racing with a layout stat, we
    might cause the server to invalidate its layout before the layout stat
    got processed.

    Fixes: 06946c6a3d ("pNFS/flexfiles: Only send layoutstats updates for mirrors that were updated")
    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
    Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
2023-03-02 13:55:26 -05:00
Benjamin Coddington 1ee7fa7843 pNFS/flexfiles: Report RDMA connection errors to the server
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2094072

commit 7836d75467e9d214bdf5c693b32721de729a6e38
Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date:   Wed May 18 16:09:06 2022 -0400

    pNFS/flexfiles: Report RDMA connection errors to the server

    The RPC/RDMA driver will return -EPROTO and -ENODEV as connection errors
    under certain circumstances. Make sure that we handle them and report
    them to the server. If not, we can end up cycling forever in a
    LAYOUTGET/LAYOUTRETURN loop.

    Fixes: a12f996d34 ("NFSv4/pNFS: Use connections to a DS that are all of the same protocol family")
    Cc: stable@vger.kernel.org # 5.11.x
    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
2022-09-26 09:34:27 -04:00
Benjamin Coddington 41f655bec4 pNFS/flexfiles: Ensure pNFS allocation modes are consistent with nfsiod
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2094072

commit 3e5f151e94c190c31a240d9458677caab4f6c44e
Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date:   Mon Mar 21 15:34:22 2022 -0400

    pNFS/flexfiles: Ensure pNFS allocation modes are consistent with nfsiod

    Ensure that pNFS flexfile allocations in rpciod/nfsiod callbacks can
    fail in low memory mode, so that the threads don't block and loop
    forever.

    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
2022-09-26 09:34:21 -04:00
Benjamin Coddington 12bd55d839 NFSv4/flexfiles: Convert GFP_NOFS to GFP_KERNEL
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2094072

commit 61345a42a2ff8b0539faf545a5ce17b6c339db38
Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date:   Sat Jan 29 14:16:55 2022 -0500

    NFSv4/flexfiles: Convert GFP_NOFS to GFP_KERNEL

    Assume that the higher layers will have set memalloc_nofs_save/restore
    as appropriate.

    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
2022-09-26 09:34:11 -04:00
Scott Mayhew 8a51e31712 SUNRPC: Trace calls to .rpc_call_done
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2049200

commit b40887e10dcacc5e8ae3c1a99dcba20877c4831b
Author: Chuck Lever <chuck.lever@oracle.com>
Date:   Sat Oct 16 18:02:57 2021 -0400

    SUNRPC: Trace calls to .rpc_call_done

    Introduce a single tracepoint that can replace simple dprintk call
    sites in upper layer "rpc_call_done" callbacks. Example:

       kworker/u24:2-1254  [001]   771.026677: rpc_stats_latency:    task:00000001@00000002 xid=0x16a6f3c0 rpcbindv2 GETPORT backlog=446 rtt=101 execute=555
       kworker/u24:2-1254  [001]   771.026677: rpc_task_call_done:   task:00000001@00000002 flags=ASYNC|DYNAMIC|SOFT|SOFTCONN|SENT runstate=RUNNING|ACTIVE status=0 action=rpcb_getport_done
       kworker/u24:2-1254  [001]   771.026678: rpcb_setport:         task:00000001@00000002 status=0 port=20048

    Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
2022-02-03 11:44:01 -05:00
Nikola Livic ed34695e15 pNFS/flexfiles: fix incorrect size check in decode_nfs_fh()
We (adam zabrocki, alexander matrosov, alexander tereshkin, maksym
bazalii) observed the check:

	if (fh->size > sizeof(struct nfs_fh))

should not use the size of the nfs_fh struct which includes an extra two
bytes from the size field.

struct nfs_fh {
	unsigned short         size;
	unsigned char          data[NFS_MAXFHSIZE];
}

but should determine the size from data[NFS_MAXFHSIZE] so the memcpy
will not write 2 bytes beyond destination.  The proposed fix is to
compare against the NFS_MAXFHSIZE directly, as is done elsewhere in fs
code base.

Fixes: d67ae825a5 ("pnfs/flexfiles: Add the FlexFile Layout Driver")
Signed-off-by: Nikola Livic <nlivic@gmail.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-14 09:36:29 -04:00
Linus Torvalds 74f602dc96 NFS client updates for Linux 5.11
Highlights include:
 
 Features:
 - NFSv3: Add emulation of lookupp() to improve open_by_filehandle()
   support.
 - A series of patches to improve readdir performance, particularly with
   large directories.
 - Basic support for using NFS/RDMA with the pNFS files and flexfiles
   drivers.
 - Micro-optimisations for RDMA.
 - RDMA tracing improvements.
 
 Bugfixes:
 - Fix a long standing bug with xs_read_xdr_buf() when receiving partial
   pages (Dan Aloni).
 - Various fixes for getxattr and listxattr, when used over non-TCP
   transports.
 - Fixes for containerised NFS from Sargun Dhillon.
 - switch nfsiod to be an UNBOUND workqueue (Neil Brown).
 - READDIR should not ask for security label information if there is no
   LSM policy. (Olga Kornievskaia)
 - Avoid using interval-based rebinding with TCP in lockd (Calum Mackay).
 - A series of RPC and NFS layer fixes to support the NFSv4.2 READ_PLUS code.
 - A couple of fixes for pnfs/flexfiles read failover
 
 Cleanups:
 - Various cleanups for the SUNRPC xdr code in conjunction with the
   READ_PLUS fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAl/aiaIACgkQZwvnipYK
 APIOihAAvONscxrFSaGRh2ICNv9I/zXW/A5+R3qnkESPVLTqTPJVphoN7FlINAr1
 B74pg6n4T4viycbvsogU2+kHrlJZO7B8lTkJL7ynm9Wgyw8+2Ga4QEn1bsAoqmuY
 b91p/+LfOLKrYeeojoH31PC73uOYYG1WHXJhjq0l9b5CTgThWpj6O3gDaFEbFvmz
 A7V3yqSp04sV70YxUhwelBHZ5BXdiXIKsPnIwvXXHuY7IcamrE4EA3wGCwtxkBnu
 4dwbOtRXURNSev0r3n6FsH4wZl+/nvp9UpnGdPtVv94F1zm2JKLwkhoJejS/vpjq
 eyKc7ZXBQ0uHbTWI2Yj1YjA61VIUO0R0EDuyTAnRKDeaarID42n5kMG7J8cIglZR
 jQfyx99xm0eSrdwxC09tcRL/lBzYcOfc6pJo5P9BtaFtRvbp9iFIHuFKlrXbULd4
 WrZzDMhiKVYGSTcTpfQyVoK2rCvn6W1Ida4iYeI0gkJ1v9X90UhbtJOyggn/bxyL
 DV/Qy40+l48n7CZfPU2eDv4WXqjKGRibpDoWMBLwUH20dDEX6kKYv3BfApFYGqyO
 /GTPAFUZarCy8BENvzZv/Jb9mt5pDQM5p9ZXpdUOhydLMMA+pauaT/Gr+pAHPIPx
 MPj546Gh2cEaT883xvRrJmQTG0nw/WscPNcHaJcgL5oYltmuwck=
 =IKWG
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Features:

   - NFSv3: Add emulation of lookupp() to improve open_by_filehandle()
     support

   - A series of patches to improve readdir performance, particularly
     with large directories

   - Basic support for using NFS/RDMA with the pNFS files and flexfiles
     drivers

   - Micro-optimisations for RDMA

   - RDMA tracing improvements

  Bugfixes:

   - Fix a long standing bug with xs_read_xdr_buf() when receiving
     partial pages (Dan Aloni)

   - Various fixes for getxattr and listxattr, when used over non-TCP
     transports

   - Fixes for containerised NFS from Sargun Dhillon

   - switch nfsiod to be an UNBOUND workqueue (Neil Brown)

   - READDIR should not ask for security label information if there is
     no LSM policy (Olga Kornievskaia)

   - Avoid using interval-based rebinding with TCP in lockd (Calum
     Mackay)

   - A series of RPC and NFS layer fixes to support the NFSv4.2
     READ_PLUS code

   - A couple of fixes for pnfs/flexfiles read failover

  Cleanups:

   - Various cleanups for the SUNRPC xdr code in conjunction with the
     READ_PLUS fixes"

* tag 'nfs-for-5.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (90 commits)
  NFS/pNFS: Fix a typo in ff_layout_resend_pnfs_read()
  pNFS/flexfiles: Avoid spurious layout returns in ff_layout_choose_ds_for_read
  NFSv4/pnfs: Add tracing for the deviceid cache
  fs/lockd: convert comma to semicolon
  NFSv4.2: fix error return on memory allocation failure
  NFSv4.2/pnfs: Don't use READ_PLUS with pNFS yet
  NFSv4.2: Deal with potential READ_PLUS data extent buffer overflow
  NFSv4.2: Don't error when exiting early on a READ_PLUS buffer overflow
  NFSv4.2: Handle hole lengths that exceed the READ_PLUS read buffer
  NFSv4.2: decode_read_plus_hole() needs to check the extent offset
  NFSv4.2: decode_read_plus_data() must skip padding after data segment
  NFSv4.2: Ensure we always reset the result->count in decode_read_plus()
  SUNRPC: When expanding the buffer, we may need grow the sparse pages
  SUNRPC: Cleanup - constify a number of xdr_buf helpers
  SUNRPC: Clean up open coded setting of the xdr_stream 'nwords' field
  SUNRPC: _copy_to/from_pages() now check for zero length
  SUNRPC: Cleanup xdr_shrink_bufhead()
  SUNRPC: Fix xdr_expand_hole()
  SUNRPC: Fixes for xdr_align_data()
  SUNRPC: _shift_data_left/right_pages should check the shift length
  ...
2020-12-17 12:15:03 -08:00
Trond Myklebust 52104f274e NFS/pNFS: Fix a typo in ff_layout_resend_pnfs_read()
Don't bump the index twice.

Fixes: 563c53e73b ("NFS: Fix flexfiles read failover")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-12-16 17:25:24 -05:00
Trond Myklebust 9bfffea352 pNFS/flexfiles: Avoid spurious layout returns in ff_layout_choose_ds_for_read
The callers of ff_layout_choose_ds_for_read() should decide whether or
not they want to return the layout on error. Sometimes, we may just want
to retry from the beginning.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-12-16 17:25:24 -05:00
Linus Torvalds 1a50ede2b3 Highlights:
- Improve support for re-exporting NFS mounts
 - Replace NFSv4 XDR decoding C macros with xdr_stream helpers
 - Support for multiple RPC/RDMA chunks per RPC transaction
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAl/Q4dIACgkQM2qzM29m
 f5fInw//eDrmXBEhxbzcgeqNilGU5Qkn4INJtAcOGwPcw5Kjp4UVNGFpZNPqIDSf
 FP0Yw0d/rW7UggwCviPcs/adLTasU9skq1jgAv8d0ig4DtPbeqFo6BvbY+G2JxVF
 EfTeHzr6w6er8HRqyuLN4hjm1rQIpQlDHaYU4QcMs4fjPVv88eYLiwnYGYf3X46i
 vBYstu1IRxHhg2x4O833xmiL6VbkZDQoWwDjGICylxUBcNUtAmq/sETjTa4JVEJj
 4vgXdcJmAFjNgAOrmoR3DISsr9mvCvKN9g3C0+hHiRERTGEon//HzvscWH74wT48
 o0LUW0ZWgpmunTcmiSNeeiHNsUXJyy3A/xyEdteqqnvSxulxlqkQzb15Eb+92+6n
 BHGT/sOz1zz+/l9NCpdeEl5AkSA9plV8Iqd/kzwFwe1KwHMjldeMw/mhMut8EM2j
 b54EMsp40ipITAwBHvcygCXiWAn/mPex6bCr17Dijo6MsNLsyd+cDsazntbNzwz3
 RMGMf2TPOi8tWswrTUS9J5xKk5LAEWX/6Z/hTA1YlsB3PfrhXO97ztrytxvoO/bp
 M0NREA+NNMn/JyyL8FT3ID5peaLVHhA1GHw9CcUw3C7OVzmsEg29D4zNo02dF1TC
 LIyekp0kbSGGY1jLOeMLsa6Jr+2+40CcctsooVkRA+3rN0tJQvw=
 =1uP3
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.11' of git://git.linux-nfs.org/projects/cel/cel-2.6

Pull nfsd updates from Chuck Lever:
 "Several substantial changes this time around:

   - Previously, exporting an NFS mount via NFSD was considered to be an
     unsupported feature. With v5.11, the community has attempted to
     make re-exporting a first-class feature of NFSD.

     This would enable the Linux in-kernel NFS server to be used as an
     intermediate cache for a remotely-located primary NFS server, for
     example, even with other NFS server implementations, like a NetApp
     filer, as the primary.

   - A short series of patches brings support for multiple RPC/RDMA data
     chunks per RPC transaction to the Linux NFS server's RPC/RDMA
     transport implementation.

     This is a part of the RPC/RDMA spec that the other premiere
     NFS/RDMA implementation (Solaris) has had for a very long time, and
     completes the implementation of RPC/RDMA version 1 in the Linux
     kernel's NFS server.

   - Long ago, NFSv4 support was introduced to NFSD using a series of C
     macros that hid dprintk's and goto's. Over time, the kernel's XDR
     implementation has been greatly improved, but these C macros have
     remained and become fallow. A series of patches in this pull
     request completely replaces those macros with the use of current
     kernel XDR infrastructure. Benefits include:

       - More robust input sanitization in NFSD's NFSv4 XDR decoders.

       - Make it easier to use common kernel library functions that use
         XDR stream APIs (for example, GSS-API).

       - Align the structure of the source code with the RFCs so it is
         easier to learn, verify, and maintain our XDR implementation.

       - Removal of more than a hundred hidden dprintk() call sites.

       - Removal of some explicit manipulation of pages to help make the
         eventual transition to xdr->bvec smoother.

   - On top of several related fixes in 5.10-rc, there are a few more
     fixes to get the Linux NFSD implementation of NFSv4.2 inter-server
     copy up to speed.

  And as usual, there is a pinch of seasoning in the form of a
  collection of unrelated minor bug fixes and clean-ups.

  Many thanks to all who contributed this time around!"

* tag 'nfsd-5.11' of git://git.linux-nfs.org/projects/cel/cel-2.6: (131 commits)
  nfsd: Record NFSv4 pre/post-op attributes as non-atomic
  nfsd: Set PF_LOCAL_THROTTLE on local filesystems only
  nfsd: Fix up nfsd to ensure that timeout errors don't result in ESTALE
  exportfs: Add a function to return the raw output from fh_to_dentry()
  nfsd: close cached files prior to a REMOVE or RENAME that would replace target
  nfsd: allow filesystems to opt out of subtree checking
  nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations
  Revert "nfsd4: support change_attr_type attribute"
  nfsd4: don't query change attribute in v2/v3 case
  nfsd: minor nfsd4_change_attribute cleanup
  nfsd: simplify nfsd4_change_info
  nfsd: only call inode_query_iversion in the I_VERSION case
  nfs_common: need lock during iterate through the list
  NFSD: Fix 5 seconds delay when doing inter server copy
  NFSD: Fix sparse warning in nfs4proc.c
  SUNRPC: Remove XDRBUF_SPARSE_PAGES flag in gss_proxy upcall
  sunrpc: clean-up cache downcall
  nfsd: Fix message level for normal termination
  NFSD: Remove macros that are no longer used
  NFSD: Replace READ* macros in nfsd4_decode_compound()
  ...
2020-12-15 18:52:30 -08:00
Trond Myklebust 9a7016319e pNFS/flexfiles: Fix up layoutstats reporting for non-TCP transports
Ensure that we report the correct netid when using UDP or RDMA
transports to the DSes.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-12-02 14:05:53 -05:00
Chuck Lever 0ae4c3e8a6 SUNRPC: Add xdr_set_scratch_page() and xdr_reset_scratch_buffer()
Clean up: De-duplicate some frequently-used code.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30 14:46:35 -05:00
Trond Myklebust 63e2fffa59 pNFS/flexfiles: Fix array overflow when flexfiles mirroring is enabled
If the flexfiles mirroring is enabled, then the read code expects to be
able to set pgio->pg_mirror_idx to point to the data server that is
being used for this particular read. However it does not change the
pg_mirror_count because we only need to send a single read.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-11-30 10:52:22 -05:00
Trond Myklebust b9df46d08a pNFS/flexfiles: Be consistent about mirror index types
A mirror index is always of type u32.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-09-18 09:25:33 -04:00
Trond Myklebust ee15c7b53e pNFS/flexfiles: Ensure we initialise the mirror bsizes correctly on read
While it is true that reading from an unmirrored source always uses
index 0, that is no longer true for mirrored sources when we fail over.

Fixes: 563c53e73b ("NFS: Fix flexfiles read failover")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-09-18 09:21:10 -04:00
Gustavo A. R. Silva df561f6688 treewide: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-23 17:36:59 -05:00
Trond Myklebust 563c53e73b NFS: Fix flexfiles read failover
The current mirrored read failover code is correctly resetting the mirror
index between failed reads, however it is not able to actually flip the
RPC call over to the next RPC client.
The end result is that we keep resending the RPC call to the same client
over and over.

The fix is to use the pnfs_read_resend_pnfs() mechanism to schedule a
new RPC call, but we need to add the ability to pass in a mirror
index so that we always retry the next mirror in the list.

Fixes: 166bd5b889 ("pNFS/flexfiles: Fix layoutstats handling during read failovers")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-08-12 11:20:29 -04:00
Trond Myklebust 18eb87f444 pNFS/flexfiles: The mirror count could depend on the layout segment range
Make sure we specify the layout segment range when calculating the
mirror count. In theory, that number could depend on the range to
which we're writing.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-07-12 23:49:55 -04:00
Trond Myklebust f97ff92bd1 pNFS/flexfiles: Clean up redundant calls to pnfs_put_lseg()
Both nfs_pageio_reset_read_mds() and nfs_pageio_reset_write_mds()
do call pnfs_generic_pg_cleanup() for us.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-07-12 23:49:55 -04:00
Trond Myklebust 8b04013737 pNFS/flexfiles: Fix list corruption if the mirror count changes
If the mirror count changes in the new layout we pick up inside
ff_layout_pg_init_write(), then we can end up adding the
request to the wrong mirror and corrupting the mirror->pg_list.

Fixes: d600ad1f2b ("NFS41: pop some layoutget errors to application")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-26 08:43:14 -04:00
Trond Myklebust cbd7be43c4 pNFS/flexfiles: Specify the layout segment range in LAYOUTGET
Move from requesting only full file layout segments, to requesting
layout segments that match our I/O size. This means the server is
still free to return a full file layout, but we will no longer
error out if it does not.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27 16:34:35 -04:00
Trond Myklebust e70430d939 pNFS/flexfiles: remove requirement for whole file layouts
Remove the requirement that the server always sends whole file
layouts.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27 16:34:35 -04:00
Trond Myklebust e1e54ab710 pNFS/flexfiles: Check the layout segment range before doing I/O
When starting to read or write with a layout segment, check that the
range matches our request.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27 16:34:35 -04:00
Trond Myklebust 660d1eb223 pNFS/flexfile: Don't merge layout segments if the mirrors don't match
Check that the number of mirrors, and the mirror information matches
before deciding to merge layout segments in pNFS/flexfiles.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27 16:34:35 -04:00
Trond Myklebust 9c455a8c1e NFS/pNFS: Clean up pNFS commit operations
Move the pNFS commit related operations into a separate structure
that can be carried by the pnfs_ds_commit_info.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27 16:34:35 -04:00
Trond Myklebust 0aa647b736 NFS: Remove bucket array from struct pnfs_ds_commit_info
Remove the unused bucket array in struct pnfs_ds_commit_info.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27 16:34:35 -04:00
Trond Myklebust ba827c9abb pNFS: Enable per-layout segment commit structures
Enable adding and lookup of per-layout segment commits in filelayout
and flexfilelayout.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27 16:34:34 -04:00
Trond Myklebust a9901899b6 pNFS: Add infrastructure for cleaning up per-layout commit structures
Ensure that both the file and flexfiles layout types clean up when
freeing the layout segments.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27 16:34:34 -04:00
Trond Myklebust c21e716884 NFSv4/pnfs: Support a list of commit arrays in struct pnfs_ds_commit_info
When we have multiple layout segments with different lists of mirrored
data, we need to track the commits on a per layout segment basis.
This patch adds a list to support this tracking in struct
pnfs_ds_commit_info.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27 16:34:34 -04:00
Trond Myklebust 329651b1f1 pNFS/flexfiles: Simplify allocation of the mirror array
Just allocate the array at the end of the layout segment structure,
instead of allocating it as a separate array of pointers.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-26 10:52:04 -04:00
Trond Myklebust cf6605d194 NFSv4: Ensure layout headers are RCU safe
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16 08:34:29 -04:00
Trond Myklebust 194a0dc8e2 pNFS/flexfiles: Report DELAY and GRACE errors from the DS to the server
Ensure that if the DS is returning too many DELAY and GRACE errors, we
also report that to the MDS through the layouterror mechanism.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16 08:34:29 -04:00
Trond Myklebust 088f3e68d8 pNFS/flexfiles: Add tracing for layout errors
Trace layout errors for pNFS/flexfiles on read/write/commit operations.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:33 -05:00
Trond Myklebust 0722dc9fea pNFS/flexfiles: Record resend attempts on I/O failure
If the attempt to do pNFS fails, then record what action we
take to recover (resend, reset to pnfs or reset to mds).

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:32 -05:00
Trond Myklebust 7af46292da pNFS/flexfiles: Don't time out requests on hard mounts
If the mount is hard, we should ignore the 'io_maxretrans' module
parameter so that we always keep retrying.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-26 15:31:29 -04:00
Trond Myklebust d5711920ec Revert "NFSv4/flexfiles: Abort I/O early if the layout segment was invalidated"
This reverts commit a79f194aa4.
The mechanism for aborting I/O is racy, since we are not guaranteed that
the request is asleep while we're changing both task->tk_status and
task->tk_action.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v5.1
2019-08-26 15:31:29 -04:00
Trond Myklebust d5b9216fd5 pnfs/flexfiles: Add tracepoints for detecting pnfs fallback to MDS
Add tracepoints to allow debugging of the event chain leading to
a pnfs fallback to doing I/O through the MDS.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-18 15:50:28 -04:00
Thomas Gleixner 09c434b8a0 treewide: Add SPDX license identifier for more missed files
Add SPDX license identifiers to all files which:

 - Have no license information of any form

 - Have MODULE_LICENCE("GPL*") inside which was used in the initial
   scan/conversion to ignore the file

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

  GPL-2.0-only

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21 10:50:45 +02:00
Trond Myklebust 9fcd5960e8 NFS: Add a helper to return a pointer to the open context of a struct nfs_page
Add a helper for when we remove the explicit pointer to the open
context.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-04-25 14:18:15 -04:00
Trond Myklebust 33344e0f7e pNFS: Add tracking to limit the number of pNFS retries
When the client is reading or writing using pNFS, and hits an error
on the DS, then it typically sends a LAYOUTERROR and/or LAYOUTRETURN
to the MDS, before redirtying the failed pages, and going for a new
round of reads/writebacks. The problem is that if the server has no
way to fix the DS, then we may need a way to interrupt this loop
after a set number of attempts have been made.
This patch adds an optional module parameter that allows the admin
to specify how many times to retry the read/writeback process before
failing with a fatal error.
The default behaviour is to retry forever.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-04-25 14:18:14 -04:00
Trond Myklebust 166bd5b889 pNFS/flexfiles: Fix layoutstats handling during read failovers
During a read failover, we may end up changing the value of
the pgio_mirror_idx, so make sure that we record the layout
stats before that update.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-23 12:03:58 -04:00
Trond Myklebust 4cbc8a571c NFS/flexfile: Simplify nfs4_ff_layout_select_ds_stateid()
Pass in a pointer to the mirror rather than forcing another
array access.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01 22:37:38 -05:00
Trond Myklebust 626d48b12c NFS/flexfile: Simplify nfs4_ff_layout_ds_version()
Pass in a pointer to the mirror rather than forcing another
array access.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01 22:37:38 -05:00