Commit Graph

410 Commits

Author SHA1 Message Date
Scott Mayhew c58edd58a0 nfs: remove nfs_page_length
JIRA: https://issues.redhat.com/browse/RHEL-59704

commit 7f296b25f2a6453bf052b03ed0676a18bee312a5
Author: Christoph Hellwig <hch@lst.de>
Date:   Fri Jul 5 07:38:38 2024 +0200

    nfs: remove nfs_page_length

    The nfs_page_length is not used anywhere, remove it.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
    Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
2024-11-05 07:34:04 -05:00
Scott Mayhew 852fd33f32 nfs: remove dead code for the old swap over NFS implementation
JIRA: https://issues.redhat.com/browse/RHEL-59704
Conflicts:
	fs/nfs/write.c: Context difference due to RHEL not having
	600f111ef51d ("fs: Rename mapping private members")

commit 7e8e78a0ba00c88f0ded86de64bdddc82e06b196
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Jul 1 07:26:48 2024 +0200

    nfs: remove dead code for the old swap over NFS implementation

    Remove the code testing folio_test_swapcache either explicitly or
    implicitly in pagemap.h headers, as is now handled using the direct I/O
    path and not the buffered I/O path that these helpers are located in.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
    Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
2024-11-05 07:34:02 -05:00
Scott Mayhew 7718bf0773 NFS: make sure lock/nolock overriding local_lock mount option
JIRA: https://issues.redhat.com/browse/RHEL-59704

commit bf95f82e6a569f41bae1e37204b219a5e1e8b971
Author: Chen Hanxiao <chenhx.fnst@fujitsu.com>
Date:   Tue Apr 2 18:33:55 2024 +0800

    NFS: make sure lock/nolock overriding local_lock mount option

    Currently, mount option lock/nolock and local_lock option
    may override NFS_MOUNT_LOCAL_FLOCK NFS_MOUNT_LOCAL_FCNTL flags
    when passing in different order:

    mount -o vers=3,local_lock=all,lock:
            local_lock=none

    mount -o vers=3,lock,local_lock=all:
            local_lock=all

    This patch will let lock/nolock override local_lock option
    as nfs(5) suggested.

    Signed-off-by: Chen Hanxiao <chenhx.fnst@fujitsu.com>
    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
2024-11-05 07:33:56 -05:00
Scott Mayhew 52e8be769f nfs: make the rpc_stat per net namespace
JIRA: https://issues.redhat.com/browse/RHEL-59704

commit 1548036ef1204df65ca5a16e8b199c858cb80075
Author: Josef Bacik <josef@toxicpanda.com>
Date:   Thu Feb 15 14:57:32 2024 -0500

    nfs: make the rpc_stat per net namespace

    Now that we're exposing the rpc stats on a per-network namespace basis,
    move this struct into struct nfs_net and use that to make sure only the
    per-network namespace stats are exposed.

    Signed-off-by: Josef Bacik <josef@toxicpanda.com>
    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
2024-10-25 12:36:07 -04:00
Ian Kent 956e3ad810 fs: port ->mknod() to pass mnt_idmap
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus

Conflicts: For consistency drop btrfs hunks because it isn't supported in
	CentOS Stream and other backports also drop such hunks.
	The cifs source has been moved in CentOS Stream so manually
	apply rejected hunks to fs/smb/client/cifsfs.h and
	fs/smb/client/dir.c.
	Dropped hunks for ntfs3 because the source is not present in the
	CentOS Stream source tree.
	CentOS Stream commit 892da692fa ("shmem: support idmapped
	mounts for tmpfs") is present, which cuases hunks #2-#4 to be
	rejected, manually apply the hunks.
	CentOS Stream commit f0f830cd7e ("ceph: create symlinks with
	encrypted and base64-encoded targets") is present and resulted
	in fuzz against fs/ceph/dir.c hunk #2.
	Upstream commit 863f144f12add ("vfs: open inside ->tmpfile()")
	is missing causing fuzz against fs/ext2/namei.c.
	Upstream commit 7d37539037c2f ("fuse: implement ->tmpfile()")
	is missing causing fuzz in hunk #4 against fs/fuse/dir.c.
	CentOS Stream commit 892da692fa ("shmem: support idmapped
	mounts for tmpfs") is present, so a patch reorder was needed
	with appropriate adjustments.

commit 5ebb29bee8d5fc173b774e0755be8cb335503ee3
Author: Christian Brauner <brauner@kernel.org>
Date:   Fri Jan 13 12:49:16 2023 +0100

    fs: port ->mknod() to pass mnt_idmap

    Convert to struct mnt_idmap.

    Last cycle we merged the necessary infrastructure in
    256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
    This is just the conversion to struct mnt_idmap.

    Currently we still pass around the plain namespace that was attached to a
    mount. This is in general pretty convenient but it makes it easy to
    conflate namespaces that are relevant on the filesystem with namespaces
    that are relevent on the mount level. Especially for non-vfs developers
    without detailed knowledge in this area this can be a potential source for
    bugs.

    Once the conversion to struct mnt_idmap is done all helpers down to the
    really low-level helpers will take a struct mnt_idmap argument instead of
    two namespace arguments. This way it becomes impossible to conflate the two
    eliminating the possibility of any bugs. All of the vfs and all filesystems
    only operate on struct mnt_idmap.

    Acked-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>

Signed-off-by: Ian Kent <ikent@redhat.com>
2024-10-16 10:45:08 +08:00
Ian Kent 19f3b4f1ba fs: port ->rename() to pass mnt_idmap
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus

Conflicts: For consistency drop btrfs hunks because it isn't supported in
	CentOS Stream and other backports also drop such hunks.
	The cifs source has been moved in CentOS Stream so manually
	apply rejected hunks to fs/smb/client/cifsfs.h and
	fs/smb/client/inode.c.
	Dropped hunks for ntfs3 because the source is not present in the
	CentOS Stream source tree.
	Upstream commit cc14d24026704 ("hpfs: Convert symlinks to
	read_folio") is not present which causes fuzz 1 for hunk #1.
	CentOS Stream commit 892da692fa ("shmem: support idmapped
	mounts for tmpfs") is present, so a patch reorder was needed
	with appropriate adjustments.

commit e18275ae55e07a2937e48134589c2f4c1d99a369
Author: Christian Brauner <brauner@kernel.org>
Date:   Fri Jan 13 12:49:17 2023 +0100

    fs: port ->rename() to pass mnt_idmap

    Convert to struct mnt_idmap.

    Last cycle we merged the necessary infrastructure in
    256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
    This is just the conversion to struct mnt_idmap.

    Currently we still pass around the plain namespace that was attached to a
    mount. This is in general pretty convenient but it makes it easy to
    conflate namespaces that are relevant on the filesystem with namespaces
    that are relevent on the mount level. Especially for non-vfs developers
    without detailed knowledge in this area this can be a potential source for
    bugs.

    Once the conversion to struct mnt_idmap is done all helpers down to the
    really low-level helpers will take a struct mnt_idmap argument instead of
    two namespace arguments. This way it becomes impossible to conflate the two
    eliminating the possibility of any bugs. All of the vfs and all filesystems
    only operate on struct mnt_idmap.

    Acked-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>

Signed-off-by: Ian Kent <ikent@redhat.com>
2024-10-16 10:45:07 +08:00
Ian Kent a7750be4f4 fs: port ->mkdir() to pass mnt_idmap
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus

Conflicts: For consistency drop btrfs hunks because it isn't supported in
	CentOS Stream and other backports also drop such hunks.
	The cifs source has been moved in CentOS Stream so manually
	apply rejected hunks to fs/smb/client/cifsfs.h and
	fs/smb/client/inode.c.
	Dropped hunks for ntfs3 because the source is not present in the
	CentOS Stream source tree.

commit c54bd91e9eaba43f09aadc25b52ea869ff3b5587
Author: Christian Brauner <brauner@kernel.org>
Date:   Fri Jan 13 12:49:15 2023 +0100

    fs: port ->mkdir() to pass mnt_idmap

    Convert to struct mnt_idmap.

    Last cycle we merged the necessary infrastructure in
    256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
    This is just the conversion to struct mnt_idmap.

    Currently we still pass around the plain namespace that was attached to a
    mount. This is in general pretty convenient but it makes it easy to
    conflate namespaces that are relevant on the filesystem with namespaces
    that are relevent on the mount level. Especially for non-vfs developers
    without detailed knowledge in this area this can be a potential source for
    bugs.

    Once the conversion to struct mnt_idmap is done all helpers down to the
    really low-level helpers will take a struct mnt_idmap argument instead of
    two namespace arguments. This way it becomes impossible to conflate the two
    eliminating the possibility of any bugs. All of the vfs and all filesystems
    only operate on struct mnt_idmap.

    Acked-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>

Signed-off-by: Ian Kent <ikent@redhat.com>
2024-10-16 10:45:00 +08:00
Ian Kent 5744ba0ee3 fs: port ->symlink() to pass mnt_idmap
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus

Conflicts: The cifs source has been moved in CentOS Stream so manually
	apply rejected hunks to fs/smb/client/cifsfs.h and
	fs/smb/client/link.c.
	Dropped hunks for ntfs3 because the source is not present in the
	CentOS Stream source tree.
	CentOS Stream commit f0f830cd7e ("ceph: create symlinks with
	encrypted and base64-encoded targets") is present and resulted
	in fuzz against fs/ceph/dir.c.

commit 7a77db95511c39be4b2db2ceca152ef589adc2dc
Author: Christian Brauner <brauner@kernel.org>
Date:   Fri Jan 13 12:49:14 2023 +0100

    fs: port ->symlink() to pass mnt_idmap

    Convert to struct mnt_idmap.

    Last cycle we merged the necessary infrastructure in
    256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
    This is just the conversion to struct mnt_idmap.

    Currently we still pass around the plain namespace that was attached to a
    mount. This is in general pretty convenient but it makes it easy to
    conflate namespaces that are relevant on the filesystem with namespaces
    that are relevent on the mount level. Especially for non-vfs developers
    without detailed knowledge in this area this can be a potential source for
    bugs.

    Once the conversion to struct mnt_idmap is done all helpers down to the
    really low-level helpers will take a struct mnt_idmap argument instead of
    two namespace arguments. This way it becomes impossible to conflate the two
    eliminating the possibility of any bugs. All of the vfs and all filesystems
    only operate on struct mnt_idmap.

    Acked-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>

Signed-off-by: Ian Kent <ikent@redhat.com>
2024-10-16 10:45:00 +08:00
Ian Kent a56d1daadf fs: port ->create() to pass mnt_idmap
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus

Conflicts: For consistency drop btrfs hunks because it isn't supported in
	CentOS Stream and other backports also drop such hunks.
	The cifs source has been moved in CentOS Stream so manually
	apply rejected hunks to fs/smb/client/cifsfs.h and
	fs/smb/client/dir.c.
	Dropped hunks for ntfs3 because the source is not present in the
	CentOS Stream source tree.
	CentOS Stream commit 892da692fa ("shmem: support idmapped
	mounts for tmpfs") is present, which cuases fuzz in mm/shmem.c.

commit 6c960e68aaed335a0040f16654f3c5e5bfcf9249
Author: Christian Brauner <brauner@kernel.org>
Date:   Fri Jan 13 12:49:13 2023 +0100

    fs: port ->create() to pass mnt_idmap

    Convert to struct mnt_idmap.

    Last cycle we merged the necessary infrastructure in
    256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
    This is just the conversion to struct mnt_idmap.

    Currently we still pass around the plain namespace that was attached to a
    mount. This is in general pretty convenient but it makes it easy to
    conflate namespaces that are relevant on the filesystem with namespaces
    that are relevent on the mount level. Especially for non-vfs developers
    without detailed knowledge in this area this can be a potential source for
    bugs.

    Once the conversion to struct mnt_idmap is done all helpers down to the
    really low-level helpers will take a struct mnt_idmap argument instead of
    two namespace arguments. This way it becomes impossible to conflate the two
    eliminating the possibility of any bugs. All of the vfs and all filesystems
    only operate on struct mnt_idmap.

    Acked-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>

Signed-off-by: Ian Kent <ikent@redhat.com>
2024-10-16 10:44:53 +08:00
Benjamin Coddington b64aae3b7c nfs: fix undefined behavior in nfs_block_bits()
JIRA: https://issues.redhat.com/browse/RHEL-53004

commit 3c0a2e0b0ae661457c8505fecc7be5501aa7a715
Author: Sergey Shtylyov <s.shtylyov@omp.ru>
Date:   Fri May 10 23:24:04 2024 +0300

    nfs: fix undefined behavior in nfs_block_bits()

    Shifting *signed int* typed constant 1 left by 31 bits causes undefined
    behavior. Specify the correct *unsigned long* type by using 1UL instead.

    Found by Linux Verification Center (linuxtesting.org) with the Svace static
    analysis tool.

    Cc: stable@vger.kernel.org
    Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
    Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
2024-08-06 09:32:38 -04:00
Benjamin Coddington b67a44c623 NFS: drop unused nfs_direct_req bytes_left
JIRA: https://issues.redhat.com/browse/RHEL-34875

commit 1fd5394e6ab8b11465a5d0867f188fad1835a762
Author: Benjamin Coddington <bcodding@redhat.com>
Date:   Fri Nov 17 06:25:14 2023 -0500

    NFS: drop unused nfs_direct_req bytes_left

    Now that we're calculating how large a remaining IO should be based
    on the current request's offset, we no longer need to track bytes_left on
    each struct nfs_direct_req.  Drop the field, and clean up the direct
    request tracepoints.

    Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
2024-06-27 08:14:31 -04:00
Benjamin Coddington 44ce03ba3e pNFS: Fix the pnfs block driver's calculation of layoutget size
JIRA: https://issues.redhat.com/browse/RHEL-34875

commit 8a6291bf3b0eae1bf26621e6419a91682f2d6227
Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date:   Fri Nov 17 06:25:13 2023 -0500

    pNFS: Fix the pnfs block driver's calculation of layoutget size

    Instead of relying on the value of the 'bytes_left' field, we should
    calculate the layout size based on the offset of the request that is
    being written out.

    Reported-by: Benjamin Coddington <bcodding@redhat.com>
    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
    Fixes: 954998b60caa ("NFS: Fix error handling for O_DIRECT write scheduling")
    Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
    Tested-by: Benjamin Coddington <bcodding@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
2024-06-27 08:14:30 -04:00
Jeffrey Layton 42f9132c4b NFS/pNFS: Set the connect timeout for the pNFS flexfiles driver
JIRA: https://issues.redhat.com/browse/RHEL-7936

commit 537935f72eb28a3dd0097386f06402e25e66359a
Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date:   Sat Aug 19 17:32:25 2023 -0400

    NFS/pNFS: Set the connect timeout for the pNFS flexfiles driver

    Ensure that the connect timeout for the pNFS flexfiles driver is of the
    same order as the I/O timeout, so that we can fail over quickly when
    trying to read from a data server that is down.

    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
    Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

Signed-off-by: Jeffrey Layton <jlayton@redhat.com>
2023-12-02 05:12:38 -05:00
Jeffrey Layton b2e0fd3d7a NFSv4.2: Rework scratch handling for READ_PLUS (again)
JIRA: https://issues.redhat.com/browse/RHEL-7936

commit 303a78052091c81e9003915c521fdca1c7e117af
Author: Anna Schumaker <Anna.Schumaker@Netapp.com>
Date:   Fri Jun 9 15:26:25 2023 -0400

    NFSv4.2: Rework scratch handling for READ_PLUS (again)

    I found that the read code might send multiple requests using the same
    nfs_pgio_header, but nfs4_proc_read_setup() is only called once. This is
    how we ended up occasionally double-freeing the scratch buffer, but also
    means we set a NULL pointer but non-zero length to the xdr scratch
    buffer. This results in an oops the first time decoding needs to copy
    something to scratch, which frequently happens when decoding READ_PLUS
    hole segments.

    I fix this by moving scratch handling into the pageio read code. I
    provide a function to allocate scratch space for decoding read replies,
    and free the scratch buffer when the nfs_pgio_header is freed.

    Fixes: fbd2a05f29a9 (NFSv4.2: Rework scratch handling for READ_PLUS)
    Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

Signed-off-by: Jeffrey Layton <jlayton@redhat.com>
2023-12-02 05:12:37 -05:00
Jeffrey Layton 1cd44bd5eb NFS: Add an "xprtsec=" NFS mount option
JIRA: https://issues.redhat.com/browse/RHEL-7936

commit c8407f2e560c53c4c73e77cb5604c8a408dbe7f7
Author: Chuck Lever <chuck.lever@oracle.com>
Date:   Wed Jun 7 10:00:09 2023 -0400

    NFS: Add an "xprtsec=" NFS mount option

    After some discussion, we decided that controlling transport layer
    security policy should be separate from the setting for the user
    authentication flavor. To accomplish this, add a new NFS mount
    option to select a transport layer security policy for RPC
    operations associated with the mount point.

      xprtsec=none     - Transport layer security is forced off.

      xprtsec=tls      - Establish an encryption-only TLS session. If
                         the initial handshake fails, the mount fails.
                         If TLS is not available on a reconnect, drop
                         the connection and try again.

      xprtsec=mtls     - Both sides authenticate and an encrypted
                         session is created. If the initial handshake
                         fails, the mount fails. If TLS is not available
                         on a reconnect, drop the connection and try
                         again.

    To support client peer authentication (mtls), the handshake daemon
    will have configurable default authentication material (certificate
    or pre-shared key). In the future, mount options can be added that
    can provide this material on a per-mount basis.

    Updates to mount.nfs (to support xprtsec=auto) and nfs(5) will be
    sent under separate cover.

    Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

Signed-off-by: Jeffrey Layton <jlayton@redhat.com>
2023-12-02 05:12:34 -05:00
Jeffrey Layton 905b73d2e4 NFS: Have struct nfs_client carry a TLS policy field
JIRA: https://issues.redhat.com/browse/RHEL-7936

commit 6c0a8c5fcf7158e889dbdd077f67c81984704710
Author: Chuck Lever <chuck.lever@oracle.com>
Date:   Wed Jun 7 09:59:42 2023 -0400

    NFS: Have struct nfs_client carry a TLS policy field

    The new field is used to match struct nfs_clients that have the same
    TLS policy setting.

    Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
    Reviewed-by: Jeff Layton <jlayton@kernel.org>
    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

Signed-off-by: Jeffrey Layton <jlayton@redhat.com>
2023-12-02 05:12:34 -05:00
Jeffrey Layton 8a46dc94a7 nfs: move nfs_fhandle_hash to common include file
Bugzilla: https://bugzilla.redhat.com/2063818

commit e59fb6749ed833deee5b3cfd7e89925296d41f49
Author: Jeff Layton <jlayton@kernel.org>
Date:   Fri Mar 3 07:16:02 2023 -0500

    nfs: move nfs_fhandle_hash to common include file

    lockd needs to be able to hash filehandles for tracepoints. Move the
    nfs_fhandle_hash() helper to a common nfs include file.

    Signed-off-by: Jeff Layton <jlayton@kernel.org>
    Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

Signed-off-by: Jeffrey Layton <jlayton@redhat.com>
2023-05-24 09:48:57 -04:00
Dave Wysochanski 2b171d71a1 NFS: Convert buffered read paths to use netfs when fscache is enabled
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2129854

commit 000dbe0bec058cbf2ca9e156e4a5584f5158b0f9
Author: Dave Wysochanski <dwysocha@redhat.com>
Date:   Mon Feb 20 08:43:06 2023 -0500

    NFS: Convert buffered read paths to use netfs when fscache is enabled

    Convert the NFS buffered read code paths to corresponding netfs APIs,
    but only when fscache is configured and enabled.

    The netfs API defines struct netfs_request_ops which must be filled
    in by the network filesystem.  For NFS, we only need to define 5 of
    the functions, the main one being the issue_read() function.
    The issue_read() function is called by the netfs layer when a read
    cannot be fulfilled locally, and must be sent to the server (either
    the cache is not active, or it is active but the data is not available).
    Once the read from the server is complete, netfs requires a call to
    netfs_subreq_terminated() which conveys either how many bytes were read
    successfully, or an error.  Note that issue_read() is called with a
    structure, netfs_io_subrequest, which defines the IO requested, and
    contains a start and a length (both in bytes), and assumes the underlying
    netfs will return a either an error on the whole region, or the number
    of bytes successfully read.

    The NFS IO path is page based and the main APIs are the pgio APIs defined
    in pagelist.c.  For the pgio APIs, there is no way for the caller to
    know how many RPCs will be sent and how the pages will be broken up
    into underlying RPCs, each of which will have their own completion and
    return code.  In contrast, netfs is subrequest based, a single
    subrequest may contain multiple pages, and a single subrequest is
    initiated with issue_read() and terminated with netfs_subreq_terminated().
    Thus, to utilze the netfs APIs, NFS needs some way to accommodate
    the netfs API requirement on the single response to the whole
    subrequest, while also minimizing disruptive changes to the NFS
    pgio layer.

    The approach taken with this patch is to allocate a small structure
    for each nfs_netfs_issue_read() call, store the final error and number
    of bytes successfully transferred in the structure, and update these values
    as each RPC completes.  The refcount on the structure is used as a marker
    for the last RPC completion, is incremented in nfs_netfs_read_initiate(),
    and decremented inside nfs_netfs_read_completion(), when a nfs_pgio_header
    contains a valid pointer to the data.  On the final put (which signals
    the final outstanding RPC is complete) in nfs_netfs_read_completion(),
    call netfs_subreq_terminated() with either the final error value (if
    one or more READs complete with an error) or the number of bytes
    successfully transferred (if all RPCs complete successfully).  Note
    that when all RPCs complete successfully, the number of bytes transferred
    is capped to the length of the subrequest.  Capping the transferred length
    to the subrequest length prevents "Subreq overread" warnings from netfs.
    This is due to the "aligned_len" in nfs_pageio_add_page(), and the
    corner case where NFS requests a full page at the end of the file,
    even when i_size reflects only a partial page (NFS overread).

    Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
    Tested-by: Daire Byrne <daire@dneg.com>
    Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>

Conflicts:
There were two upstream commits that changed the contents of fscache.c,
but then the changes were removed by this commit, so both of these
commits were omitted from the backport:
  de4eda9de2d9 use less confusing names for iov_iter direction initializers
  8bb7cd842c44 nfs: use bvec_set_page to initialize bvecs
2023-05-11 16:58:39 -04:00
Scott Mayhew 0c2c4eaad2 NFS: Convert buffered writes to use folios
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2183621

commit 0c493b5cf16e28d761b6e77c7c32aa0e7af70813
Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date:   Thu Jan 19 16:33:43 2023 -0500

    NFS: Convert buffered writes to use folios

    Mostly mechanical conversion of struct page and functions into struct
    folio equivalents.
    The lack of support for folios in write_cache_pages(), means we still
    only support order 0 folio allocations. However the rest of the
    writeback code should now be ready for order n > 0.

    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
    Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
2023-05-08 10:41:30 -04:00
Scott Mayhew 5f1b05441c NFS: Convert buffered reads to use folios
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2183621

commit ab75bff1140733f1b43e81f055acd7d27af7ac05
Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date:   Thu Jan 19 16:33:41 2023 -0500

    NFS: Convert buffered reads to use folios

    Perform a largely mechanical conversion of references to struct page and
    page-specific functions to use the folio equivalents.

    Note that the fscache functionality remains untouched. Instead we just
    pass in the folio page. This should be OK, as long as we use order 0
    folios together with fscache.

    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
    Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
2023-05-08 10:41:29 -04:00
Scott Mayhew cec4a4f7e1 NFS: Support folios in nfs_generic_pgio()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2183621

commit eb9f2a5a5e85fd24949480d1d02c2a497f26e154
Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date:   Thu Jan 19 16:33:36 2023 -0500

    NFS: Support folios in nfs_generic_pgio()

    Add support for multi-page folios in the generic NFS i/o engine.

    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
    Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
2023-05-08 10:41:28 -04:00
Scott Mayhew dc6fba107d NFS: Avoid memcpy() run-time warning for struct sockaddr overflows
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2183621

commit cf0d7e7f4520814f45e1313872ad5777ed504004
Author: Kees Cook <keescook@chromium.org>
Date:   Sun Oct 16 21:36:50 2022 -0700

    NFS: Avoid memcpy() run-time warning for struct sockaddr overflows

    The 'nfs_server' and 'mount_server' structures include a union of
    'struct sockaddr' (with the older 16 bytes max address size) and
    'struct sockaddr_storage' which is large enough to hold all the
    supported sa_family types (128 bytes max size). The runtime memcpy()
    buffer overflow checker is seeing attempts to write beyond the 16
    bytes as an overflow, but the actual expected size is that of 'struct
    sockaddr_storage'. Plumb the use of 'struct sockaddr_storage' more
    completely through-out NFS, which results in adjusting the memcpy()
    buffers to the correct union members. Avoids this false positive run-time
    warning under CONFIG_FORTIFY_SOURCE:

      memcpy: detected field-spanning write (size 28) of single field "&ctx->nfs_server.address" at fs/nfs/namespace.c:178 (size 16)

    Reported-by: kernel test robot <yujie.liu@intel.com>
    Link: https://lore.kernel.org/all/202210110948.26b43120-yujie.liu@intel.com
    Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
    Cc: Anna Schumaker <anna@kernel.org>
    Cc: linux-nfs@vger.kernel.org
    Signed-off-by: Kees Cook <keescook@chromium.org>
    Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
2023-05-08 10:41:17 -04:00
Scott Mayhew 5c4480c2fb nfs: remove nfs_wait_atomic_killable() and nfs_write_prepare() declaration
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2183621

commit a035618caf8718a1d4e840ec39dfc5fce0dcdee1
Author: Gaosheng Cui <cuigaosheng1@huawei.com>
Date:   Fri Sep 9 14:24:11 2022 +0800

    nfs: remove nfs_wait_atomic_killable() and nfs_write_prepare() declaration

    nfs_write_prepare() has been removed since
    commit a4cdda5911 ("NFS: Create a common pgio_rpc_prepare
    function"), so remove it.

    nfs_wait_atomic_killable() has been removed since
    commit 723c921e7d ("sched/wait, fs/nfs: Convert wait_on_atomic_t()
    usage to the new wait_var_event() API"), so remove it.

    Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
    Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
2023-05-08 10:41:12 -04:00
Chris von Recklinghausen 3a51424db2 nfs: Convert to migrate_folio
Bugzilla: https://bugzilla.redhat.com/2160210

commit 4ae84a80475144f739f77ed8bc789bc7feaa08ce
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Mon Jun 6 09:22:19 2022 -0400

    nfs: Convert to migrate_folio

    Use a folio throughout this function.  migrate_page() will be converted
    later.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:29 -04:00
Benjamin Coddington 8cfc686839 NFS: Allow very small rsize & wsize again
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2107347

commit a60214c2465493aac0b014d87ee19327b6204c42
Author: Anna Schumaker <Anna.Schumaker@Netapp.com>
Date:   Wed Nov 30 15:30:47 2022 -0500

    NFS: Allow very small rsize & wsize again

    940261a19508 introduced nfs_io_size() to clamp the iosize to a multiple
    of PAGE_SIZE. This had the unintended side effect of no longer allowing
    iosizes less than a page, which could be useful in some situations.

    UDP already has an exception that causes it to fall back on the
    power-of-two style sizes instead. This patch adds an additional
    exception for very small iosizes.

    Reported-by: Jeff Layton <jlayton@kernel.org>
    Fixes: 940261a19508 ("NFS: Allow setting rsize / wsize to a multiple of PAGE_SIZE")
    Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
    Reviewed-by: Jeff Layton <jlayton@kernel.org>
    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
2022-12-21 09:50:14 -05:00
Benjamin Coddington 1ac84d4498 NFS: Allow setting rsize / wsize to a multiple of PAGE_SIZE
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2107347

commit 940261a195080cf1cdcd56948d363fe363b69da1
Author: Anna Schumaker <Anna.Schumaker@Netapp.com>
Date:   Fri Jun 17 16:23:36 2022 -0400

    NFS: Allow setting rsize / wsize to a multiple of PAGE_SIZE

    Previously, we required this to value to be a power of 2 for UDP related
    reasons. This patch keeps the power of 2 rule for UDP but allows more
    flexibility for TCP and RDMA.

    Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
2022-12-21 09:49:09 -05:00
Benjamin Coddington 6e5f86704e NFSv4.2: Update mode bits after ALLOCATE and DEALLOCATE
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2094072

commit d7a5118635e725d195843bda80cc5c964d93ef31
Author: Anna Schumaker <Anna.Schumaker@Netapp.com>
Date:   Wed Sep 7 16:34:21 2022 -0400

    NFSv4.2: Update mode bits after ALLOCATE and DEALLOCATE

    The fallocate call invalidates suid and sgid bits as part of normal
    operation. We need to mark the mode bits as invalid when using fallocate
    with an suid so these will be updated the next time the user looks at them.

    This fixes xfstests generic/683 and generic/684.

    Reported-by: Yue Cui <cuiyue-fnst@fujitsu.com>
    Fixes: 913eca1aea ("NFS: Fallocate should use the nfs4_fattr_bitmap")
    Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
2022-09-26 09:34:29 -04:00
Benjamin Coddington a0ccfb0b28 NFS: Memory allocation failures are not server fatal errors
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2094072

commit 452284407c18d8a522c3039339b1860afa0025a8
Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date:   Sat May 14 10:08:10 2022 -0400

    NFS: Memory allocation failures are not server fatal errors

    We need to filter out ENOMEM in nfs_error_is_fatal_on_server(), because
    running out of memory on our client is not a server error.

    Reported-by: Olga Kornievskaia <aglo@umich.edu>
    Fixes: 2dc23afffb ("NFS: ENOMEM should also be a fatal error.")
    Cc: stable@vger.kernel.org
    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
    Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
2022-09-26 09:34:26 -04:00
Benjamin Coddington 6231544861 NFSv4: fix open failure with O_ACCMODE flag
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2094072

commit b243874f6f9568b2daf1a00e9222cacdc15e159c
Author: ChenXiaoSong <chenxiaosong2@huawei.com>
Date:   Tue Mar 29 19:32:08 2022 +0800

    NFSv4: fix open failure with O_ACCMODE flag

    open() with O_ACCMODE|O_DIRECT flags secondly will fail.

    Reproducer:
      1. mount -t nfs -o vers=4.2 $server_ip:/ /mnt/
      2. fd = open("/mnt/file", O_ACCMODE|O_DIRECT|O_CREAT)
      3. close(fd)
      4. fd = open("/mnt/file", O_ACCMODE|O_DIRECT)

    Server nfsd4_decode_share_access() will fail with error nfserr_bad_xdr when
    client use incorrect share access mode of 0.

    Fix this by using NFS4_SHARE_ACCESS_BOTH share access mode in client,
    just like firstly opening.

    Fixes: ce4ef7c0a8 ("NFS: Split out NFS v4 file operations")
    Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
2022-09-26 09:34:22 -04:00
Benjamin Coddington 8038eeb715 NFS: nfsiod should not block forever in mempool_alloc()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2094072

commit 515dcdcd48736576c6f5c197814da6f81c60a21e
Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date:   Mon Mar 21 12:34:19 2022 -0400

    NFS: nfsiod should not block forever in mempool_alloc()

    The concern is that since nfsiod is sometimes required to kick off a
    commit, it can get locked up waiting forever in mempool_alloc() instead
    of failing gracefully and leaving the commit until later.

    Try to allocate from the slab first, with GFP_KERNEL | __GFP_NORETRY,
    then fall back to a non-blocking attempt to allocate from the memory
    pool.

    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
2022-09-26 09:34:21 -04:00
Benjamin Coddington 6d230ac05b NFS: Improve heuristic for readdirplus
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2094072

commit 230bc98f7a2a49eb472d184bdec91fd3096384b3
Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date:   Thu Feb 17 11:08:24 2022 -0500

    NFS: Improve heuristic for readdirplus

    The heuristic for readdirplus is designed to try to detect 'ls -l' and
    similar patterns. It does so by looking for cache hit/miss patterns in
    both the attribute cache and in the dcache of the files in a given
    directory, and then sets a flag for the readdirplus code to interpret.

    The problem with this approach is that a single attribute or dcache miss
    can cause the NFS code to force a refresh of the attributes for the
    entire set of files contained in the directory.

    To be able to make a more nuanced decision, let's sample the number of
    hits and misses in the set of open directory descriptors. That allows us
    to set thresholds at which we start preferring READDIRPLUS over regular
    READDIR, or at which we start to force a re-read of the remaining
    readdir cache using READDIRPLUS.

    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
2022-09-26 09:34:16 -04:00
Benjamin Coddington 134e51caac NFS: Clean up NFSv4.2 xattrs
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2094072

commit 84631f84ac95b6ff6f08a41ffba1f93eaab4e9c7
Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date:   Wed Feb 23 15:43:26 2022 -0500

    NFS: Clean up NFSv4.2 xattrs

    Add a helper for the xattr mask so that we can get rid of the inlined
    ifdefs.

    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
2022-09-26 09:34:12 -04:00
Benjamin Coddington 0bd454fc3a NFS: Add a helper to remove case-insensitive aliases
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2094072

commit 00bdadc7accfce944dc30fbc205cd28a7eed657b
Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date:   Fri Dec 17 15:36:57 2021 -0500

    NFS: Add a helper to remove case-insensitive aliases

    When dealing with case insensitive names, the client has no idea how the
    server performs the mapping, so cannot collapse the dentries into a
    single representative. So both rename and unlink need to deal with the
    fact that there could be several dentries representing the file, and
    have to somehow force them to be revalidated. Use d_prune_aliases() as a
    big hammer approach.

    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
    Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
2022-09-26 09:34:05 -04:00
Jeffrey Layton bd50d46325 nfs: add new nfs_direct_req tracepoint events
Bugzilla: http://bugzilla.redhat.com/2028370
Conflicts: minor contextual conflicts

commit 8efc4bbe84a8bdd26e848ed93a8900fad1b44ca2
Author: Jeff Layton <jlayton@kernel.org>
Date:   Fri Jul 22 14:12:18 2022 -0400

    nfs: add new nfs_direct_req tracepoint events

    Add some new tracepoints to the DIO write code.

    Signed-off-by: Jeff Layton <jlayton@kernel.org>
    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

Signed-off-by: Jeffrey Layton <jlayton@redhat.com>
2022-08-11 07:35:57 -04:00
Scott Mayhew d250f91fd6 NFS: Create a new nfs_alloc_fattr_with_label() function
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2049200

commit d755ad8dc752d44545613ea04d660aed674e540d
Author: Anna Schumaker <Anna.Schumaker@Netapp.com>
Date:   Fri Oct 22 13:11:00 2021 -0400

    NFS: Create a new nfs_alloc_fattr_with_label() function

    For creating fattrs with the label field already allocated for us. I
    also update nfs_free_fattr() to free the label in the end.

    Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
2022-02-03 11:44:04 -05:00
Scott Mayhew 45ddbe1ede NFS: Unexport nfs_probe_fsinfo()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2049200

commit 5fe1210d259542f966bab130830ece08e97f68f5
Author: Anna Schumaker <Anna.Schumaker@Netapp.com>
Date:   Thu Oct 14 13:55:08 2021 -0400

    NFS: Unexport nfs_probe_fsinfo()

    All the callers are now in client.c so we can remove the
    EXPORT_SYMBOL_GPL() and make it static.

    Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
2022-02-03 11:44:02 -05:00
Scott Mayhew 6b2ba6e244 NFS: Move nfs_probe_destination() into the generic client
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2049200

commit e5731131fb6fefaa69064ca511b7c4971d6cf54f
Author: Anna Schumaker <Anna.Schumaker@Netapp.com>
Date:   Thu Oct 14 13:55:05 2021 -0400

    NFS: Move nfs_probe_destination() into the generic client

    And rename it to nfs_probe_server(). I also change it to take the nfs_fh
    as an argument so callers can choose what filehandle to probe.

    Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
2022-02-03 11:44:01 -05:00
Scott Mayhew 5afae24456 NFS: Create an nfs4_server_set_init_caps() function
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2049200

commit 01dde76e471229e3437a2686c572f4980b2c483e
Author: Anna Schumaker <Anna.Schumaker@Netapp.com>
Date:   Thu Oct 14 13:55:04 2021 -0400

    NFS: Create an nfs4_server_set_init_caps() function

    And call it before doing an FSINFO probe to reset to the baseline
    capabilities before probing.

    Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
2022-02-03 11:44:01 -05:00
Scott Mayhew e867c58e87 NFSv4 introduce max_connect mount options
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2049200

commit 7e134205f62955369619021a695cd78fefd32451
Author: Olga Kornievskaia <kolga@netapp.com>
Date:   Fri Aug 27 14:37:17 2021 -0400

    NFSv4 introduce max_connect mount options

    This option will control up to how many xprts can the client
    establish to the server with a distinct address (that means
    nconnect connections are not counted towards this new limit).
    This patch is setting up nfs structures to keeep track of the
    max_connect limit (does not enforce it).

    The default value is kept at 1 so that no current mounts that
    don't want any additional connections would be effected. The
    maximum value is set at 16.

    Mounts to DS are not limited to default value of 1 but instead
    set to the maximum default value of 16 (NFS_MAX_TRANSPORTS).

    Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
    Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
2022-02-03 11:39:49 -05:00
Linus Torvalds a647034fe2 NFS client updates for Linux 5.13
Highlights include:
 
 Stable fixes:
 - Add validation of the UDP retrans parameter to prevent shift out-of-bounds
 - Don't discard pNFS layout segments that are marked for return
 
 Bugfixes:
 - Fix a NULL dereference crash in xprt_complete_bc_request() when the
   NFSv4.1 server misbehaves.
 - Fix the handling of NFS READDIR cookie verifiers
 - Sundry fixes to ensure attribute revalidation works correctly when the
   server does not return post-op attributes.
 - nfs4_bitmask_adjust() must not change the server global bitmasks
 - Fix major timeout handling in the RPC code.
 - NFSv4.2 fallocate() fixes.
 - Fix the NFSv4.2 SEEK_HOLE/SEEK_DATA end-of-file handling
 - Copy offload attribute revalidation fixes
 - Fix an incorrect filehandle size check in the pNFS flexfiles driver
 - Fix several RDMA transport setup/teardown races
 - Fix several RDMA queue wrapping issues
 - Fix a misplaced memory read barrier in sunrpc's call_decode()
 
 Features:
 - Micro optimisation of the TCP transmission queue using TCP_CORK
 - statx() performance improvements by further splitting up the tracking
   of invalid cached file metadata.
 - Support the NFSv4.2 "change_attr_type" attribute and use it to
   optimise handling of change attribute updates.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmCVLooACgkQZwvnipYK
 APJB5BAAtIJyhx40ooMBzcucDmXd1qovlKsb8ZlvnSI6c7wvHhFPNk9z4zwThnjL
 FpVYzJzK6XzAQY/PtgbrPwnSUmW925ngPWYR/hiYe+OGPBnYV+tXP8izCyEkNgMg
 45goDOxojGWl7AGTuAJiKcDSdH9PyIrbvt28iwcNSGjslasGSbAoL/836l4OIGr1
 Ymxs/NDML11dPco8GIKLGtHd8leFGleDx089VeNsgud8MdaFErp16O5Iz8DdzRKd
 W1l2zDMb05j8eDZIfy3w3FyrLkDXA+KgLSADiC8TcpxoadPaQJMeCvoIq8oqVndn
 bZBoxduXdLgf54Aec0WnNKFAOyc7pGvZoSNmFouT7EGV73g+g1LQ+ZbEE1bb8fCQ
 XHqCVaBt2+47NiTUgdxjXlZRfcn8fYKx0tVxfG3mQVMXUAWfsjmMyQMNgijDRJI2
 8Wz3lZMRGMILbR9j4QpP1biVy/2zGNWG/TB5ZZyZMSY4uT+aOpzlqdknb4UsRaSp
 f7MfmB7xEWpS4DJr9RIBrJ/hIdnMu1mNInxDPFo5Kl5HNp4TaPm2dPir2ZD2wMZI
 daURTX7giUhpE15ZebQDBqWD+mTR0bVDqLLeo131JRmMfMEHugNrr49xe+NkBu/R
 QWnFzgkGdQsOeiKRRwEUuhsi74JspqfwzdZzHqcRM5WuXVvBLcA=
 =h01b
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.13-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Stable fixes:

   - Add validation of the UDP retrans parameter to prevent shift
     out-of-bounds

   - Don't discard pNFS layout segments that are marked for return

  Bugfixes:

   - Fix a NULL dereference crash in xprt_complete_bc_request() when the
     NFSv4.1 server misbehaves.

   - Fix the handling of NFS READDIR cookie verifiers

   - Sundry fixes to ensure attribute revalidation works correctly when
     the server does not return post-op attributes.

   - nfs4_bitmask_adjust() must not change the server global bitmasks

   - Fix major timeout handling in the RPC code.

   - NFSv4.2 fallocate() fixes.

   - Fix the NFSv4.2 SEEK_HOLE/SEEK_DATA end-of-file handling

   - Copy offload attribute revalidation fixes

   - Fix an incorrect filehandle size check in the pNFS flexfiles driver

   - Fix several RDMA transport setup/teardown races

   - Fix several RDMA queue wrapping issues

   - Fix a misplaced memory read barrier in sunrpc's call_decode()

  Features:

   - Micro optimisation of the TCP transmission queue using TCP_CORK

   - statx() performance improvements by further splitting up the
     tracking of invalid cached file metadata.

   - Support the NFSv4.2 'change_attr_type' attribute and use it to
     optimise handling of change attribute updates"

* tag 'nfs-for-5.13-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (85 commits)
  xprtrdma: Fix a NULL dereference in frwr_unmap_sync()
  sunrpc: Fix misplaced barrier in call_decode
  NFSv4.2: Remove ifdef CONFIG_NFSD from NFSv4.2 client SSC code.
  xprtrdma: Move fr_mr field to struct rpcrdma_mr
  xprtrdma: Move the Work Request union to struct rpcrdma_mr
  xprtrdma: Move fr_linv_done field to struct rpcrdma_mr
  xprtrdma: Move cqe to struct rpcrdma_mr
  xprtrdma: Move fr_cid to struct rpcrdma_mr
  xprtrdma: Remove the RPC/RDMA QP event handler
  xprtrdma: Don't display r_xprt memory addresses in tracepoints
  xprtrdma: Add an rpcrdma_mr_completion_class
  xprtrdma: Add tracepoints showing FastReg WRs and remote invalidation
  xprtrdma: Avoid Send Queue wrapping
  xprtrdma: Do not wake RPC consumer on a failed LocalInv
  xprtrdma: Do not recycle MR after FastReg/LocalInv flushes
  xprtrdma: Clarify use of barrier in frwr_wc_localinv_done()
  xprtrdma: Rename frwr_release_mr()
  xprtrdma: rpcrdma_mr_pop() already does list_del_init()
  xprtrdma: Delete rpcrdma_recv_buffer_put()
  xprtrdma: Fix cwnd update ordering
  ...
2021-05-07 11:23:41 -07:00
Linus Torvalds f1c921fb70 selinux/stable-5.13 PR 20210426
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmCHM2sUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNfCg/9GmoCyCh+ZRj5RGQ6M+yJas1+yyJQ
 uEfTNde54yfATUTaaWYnZG59yqzM3I2uaV11U7tqg8ajiFPxJKqbs5R9jl3lnSjH
 0Dg22nXPSCOTKcU0x/DeLoKRr+M9jO1K/nQ8NEZvYX4nC/OgtCvJqb/oEQZIKAk5
 2a7OEmNNQyFGd274p9dELaDHxN9UIaJ2PzQFXtq7ROHgBXQO4ONb2ajOf6mDSFQb
 vP/CDHwaH+pcE28w44oRy0/YBkO1SrdqoFQchg5yFagM5tQRLGkXK4OFSs5KHi5Q
 YMtmaOzMPIv1e5eaC1HuuMJYA4pPb30T9hFHP7tmBVZfmZaFaDeUs+BhMm98WTiS
 o0iTP7tfs36/poOR1Q0/sB06uvF9hUAAX1ZuE95YySifbXU9hsUc9b0uQSwCdg9P
 /J9rcdHLTpWqjw9n02mezWmAvo5U8ZvbDs+0xPIwI+3RTUP5t6mp+Hd5Tc7bPTq1
 0rpWXx+FQoSytFap5qiUSiwBp+HF6HQnNIXB0Muf6wctChoTjvo7TwoxH//z4kEm
 +SddhOCNkB7VC/X7hOxhl0F/rdHuXvb1AFIWjpTLJH2CR1PvMtF+sGey+uPT6hKZ
 /gvhmQGjFdph99eGlfVbCNvx1pM61O25IscaYD1T2wGImw+z7dX4WkG3WoOdDSkR
 bRjrBkcHh0gLhWk=
 =HTEy
 -----END PGP SIGNATURE-----

Merge tag 'selinux-pr-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux updates from Paul Moore:

 - Add support for measuring the SELinux state and policy capabilities
   using IMA.

 - A handful of SELinux/NFS patches to compare the SELinux state of one
   mount with a set of mount options. Olga goes into more detail in the
   patch descriptions, but this is important as it allows more
   flexibility when using NFS and SELinux context mounts.

 - Properly differentiate between the subjective and objective LSM
   credentials; including support for the SELinux and Smack. My clumsy
   attempt at a proper fix for AppArmor didn't quite pass muster so John
   is working on a proper AppArmor patch, in the meantime this set of
   patches shouldn't change the behavior of AppArmor in any way. This
   change explains the bulk of the diffstat beyond security/.

 - Fix a problem where we were not properly terminating the permission
   list for two SELinux object classes.

* tag 'selinux-pr-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: add proper NULL termination to the secclass_map permissions
  smack: differentiate between subjective and objective task credentials
  selinux: clarify task subjective and objective credentials
  lsm: separate security_task_getsecid() into subjective and objective variants
  nfs: account for selinux security context when deciding to share superblock
  nfs: remove unneeded null check in nfs_fill_super()
  lsm,selinux: add new hook to compare new mount to an existing mount
  selinux: fix misspellings using codespell tool
  selinux: fix misspellings using codespell tool
  selinux: measure state and policy capabilities
  selinux: Allow context mounts for unpriviliged overlayfs
2021-04-27 13:42:11 -07:00
Eryu Guan c9301cb35b nfs: hornor timeo and retrans option when mounting NFSv3
Mounting NFSv3 uses default timeout parameters specified by underlying
sunrpc transport, and mount options like 'timeo' and 'retrans', unlike
NFSv4, are not honored.

But sometimes we want to set non-default timeout value when mounting
NFSv3, so pass 'timeo' and 'retrans' to nfs_mount() and fill the
'timeout' field of struct rpc_create_args before creating RPC
connection. This is also consistent with NFSv4 behavior.

Note that this only sets the timeout value of rpc connection to mountd,
but the timeout of rpcbind connection should be set as well. A later
patch will fix the rpcbind part.

Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-05 09:04:21 -04:00
Olga Kornievskaia ec1ade6a04 nfs: account for selinux security context when deciding to share superblock
Keep track of whether or not there were LSM security context
options passed during mount (ie creation of the superblock).
Then, while deciding if the superblock can be shared for the new
mount, check if the newly passed in LSM security context options
are compatible with the existing superblock's ones by calling
security_sb_mnt_opts_compat().

Previously, with selinux enabled, NFS wasn't able to do the
following 2mounts:
mount -o vers=4.2,sec=sys,context=system_u:object_r:root_t:s0
<serverip>:/ /mnt
mount -o vers=4.2,sec=sys,context=system_u:object_r:swapfile_t:s0
<serverip>:/scratch /scratch

2nd mount would fail with "mount.nfs: an incorrect mount option was
specified" and var log messages would have:
"SElinux: mount invalid. Same superblock, different security
settings for.."

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
[PM: tweak subject line]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-03-22 15:01:45 -04:00
Trond Myklebust fd6d3feed0 NFS: Clean up function nfs_mark_dir_for_revalidate()
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-03-08 16:01:02 -05:00
Christian Brauner 549c729771
fs: make helpers idmap mount aware
Extend some inode methods with an additional user namespace argument. A
filesystem that is aware of idmapped mounts will receive the user
namespace the mount has been marked with. This can be used for
additional permission checking and also to enable filesystems to
translate between uids and gids if they need to. We have implemented all
relevant helpers in earlier patches.

As requested we simply extend the exisiting inode method instead of
introducing new ones. This is a little more code churn but it's mostly
mechanical and doesnt't leave us with additional inode methods.

Link: https://lore.kernel.org/r/20210121131959.646623-25-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24 14:27:20 +01:00
Trond Myklebust 896567ee7f NFS: nfs_igrab_and_active must first reference the superblock
Before referencing the inode, we must ensure that the superblock can be
referenced. Otherwise, we can end up with iput() calling superblock
operations that are no longer valid or accessible.

Fixes: ea7c38fef0 ("NFSv4: Ensure we reference the inode for return-on-close in delegreturn")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-01-10 16:29:28 -05:00
Scott Mayhew c98e9daa59 NFS: Adjust fs_context error logging
Several existing dprink()/dfprintk() calls were converted to use the new
mount API logging macros by commit ce8866f091 ("NFS: Attach
supplementary error information to fs_context").  If the fs_context was
not created using fsopen() then it will not have had a log buffer
allocated for it, and the new mount API logging macros will wind up
calling printk().

This can result in syslog messages being logged where previously there
were none... most notably "NFS4: Couldn't follow remote path", which can
happen if the client is auto-negotiating a protocol version with an NFS
server that doesn't support the higher v4.x versions.

Convert the nfs_errorf(), nfs_invalf(), and nfs_warnf() macros to check
for the existence of the fs_context's log buffer and call dprintk() if
it doesn't exist.  Add nfs_ferrorf(), nfs_finvalf(), and nfs_warnf(),
which do the same thing but take an NFS debug flag as an argument and
call dfprintk().  Finally, modify the "NFS4: Couldn't follow remote
path" message to use nfs_ferrorf().

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=207385
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Fixes: ce8866f091 ("NFS: Attach supplementary error information to fs_context.")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-01-10 13:32:39 -05:00
Trond Myklebust 1a34c8c9a4 NFS: Support larger readdir buffers
Support readdir buffers of up to 1MB in size so that we can read
large directories using few RPC calls.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Dave Wysochanski <dwysocha@redhat.com>
2020-12-02 14:05:52 -05:00
NeilBrown 8d92890bd6 mm/writeback: discard NR_UNSTABLE_NFS, use NR_WRITEBACK instead
After an NFS page has been written it is considered "unstable" until a
COMMIT request succeeds.  If the COMMIT fails, the page will be
re-written.

These "unstable" pages are currently accounted as "reclaimable", either
in WB_RECLAIMABLE, or in NR_UNSTABLE_NFS which is included in a
'reclaimable' count.  This might have made sense when sending the COMMIT
required a separate action by the VFS/MM (e.g.  releasepage() used to
send a COMMIT).  However now that all writes generated by ->writepages()
will automatically be followed by a COMMIT (since commit 919e3bd9a8
("NFS: Ensure we commit after writeback is complete")) it makes more
sense to treat them as writeback pages.

So this patch removes NR_UNSTABLE_NFS and accounts unstable pages in
NR_WRITEBACK and WB_WRITEBACK.

A particular effect of this change is that when
wb_check_background_flush() calls wb_over_bg_threshold(), the latter
will report 'true' a lot less often as the 'unstable' pages are no
longer considered 'dirty' (as there is nothing that writeback can do
about them anyway).

Currently wb_check_background_flush() will trigger writeback to NFS even
when there are relatively few dirty pages (if there are lots of unstable
pages), this can result in small writes going to the server (10s of
Kilobytes rather than a Megabyte) which hurts throughput.  With this
patch, there are fewer writes which are each larger on average.

Where the NR_UNSTABLE_NFS count was included in statistics
virtual-files, the entry is retained, but the value is hard-coded as
zero.  static trace points and warning printks which mentioned this
counter no longer report it.

[akpm@linux-foundation.org: re-layout comment]
[akpm@linux-foundation.org: fix printk warning]
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Acked-by: Michal Hocko <mhocko@suse.com>	[mm]
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Link: http://lkml.kernel.org/r/87d06j7gqa.fsf@notabene.neil.brown.name
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02 10:59:08 -07:00
Trond Myklebust 377840ee48 NFS: Remove the redundant function nfs_pgio_has_mirroring()
We need to trust that desc->pg_mirror_idx is set correctly, whether
or not mirroring is enabled.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-01 13:37:56 -04:00