JIRA: https://issues.redhat.com/browse/RHEL-59704
commit b326df4a8ec6ef53e2e2f1c2cbf14f8a20e85baa
Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date: Tue Feb 13 13:31:47 2024 -0500
NFS: enable nconnect for RDMA
It appears that in certain cases, RDMA capable transports can benefit
from the ability to establish multiple connections to increase their
throughput. This patch therefore enables the use of the "nconnect" mount
option for those use cases.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-53004
commit 4840c00003a2275668a13b82c9f5b1aed80183aa
Author: Olga Kornievskaia <kolga@netapp.com>
Date: Mon Jun 24 09:28:27 2024 -0400
NFSv4.1 another fix for EXCHGID4_FLAG_USE_PNFS_DS for DS server
Previously in order to mark the communication with the DS server,
we tried to use NFS_CS_DS in cl_flags. However, this flag would
only be saved for the DS server and in case where DS equals MDS,
the client would not find a matching nfs_client in nfs_match_client
that represents the MDS (but is also a DS).
Instead, don't rely on the NFS_CS_DS but instead use NFS_CS_PNFS.
Fixes: 379e4adfddd6 ("NFSv4.1: fixup use EXCHGID4_FLAG_USE_PNFS_DS for DS server")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-7936
commit 806a3bc421a115fbb287c1efce63a48c54ee804b
Author: Olga Kornievskaia <kolga@netapp.com>
Date: Wed Aug 30 15:29:34 2023 -0400
NFSv4.1: fix pnfs MDS=DS session trunking
Currently, when GETDEVICEINFO returns multiple locations where each
is a different IP but the server's identity is same as MDS, then
nfs4_set_ds_client() finds the existing nfs_client structure which
has the MDS's max_connect value (and if it's 1), then the 1st IP
on the DS's list will get dropped due to MDS trunking rules. Other
IPs would be added as they fall under the pnfs trunking rules.
For the list of IPs the 1st goes thru calling nfs4_set_ds_client()
which will eventually call nfs4_add_trunk() and call into
rpc_clnt_test_and_add_xprt() which has the check for MDS trunking.
The other IPs (after the 1st one), would call rpc_clnt_add_xprt()
which doesn't go thru that check.
nfs4_add_trunk() is called when MDS trunking is happening and it
needs to enforce the usage of max_connect mount option of the
1st mount. However, this shouldn't be applied to pnfs flow.
Instead, this patch proposed to treat MDS=DS as DS trunking and
make sure that MDS's max_connect limit does not apply to the
1st IP returned in the GETDEVICEINFO list. It does so by
marking the newly created client with a new flag NFS_CS_PNFS
which then used to pass max_connect value to use into the
rpc_clnt_test_and_add_xprt() instead of the existing rpc
client's max_connect value set by the MDS connection.
For example, mount was done without max_connect value set
so MDS's rpc client has cl_max_connect=1. Upon calling into
rpc_clnt_test_and_add_xprt() and using rpc client's value,
the caller passes in max_connect value which is previously
been set in the pnfs path (as a part of handling
GETDEVICEINFO list of IPs) in nfs4_set_ds_client().
However, when NFS_CS_PNFS flag is not set and we know we
are doing MDS trunking, comparing a new IP of the same
server, we then set the max_connect value to the
existing MDS's value and pass that into
rpc_clnt_test_and_add_xprt().
Fixes: dc48e0abee24 ("SUNRPC enforce creation of no more than max_connect xprts")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Jeffrey Layton <jlayton@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-7936
commit 51d674a5e4889f1c8e223ac131cf218e1631e423
Author: Olga Kornievskaia <kolga@netapp.com>
Date: Thu Jul 13 13:02:38 2023 -0400
NFSv4.1: use EXCHGID4_FLAG_USE_PNFS_DS for DS server
After receiving the location(s) of the DS server(s) in the
GETDEVINCEINFO, create the request for the clientid to such
server and indicate that the client is connecting to a DS.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Jeffrey Layton <jlayton@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-7936
commit e13b549319a684dd80c4cc25e9567a5c84007e32
Author: Benjamin Coddington <bcodding@redhat.com>
Date: Thu Jun 15 14:07:27 2023 -0400
NFS: Add sysfs links to sunrpc clients for nfs_clients
For the general and state management nfs_client under each mount, create
symlinks to their respective rpc_client sysfs entries.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Jeffrey Layton <jlayton@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-7936
commit 1c7251187dc067a6d460cf33ca67da9c1dd87807
Author: Benjamin Coddington <bcodding@redhat.com>
Date: Thu Jun 15 14:07:26 2023 -0400
NFS: add superblock sysfs entries
Create a sysfs directory for each mount that corresponds to the mount's
nfs_server struct. As the mount is being constructed, use the name
"server-n", but rename it to the "MAJOR:MINOR" of the mount after assigning
a device_id. The rename approach allows us to populate the mount's directory
with links to the various rpc_client objects during the mount's
construction. The naming convention (MAJOR:MINOR) can be used to reference
a particular NFS mount's sysfs tree.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Jeffrey Layton <jlayton@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-7936
commit c8407f2e560c53c4c73e77cb5604c8a408dbe7f7
Author: Chuck Lever <chuck.lever@oracle.com>
Date: Wed Jun 7 10:00:09 2023 -0400
NFS: Add an "xprtsec=" NFS mount option
After some discussion, we decided that controlling transport layer
security policy should be separate from the setting for the user
authentication flavor. To accomplish this, add a new NFS mount
option to select a transport layer security policy for RPC
operations associated with the mount point.
xprtsec=none - Transport layer security is forced off.
xprtsec=tls - Establish an encryption-only TLS session. If
the initial handshake fails, the mount fails.
If TLS is not available on a reconnect, drop
the connection and try again.
xprtsec=mtls - Both sides authenticate and an encrypted
session is created. If the initial handshake
fails, the mount fails. If TLS is not available
on a reconnect, drop the connection and try
again.
To support client peer authentication (mtls), the handshake daemon
will have configurable default authentication material (certificate
or pre-shared key). In the future, mount options can be added that
can provide this material on a per-mount basis.
Updates to mount.nfs (to support xprtsec=auto) and nfs(5) will be
sent under separate cover.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Jeffrey Layton <jlayton@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-7936
commit 6c0a8c5fcf7158e889dbdd077f67c81984704710
Author: Chuck Lever <chuck.lever@oracle.com>
Date: Wed Jun 7 09:59:42 2023 -0400
NFS: Have struct nfs_client carry a TLS policy field
The new field is used to match struct nfs_clients that have the same
TLS policy setting.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Jeffrey Layton <jlayton@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2183621
commit cf0d7e7f4520814f45e1313872ad5777ed504004
Author: Kees Cook <keescook@chromium.org>
Date: Sun Oct 16 21:36:50 2022 -0700
NFS: Avoid memcpy() run-time warning for struct sockaddr overflows
The 'nfs_server' and 'mount_server' structures include a union of
'struct sockaddr' (with the older 16 bytes max address size) and
'struct sockaddr_storage' which is large enough to hold all the
supported sa_family types (128 bytes max size). The runtime memcpy()
buffer overflow checker is seeing attempts to write beyond the 16
bytes as an overflow, but the actual expected size is that of 'struct
sockaddr_storage'. Plumb the use of 'struct sockaddr_storage' more
completely through-out NFS, which results in adjusting the memcpy()
buffers to the correct union members. Avoids this false positive run-time
warning under CONFIG_FORTIFY_SOURCE:
memcpy: detected field-spanning write (size 28) of single field "&ctx->nfs_server.address" at fs/nfs/namespace.c:178 (size 16)
Reported-by: kernel test robot <yujie.liu@intel.com>
Link: https://lore.kernel.org/all/202210110948.26b43120-yujie.liu@intel.com
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: linux-nfs@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2183621
commit 0dd7439f382518e9997cfa7ca9d06799dbeb33fa
Author: Wolfram Sang <wsa+renesas@sang-engineering.com>
Date: Thu Aug 18 23:01:15 2022 +0200
NFS: move from strlcpy with unused retval to strscpy
Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.
Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2107347
commit 940261a195080cf1cdcd56948d363fe363b69da1
Author: Anna Schumaker <Anna.Schumaker@Netapp.com>
Date: Fri Jun 17 16:23:36 2022 -0400
NFS: Allow setting rsize / wsize to a multiple of PAGE_SIZE
Previously, we required this to value to be a power of 2 for UDP related
reasons. This patch keeps the power of 2 rule for UDP but allows more
flexibility for TCP and RDMA.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2094072
commit fbd2057e5329d3502a27491190237b6be52a1cb6
Author: Xiaoke Wang <xkernel.wang@foxmail.com>
Date: Fri Dec 17 01:01:33 2021 +0800
nfs: nfs4clinet: check the return value of kstrdup()
kstrdup() returns NULL when some internal memory errors happen, it is
better to check the return value of it so to catch the memory error in
time.
Signed-off-by: Xiaoke Wang <xkernel.wang@foxmail.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2049200
commit 4d4cf8d2d6ccb43c68bc5925dc83500b81b50f9e
Author: Anna Schumaker <Anna.Schumaker@Netapp.com>
Date: Thu Oct 14 13:55:06 2021 -0400
NFS: Replace calls to nfs_probe_fsinfo() with nfs_probe_server()
Clean up. There are a few places where we want to probe the server, but
don't actually care about the fsinfo result. Change these to use
nfs_probe_server(), which handles the fattr allocation for us.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2049200
commit e5731131fb6fefaa69064ca511b7c4971d6cf54f
Author: Anna Schumaker <Anna.Schumaker@Netapp.com>
Date: Thu Oct 14 13:55:05 2021 -0400
NFS: Move nfs_probe_destination() into the generic client
And rename it to nfs_probe_server(). I also change it to take the nfs_fh
as an argument so callers can choose what filehandle to probe.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2049200
commit 01dde76e471229e3437a2686c572f4980b2c483e
Author: Anna Schumaker <Anna.Schumaker@Netapp.com>
Date: Thu Oct 14 13:55:04 2021 -0400
NFS: Create an nfs4_server_set_init_caps() function
And call it before doing an FSINFO probe to reset to the baseline
capabilities before probing.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2049200
commit 2a7a451a9084877a0b9d335c77d57e4cda1e5882
Author: Olga Kornievskaia <kolga@netapp.com>
Date: Fri Aug 27 14:37:19 2021 -0400
NFSv4.1 add network transport when session trunking is detected
After trunking is discovered in nfs4_discover_server_trunking(),
add the transport to the old client structure if the allowed limit
of transports has not been reached.
An example: there exists a multi-homed server and client mounts
one server address and some volume and then doest another mount to
a different address of the same server and perhaps a different
volume. Previously, the client checks that this is a session
trunkable servers (same server), and removes the newly created
client structure along with its transport. Now, the client
adds the connection from the 2nd mount into the xprt switch of
the existing client (it leads to having 2 available connections).
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2049200
commit 7e134205f62955369619021a695cd78fefd32451
Author: Olga Kornievskaia <kolga@netapp.com>
Date: Fri Aug 27 14:37:17 2021 -0400
NFSv4 introduce max_connect mount options
This option will control up to how many xprts can the client
establish to the server with a distinct address (that means
nconnect connections are not counted towards this new limit).
This patch is setting up nfs structures to keeep track of the
max_connect limit (does not enforce it).
The default value is kept at 1 so that no current mounts that
don't want any additional connections would be effected. The
maximum value is set at 16.
Mounts to DS are not limited to default value of 1 but instead
set to the maximum default value of 16 (NFS_MAX_TRANSPORTS).
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Set up the connection to the NFSv4 server in nfs4_alloc_client(), before
we've added the struct nfs_client to the net-namespace's nfs_client_list
so that a downed server won't cause other mounts to hang in the trunking
detection code.
Reported-by: Michael Wakabayashi <mwakabayashi@vmware.com>
Fixes: 5c6e5b60aa ("NFS: Fix an Oops in the pNFS files and flexfiles connection setup to the DS")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
KASAN reports a use-after-free when attempting to mount two different
exports through two different NICs that belong to the same server.
Olga was able to hit this with kernels starting somewhere between 5.7
and 5.10, but I traced the patch that introduced the clear_bit() call to
4.13. So something must have changed in the refcounting of the clp
pointer to make this call to nfs_put_client() the very last one.
Fixes: 8dcbec6d20 ("NFSv41: Handle EXCHID4_FLAG_CONFIRMED_R during NFSv4.1 migration")
Cc: stable@vger.kernel.org # 4.13+
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple
warnings by explicitly add multiple break/goto/return/fallthrough
statements instead of just letting the code fall through to the next
case.
Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
In several patches work has been done to enable NFSv4 to use user
namespaces:
58002399da65: NFSv4: Convert the NFS client idmapper to use the container user namespace
3b7eb5e35d0f: NFS: When mounting, don't share filesystems between different user namespaces
Unfortunately, the userspace APIs were only such that the userspace facing
side of the filesystem (superblock s_user_ns) could be set to a non init
user namespace. This furthers the fs_context related refactoring, and
piggybacks on top of that logic, so the superblock user namespace, and the
NFS user namespace are the same.
Users can still use rpc.idmapd if they choose to, but there are complexities
with user namespaces and request-key that have yet to be addresssed.
Eventually, we will need to at least:
* Come up with an upcall mechanism that can be triggered inside of the container,
or safely triggered outside, with the requisite context to do the right
mapping. * Handle whatever refactoring needs to be done in net/sunrpc.
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Tested-by: Alban Crequy <alban.crequy@gmail.com>
Fixes: 62a55d088c ("NFS: Additional refactoring for fs_context conversion")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
This patch adds client support for decoding a single NFS4_CONTENT_DATA
segment returned by the server. This is the simplest implementation
possible, since it does not account for any hole segments in the reply.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
It looks like this "else" is just a typo. It turns off nconnect for
NFSv4.0 even though it works for every other version.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Set limits for extended attributes (attribute value size and listxattr
buffer size), based on the fs-independent limits (XATTR_*_MAX).
Define the maximum XDR sizes for the RFC 8276 XATTR operations.
In the case of operations that carry a larger payload (SETXATTR,
GETXATTR, LISTXATTR), these exclude that payload, which is added
as separate pages, like other operations do.
Define, much like for read and write operations, the maximum overhead
sizes for get/set/listxattr, and use them to limit the maximum payload
size for those operations, in combination with the channel attributes.
Signed-off-by: Frank van der Linden <fllinden@amazon.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
An NFS client that mounts multiple exports from the same NFS
server with higher NFSv4 versions disabled (i.e. 4.2) and without
forcing a specific NFS version results in fscache index cookie
collisions and the following messages:
[ 570.004348] FS-Cache: Duplicate cookie detected
Each nfs_client structure should have its own fscache index cookie,
so add the minorversion to nfs_server_key.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200145
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Split out from commit "NFS: Add fs_context support."
This patch adds additional refactoring for the conversion of NFS to use
fs_context, namely:
(*) Merge nfs_mount_info and nfs_clone_mount into nfs_fs_context.
nfs_clone_mount has had several fields removed, and nfs_mount_info
has been removed altogether.
(*) Various functions now take an fs_context as an argument instead
of nfs_mount_info, nfs_fs_context, etc.
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Rename struct nfs_parsed_mount_data to struct nfs_fs_context and rename
pointers to it to "ctx". At some point this will be pointed to by an
fs_context struct's fs_private pointer.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
pick it from mount_info
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
NFSv2, v3 and NFSv4 servers often have duplicate replay caches that look
at the source port when deciding whether or not an RPC call is a replay
of a previous call. This requires clients to perform strange TCP gymnastics
in order to ensure that when they reconnect to the server, they bind
to the same source port.
NFSv4.1 and NFSv4.2 have sessions that provide proper replay semantics,
that do not look at the source port of the connection. This patch therefore
ensures they can ignore the rebind requirement.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Try using the delegation stateid, then the open stateid.
Only NL4_NETATTR, No support for NL4_NAME and NL4_URL.
Allow only one source server address to be returned for now.
To distinguish between same server copy offload ("intra") and
a copy between different server ("inter"), do a check of server
owner identity and also make sure server is capable of doing
a copy offload.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
John Hubbard reports seeing the following stack trace:
nfs4_do_reclaim
rcu_read_lock /* we are now in_atomic() and must not sleep */
nfs4_purge_state_owners
nfs4_free_state_owner
nfs4_destroy_seqid_counter
rpc_destroy_wait_queue
cancel_delayed_work_sync
__cancel_work_timer
__flush_work
start_flush_work
might_sleep:
(kernel/workqueue.c:2975: BUG)
The solution is to separate out the freeing of the state owners
from nfs4_purge_state_owners(), and perform that outside the atomic
context.
Reported-by: John Hubbard <jhubbard@nvidia.com>
Fixes: 0aaaf5c424 ("NFS: Cache state owners after files are closed")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If the user specifies -onconnect=<number> mount option, and the transport
protocol is TCP, then set up <number> connections to the pNFS data server
as well. The connections will all go to the same IP address.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
If the user specifies the -onconn=<number> mount option, and the transport
protocol is TCP, then set up <number> connections to the server. The
connections will all go to the same IP address.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Add SPDX license identifiers to all files which:
- Have no license information of any form
- Have EXPORT_.*_SYMBOL_GPL inside which was used in the
initial scan/conversion to ignore the file
These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:
GPL-2.0-only
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Store the credential of the mount process so that we can determine
information such as the user namespace.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Drop LIST_HEAD where the variable it declares has never
been used.
The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
identifier x;
@@
- LIST_HEAD(x);
... when != x
// </smpl>
Fixes: 0e20162ed1 ("NFSv4.1 Use MDS auth flavor for data server connection")
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Fix up some compiler warnings about function parameters, etc not being
correctly described or formatted.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
SUNRPC has two sorts of credentials, both of which appear as
"struct rpc_cred".
There are "generic credentials" which are supplied by clients
such as NFS and passed in 'struct rpc_message' to indicate
which user should be used to authorize the request, and there
are low-level credentials such as AUTH_NULL, AUTH_UNIX, AUTH_GSS
which describe the credential to be sent over the wires.
This patch replaces all the generic credentials by 'struct cred'
pointers - the credential structure used throughout Linux.
For machine credentials, there is a special 'struct cred *' pointer
which is statically allocated and recognized where needed as
having a special meaning. A look-up of a low-level cred will
map this to a machine credential.
Signed-off-by: NeilBrown <neilb@suse.com>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
The intention of nfs4_session_set_rwsize() was to cap the r/wsize to the
buffer sizes negotiated by the CREATE_SESSION. The initial code had a
bug whereby we would not check the values negotiated by nfs_probe_fsinfo()
(the assumption being that CREATE_SESSION will always negotiate buffer values
that are sane w.r.t. the server's preferred r/wsizes) but would only check
values set by the user in the 'mount' command.
The code was changed in 4.11 to _always_ set the r/wsize, meaning that we
now never use the server preferred r/wsizes. This is the regression that
this patch fixes.
Also rename the function to nfs4_session_limit_rwsize() in order to avoid
future confusion.
Fixes: 033853325f (NFSv4.1 respect server's max size in CREATE_SESSION")
Cc: stable@vger.kernel.org # v4.11+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
It's possible that server replies back with CB_OFFLOAD call and
COPY reply at the same time such that client will process
CB_OFFLOAD before reply to COPY. For that keep a list of pending
callback stateids received and then before waiting on completion
check the pending list.
Cleanup any pending copies on the client shutdown.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Commit 530ea42192 ("nfs: Referrals should use the same proto setting
as their parent") encloses the fix with #ifdef CONFIG_SUNRPC_XPRT_RDMA.
CONFIG_SUNRPC_XPRT_RDMA is a tristate option, so it should be tested
with #if IS_ENABLED().
Fixes: 530ea42192 ("nfs: Referrals should use the same proto setting as their parent")
Reported-by: Helen Chao <helen.chao@oracle.com>
Tested-by: Helen Chao <helen.chao@oracle.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Bill Baker <bill.baker@oracle.com>
Signed-off-by: Calum Mackay <calum.mackay@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
nfs4_update_server unconditionally releases the nfs_client for the
source server. If migration fails, this can cause the source server's
nfs_client struct to be left with a low reference count, resulting in
use-after-free. Also, adjust reference count handling for ELOOP.
NFS: state manager: migration failed on NFSv4 server nfsvmu10 with error 6
WARNING: CPU: 16 PID: 17960 at fs/nfs/client.c:281 nfs_put_client+0xfa/0x110 [nfs]()
nfs_put_client+0xfa/0x110 [nfs]
nfs4_run_state_manager+0x30/0x40 [nfsv4]
kthread+0xd8/0xf0
BUG: unable to handle kernel NULL pointer dereference at 00000000000002a8
nfs4_xdr_enc_write+0x6b/0x160 [nfsv4]
rpcauth_wrap_req+0xac/0xf0 [sunrpc]
call_transmit+0x18c/0x2c0 [sunrpc]
__rpc_execute+0xa6/0x490 [sunrpc]
rpc_async_schedule+0x15/0x20 [sunrpc]
process_one_work+0x160/0x470
worker_thread+0x112/0x540
? rescuer_thread+0x3f0/0x3f0
kthread+0xd8/0xf0
This bug was introduced by 32e62b7c ("NFS: Add nfs4_update_server"),
but the fix applies cleanly to 52442f9b ("NFS4: Avoid migration loops")
Reported-by: Helen Chao <helen.chao@oracle.com>
Fixes: 52442f9b11 ("NFS4: Avoid migration loops")
Signed-off-by: Bill Baker <bill.baker@oracle.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
After traversing a referral or recovering from a migration event,
ensure that the server port reported in /proc/mounts is updated
to the correct port setting for the new submount.
Reported-by: Helen Chao <helen.chao@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Helen Chao <helen.chao@oracle.com> noticed that when a user
traverses a referral on an NFS/RDMA mount, the resulting submount
always uses TCP.
This behavior does not match the vers= setting when traversing
a referral (vers=4.1 is preserved). It also does not match the
behavior of crossing from the pseudofs into a real filesystem
(proto=rdma is preserved in that case).
The Linux NFS client does not currently support the
fs_locations_info attribute. The situation is similar for all
NFSv4 servers I know of. Therefore until the community has broad
support for fs_locations_info, when following a referral:
- First try to connect with RPC-over-RDMA. This will fail quickly
if the client has no RDMA-capable interfaces.
- If connecting with RPC-over-RDMA fails, or the RPC-over-RDMA
transport is not available, use TCP.
Reported-by: Helen Chao <helen.chao@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
The following deadlock can occur between a process waiting for a client
to initialize in while walking the client list during nfsv4 server trunking
detection and another process waiting for the nfs_clid_init_mutex so it
can initialize that client:
Process 1 Process 2
--------- ---------
spin_lock(&nn->nfs_client_lock);
list_add_tail(&CLIENTA->cl_share_link,
&nn->nfs_client_list);
spin_unlock(&nn->nfs_client_lock);
spin_lock(&nn->nfs_client_lock);
list_add_tail(&CLIENTB->cl_share_link,
&nn->nfs_client_list);
spin_unlock(&nn->nfs_client_lock);
mutex_lock(&nfs_clid_init_mutex);
nfs41_walk_client_list(clp, result, cred);
nfs_wait_client_init_complete(CLIENTA);
(waiting for nfs_clid_init_mutex)
Make sure nfs_match_client() only evaluates clients that have completed
initialization in order to prevent that deadlock.
This patch also fixes v4.0 trunking behavior by not marking the client
NFS_CS_READY until the clientid has been confirmed.
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Bool initializations should use true and false. Bool tests don't need
comparisons.
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)
Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.
The variable nfs_client.cl_count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>