JIRA: https://issues.redhat.com/browse/RHEL-64867
Conflicts: RHEL is missing commit 1ded5e5a5931 ("net: annotate
data-races around sock->ops"), which accounts for the differences in
ops structure dereferencing.
commit 0645fbe760afcc5332c858d1cbf416bf77ef3c29
Author: Jens Axboe <axboe@kernel.dk>
Date: Thu May 9 09:31:05 2024 -0600
net: have do_accept() take a struct proto_accept_arg argument
In preparation for passing in more information via this API, change
do_accept() to take a proto_accept_arg struct pointer rather than just
the file flags separately.
No functional changes in this patch.
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-64867
commit c04245328dd7e915e21ac6395ffd218616e22754
Author: Yajun Deng <yajun.deng@linux.dev>
Date: Fri Jun 10 17:10:17 2022 +0800
net: make __sys_accept4_file() static
__sys_accept4_file() isn't used outside of the file, make it static.
As the same time, move file_flags and nofile parameters into
__sys_accept4_file().
Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-64867
Conflicts: RHEL is missing commit 1ded5e5a5931 ("net: annotate
data-races around sock->ops"), which accounts for the differences in
ops structure dereferencing.
commit 92ef0fd55ac80dfc2e4654edfe5d1ddfa6e070fe
Author: Jens Axboe <axboe@kernel.dk>
Date: Thu May 9 09:20:08 2024 -0600
net: change proto and proto_ops accept type
Rather than pass in flags, error pointer, and whether this is a kernel
invocation or not, add a struct proto_accept_arg struct as the argument.
This then holds all of these arguments, and prepares accept for being
able to pass back more information.
No functional changes in this patch.
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-64867
commit 8c9a6f549e65912825e31dc1e0e3f7995984649d
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Tue Apr 9 14:05:53 2024 -0700
io_uring: separate header for exported net bits
We're exporting some io_uring bits to networking, e.g. for implementing
a net callback for io_uring cmds, but we don't want to expose more than
needed. Add a separate header for networking.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David Wei <dw@davidwei.uk>
Link: https://lore.kernel.org/r/20240409210554.1878789-1-dw@davidwei.uk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5775
JIRA: https://issues.redhat.com/browse/RHEL-66687
Upstream Status: linux.git commit 631083143315d1b192bd7d915b967b37819e88ea
CVE: CVE-2024-50186
commit 631083143315d1b192bd7d915b967b37819e88ea
Author: Ignat Korchagin <ignat@cloudflare.com>
Date: Thu Oct 3 18:01:51 2024 +0100
net: explicitly clear the sk pointer, when pf->create fails
We have recently noticed the exact same KASAN splat as in commit
6cd4a78d962b ("net: do not leave a dangling sk pointer, when socket
creation fails"). The problem is that commit did not fully address the
problem, as some pf->create implementations do not use sk_common_release
in their error paths.
For example, we can use the same reproducer as in the above commit, but
changing ping to arping. arping uses AF_PACKET socket and if packet_create
fails, it will just sk_free the allocated sk object.
While we could chase all the pf->create implementations and make sure they
NULL the freed sk object on error from the socket, we can't guarantee
future protocols will not make the same mistake.
So it is easier to just explicitly NULL the sk pointer upon return from
pf->create in __sock_create. We do know that pf->create always releases the
allocated sk object on error, so if the pointer is not NULL, it is
definitely dangling.
Fixes: 6cd4a78d962b ("net: do not leave a dangling sk pointer, when socket creation fails")
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Cc: stable@vger.kernel.org
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241003170151.69445-1-ignat@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Approved-by: Florian Westphal <fwestpha@redhat.com>
Approved-by: Davide Caratti <dcaratti@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>
Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-66687
Upstream Status: linux.git commit 631083143315d1b192bd7d915b967b37819e88ea
CVE: CVE-2024-50186
commit 631083143315d1b192bd7d915b967b37819e88ea
Author: Ignat Korchagin <ignat@cloudflare.com>
Date: Thu Oct 3 18:01:51 2024 +0100
net: explicitly clear the sk pointer, when pf->create fails
We have recently noticed the exact same KASAN splat as in commit
6cd4a78d962b ("net: do not leave a dangling sk pointer, when socket
creation fails"). The problem is that commit did not fully address the
problem, as some pf->create implementations do not use sk_common_release
in their error paths.
For example, we can use the same reproducer as in the above commit, but
changing ping to arping. arping uses AF_PACKET socket and if packet_create
fails, it will just sk_free the allocated sk object.
While we could chase all the pf->create implementations and make sure they
NULL the freed sk object on error from the socket, we can't guarantee
future protocols will not make the same mistake.
So it is easier to just explicitly NULL the sk pointer upon return from
pf->create in __sock_create. We do know that pf->create always releases the
allocated sk object on error, so if the pointer is not NULL, it is
definitely dangling.
Fixes: 6cd4a78d962b ("net: do not leave a dangling sk pointer, when socket creation fails")
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Cc: stable@vger.kernel.org
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241003170151.69445-1-ignat@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-65205
commit 33f339a1ba54e56bba57ee9a77c71e385ab4825c
Author: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
Date: Fri Aug 30 16:25:17 2024 +0800
bpf, net: Fix a potential race in do_sock_getsockopt()
There's a potential race when `cgroup_bpf_enabled(CGROUP_GETSOCKOPT)` is
false during the execution of `BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN`, but
becomes true when `BPF_CGROUP_RUN_PROG_GETSOCKOPT` is called.
This inconsistency can lead to `BPF_CGROUP_RUN_PROG_GETSOCKOPT` receiving
an "-EFAULT" from `__cgroup_bpf_run_filter_getsockopt(max_optlen=0)`.
Scenario shown as below:
`process A` `process B`
----------- ------------
BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN
enable CGROUP_GETSOCKOPT
BPF_CGROUP_RUN_PROG_GETSOCKOPT (-EFAULT)
To resolve this, remove the `BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN` macro and
directly uses `copy_from_sockptr` to ensure that `max_optlen` is always
set before `BPF_CGROUP_RUN_PROG_GETSOCKOPT` is invoked.
Fixes: 0d01da6afc ("bpf: implement getsockopt and setsockopt hooks")
Co-developed-by: Yanghui Li <yanghui.li@mediatek.com>
Signed-off-by: Yanghui Li <yanghui.li@mediatek.com>
Co-developed-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com>
Signed-off-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com>
Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Link: https://patch.msgid.link/20240830082518.23243-1-Tze-nan.Wu@mediatek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Felix Maurer <fmaurer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus
Conflicts: The cifs source has been moved in CentOS Stream so manually
apply rejected hunk to fs/smb/client/xattr.c.
Dropped hunks for ntfs3 because the source is not present in
the CentOS Stream source tree.
CentOS Stream commit 98ba731fc7 ("ovl: Move xattr support
to new xattrs.c file") moved ovl_own_xattr_set(), manually apply
changes.
CentOS Stream commit 67e2fcb2f3 ("evm: don't copy up
'security.evm' xattr") is present causing hunk #1 against
include/linux/evm.h to be rejected, manually apply.
Upstream commit 5d1ef2ce13a90 ("ima: Introduce
ima_get_current_hash_algo()") is not present in CentOS Stream
which causes fuzz 1 for hunk #1 against include/linux/ima.h.
There's a reject of hunk #1 for include/linux/lsm_hooks.h but
I can't see any reason for it, manually applied the hunk.
CentOS Stream does not have upstream commit ce5bb5a86e5eb
("ima: Return int in the functions to measure a buffer") which
results in a reject of hunk #2 against security/integrity/ima/ima.h
and hunks #8 and #11 against security/integrity/ima/ima_main.c, so
manually apply hunks. There also appears to be a whitespace
mismatch causing hunk #7 to report fuzz 2 on application.
CentOS Stream does not have upstream commit c7423dbdbc9ec
("ima: Handle -ESTALE returned by ima_filter_rule_match()")
which results in a reject of hunk #3 against
security/integrity/ima/ima_policy.c, so manually apply hunk.
commit 39f60c1ccee72caa0104145b5dbf5d37cce1ea39
Author: Christian Brauner <brauner@kernel.org>
Date: Fri Jan 13 12:49:23 2023 +0100
fs: port xattr to mnt_idmap
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Ian Kent <ikent@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus
Conflicts: CentOS Stream commit 3c29fadfb1 ("afs: split
afs_pagecache_valid() out of afs_validate()") is present, manually
adjust hunk #1 of fs/afs/internal.h.
For consistency drop btrfs hunks because it isn't supported in
CentOS Stream and other backports also drop such hunks.
CentOS Stream commit 48fa94aacd ("ceph: fscrypt_auth handling
for ceph") alters the definition of _ceph_setattr(), adjust
manually.
CentOS Stream commit 34b2a2b5a3 {"ceph: add some fscrypt
guardrails") introduces a call to fscrypt_prepare_setattr() which
causes fuzz when applying.
The cifs source has been moved in CentOS Stream so manually
apply rejected hunks to fs/smb/client/cifsfs.h and
fs/smb/client/inode.c.
Upstream commit 5a646fb3a3e2d ("coda: avoid doing bad things on
inode type changes during revalidation") is not present which
causes fuzz in fs/coda/coda_linux.h.
Dropped hunks for ntfs3 because the source is not present in
the CentOS Stream source tree.
CentOS Stream commit 98ba731fc7 ("ovl: Move xattr support
to new xattrs.c file") is presnt so manually apply hunk.
CentOS Stream commit 892da692fa ("shmem: support idmapped
mounts for tmpfs") is present so it's ok to pass idmap to
setattr_prepare() and setattr_copy().
Update to add incremental changes needed due to CentOS Stream
commit 469e1d13f6 ("shmem: quota support").
Allow for CentOS Stream commit 6c3396a0d8 ("kernfs: Introduce
separate rwsem to protect inode attributes") which is already
present.
CentOS Stream commit f5219db0c0 ("KVM: fix Add KVM_CREATE_GUEST_MEMFD
ioctl() for guest-specific backing memory") updated the upstream commit
a7800aa80ea4d ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific
backing memory") to account for missing idmapping commits. Now we have
updated one of the two places these changes were made make one of the
needed adjustments to match the original upstream patch.
commit c1632a0f11209338fc300c66252bcc4686e609e8
Author: Christian Brauner <brauner@kernel.org>
Date: Fri Jan 13 12:49:11 2023 +0100
fs: port ->setattr() to pass mnt_idmap
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Ian Kent <ikent@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27742
This patch is a backport of the following upstream commit:
commit 67178fd066d53d5a6cada1cdc6399d02b68b708e
Author: David Howells <dhowells@redhat.com>
Date: Mon May 22 14:50:00 2023 +0100
net: Make sock_splice_read() use copy_splice_read() by default
Make sock_splice_read() use copy_splice_read() by default as
file_splice_read() will return immediately with 0 as a socket has no
pagecache and is a zero-size file.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: netdev@vger.kernel.org
cc: linux-block@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/20230522135018.2742245-14-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Rafael Aquini <raquini@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27755
Conflicts: RHEL does not include commit 1ded5e5a5931 ("net: annotate
data-races around sock->ops"), which converts proto_ops to a const
accessed with READ_ONCE. Fix up the patch to apply, but keep the
READ_ONCE from 1ded5e5a5931.
commit 0b05b0cd78c92371fdde6333d006f39eaf9e0860
Author: Breno Leitao <leitao@debian.org>
Date: Mon Oct 16 06:47:42 2023 -0700
net/socket: Break down __sys_getsockopt
Split __sys_getsockopt() into two functions by removing the core
logic into a sub-function (do_sock_getsockopt()). This will avoid
code duplication when doing the same operation in other callers, for
instance.
do_sock_getsockopt() will be called by io_uring getsockopt() command
operation in the following patch.
The same was done for the setsockopt pair.
Suggested-by: Martin KaFai Lau <martin.lau@linux.dev>
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20231016134750.1381153-5-leitao@debian.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27755
Conflicts: RHEL does not have commit 1ded5e5a5931 ("net: annotate
data-races around sock->ops"), which leads to context differences in
the patch.
commit 1406245c29454ff84919736be83e14cdaba7fec1
Author: Breno Leitao <leitao@debian.org>
Date: Mon Oct 16 06:47:41 2023 -0700
net/socket: Break down __sys_setsockopt
Split __sys_setsockopt() into two functions by removing the core
logic into a sub-function (do_sock_setsockopt()). This will avoid
code duplication when doing the same operation in other callers, for
instance.
do_sock_setsockopt() will be called by io_uring setsockopt() command
operation in the following patch.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20231016134750.1381153-4-leitao@debian.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27755
commit 3f31e0d14d44ad491a81b7c1f83f32fbc300a867
Author: Breno Leitao <leitao@debian.org>
Date: Mon Oct 16 06:47:40 2023 -0700
bpf: Add sockptr support for setsockopt
The whole network stack uses sockptr, and while it doesn't move to
something more modern, let's use sockptr in setsockptr BPF hooks, so, it
could be used by other callers.
The main motivation for this change is to use it in the io_uring
{g,s}etsockopt(), which will use a userspace pointer for *optval, but, a
kernel value for optlen.
Link: https://lore.kernel.org/all/ZSArfLaaGcfd8LH8@gmail.com/
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20231016134750.1381153-3-leitao@debian.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27755
commit a615f67e1a426f35366b8398c11f31c148e7df48
Author: Breno Leitao <leitao@debian.org>
Date: Mon Oct 16 06:47:39 2023 -0700
bpf: Add sockptr support for getsockopt
The whole network stack uses sockptr, and while it doesn't move to
something more modern, let's use sockptr in getsockptr BPF hooks, so, it
could be used by other callers.
The main motivation for this change is to use it in the io_uring
{g,s}etsockopt(), which will use a userspace pointer for *optval, but, a
kernel value for optlen.
Link: https://lore.kernel.org/all/ZSArfLaaGcfd8LH8@gmail.com/
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20231016134750.1381153-2-leitao@debian.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33410
JIRA: https://issues.redhat.com/browse/RHEL-30875
Upstream Status: net.git commit 0bdf399342c5acbd817c9098b6c7ed21f1974312
Conflicts:
- net/socket.c: don't use READ_ONCE() as we don't have upstream commit
1ded5e5a5931 ("net: annotate data-races around sock->ops")
commit 0bdf399342c5acbd817c9098b6c7ed21f1974312
Author: Jordan Rife <jrife@google.com>
Date: Mon Aug 21 16:45:23 2023 -0500
net: Avoid address overwrite in kernel_connect
BPF programs that run on connect can rewrite the connect address. For
the connect system call this isn't a problem, because a copy of the address
is made when it is moved into kernel space. However, kernel_connect
simply passes through the address it is given, so the caller may observe
its address value unexpectedly change.
A practical example where this is problematic is where NFS is combined
with a system such as Cilium which implements BPF-based load balancing.
A common pattern in software-defined storage systems is to have an NFS
mount that connects to a persistent virtual IP which in turn maps to an
ephemeral server IP. This is usually done to achieve high availability:
if your server goes down you can quickly spin up a replacement and remap
the virtual IP to that endpoint. With BPF-based load balancing, mounts
will forget the virtual IP address when the address rewrite occurs
because a pointer to the only copy of that address is passed down the
stack. Server failover then breaks, because clients have forgotten the
virtual IP address. Reconnects fail and mounts remain broken. This patch
was tested by setting up a scenario like this and ensuring that NFS
reconnects worked after applying the patch.
Signed-off-by: Jordan Rife <jrife@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-32107
Conflicts:
- context conflict due to existing backport of 66f7223039c0c ("net:
add NDOs for configuring hardware timestamping")
commit 97dc7cd92ac67f6e05df418df1772ba4a7fbf693
Author: Gerhard Engleder <gerhard@engleder-embedded.com>
Date: Fri May 6 22:01:40 2022 +0200
ptp: Support late timestamp determination
If a physical clock supports a free running cycle counter, then
timestamps shall be based on this time too. For TX it is known in
advance before the transmission if a timestamp based on the free running
cycle counter is needed. For RX it is impossible to know which timestamp
is needed before the packet is received and assigned to a socket.
Support late timestamp determination by a network device. Therefore, an
address/cookie is stored within the new netdev_data field of struct
skb_shared_hwtstamps. This address/cookie is provided to a new network
device function called ndo_get_tstamp(), which returns a timestamp based
on the normal/adjustable time or based on the free running cycle
counter. If function is not supported, then timestamp handling is not
changed.
This mechanism is intended for RX, but TX use is also possible.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-32107
commit d58809d854c9ee19e4cd41023e137e65e9dc3f94
Author: Gerhard Engleder <gerhard@engleder-embedded.com>
Date: Fri May 6 22:01:39 2022 +0200
ptp: Pass hwtstamp to ptp_convert_timestamp()
ptp_convert_timestamp() converts only the timestamp hwtstamp, which is
a field of the argument with the type struct skb_shared_hwtstamps *. So
a pointer to the hwtstamp field of this structure is sufficient.
Rework ptp_convert_timestamp() to use an argument of type ktime_t *.
This allows to add additional timestamp manipulation stages before the
call of ptp_convert_timestamp().
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-32107
commit 51eb7492af276b5b4d27cfa4474d40bdac7b9cf8
Author: Gerhard Engleder <gerhard@engleder-embedded.com>
Date: Fri May 6 22:01:38 2022 +0200
ptp: Request cycles for TX timestamp
The free running cycle counter of physical clocks called cycles shall be
used for hardware timestamps to enable synchronisation.
Introduce new flag SKBTX_HW_TSTAMP_USE_CYCLES, which signals driver to
provide a TX timestamp based on cycles if cycles are supported.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-23643
commit 15fb6f2b6c4c3c129adc2412ae12ec15e60a6adb
Author: Dave Marchevsky <davemarchevsky@fb.com>
Date: Tue Oct 31 14:56:25 2023 -0700
bpf: Add __bpf_hook_{start,end} macros
Not all uses of __diag_ignore_all(...) in BPF-related code in order to
suppress warnings are wrapping kfunc definitions. Some "hook point"
definitions - small functions meant to be used as attach points for
fentry and similar BPF progs - need to suppress -Wmissing-declarations.
We could use __bpf_kfunc_{start,end}_defs added in the previous patch in
such cases, but this might be confusing to someone unfamiliar with BPF
internals. Instead, this patch adds __bpf_hook_{start,end} macros,
currently having the same effect as __bpf_kfunc_{start,end}_defs, then
uses them to suppress warnings for two hook points in the kernel itself
and some bpf_testmod hook points as well.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Cc: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20231031215625.2343848-2-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Artem Savkov <asavkov@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-21447
Tested: LNST, Tier1
Upstream commit:
commit 01b2885d9415152bcb12ff1f7788f500a74ea0ed
Author: Marc Dionne <marc.dionne@auristor.com>
Date: Thu Dec 21 09:12:30 2023 -0400
net: Save and restore msg_namelen in sock_sendmsg
Commit 86a7e0b69bd5 ("net: prevent rewrite of msg_name in
sock_sendmsg()") made sock_sendmsg save the incoming msg_name pointer
and restore it before returning, to insulate the caller against
msg_name being changed by the called code. If the address length
was also changed however, we may return with an inconsistent structure
where the length doesn't match the address, and attempts to reuse it may
lead to lost packets.
For example, a kernel that doesn't have commit 1c5950fc6fe9 ("udp6: fix
potential access to stale information") will replace a v4 mapped address
with its ipv4 equivalent, and shorten namelen accordingly from 28 to 16.
If the caller attempts to reuse the resulting msg structure, it will have
the original ipv6 (v4 mapped) address but an incorrect v4 length.
Fixes: 86a7e0b69bd5 ("net: prevent rewrite of msg_name in sock_sendmsg()")
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3318
Update io_uring and its dependencies to upstream kernel version 6.6.
JIRA: https://issues.redhat.com/browse/RHEL-12076
JIRA: https://issues.redhat.com/browse/RHEL-14998
JIRA: https://issues.redhat.com/browse/RHEL-4447
CVE: CVE-2023-46862
Omitted-Fix: ab69838e7c75 ("io_uring/kbuf: Fix check of BID wrapping in provided buffers")
Omitted-Fix: f74c746e476b ("io_uring/kbuf: Allow the full buffer id space for provided buffers")
This is the list of new features available (includes upstream kernel versions 6.3-6.6):
User-specified ring buffer
Provided Buffers allocated by the kernel
Ability to register the ring fd
Multi-shot timeouts
ability to pass custom flags to the completion queue entry for ring messages
All of these features are covered by the liburing tests.
In my testing, no-mmap-inval.t failed because of a broken test. socket-uring-cmd.t also failed because of a missing selinux policy rule. Try running audit2allow if you see a failure in that test.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Approved-by: Wander Lairson Costa <wander@redhat.com>
Approved-by: Donald Dutile <ddutile@redhat.com>
Approved-by: Chris von Recklinghausen <crecklin@redhat.com>
Approved-by: Jiri Benc <jbenc@redhat.com>
Approved-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Scott Weaver <scweaver@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-17413
Upstream Status: linux.git
commit 6e6eda44b939c0931533d6681d9f2ed41b44cde9
Author: Yunhui Cui <cuiyunhui@bytedance.com>
Date: Wed Jan 11 14:59:30 2023 +0800
sock: add tracepoint for send recv length
Add 2 tracepoints to monitor the tcp/udp traffic
of per process and per cgroup.
Regarding monitoring the tcp/udp traffic of each process, there are two
existing solutions, the first one is https://www.atoptool.nl/netatop.php.
The second is via kprobe/kretprobe.
Netatop solution is implemented by registering the hook function at the
hook point provided by the netfilter framework.
These hook functions may be in the soft interrupt context and cannot
directly obtain the pid. Some data structures are added to bind packets
and processes. For example, struct taskinfobucket, struct taskinfo ...
Every time the process sends and receives packets it needs multiple
hashmaps,resulting in low performance and it has the problem fo inaccurate
tcp/udp traffic statistics(for example: multiple threads share sockets).
We can obtain the information with kretprobe, but as we know, kprobe gets
the result by trappig in an exception, which loses performance compared
to tracepoint.
We compared the performance of tracepoints with the above two methods, and
the results are as follows:
ab -n 1000000 -c 1000 -r http://127.0.0.1/index.html
without trace:
Time per request: 39.660 [ms] (mean)
Time per request: 0.040 [ms] (mean, across all concurrent requests)
netatop:
Time per request: 50.717 [ms] (mean)
Time per request: 0.051 [ms] (mean, across all concurrent requests)
kr:
Time per request: 43.168 [ms] (mean)
Time per request: 0.043 [ms] (mean, across all concurrent requests)
tracepoint:
Time per request: 41.004 [ms] (mean)
Time per request: 0.041 [ms] (mean, across all concurrent requests
It can be seen that tracepoint has better performance.
Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
Signed-off-by: Xiongchun Duan <duanxiongchun@bytedance.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3305
JIRA: https://issues.redhat.com/browse/RHEL-15036
Tested: LNST, Tier1, selftests, pktdrill
Rebase to the current upstream to bring in new features and
a lot of fixes. A good half of the long commit list touches
the self-tests only, and the remaining is self-contained in mptcp.
The only notable exception is:
tcp: get rid of sysctl_tcp_adv_win_scale
that is a pre requisite to a bunch of mptcp changes included here
and also uncontroversially a good thing (TM) for TCP.
Wider-scope data-races related changeset are included (possibly as
partial backport) only if they help to reduce conflict on later
changes.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Approved-by: Florian Westphal <fwestpha@redhat.com>
Approved-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Jan Stancek <jstancek@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12076
commit 8e9fad0e70b7b62848e0aeb1a873903b9ce4d7c4
Author: Breno Leitao <leitao@debian.org>
Date: Tue Jun 27 06:44:24 2023 -0700
io_uring: Add io_uring command support for sockets
Enable io_uring commands on network sockets. Create two new
SOCKET_URING_OP commands that will operate on sockets.
In order to call ioctl on sockets, use the file_operations->io_uring_cmd
callbacks, and map it to a uring socket function, which handles the
SOCKET_URING_OP accordingly, and calls socket ioctls.
This patches was tested by creating a new test case in liburing.
Link: https://github.com/leitao/liburing/tree/io_uring_cmd
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230627134424.2784797-1-leitao@debian.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12076
commit b841b901c452d92610f739a36e54978453528876
Author: David Howells <dhowells@redhat.com>
Date: Mon May 22 13:11:10 2023 +0100
net: Declare MSG_SPLICE_PAGES internal sendmsg() flag
Declare MSG_SPLICE_PAGES, an internal sendmsg() flag, that hints to a
network protocol that it should splice pages from the source iterator
rather than copying the data if it can. This flag is added to a list that
is cleared by sendmsg syscalls on entry.
This is intended as a replacement for the ->sendpage() op, allowing a way
to splice in several multipage folios in one go.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12076
commit fe34db062b8036f72e97c2b9eaa7e9fbb725ead2
Author: Jens Axboe <axboe@kernel.dk>
Date: Tue May 9 09:19:08 2023 -0600
net: set FMODE_NOWAIT for sockets
The socket read/write functions deal with O_NONBLOCK and IOCB_NOWAIT
just fine, so we can flag them as being FMODE_NOWAIT compliant. With
this, we can remove socket special casing in io_uring when checking
if a file type is sane for nonblocking IO, and it's also the defined
way to flag file types as such in the kernel.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org
Reviewed-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20230509151910.183637-2-axboe@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-15036
Tested: LNST, Tier1
Upstream commit:
commit 0dd061a6a115f25132989cbd591a25afb2dee086
Author: Geliang Tang <geliang.tang@suse.com>
Date: Wed Aug 16 09:11:56 2023 +0800
bpf: Add update_socket_protocol hook
Add a hook named update_socket_protocol in __sys_socket(), for bpf
progs to attach to and update socket protocol. One user case is to
force legacy TCP apps to create and use MPTCP sockets instead of
TCP ones.
Define a fmod_ret set named bpf_mptcp_fmodret_ids, add the hook
update_socket_protocol into this set, and register it in
bpf_mptcp_kfunc_init().
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/79
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Link: https://lore.kernel.org/r/ac84be00f97072a46f8a72b4e2be46cbb7fa5053.1692147782.git.geliang.tang@suse.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-14364
Tested: LNST, Tier1
Conflicts: missing READ_ONCE annotation in the orig code, as rhel lacks\
the upstream commit 1ded5e5a5931 ("net: annotate data-races around \
sock->ops")
Upstream commit:
commit c889a99a21bf124c3db08d09df919f0eccc5ea4c
Author: Jordan Rife <jrife@google.com>
Date: Thu Sep 21 18:46:42 2023 -0500
net: prevent address rewrite in kernel_bind()
Similar to the change in commit 0bdf399342c5("net: Avoid address
overwrite in kernel_connect"), BPF hooks run on bind may rewrite the
address passed to kernel_bind(). This change
1) Makes a copy of the bind address in kernel_bind() to insulate
callers.
2) Replaces direct calls to sock->ops->bind() in net with kernel_bind()
Link: https://lore.kernel.org/netdev/20230912013332.2048422-1-jrife@google.com/
Fixes: 4fbac77d2d ("bpf: Hooks for sys_bind")
Cc: stable@vger.kernel.org
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jordan Rife <jrife@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-14364
Tested: LNST, Tier1
Upstream commit:
commit 86a7e0b69bd5b812e48a20c66c2161744f3caa16
Author: Jordan Rife <jrife@google.com>
Date: Thu Sep 21 18:46:41 2023 -0500
net: prevent rewrite of msg_name in sock_sendmsg()
Callers of sock_sendmsg(), and similarly kernel_sendmsg(), in kernel
space may observe their value of msg_name change in cases where BPF
sendmsg hooks rewrite the send address. This has been confirmed to break
NFS mounts running in UDP mode and has the potential to break other
systems.
This patch:
1) Creates a new function called __sock_sendmsg() with same logic as the
old sock_sendmsg() function.
2) Replaces calls to sock_sendmsg() made by __sys_sendto() and
__sys_sendmsg() with __sock_sendmsg() to avoid an unnecessary copy,
as these system calls are already protected.
3) Modifies sock_sendmsg() so that it makes a copy of msg_name if
present before passing it down the stack to insulate callers from
changes to the send address.
Link: https://lore.kernel.org/netdev/20230912013332.2048422-1-jrife@google.com/
Fixes: 1cedee13d2 ("bpf: Hooks for sys_sendmsg")
Cc: stable@vger.kernel.org
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jordan Rife <jrife@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2068237
Conflicts: I backported commit 72c531f8ef30 ("net: copy from user before
calling __get_compat_msghdr") and commit 7fa875b8e53c ("net: copy from
user before calling __copy_msghdr") before this one. Upstream this commit
went before those two (I think there was a merge that resolved the
conflicts, but I couldn't find it). I fixed up the conflict in this
patch.
commit 7c701d92b2b5e5175dbfec875816474b802b0c45
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Tue Jul 12 21:52:29 2022 +0100
skbuff: carry external ubuf_info in msghdr
Make possible for network in-kernel callers like io_uring to pass in a
custom ubuf_info by setting it in a new field of struct msghdr.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2068237
commit 7fa875b8e53c288d616234b9daf417b0650ce1cc
Author: Dylan Yudaken <dylany@fb.com>
Date: Thu Jul 14 04:02:56 2022 -0700
net: copy from user before calling __copy_msghdr
this is in preparation for multishot receive from io_uring, where it needs
to have access to the original struct user_msghdr.
functionally this should be a no-op.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220714110258.1336200-2-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2123490
commit 1228b34c8d0ecf6de18c4c95d22f60cc8607c50a
Author: Eric Dumazet <edumazet@google.com>
Date: Wed Jun 22 15:02:20 2022 +0000
net: clear msg_get_inq in __sys_recvfrom() and __copy_msghdr_from_user()
syzbot reported uninit-value in tcp_recvmsg() [1]
Issue here is that msg->msg_get_inq should have been cleared,
otherwise tcp_recvmsg() might read garbage and perform
more work than needed, or have undefined behavior.
Given CONFIG_INIT_STACK_ALL_ZERO=y is probably going to be
the default soon, I chose to change __sys_recvfrom() to clear
all fields but msghdr.addr which might be not NULL.
For __copy_msghdr_from_user(), I added an explicit clear
of kmsg->msg_get_inq.
[1]
BUG: KMSAN: uninit-value in tcp_recvmsg+0x6cf/0xb60 net/ipv4/tcp.c:2557
tcp_recvmsg+0x6cf/0xb60 net/ipv4/tcp.c:2557
inet_recvmsg+0x13a/0x5a0 net/ipv4/af_inet.c:850
sock_recvmsg_nosec net/socket.c:995 [inline]
sock_recvmsg net/socket.c:1013 [inline]
__sys_recvfrom+0x696/0x900 net/socket.c:2176
__do_sys_recvfrom net/socket.c:2194 [inline]
__se_sys_recvfrom net/socket.c:2190 [inline]
__x64_sys_recvfrom+0x122/0x1c0 net/socket.c:2190
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
Local variable msg created at:
__sys_recvfrom+0x81/0x900 net/socket.c:2154
__do_sys_recvfrom net/socket.c:2194 [inline]
__se_sys_recvfrom net/socket.c:2190 [inline]
__x64_sys_recvfrom+0x122/0x1c0 net/socket.c:2190
CPU: 0 PID: 3493 Comm: syz-executor170 Not tainted 5.19.0-rc3-syzkaller-30868-g4b28366af7d9 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: f94fd25cb0aa ("tcp: pass back data left in socket after receive")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Alexander Potapenko<glider@google.com>
Link: https://lore.kernel.org/r/20220622150220.1091182-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2123490
commit da214a475f8bd1d3e9e7a19ddfeb4d1617551bab
Author: Jens Axboe <axboe@kernel.dk>
Date: Tue Apr 12 14:22:39 2022 -0600
net: add __sys_socket_file()
This works like __sys_socket(), except instead of allocating and
returning a socket fd, it just returns the file associated with the
socket. No fd is installed into the process file table.
This is similar to do_accept(), and allows io_uring to use this without
instantiating a file descriptor in the process file table.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Acked-by: David S. Miller <davem@davemloft.net>
Link: https://lore.kernel.org/r/20220412202240.234207-2-axboe@kernel.dk
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2133057
Tested: LNST, Tier1
Upstream commit:
commit a5ef058dc4d9a3e60d1808a0700e18e0e37e408e
Author: Paolo Abeni <pabeni@redhat.com>
Date: Thu Oct 20 19:48:51 2022 +0200
net: introduce and use custom sockopt socket flag
We will soon introduce custom setsockopt for UDP sockets, too.
Instead of doing even more complex arbitrary checks inside
sock_use_custom_sol_socket(), add a new socket flag and set it
for the relevant socket types (currently only MPTCP).
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1142
# Merge Request Required Information
## Summary of Changes
Update the io_uring code base and its dependencies to v5.15. We will not enable the functionality at this time, this is only a preparatory patch series. The patch series does touch other subsystems, though:
91ef658fb8b8-namei-ignore-ERR-NULL-names-in-putname.patch
0ee50b47532a-namei-change-filename_parentat-calling-conventions.patch
584d3226d665-namei-make-do_mkdirat-take-struct-filename.patch
7797251bb5ab-namei-make-do_mknodat-take-struct-filename.patch
da2d0cede330-namei-make-do_symlinkat-take-struct-filename.patch
8228e2c31319-namei-add-getname_uflags.patch
020250f31c4c-namei-make-do_linkat-take-struct-filename.patch
45f30dab3957-namei-update-do_-helpers-to-return-ints.patch
d32f89da7fa8-net-add-accept-helper-not-installing-fd.patch
2112ff5ce0c1-iov_iter-track-truncated-size.patch
0766ec82e5fb-namei-Fix-use-after-free-in-kern_path_locked.patch
8fb0f47a9d7a-iov_iter-add-helper-to-save-iov_iter-state.patch
7dedd3e18077-Revert-iov_iter-track-truncated-size.patch
3a862cacf867-fs-add-anon_inode_getfile_secure-similar-to-anon_inode_getfd_secure.patch
As a result, file system, block and networking tests should be run.
Omitted-fix: 81132a39c152 ("fs: remove fget_many and fput_many interface")
This is outside the scope of this MR, and isn't a "fix" so much as a performance enhancement.
## Approved Bugzilla Ticket
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2107656
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Approved-by: Antoine Tenart <atenart@redhat.com>
Approved-by: Ming Lei <ming.lei@redhat.com>
Approved-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2134161
Tested: LNST, Tier1
Upstream commit:
commit 3c9ba81d72047f2e81bb535d42856517b613aba7
Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date: Tue Aug 23 10:47:00 2022 -0700
net: Fix a data-race around sysctl_somaxconn.
While reading sysctl_somaxconn, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2069046
Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
commit aef2feda97b840ec38e9fa53d0065188453304e8
Author: Jakub Kicinski <kuba@kernel.org>
Date: Wed Dec 15 18:55:37 2021 -0800
add missing bpf-cgroup.h includes
We're about to break the cgroup-defs.h -> bpf-cgroup.h dependency,
make sure those who actually need more than the definition of
struct cgroup_bpf include bpf-cgroup.h explicitly.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/bpf/20211216025538.1649516-3-kuba@kernel.org
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2107656
commit d32f89da7fa8ccc8b3fb8f909d61e42b9bc39329
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Wed Aug 25 12:25:44 2021 +0100
net: add accept helper not installing fd
Introduce and reuse a helper that acts similarly to __sys_accept4_file()
but returns struct file instead of installing file descriptor. Will be
used by io_uring.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Acked-by: David S. Miller <davem@davemloft.net>
Link: https://lore.kernel.org/r/c57b9e8e818d93683a3d24f8ca50ca038d1da8c4.1629888991.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/854
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2066451
The first patch is a dependence for patch 007747a984ea ("net: fix SOF_TIMESTAMPING_BIND_PHC to work with multiple sockets"). So I also backported it. Others are ptp_kvm and ptp_vclock fixes
v2: add another fix b6b19a71c8bb ("ptp: free 'vclock_index' in ptp_clock_release()")
Signed-off-by: Hangbin Liu <haliu@redhat.com>
Approved-by: Íñigo Huguet <ihuguet@redhat.com>
Approved-by: Kamal Heib <kheib@redhat.com>
Approved-by: Antoine Tenart <atenart@redhat.com>
Approved-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Patrick Talbert <ptalbert@redhat.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/824
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2081601
Depends: https://bugzilla.redhat.com/show_bug.cgi?id=2081260
Tested: Using bridge related self-tests
This MR updates the bridge network core to the upstream version v5.18.
List of commits:
```
4682048af0c8 ("net: bridge: remove fdb_notify forward declaration")
5f94a5e276ae ("net: bridge: remove fdb_insert forward declaration")
4731b6d6b257 ("net: bridge: rename fdb_insert to fdb_add_local")
f6814fdcfe1b ("net: bridge: rename br_fdb_insert to br_fdb_add_local")
9574fb558044 ("net: bridge: reduce indentation level in fdb_create")
5cda5272a460 ("net: bridge: move br_fdb_replay inside br_switchdev.c")
fab9eca88410 ("net: bridge: create a common function for populating switchdev FDB entries")
716a30a97a52 ("net: switchdev: merge switchdev_handle_fdb_{add,del}_to_device")
c5f6e5ebc2af ("net: bridge: provide shim definition for br_vlan_flags")
4a6849e46173 ("net: bridge: move br_vlan_replay to br_switchdev.c")
9ae9ff994b0e ("net: bridge: split out the switchdev portion of br_mdb_notify")
9776457c784f ("net: bridge: mdb: move all switchdev logic to br_switchdev.c")
326b212e9cd6 ("net: bridge: switchdev: consistent function naming")
ae0393500e3b ("net: bridge: switchdev: fix shim definition for br_switchdev_mdb_notify")
cc0be1ad686f ("net: bridge: Slightly optimize 'find_portno()'")
520fbdf7fb19 ("net/bridge: replace simple_strtoul to kstrtol")
5a45ab3f248b ("net: bridge: Allow base 16 inputs in sysfs")
442b03c32ca1 ("bridge: use __set_bit in __br_vlan_set_default_pvid")
fd3a45900055 ("net: bridge: Get SIOCGIFBR/SIOCSIFBR ioctl working in compat mode")
dcb2c5c6ca9b ("net: bridge: vlan: fix single net device option dumping")
fd20d9738395 ("net: bridge: vlan: fix memory leak in __allowed_ingress")
d8c2858181cc ("net/switchdev: use struct_size over open coded arithmetic")
5454f5c28eca ("net: bridge: vlan: check for errors from __vlan_del in __vlan_flush")
b2bc58d41fde ("net: bridge: vlan: check early for lack of BRENTRY flag in br_vlan_add_existing")
3116ad0696dd ("net: bridge: vlan: don't notify to switchdev master VLANs without BRENTRY flag")
cab2cd770051 ("net: bridge: vlan: make __vlan_add_flags react only to PVID and UNTAGGED")
27c5f74c7ba7 ("net: bridge: vlan: notify switchdev only when something changed")
8d23a54f5bee ("net: bridge: switchdev: differentiate new VLANs from changed ones")
263029ae3172 ("net: bridge: make nbp_switchdev_unsync_objs() follow reverse order of sync()")
b28d580e2939 ("net: bridge: switchdev: replay all VLAN groups")
7b465f4cf39e ("net: switchdev: rename switchdev_lower_dev_find to switchdev_lower_dev_find_rcu")
c4076cdd21f8 ("net: switchdev: introduce switchdev_handle_port_obj_{add,del} for foreign interfaces")
c832962ac972 ("net: bridge: multicast: notify switchdev driver whenever MC processing gets disabled")
36a29fb6b22d ("bridge: switch br_net_exit to batch mode")
acd8df5880d7 ("net: switchdev: avoid infinite recursion from LAG to bridge with port object handler")
a21d9a670d81 ("net: bridge: Add support for bridge port in locked mode")
fa1c83342987 ("net: bridge: Add support for offloading of locked port flag")
b2b681a41251 ("selftests: forwarding: tests of locked port feature")
ec638740fce9 ("net: switchdev: remove lag_mod_cb from switchdev_handle_fdb_event_to_device")
ec7328b59176 ("net: bridge: mst: Multiple Spanning Tree (MST) mode")
8c678d60562f ("net: bridge: mst: Allow changing a VLAN's MSTI")
122c29486e1f ("net: bridge: mst: Support setting and reporting MST port states")
87c167bb94ee ("net: bridge: mst: Notify switchdev drivers of MST mode changes")
6284c723d9b9 ("net: bridge: mst: Notify switchdev drivers of VLAN MSTI migrations")
7ae9147f4312 ("net: bridge: mst: Notify switchdev drivers of MST state changes")
cceac97afa09 ("net: bridge: mst: Add helper to map an MSTI to a VID set")
48d57b2e5f43 ("net: bridge: mst: Add helper to check if MST is enabled")
f54fd0e16306 ("net: bridge: mst: Add helper to query a port's MST state")
917b149ac3d5 ("selftests: forwarding: Disable learning before link up")
f70f5f1a8fff ("selftests: forwarding: Use same VRF for port and VLAN upper")
cde3fc244b3d ("net: bridge: mst: prevent NULL deref in br_mst_info_size()")
a911ad18a56a ("net: bridge: mst: Restrict info size queries to bridge ports")
7f40ea2145d9 ("net: bridge: switchdev: check br_vlan_group() return value")
```
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Approved-by: Petr Oros <poros@redhat.com>
Approved-by: Jarod Wilson <jarod@redhat.com>
Approved-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: Patrick Talbert <ptalbert@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2066451
Upstream Status: net.git commit 007747a984ea
commit 007747a984ea5e895b7d8b056b24ebf431e1e71d
Author: Miroslav Lichvar <mlichvar@redhat.com>
Date: Wed Jan 5 11:33:26 2022 +0100
net: fix SOF_TIMESTAMPING_BIND_PHC to work with multiple sockets
When multiple sockets using the SOF_TIMESTAMPING_BIND_PHC flag received
a packet with a hardware timestamp (e.g. multiple PTP instances in
different PTP domains using the UDPv4/v6 multicast or L2 transport),
the timestamps received on some sockets were corrupted due to repeated
conversion of the same timestamp (by the same or different vclocks).
Fix ptp_convert_timestamp() to not modify the shared skb timestamp
and return the converted timestamp as a ktime_t instead. If the
conversion fails, return 0 to not confuse the application with
timestamps corresponding to an unexpected PHC.
Fixes: d7c0882655 ("net: socket: support hardware timestamp conversion to PHC bound")
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Yangbo Lu <yangbo.lu@nxp.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Hangbin Liu <haliu@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2081601
commit fd3a459000557ff12c1d4b41f1bd30f439f6c942
Author: Remi Pommarel <repk@triplefau.lt>
Date: Fri Dec 24 12:46:40 2021 +0100
net: bridge: Get SIOCGIFBR/SIOCSIFBR ioctl working in compat mode
In compat mode SIOC{G,S}IFBR ioctls were only supporting
BRCTL_GET_VERSION returning an artificially version to spur userland
tool to use SIOCDEVPRIVATE instead. But some userland tools ignore that
and use SIOC{G,S}IFBR unconditionally as seen with busybox's brctl.
Example of non working 32-bit brctl with CONFIG_COMPAT=y:
$ brctl show
brctl: SIOCGIFBR: Invalid argument
Example of fixed 32-bit brctl with CONFIG_COMPAT=y:
$ brctl show
bridge name bridge id STP enabled interfaces
br0
Signed-off-by: Remi Pommarel <repk@triplefau.lt>
Co-developed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2013413
Conflicts:
1) Merge conflict in fs/xfs/xfs_icache.c due to missing upstream commit
182696fb021f ("xfs: rename _zone variables to _cache").
2) The hunk for fs/9p/vfs_inode.c is dropped due to merge conflict and
9P filesystem not supported in RHEL9.
3) The hunk for fs/ntfs3/super.c is dropped due to file not currently
present.
commit fd60b28842df833477c42da6a6d63d0d114a5fcc
Author: Muchun Song <songmuchun@bytedance.com>
Date: Tue, 22 Mar 2022 14:41:03 -0700
fs: allocate inode by using alloc_inode_sb()
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() of all filesystems to alloc_inode_sb().
Link: https://lkml.kernel.org/r/20220228122126.37293-5-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Theodore Ts'o <tytso@mit.edu> [ext4]
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2028276
Tested: LNST, Tier1
Conflicts: the upstream commit had some relevant conflicts due \
to the net-next commit 876f0bf9d0d5 ("net: socket: simplify \
dev_ifconf handling"), solved by the merge commit 29ce8f970107
("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net").
Upstream commit:
commit d0efb16294d145d157432feda83877ae9d7cdf37
Author: Peter Collingbourne <pcc@google.com>
Date: Thu Aug 26 12:46:01 2021 -0700
net: don't unconditionally copy_from_user a struct ifreq for socket ioctls
A common implementation of isatty(3) involves calling a ioctl passing
a dummy struct argument and checking whether the syscall failed --
bionic and glibc use TCGETS (passing a struct termios), and musl uses
TIOCGWINSZ (passing a struct winsize). If the FD is a socket, we will
copy sizeof(struct ifreq) bytes of data from the argument and return
-EFAULT if that fails. The result is that the isatty implementations
may return a non-POSIX-compliant value in errno in the case where part
of the dummy struct argument is inaccessible, as both struct termios
and struct winsize are smaller than struct ifreq (at least on arm64).
Although there is usually enough stack space following the argument
on the stack that this did not present a practical problem up to now,
with MTE stack instrumentation it's more likely for the copy to fail,
as the memory following the struct may have a different tag.
Fix the problem by adding an early check for whether the ioctl is a
valid socket ioctl, and return -ENOTTY if it isn't.
Fixes: 44c02a2c3d ("dev_ioctl(): move copyin/copyout to callers")
Link: https://linux-review.googlesource.com/id/I869da6cf6daabc3e4b7b82ac979683ba05e27d4d
Signed-off-by: Peter Collingbourne <pcc@google.com>
Cc: <stable@vger.kernel.org> # 4.19
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2008927
commit ad2f99aedf8fa77f3ae647153284fa63c43d3055
Author: Arnd Bergmann <arnd@arndb.de>
Date: Tue Jul 27 15:45:16 2021 +0200
net: bridge: move bridge ioctls out of .ndo_do_ioctl
Working towards obsoleting the .ndo_do_ioctl operation entirely,
stop passing the SIOCBRADDIF/SIOCBRDELIF device ioctl commands
into this callback.
My first attempt was to add another ndo_siocbr() callback, but
as there is only a single driver that takes these commands and
there is already a hook mechanism to call directly into this
driver, extend this hook instead, and use it for both the
deviceless and the device specific ioctl commands.
Cc: Roopa Prabhu <roopa@nvidia.com>
Cc: Nikolay Aleksandrov <nikolay@nvidia.com>
Cc: bridge@lists.linux-foundation.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2008927
commit 88fc023f7de22922c6c61e2f3d4c54befb8b3549
Author: Arnd Bergmann <arnd@arndb.de>
Date: Tue Jul 27 15:45:15 2021 +0200
net: socket: return changed ifreq from SIOCDEVPRIVATE
Some drivers that use SIOCDEVPRIVATE ioctl commands modify
the ifreq structure and expect it to be passed back to user
space, which has never really happened for compat mode
because the calling these drivers through ndo_do_ioctl
requires overwriting the ifr_data pointer.
Now that all drivers are converted to ndo_siocdevprivate,
change it to handle this correctly in both compat and
native mode.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>