Commit Graph

607 Commits

Author SHA1 Message Date
Jeff Moyer 8ba710b1a7 net: have do_accept() take a struct proto_accept_arg argument
JIRA: https://issues.redhat.com/browse/RHEL-64867
Conflicts: RHEL is missing commit 1ded5e5a5931 ("net: annotate
data-races around sock->ops"), which accounts for the differences in
ops structure dereferencing.

commit 0645fbe760afcc5332c858d1cbf416bf77ef3c29
Author: Jens Axboe <axboe@kernel.dk>
Date:   Thu May 9 09:31:05 2024 -0600

    net: have do_accept() take a struct proto_accept_arg argument
    
    In preparation for passing in more information via this API, change
    do_accept() to take a proto_accept_arg struct pointer rather than just
    the file flags separately.
    
    No functional changes in this patch.
    
    Acked-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-12-02 11:12:47 -05:00
Jeff Moyer df326e77d1 net: make __sys_accept4_file() static
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit c04245328dd7e915e21ac6395ffd218616e22754
Author: Yajun Deng <yajun.deng@linux.dev>
Date:   Fri Jun 10 17:10:17 2022 +0800

    net: make __sys_accept4_file() static
    
    __sys_accept4_file() isn't used outside of the file, make it static.
    
    As the same time, move file_flags and nofile parameters into
    __sys_accept4_file().
    
    Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-12-02 11:12:46 -05:00
Jeff Moyer c46aaba751 net: change proto and proto_ops accept type
JIRA: https://issues.redhat.com/browse/RHEL-64867
Conflicts: RHEL is missing commit 1ded5e5a5931 ("net: annotate
data-races around sock->ops"), which accounts for the differences in
ops structure dereferencing.

commit 92ef0fd55ac80dfc2e4654edfe5d1ddfa6e070fe
Author: Jens Axboe <axboe@kernel.dk>
Date:   Thu May 9 09:20:08 2024 -0600

    net: change proto and proto_ops accept type
    
    Rather than pass in flags, error pointer, and whether this is a kernel
    invocation or not, add a struct proto_accept_arg struct as the argument.
    This then holds all of these arguments, and prepares accept for being
    able to pass back more information.
    
    No functional changes in this patch.
    
    Acked-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-12-02 11:12:33 -05:00
Jeff Moyer 69fe4f1180 io_uring: separate header for exported net bits
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit 8c9a6f549e65912825e31dc1e0e3f7995984649d
Author: Pavel Begunkov <asml.silence@gmail.com>
Date:   Tue Apr 9 14:05:53 2024 -0700

    io_uring: separate header for exported net bits
    
    We're exporting some io_uring bits to networking, e.g. for implementing
    a net callback for io_uring cmds, but we don't want to expose more than
    needed. Add a separate header for networking.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Signed-off-by: David Wei <dw@davidwei.uk>
    Link: https://lore.kernel.org/r/20240409210554.1878789-1-dw@davidwei.uk
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 17:20:44 -05:00
Rado Vrbovsky 8f1ff93483 Merge: net: explicitly clear the sk pointer, when pf->create fails
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5775

JIRA: https://issues.redhat.com/browse/RHEL-66687  
Upstream Status: linux.git commit 631083143315d1b192bd7d915b967b37819e88ea  
CVE: CVE-2024-50186
  
commit 631083143315d1b192bd7d915b967b37819e88ea  
Author: Ignat Korchagin <ignat@cloudflare.com>  
Date:   Thu Oct 3 18:01:51 2024 +0100  
  
    net: explicitly clear the sk pointer, when pf->create fails  
  
    We have recently noticed the exact same KASAN splat as in commit  
    6cd4a78d962b ("net: do not leave a dangling sk pointer, when socket  
    creation fails"). The problem is that commit did not fully address the  
    problem, as some pf->create implementations do not use sk_common_release  
    in their error paths.  
  
    For example, we can use the same reproducer as in the above commit, but  
    changing ping to arping. arping uses AF_PACKET socket and if packet_create  
    fails, it will just sk_free the allocated sk object.  
  
    While we could chase all the pf->create implementations and make sure they  
    NULL the freed sk object on error from the socket, we can't guarantee  
    future protocols will not make the same mistake.  
  
    So it is easier to just explicitly NULL the sk pointer upon return from  
    pf->create in __sock_create. We do know that pf->create always releases the  
    allocated sk object on error, so if the pointer is not NULL, it is  
    definitely dangling.  
  
    Fixes: 6cd4a78d962b ("net: do not leave a dangling sk pointer, when socket creation fails")  
    Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>  
    Cc: stable@vger.kernel.org  
    Reviewed-by: Eric Dumazet <edumazet@google.com>  
    Link: https://patch.msgid.link/20241003170151.69445-1-ignat@cloudflare.com  
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>  
  
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>

Approved-by: Florian Westphal <fwestpha@redhat.com>
Approved-by: Davide Caratti <dcaratti@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2024-11-22 09:30:52 +00:00
Rado Vrbovsky 0e814ddac4 Merge: bpf: backports from upstream [9.6 phase 1]
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5622

JIRA: https://issues.redhat.com/browse/RHEL-65205  
JIRA: https://issues.redhat.com/browse/RHEL-63189  
JIRA: https://issues.redhat.com/browse/RHEL-54828  
JIRA: https://issues.redhat.com/browse/RHEL-65858  
CVE: CVE-2024-47710  
CVE: CVE-2024-43834  
CVE: CVE-2024-41010  
  
Backporting stable fixes from upstream.  
  
Omitted-fix: 517125f67494 ("selftests/bpf: DENYLIST.aarch64: Skip fexit_sleep again")  
    We have a different way of skipping broken selftests in CKI  
  
Signed-off-by: Felix Maurer <fmaurer@redhat.com>

Approved-by: Chris von Recklinghausen <crecklin@redhat.com>
Approved-by: Toke Høiland-Jørgensen <toke@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2024-11-15 21:02:44 +00:00
Andrea Claudi 6b6b533653 net: explicitly clear the sk pointer, when pf->create fails
JIRA: https://issues.redhat.com/browse/RHEL-66687
Upstream Status: linux.git commit 631083143315d1b192bd7d915b967b37819e88ea
CVE: CVE-2024-50186

commit 631083143315d1b192bd7d915b967b37819e88ea
Author: Ignat Korchagin <ignat@cloudflare.com>
Date:   Thu Oct 3 18:01:51 2024 +0100

    net: explicitly clear the sk pointer, when pf->create fails

    We have recently noticed the exact same KASAN splat as in commit
    6cd4a78d962b ("net: do not leave a dangling sk pointer, when socket
    creation fails"). The problem is that commit did not fully address the
    problem, as some pf->create implementations do not use sk_common_release
    in their error paths.

    For example, we can use the same reproducer as in the above commit, but
    changing ping to arping. arping uses AF_PACKET socket and if packet_create
    fails, it will just sk_free the allocated sk object.

    While we could chase all the pf->create implementations and make sure they
    NULL the freed sk object on error from the socket, we can't guarantee
    future protocols will not make the same mistake.

    So it is easier to just explicitly NULL the sk pointer upon return from
    pf->create in __sock_create. We do know that pf->create always releases the
    allocated sk object on error, so if the pointer is not NULL, it is
    definitely dangling.

    Fixes: 6cd4a78d962b ("net: do not leave a dangling sk pointer, when socket creation fails")
    Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
    Cc: stable@vger.kernel.org
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Link: https://patch.msgid.link/20241003170151.69445-1-ignat@cloudflare.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
2024-11-13 19:08:54 +01:00
Felix Maurer 9895884fab bpf, net: Fix a potential race in do_sock_getsockopt()
JIRA: https://issues.redhat.com/browse/RHEL-65205

commit 33f339a1ba54e56bba57ee9a77c71e385ab4825c
Author: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
Date:   Fri Aug 30 16:25:17 2024 +0800

    bpf, net: Fix a potential race in do_sock_getsockopt()
    
    There's a potential race when `cgroup_bpf_enabled(CGROUP_GETSOCKOPT)` is
    false during the execution of `BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN`, but
    becomes true when `BPF_CGROUP_RUN_PROG_GETSOCKOPT` is called.
    This inconsistency can lead to `BPF_CGROUP_RUN_PROG_GETSOCKOPT` receiving
    an "-EFAULT" from `__cgroup_bpf_run_filter_getsockopt(max_optlen=0)`.
    Scenario shown as below:
    
               `process A`                      `process B`
               -----------                      ------------
      BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN
                                                enable CGROUP_GETSOCKOPT
      BPF_CGROUP_RUN_PROG_GETSOCKOPT (-EFAULT)
    
    To resolve this, remove the `BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN` macro and
    directly uses `copy_from_sockptr` to ensure that `max_optlen` is always
    set before `BPF_CGROUP_RUN_PROG_GETSOCKOPT` is invoked.
    
    Fixes: 0d01da6afc ("bpf: implement getsockopt and setsockopt hooks")
    Co-developed-by: Yanghui Li <yanghui.li@mediatek.com>
    Signed-off-by: Yanghui Li <yanghui.li@mediatek.com>
    Co-developed-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com>
    Signed-off-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com>
    Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
    Acked-by: Stanislav Fomichev <sdf@fomichev.me>
    Acked-by: Alexei Starovoitov <ast@kernel.org>
    Link: https://patch.msgid.link/20240830082518.23243-1-Tze-nan.Wu@mediatek.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
2024-11-06 19:29:48 +01:00
Ian Kent 92d69b838d fs: port xattr to mnt_idmap
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus

Conflicts: The cifs source has been moved in CentOS Stream so manually
	apply rejected hunk to fs/smb/client/xattr.c.
        Dropped hunks for ntfs3 because the source is not present in
        the CentOS Stream source tree.
	CentOS Stream commit 98ba731fc7 ("ovl: Move xattr support
	to new xattrs.c file") moved ovl_own_xattr_set(), manually apply
	changes.
	CentOS Stream commit 67e2fcb2f3 ("evm: don't copy up
	'security.evm' xattr") is present causing hunk #1 against
	include/linux/evm.h to be rejected, manually apply.
	Upstream commit 5d1ef2ce13a90 ("ima: Introduce
	ima_get_current_hash_algo()") is not present in CentOS Stream
	which causes fuzz 1 for hunk #1 against include/linux/ima.h.
	There's a reject of hunk #1 for include/linux/lsm_hooks.h but
	I can't see any reason for it, manually applied the hunk.
	CentOS Stream does not have upstream commit ce5bb5a86e5eb
	("ima: Return int in the functions to measure a buffer") which
	results in a reject of hunk #2 against security/integrity/ima/ima.h
	and hunks #8 and #11 against security/integrity/ima/ima_main.c, so
	manually apply hunks. There also appears to be a whitespace
	mismatch causing hunk #7 to report fuzz 2 on application.
	CentOS Stream does not have upstream commit c7423dbdbc9ec
	("ima: Handle -ESTALE returned by ima_filter_rule_match()")
	which results in a reject of hunk #3 against
	security/integrity/ima/ima_policy.c, so manually apply hunk.

commit 39f60c1ccee72caa0104145b5dbf5d37cce1ea39
Author: Christian Brauner <brauner@kernel.org>
Date:   Fri Jan 13 12:49:23 2023 +0100

    fs: port xattr to mnt_idmap

    Convert to struct mnt_idmap.

    Last cycle we merged the necessary infrastructure in
    256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
    This is just the conversion to struct mnt_idmap.

    Currently we still pass around the plain namespace that was attached to a
    mount. This is in general pretty convenient but it makes it easy to
    conflate namespaces that are relevant on the filesystem with namespaces
    that are relevent on the mount level. Especially for non-vfs developers
    without detailed knowledge in this area this can be a potential source for
    bugs.

    Once the conversion to struct mnt_idmap is done all helpers down to the
    really low-level helpers will take a struct mnt_idmap argument instead of
    two namespace arguments. This way it becomes impossible to conflate the two
    eliminating the possibility of any bugs. All of the vfs and all filesystems
    only operate on struct mnt_idmap.

    Acked-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>

Signed-off-by: Ian Kent <ikent@redhat.com>
2024-10-16 10:45:21 +08:00
Ian Kent 43ca440cdf fs: port ->setattr() to pass mnt_idmap
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus

Conflicts: CentOS Stream commit 3c29fadfb1 ("afs: split
	afs_pagecache_valid() out of afs_validate()") is present, manually
	adjust hunk #1 of fs/afs/internal.h.
	For consistency drop btrfs hunks because it isn't supported in
	CentOS Stream and other backports also drop such hunks.
	CentOS Stream commit 48fa94aacd ("ceph: fscrypt_auth handling
	for ceph") alters the definition of _ceph_setattr(), adjust
	manually.
	CentOS Stream commit 34b2a2b5a3 {"ceph: add some fscrypt
	guardrails") introduces a call to fscrypt_prepare_setattr() which
	causes fuzz when applying.
	The cifs source has been moved in CentOS Stream so manually
	apply rejected hunks to fs/smb/client/cifsfs.h and
	fs/smb/client/inode.c.
	Upstream commit 5a646fb3a3e2d ("coda: avoid doing bad things on
	inode type changes during revalidation") is not present which
	causes fuzz in fs/coda/coda_linux.h.
	Dropped hunks for ntfs3 because the source is not present in
	the CentOS Stream source tree.
	CentOS Stream commit 98ba731fc7 ("ovl: Move xattr support
	to new xattrs.c file") is presnt so manually apply hunk.
	CentOS Stream commit 892da692fa ("shmem: support idmapped
	mounts for tmpfs") is present so it's ok to pass idmap to
	setattr_prepare() and setattr_copy().
	Update to add incremental changes needed due to CentOS Stream
	commit 469e1d13f6 ("shmem: quota support").
	Allow for CentOS Stream commit 6c3396a0d8 ("kernfs: Introduce
	separate rwsem to protect inode attributes") which is already
	present.
	CentOS Stream commit f5219db0c0 ("KVM: fix Add KVM_CREATE_GUEST_MEMFD
	ioctl() for guest-specific backing memory") updated the upstream commit
	a7800aa80ea4d ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific
	backing memory") to account for missing idmapping commits. Now we have
	updated one of the two places these changes were made make one of the
	needed adjustments to match the original upstream patch.

commit c1632a0f11209338fc300c66252bcc4686e609e8
Author: Christian Brauner <brauner@kernel.org>
Date:   Fri Jan 13 12:49:11 2023 +0100

    fs: port ->setattr() to pass mnt_idmap

    Convert to struct mnt_idmap.

    Last cycle we merged the necessary infrastructure in
    256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
    This is just the conversion to struct mnt_idmap.

    Currently we still pass around the plain namespace that was attached to a
    mount. This is in general pretty convenient but it makes it easy to
    conflate namespaces that are relevant on the filesystem with namespaces
    that are relevent on the mount level. Especially for non-vfs developers
    without detailed knowledge in this area this can be a potential source for
    bugs.

    Once the conversion to struct mnt_idmap is done all helpers down to the
    really low-level helpers will take a struct mnt_idmap argument instead of
    two namespace arguments. This way it becomes impossible to conflate the two
    eliminating the possibility of any bugs. All of the vfs and all filesystems
    only operate on struct mnt_idmap.

    Acked-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>

Signed-off-by: Ian Kent <ikent@redhat.com>
2024-10-16 09:07:05 +08:00
Rafael Aquini ded92bb26c net: Make sock_splice_read() use copy_splice_read() by default
JIRA: https://issues.redhat.com/browse/RHEL-27742

This patch is a backport of the following upstream commit:
commit 67178fd066d53d5a6cada1cdc6399d02b68b708e
Author: David Howells <dhowells@redhat.com>
Date:   Mon May 22 14:50:00 2023 +0100

    net: Make sock_splice_read() use copy_splice_read() by default

    Make sock_splice_read() use copy_splice_read() by default as
    file_splice_read() will return immediately with 0 as a socket has no
    pagecache and is a zero-size file.

    Signed-off-by: David Howells <dhowells@redhat.com>
    cc: "David S. Miller" <davem@davemloft.net>
    cc: Eric Dumazet <edumazet@google.com>
    cc: Jakub Kicinski <kuba@kernel.org>
    cc: Paolo Abeni <pabeni@redhat.com>
    cc: Christoph Hellwig <hch@lst.de>
    cc: Al Viro <viro@zeniv.linux.org.uk>
    cc: Jens Axboe <axboe@kernel.dk>
    cc: netdev@vger.kernel.org
    cc: linux-block@vger.kernel.org
    cc: linux-mm@kvack.org
    Link: https://lore.kernel.org/r/20230522135018.2742245-14-dhowells@redhat.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-09-05 20:35:55 -04:00
Jeff Moyer 8d70cdbc58 net/socket: Break down __sys_getsockopt
JIRA: https://issues.redhat.com/browse/RHEL-27755
Conflicts: RHEL does not include commit 1ded5e5a5931 ("net: annotate
data-races around sock->ops"), which converts proto_ops to a const
accessed with READ_ONCE.  Fix up the patch to apply, but keep the
READ_ONCE from 1ded5e5a5931.

commit 0b05b0cd78c92371fdde6333d006f39eaf9e0860
Author: Breno Leitao <leitao@debian.org>
Date:   Mon Oct 16 06:47:42 2023 -0700

    net/socket: Break down __sys_getsockopt
    
    Split __sys_getsockopt() into two functions by removing the core
    logic into a sub-function (do_sock_getsockopt()). This will avoid
    code duplication when doing the same operation in other callers, for
    instance.
    
    do_sock_getsockopt() will be called by io_uring getsockopt() command
    operation in the following patch.
    
    The same was done for the setsockopt pair.
    
    Suggested-by: Martin KaFai Lau <martin.lau@linux.dev>
    Signed-off-by: Breno Leitao <leitao@debian.org>
    Acked-by: Jakub Kicinski <kuba@kernel.org>
    Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
    Link: https://lore.kernel.org/r/20231016134750.1381153-5-leitao@debian.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-07-02 09:57:34 -04:00
Jeff Moyer ea15616bfd net/socket: Break down __sys_setsockopt
JIRA: https://issues.redhat.com/browse/RHEL-27755
Conflicts: RHEL does not have commit 1ded5e5a5931 ("net: annotate
data-races around sock->ops"), which leads to context differences in
the patch.

commit 1406245c29454ff84919736be83e14cdaba7fec1
Author: Breno Leitao <leitao@debian.org>
Date:   Mon Oct 16 06:47:41 2023 -0700

    net/socket: Break down __sys_setsockopt
    
    Split __sys_setsockopt() into two functions by removing the core
    logic into a sub-function (do_sock_setsockopt()). This will avoid
    code duplication when doing the same operation in other callers, for
    instance.
    
    do_sock_setsockopt() will be called by io_uring setsockopt() command
    operation in the following patch.
    
    Signed-off-by: Breno Leitao <leitao@debian.org>
    Reviewed-by: Willem de Bruijn <willemb@google.com>
    Acked-by: Jakub Kicinski <kuba@kernel.org>
    Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
    Link: https://lore.kernel.org/r/20231016134750.1381153-4-leitao@debian.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-07-02 09:56:34 -04:00
Jeff Moyer a5d32ca328 bpf: Add sockptr support for setsockopt
JIRA: https://issues.redhat.com/browse/RHEL-27755

commit 3f31e0d14d44ad491a81b7c1f83f32fbc300a867
Author: Breno Leitao <leitao@debian.org>
Date:   Mon Oct 16 06:47:40 2023 -0700

    bpf: Add sockptr support for setsockopt
    
    The whole network stack uses sockptr, and while it doesn't move to
    something more modern, let's use sockptr in setsockptr BPF hooks, so, it
    could be used by other callers.
    
    The main motivation for this change is to use it in the io_uring
    {g,s}etsockopt(), which will use a userspace pointer for *optval, but, a
    kernel value for optlen.
    
    Link: https://lore.kernel.org/all/ZSArfLaaGcfd8LH8@gmail.com/
    
    Signed-off-by: Breno Leitao <leitao@debian.org>
    Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
    Link: https://lore.kernel.org/r/20231016134750.1381153-3-leitao@debian.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-07-02 09:55:34 -04:00
Jeff Moyer 4172e40e17 bpf: Add sockptr support for getsockopt
JIRA: https://issues.redhat.com/browse/RHEL-27755

commit a615f67e1a426f35366b8398c11f31c148e7df48
Author: Breno Leitao <leitao@debian.org>
Date:   Mon Oct 16 06:47:39 2023 -0700

    bpf: Add sockptr support for getsockopt
    
    The whole network stack uses sockptr, and while it doesn't move to
    something more modern, let's use sockptr in getsockptr BPF hooks, so, it
    could be used by other callers.
    
    The main motivation for this change is to use it in the io_uring
    {g,s}etsockopt(), which will use a userspace pointer for *optval, but, a
    kernel value for optlen.
    
    Link: https://lore.kernel.org/all/ZSArfLaaGcfd8LH8@gmail.com/
    
    Signed-off-by: Breno Leitao <leitao@debian.org>
    Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
    Link: https://lore.kernel.org/r/20231016134750.1381153-2-leitao@debian.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-07-02 09:54:34 -04:00
Davide Caratti 673222a076 net: Avoid address overwrite in kernel_connect
JIRA: https://issues.redhat.com/browse/RHEL-33410
JIRA: https://issues.redhat.com/browse/RHEL-30875
Upstream Status: net.git commit 0bdf399342c5acbd817c9098b6c7ed21f1974312
Conflicts:
 - net/socket.c: don't use READ_ONCE() as we don't have upstream commit
   1ded5e5a5931 ("net: annotate data-races around sock->ops")

commit 0bdf399342c5acbd817c9098b6c7ed21f1974312
Author: Jordan Rife <jrife@google.com>
Date:   Mon Aug 21 16:45:23 2023 -0500

    net: Avoid address overwrite in kernel_connect

    BPF programs that run on connect can rewrite the connect address. For
    the connect system call this isn't a problem, because a copy of the address
    is made when it is moved into kernel space. However, kernel_connect
    simply passes through the address it is given, so the caller may observe
    its address value unexpectedly change.

    A practical example where this is problematic is where NFS is combined
    with a system such as Cilium which implements BPF-based load balancing.
    A common pattern in software-defined storage systems is to have an NFS
    mount that connects to a persistent virtual IP which in turn maps to an
    ephemeral server IP. This is usually done to achieve high availability:
    if your server goes down you can quickly spin up a replacement and remap
    the virtual IP to that endpoint. With BPF-based load balancing, mounts
    will forget the virtual IP address when the address rewrite occurs
    because a pointer to the only copy of that address is passed down the
    stack. Server failover then breaks, because clients have forgotten the
    virtual IP address. Reconnects fail and mounts remain broken. This patch
    was tested by setting up a scenario like this and ensuring that NFS
    reconnects worked after applying the patch.

    Signed-off-by: Jordan Rife <jrife@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2024-06-06 11:51:41 +02:00
Lucas Zampieri a7bb707d81 Merge: CNB95: ptp: Support hardware clocks with additional free running cycle counter
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4001

JIRA: https://issues.redhat.com/browse/RHEL-32107  
Tested: Just built... igc driver update is needed to verify  

Commits:
```
42704b26b0f1d ("ptp: Add cycles support for virtual clocks")
51eb7492af276 ("ptp: Request cycles for TX timestamp")
d58809d854c9e ("ptp: Pass hwtstamp to ptp_convert_timestamp()")
97dc7cd92ac67 ("ptp: Support late timestamp determination")
fcf308e50928a ("ptp: Speed up vclock lookup")
```

Signed-off-by: Ivan Vecera <ivecera@redhat.com>

Approved-by: Corinna Vinschen <vinschen@redhat.com>
Approved-by: Hangbin Liu <haliu@redhat.com>

Merged-by: Lucas Zampieri <lzampier@redhat.com>
2024-04-22 12:45:16 +00:00
Ivan Vecera aa98375a76 ptp: Support late timestamp determination
JIRA: https://issues.redhat.com/browse/RHEL-32107

Conflicts:
- context conflict due to existing backport of 66f7223039c0c ("net:
  add NDOs for configuring hardware timestamping")

commit 97dc7cd92ac67f6e05df418df1772ba4a7fbf693
Author: Gerhard Engleder <gerhard@engleder-embedded.com>
Date:   Fri May 6 22:01:40 2022 +0200

    ptp: Support late timestamp determination

    If a physical clock supports a free running cycle counter, then
    timestamps shall be based on this time too. For TX it is known in
    advance before the transmission if a timestamp based on the free running
    cycle counter is needed. For RX it is impossible to know which timestamp
    is needed before the packet is received and assigned to a socket.

    Support late timestamp determination by a network device. Therefore, an
    address/cookie is stored within the new netdev_data field of struct
    skb_shared_hwtstamps. This address/cookie is provided to a new network
    device function called ndo_get_tstamp(), which returns a timestamp based
    on the normal/adjustable time or based on the free running cycle
    counter. If function is not supported, then timestamp handling is not
    changed.

    This mechanism is intended for RX, but TX use is also possible.

    Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
    Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
    Acked-by: Richard Cochran <richardcochran@gmail.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-04-08 13:04:21 +02:00
Ivan Vecera 6faa5af62c ptp: Pass hwtstamp to ptp_convert_timestamp()
JIRA: https://issues.redhat.com/browse/RHEL-32107

commit d58809d854c9ee19e4cd41023e137e65e9dc3f94
Author: Gerhard Engleder <gerhard@engleder-embedded.com>
Date:   Fri May 6 22:01:39 2022 +0200

    ptp: Pass hwtstamp to ptp_convert_timestamp()

    ptp_convert_timestamp() converts only the timestamp hwtstamp, which is
    a field of the argument with the type struct skb_shared_hwtstamps *. So
    a pointer to the hwtstamp field of this structure is sufficient.

    Rework ptp_convert_timestamp() to use an argument of type ktime_t *.
    This allows to add additional timestamp manipulation stages before the
    call of ptp_convert_timestamp().

    Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
    Acked-by: Richard Cochran <richardcochran@gmail.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-04-08 13:04:17 +02:00
Ivan Vecera 14b7388015 ptp: Request cycles for TX timestamp
JIRA: https://issues.redhat.com/browse/RHEL-32107

commit 51eb7492af276b5b4d27cfa4474d40bdac7b9cf8
Author: Gerhard Engleder <gerhard@engleder-embedded.com>
Date:   Fri May 6 22:01:38 2022 +0200

    ptp: Request cycles for TX timestamp

    The free running cycle counter of physical clocks called cycles shall be
    used for hardware timestamps to enable synchronisation.

    Introduce new flag SKBTX_HW_TSTAMP_USE_CYCLES, which signals driver to
    provide a TX timestamp based on cycles if cycles are supported.

    Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
    Acked-by: Richard Cochran <richardcochran@gmail.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-04-08 13:04:12 +02:00
Artem Savkov ad2c2df99e bpf: Add __bpf_hook_{start,end} macros
JIRA: https://issues.redhat.com/browse/RHEL-23643

commit 15fb6f2b6c4c3c129adc2412ae12ec15e60a6adb
Author: Dave Marchevsky <davemarchevsky@fb.com>
Date:   Tue Oct 31 14:56:25 2023 -0700

    bpf: Add __bpf_hook_{start,end} macros
    
    Not all uses of __diag_ignore_all(...) in BPF-related code in order to
    suppress warnings are wrapping kfunc definitions. Some "hook point"
    definitions - small functions meant to be used as attach points for
    fentry and similar BPF progs - need to suppress -Wmissing-declarations.
    
    We could use __bpf_kfunc_{start,end}_defs added in the previous patch in
    such cases, but this might be confusing to someone unfamiliar with BPF
    internals. Instead, this patch adds __bpf_hook_{start,end} macros,
    currently having the same effect as __bpf_kfunc_{start,end}_defs, then
    uses them to suppress warnings for two hook points in the kernel itself
    and some bpf_testmod hook points as well.
    
    Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
    Cc: Yafang Shao <laoar.shao@gmail.com>
    Acked-by: Jiri Olsa <jolsa@kernel.org>
    Acked-by: Yafang Shao <laoar.shao@gmail.com>
    Link: https://lore.kernel.org/r/20231031215625.2343848-2-davemarchevsky@fb.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2024-03-27 11:23:42 +01:00
Paolo Abeni 0235a0b825 net: Save and restore msg_namelen in sock_sendmsg
JIRA: https://issues.redhat.com/browse/RHEL-21447
Tested: LNST, Tier1

Upstream commit:
commit 01b2885d9415152bcb12ff1f7788f500a74ea0ed
Author: Marc Dionne <marc.dionne@auristor.com>
Date:   Thu Dec 21 09:12:30 2023 -0400

    net: Save and restore msg_namelen in sock_sendmsg

    Commit 86a7e0b69bd5 ("net: prevent rewrite of msg_name in
    sock_sendmsg()") made sock_sendmsg save the incoming msg_name pointer
    and restore it before returning, to insulate the caller against
    msg_name being changed by the called code.  If the address length
    was also changed however, we may return with an inconsistent structure
    where the length doesn't match the address, and attempts to reuse it may
    lead to lost packets.

    For example, a kernel that doesn't have commit 1c5950fc6fe9 ("udp6: fix
    potential access to stale information") will replace a v4 mapped address
    with its ipv4 equivalent, and shorten namelen accordingly from 28 to 16.
    If the caller attempts to reuse the resulting msg structure, it will have
    the original ipv6 (v4 mapped) address but an incorrect v4 length.

    Fixes: 86a7e0b69bd5 ("net: prevent rewrite of msg_name in sock_sendmsg()")
    Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
    Reviewed-by: Willem de Bruijn <willemb@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-12 17:13:23 +01:00
Scott Weaver c6519990cd Merge: net: visibility patches
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3447

JIRA: https://issues.redhat.com/browse/RHEL-17413

A set of various visibility / debuggability improvements related to the net stack.

Signed-off-by: Antoine Tenart <atenart@redhat.com>

Approved-by: John W. Linville <linville@redhat.com>
Approved-by: Florian Westphal <fwestpha@redhat.com>
Approved-by: Eric Chanudet <echanude@redhat.com>

Signed-off-by: Scott Weaver <scweaver@redhat.com>
2024-01-02 10:35:00 -05:00
Scott Weaver 8d95883db0 Merge: io_uring: update to upstream v6.6
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3318

Update io_uring and its dependencies to upstream kernel version 6.6.

JIRA: https://issues.redhat.com/browse/RHEL-12076
JIRA: https://issues.redhat.com/browse/RHEL-14998
JIRA: https://issues.redhat.com/browse/RHEL-4447
CVE: CVE-2023-46862

Omitted-Fix: ab69838e7c75 ("io_uring/kbuf: Fix check of BID wrapping in provided buffers")
Omitted-Fix: f74c746e476b ("io_uring/kbuf: Allow the full buffer id space for provided buffers")

This is the list of new features available (includes upstream kernel versions 6.3-6.6):

    User-specified ring buffer
    Provided Buffers allocated by the kernel
    Ability to register the ring fd
    Multi-shot timeouts
    ability to pass custom flags to the completion queue entry for ring messages

All of these features are covered by the liburing tests.

In my testing, no-mmap-inval.t failed because of a broken test.  socket-uring-cmd.t also failed because of a missing selinux policy rule.  Try running audit2allow if you see a failure in that test.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>

Approved-by: Wander Lairson Costa <wander@redhat.com>
Approved-by: Donald Dutile <ddutile@redhat.com>
Approved-by: Chris von Recklinghausen <crecklin@redhat.com>
Approved-by: Jiri Benc <jbenc@redhat.com>
Approved-by: Ming Lei <ming.lei@redhat.com>

Signed-off-by: Scott Weaver <scweaver@redhat.com>
2023-12-16 14:38:47 -05:00
Antoine Tenart 0335a6303f sock: add tracepoint for send recv length
JIRA: https://issues.redhat.com/browse/RHEL-17413
Upstream Status: linux.git

commit 6e6eda44b939c0931533d6681d9f2ed41b44cde9
Author: Yunhui Cui <cuiyunhui@bytedance.com>
Date:   Wed Jan 11 14:59:30 2023 +0800

    sock: add tracepoint for send recv length

    Add 2 tracepoints to monitor the tcp/udp traffic
    of per process and per cgroup.

    Regarding monitoring the tcp/udp traffic of each process, there are two
    existing solutions, the first one is https://www.atoptool.nl/netatop.php.
    The second is via kprobe/kretprobe.

    Netatop solution is implemented by registering the hook function at the
    hook point provided by the netfilter framework.

    These hook functions may be in the soft interrupt context and cannot
    directly obtain the pid. Some data structures are added to bind packets
    and processes. For example, struct taskinfobucket, struct taskinfo ...

    Every time the process sends and receives packets it needs multiple
    hashmaps,resulting in low performance and it has the problem fo inaccurate
    tcp/udp traffic statistics(for example: multiple threads share sockets).

    We can obtain the information with kretprobe, but as we know, kprobe gets
    the result by trappig in an exception, which loses performance compared
    to tracepoint.

    We compared the performance of tracepoints with the above two methods, and
    the results are as follows:

    ab -n 1000000 -c 1000 -r http://127.0.0.1/index.html
    without trace:
    Time per request: 39.660 [ms] (mean)
    Time per request: 0.040 [ms] (mean, across all concurrent requests)

    netatop:
    Time per request: 50.717 [ms] (mean)
    Time per request: 0.051 [ms] (mean, across all concurrent requests)

    kr:
    Time per request: 43.168 [ms] (mean)
    Time per request: 0.043 [ms] (mean, across all concurrent requests)

    tracepoint:
    Time per request: 41.004 [ms] (mean)
    Time per request: 0.041 [ms] (mean, across all concurrent requests

    It can be seen that tracepoint has better performance.

    Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
    Signed-off-by: Xiongchun Duan <duanxiongchun@bytedance.com>
    Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2023-12-11 11:12:03 +01:00
Jan Stancek 063f72e7e5 Merge: mptcp: rebase to Linux 6.7
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3305

JIRA: https://issues.redhat.com/browse/RHEL-15036
Tested: LNST, Tier1, selftests, pktdrill

Rebase to the current upstream to bring in new features and
a lot of fixes. A good half of the long commit list touches
the self-tests only, and the remaining is self-contained in mptcp.

The only notable exception is:

tcp: get rid of sysctl_tcp_adv_win_scale

that is a pre requisite to a bunch of mptcp changes included here
and also uncontroversially a good thing (TM) for TCP.

Wider-scope data-races related changeset are included (possibly as
partial backport) only if they help to reduce conflict on later
changes.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Approved-by: Florian Westphal <fwestpha@redhat.com>
Approved-by: Davide Caratti <dcaratti@redhat.com>

Signed-off-by: Jan Stancek <jstancek@redhat.com>
2023-11-20 21:50:53 +01:00
Jeff Moyer 0970f7f058 io_uring: Add io_uring command support for sockets
JIRA: https://issues.redhat.com/browse/RHEL-12076

commit 8e9fad0e70b7b62848e0aeb1a873903b9ce4d7c4
Author: Breno Leitao <leitao@debian.org>
Date:   Tue Jun 27 06:44:24 2023 -0700

    io_uring: Add io_uring command support for sockets
    
    Enable io_uring commands on network sockets. Create two new
    SOCKET_URING_OP commands that will operate on sockets.
    
    In order to call ioctl on sockets, use the file_operations->io_uring_cmd
    callbacks, and map it to a uring socket function, which handles the
    SOCKET_URING_OP accordingly, and calls socket ioctls.
    
    This patches was tested by creating a new test case in liburing.
    Link: https://github.com/leitao/liburing/tree/io_uring_cmd
    
    Signed-off-by: Breno Leitao <leitao@debian.org>
    Acked-by: Jakub Kicinski <kuba@kernel.org>
    Link: https://lore.kernel.org/r/20230627134424.2784797-1-leitao@debian.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-11-02 15:32:16 -04:00
Jeff Moyer 4d4952f4d4 net: Declare MSG_SPLICE_PAGES internal sendmsg() flag
JIRA: https://issues.redhat.com/browse/RHEL-12076

commit b841b901c452d92610f739a36e54978453528876
Author: David Howells <dhowells@redhat.com>
Date:   Mon May 22 13:11:10 2023 +0100

    net: Declare MSG_SPLICE_PAGES internal sendmsg() flag
    
    Declare MSG_SPLICE_PAGES, an internal sendmsg() flag, that hints to a
    network protocol that it should splice pages from the source iterator
    rather than copying the data if it can.  This flag is added to a list that
    is cleared by sendmsg syscalls on entry.
    
    This is intended as a replacement for the ->sendpage() op, allowing a way
    to splice in several multipage folios in one go.
    
    Signed-off-by: David Howells <dhowells@redhat.com>
    Reviewed-by: Willem de Bruijn <willemb@google.com>
    cc: Jens Axboe <axboe@kernel.dk>
    cc: Matthew Wilcox <willy@infradead.org>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-11-02 15:31:53 -04:00
Jeff Moyer 830aa61a71 net: set FMODE_NOWAIT for sockets
JIRA: https://issues.redhat.com/browse/RHEL-12076

commit fe34db062b8036f72e97c2b9eaa7e9fbb725ead2
Author: Jens Axboe <axboe@kernel.dk>
Date:   Tue May 9 09:19:08 2023 -0600

    net: set FMODE_NOWAIT for sockets
    
    The socket read/write functions deal with O_NONBLOCK and IOCB_NOWAIT
    just fine, so we can flag them as being FMODE_NOWAIT compliant. With
    this, we can remove socket special casing in io_uring when checking
    if a file type is sane for nonblocking IO, and it's also the defined
    way to flag file types as such in the kernel.
    
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Eric Dumazet <edumazet@google.com>
    Cc: Jakub Kicinski <kuba@kernel.org>
    Cc: netdev@vger.kernel.org
    Reviewed-by: Paolo Abeni <pabeni@redhat.com>
    Link: https://lore.kernel.org/r/20230509151910.183637-2-axboe@kernel.dk
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-11-02 15:31:52 -04:00
Paolo Abeni d894b64984 bpf: Add update_socket_protocol hook
JIRA: https://issues.redhat.com/browse/RHEL-15036
Tested: LNST, Tier1

Upstream commit:
commit 0dd061a6a115f25132989cbd591a25afb2dee086
Author: Geliang Tang <geliang.tang@suse.com>
Date:   Wed Aug 16 09:11:56 2023 +0800

    bpf: Add update_socket_protocol hook

    Add a hook named update_socket_protocol in __sys_socket(), for bpf
    progs to attach to and update socket protocol. One user case is to
    force legacy TCP apps to create and use MPTCP sockets instead of
    TCP ones.

    Define a fmod_ret set named bpf_mptcp_fmodret_ids, add the hook
    update_socket_protocol into this set, and register it in
    bpf_mptcp_kfunc_init().

    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/79
    Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Acked-by: Yonghong Song <yonghong.song@linux.dev>
    Signed-off-by: Geliang Tang <geliang.tang@suse.com>
    Link: https://lore.kernel.org/r/ac84be00f97072a46f8a72b4e2be46cbb7fa5053.1692147782.git.geliang.tang@suse.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-31 21:50:01 +01:00
Paolo Abeni d95dec3bf4 net: prevent address rewrite in kernel_bind()
JIRA: https://issues.redhat.com/browse/RHEL-14364
Tested: LNST, Tier1
Conflicts: missing READ_ONCE annotation in the orig code, as rhel lacks\
  the upstream commit 1ded5e5a5931 ("net: annotate data-races around \
  sock->ops")

Upstream commit:
commit c889a99a21bf124c3db08d09df919f0eccc5ea4c
Author: Jordan Rife <jrife@google.com>
Date:   Thu Sep 21 18:46:42 2023 -0500

    net: prevent address rewrite in kernel_bind()

    Similar to the change in commit 0bdf399342c5("net: Avoid address
    overwrite in kernel_connect"), BPF hooks run on bind may rewrite the
    address passed to kernel_bind(). This change

    1) Makes a copy of the bind address in kernel_bind() to insulate
       callers.
    2) Replaces direct calls to sock->ops->bind() in net with kernel_bind()

    Link: https://lore.kernel.org/netdev/20230912013332.2048422-1-jrife@google.com/
    Fixes: 4fbac77d2d ("bpf: Hooks for sys_bind")
    Cc: stable@vger.kernel.org
    Reviewed-by: Willem de Bruijn <willemb@google.com>
    Signed-off-by: Jordan Rife <jrife@google.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-20 13:55:59 +02:00
Paolo Abeni 8005437711 net: prevent rewrite of msg_name in sock_sendmsg()
JIRA: https://issues.redhat.com/browse/RHEL-14364
Tested: LNST, Tier1

Upstream commit:
commit 86a7e0b69bd5b812e48a20c66c2161744f3caa16
Author: Jordan Rife <jrife@google.com>
Date:   Thu Sep 21 18:46:41 2023 -0500

    net: prevent rewrite of msg_name in sock_sendmsg()

    Callers of sock_sendmsg(), and similarly kernel_sendmsg(), in kernel
    space may observe their value of msg_name change in cases where BPF
    sendmsg hooks rewrite the send address. This has been confirmed to break
    NFS mounts running in UDP mode and has the potential to break other
    systems.

    This patch:

    1) Creates a new function called __sock_sendmsg() with same logic as the
       old sock_sendmsg() function.
    2) Replaces calls to sock_sendmsg() made by __sys_sendto() and
       __sys_sendmsg() with __sock_sendmsg() to avoid an unnecessary copy,
       as these system calls are already protected.
    3) Modifies sock_sendmsg() so that it makes a copy of msg_name if
       present before passing it down the stack to insulate callers from
       changes to the send address.

    Link: https://lore.kernel.org/netdev/20230912013332.2048422-1-jrife@google.com/
    Fixes: 1cedee13d2 ("bpf: Hooks for sys_sendmsg")
    Cc: stable@vger.kernel.org
    Reviewed-by: Willem de Bruijn <willemb@google.com>
    Signed-off-by: Jordan Rife <jrife@google.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-20 13:54:02 +02:00
Jeff Moyer dfc60afe37 skbuff: carry external ubuf_info in msghdr
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2068237
Conflicts: I backported commit 72c531f8ef30 ("net: copy from user before
  calling __get_compat_msghdr") and commit 7fa875b8e53c ("net: copy from
  user before calling __copy_msghdr") before this one.  Upstream this commit
  went before those two (I think there was a merge that resolved the
  conflicts, but I couldn't find it).  I fixed up the conflict in this
  patch.

commit 7c701d92b2b5e5175dbfec875816474b802b0c45
Author: Pavel Begunkov <asml.silence@gmail.com>
Date:   Tue Jul 12 21:52:29 2022 +0100

    skbuff: carry external ubuf_info in msghdr
    
    Make possible for network in-kernel callers like io_uring to pass in a
    custom ubuf_info by setting it in a new field of struct msghdr.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-04-29 08:05:02 -04:00
Jeff Moyer 79c5b9e709 net: copy from user before calling __copy_msghdr
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2068237

commit 7fa875b8e53c288d616234b9daf417b0650ce1cc
Author: Dylan Yudaken <dylany@fb.com>
Date:   Thu Jul 14 04:02:56 2022 -0700

    net: copy from user before calling __copy_msghdr
    
    this is in preparation for multishot receive from io_uring, where it needs
    to have access to the original struct user_msghdr.
    
    functionally this should be a no-op.
    
    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Dylan Yudaken <dylany@fb.com>
    Link: https://lore.kernel.org/r/20220714110258.1336200-2-dylany@fb.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-04-29 07:34:02 -04:00
Jeff Moyer bbfff3a273 net: avoid double iput when sock_alloc_file fails
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2123490

commit 649c15c7691e9b13cbe9bf6c65c365350e056067
Author: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Date:   Tue Mar 7 14:37:07 2023 -0300

    net: avoid double iput when sock_alloc_file fails
    
    When sock_alloc_file fails to allocate a file, it will call sock_release.
    __sys_socket_file should then not call sock_release again, otherwise there
    will be a double free.
    
    [   89.319884] ------------[ cut here ]------------
    [   89.320286] kernel BUG at fs/inode.c:1764!
    [   89.320656] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
    [   89.321051] CPU: 7 PID: 125 Comm: iou-sqp-124 Not tainted 6.2.0+ #361
    [   89.321535] RIP: 0010:iput+0x1ff/0x240
    [   89.321808] Code: d1 83 e1 03 48 83 f9 02 75 09 48 81 fa 00 10 00 00 77 05 83 e2 01 75 1f 4c 89 ef e8 fb d2 ba 00 e9 80 fe ff ff c3 cc cc cc cc <0f> 0b 0f 0b e9 d0 fe ff ff 0f 0b eb 8d 49 8d b4 24 08 01 00 00 48
    [   89.322760] RSP: 0018:ffffbdd60068bd50 EFLAGS: 00010202
    [   89.323036] RAX: 0000000000000000 RBX: ffff9d7ad3cacac0 RCX: 0000000000001107
    [   89.323412] RDX: 000000000003af00 RSI: 0000000000000000 RDI: ffff9d7ad3cacb40
    [   89.323785] RBP: ffffbdd60068bd68 R08: ffffffffffffffff R09: ffffffffab606438
    [   89.324157] R10: ffffffffacb3dfa0 R11: 6465686361657256 R12: ffff9d7ad3cacb40
    [   89.324529] R13: 0000000080000001 R14: 0000000080000001 R15: 0000000000000002
    [   89.324904] FS:  00007f7b28516740(0000) GS:ffff9d7aeb1c0000(0000) knlGS:0000000000000000
    [   89.325328] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [   89.325629] CR2: 00007f0af52e96c0 CR3: 0000000002a02006 CR4: 0000000000770ee0
    [   89.326004] PKRU: 55555554
    [   89.326161] Call Trace:
    [   89.326298]  <TASK>
    [   89.326419]  __sock_release+0xb5/0xc0
    [   89.326632]  __sys_socket_file+0xb2/0xd0
    [   89.326844]  io_socket+0x88/0x100
    [   89.327039]  ? io_issue_sqe+0x6a/0x430
    [   89.327258]  io_issue_sqe+0x67/0x430
    [   89.327450]  io_submit_sqes+0x1fe/0x670
    [   89.327661]  io_sq_thread+0x2e6/0x530
    [   89.327859]  ? __pfx_autoremove_wake_function+0x10/0x10
    [   89.328145]  ? __pfx_io_sq_thread+0x10/0x10
    [   89.328367]  ret_from_fork+0x29/0x50
    [   89.328576] RIP: 0033:0x0
    [   89.328732] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
    [   89.329073] RSP: 002b:0000000000000000 EFLAGS: 00000202 ORIG_RAX: 00000000000001a9
    [   89.329477] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007f7b28637a3d
    [   89.329845] RDX: 00007fff4e4318a8 RSI: 00007fff4e4318b0 RDI: 0000000000000400
    [   89.330216] RBP: 00007fff4e431830 R08: 00007fff4e431711 R09: 00007fff4e4318b0
    [   89.330584] R10: 0000000000000000 R11: 0000000000000202 R12: 00007fff4e441b38
    [   89.330950] R13: 0000563835e3e725 R14: 0000563835e40d10 R15: 00007f7b28784040
    [   89.331318]  </TASK>
    [   89.331441] Modules linked in:
    [   89.331617] ---[ end trace 0000000000000000 ]---
    
    Fixes: da214a475f8b ("net: add __sys_socket_file()")
    Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
    Reviewed-by: Jens Axboe <axboe@kernel.dk>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Link: https://lore.kernel.org/r/20230307173707.468744-1-cascardo@canonical.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-03-28 11:00:20 -04:00
Jeff Moyer a3ace3f26f net: clear msg_get_inq in __sys_recvfrom() and __copy_msghdr_from_user()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2123490

commit 1228b34c8d0ecf6de18c4c95d22f60cc8607c50a
Author: Eric Dumazet <edumazet@google.com>
Date:   Wed Jun 22 15:02:20 2022 +0000

    net: clear msg_get_inq in __sys_recvfrom() and __copy_msghdr_from_user()
    
    syzbot reported uninit-value in tcp_recvmsg() [1]
    
    Issue here is that msg->msg_get_inq should have been cleared,
    otherwise tcp_recvmsg() might read garbage and perform
    more work than needed, or have undefined behavior.
    
    Given CONFIG_INIT_STACK_ALL_ZERO=y is probably going to be
    the default soon, I chose to change __sys_recvfrom() to clear
    all fields but msghdr.addr which might be not NULL.
    
    For __copy_msghdr_from_user(), I added an explicit clear
    of kmsg->msg_get_inq.
    
    [1]
    BUG: KMSAN: uninit-value in tcp_recvmsg+0x6cf/0xb60 net/ipv4/tcp.c:2557
    tcp_recvmsg+0x6cf/0xb60 net/ipv4/tcp.c:2557
    inet_recvmsg+0x13a/0x5a0 net/ipv4/af_inet.c:850
    sock_recvmsg_nosec net/socket.c:995 [inline]
    sock_recvmsg net/socket.c:1013 [inline]
    __sys_recvfrom+0x696/0x900 net/socket.c:2176
    __do_sys_recvfrom net/socket.c:2194 [inline]
    __se_sys_recvfrom net/socket.c:2190 [inline]
    __x64_sys_recvfrom+0x122/0x1c0 net/socket.c:2190
    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
    entry_SYSCALL_64_after_hwframe+0x46/0xb0
    
    Local variable msg created at:
    __sys_recvfrom+0x81/0x900 net/socket.c:2154
    __do_sys_recvfrom net/socket.c:2194 [inline]
    __se_sys_recvfrom net/socket.c:2190 [inline]
    __x64_sys_recvfrom+0x122/0x1c0 net/socket.c:2190
    
    CPU: 0 PID: 3493 Comm: syz-executor170 Not tainted 5.19.0-rc3-syzkaller-30868-g4b28366af7d9 #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    
    Fixes: f94fd25cb0aa ("tcp: pass back data left in socket after receive")
    Reported-by: syzbot <syzkaller@googlegroups.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Jens Axboe <axboe@kernel.dk>
    Tested-by: Alexander Potapenko<glider@google.com>
    Link: https://lore.kernel.org/r/20220622150220.1091182-1-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-03-16 08:38:43 -04:00
Jeff Moyer 7384cba1f5 net: add __sys_socket_file()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2123490

commit da214a475f8bd1d3e9e7a19ddfeb4d1617551bab
Author: Jens Axboe <axboe@kernel.dk>
Date:   Tue Apr 12 14:22:39 2022 -0600

    net: add __sys_socket_file()
    
    This works like __sys_socket(), except instead of allocating and
    returning a socket fd, it just returns the file associated with the
    socket. No fd is installed into the process file table.
    
    This is similar to do_accept(), and allows io_uring to use this without
    instantiating a file descriptor in the process file table.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Acked-by: David S. Miller <davem@davemloft.net>
    Link: https://lore.kernel.org/r/20220412202240.234207-2-axboe@kernel.dk

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-03-16 08:29:43 -04:00
Paolo Abeni 630836aece net: introduce and use custom sockopt socket flag
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2133057
Tested: LNST, Tier1

Upstream commit:
commit a5ef058dc4d9a3e60d1808a0700e18e0e37e408e
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Thu Oct 20 19:48:51 2022 +0200

    net: introduce and use custom sockopt socket flag

    We will soon introduce custom setsockopt for UDP sockets, too.
    Instead of doing even more complex arbitrary checks inside
    sock_use_custom_sol_socket(), add a new socket flag and set it
    for the relevant socket types (currently only MPTCP).

    Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Acked-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-28 09:38:26 +02:00
Frantisek Hrbata f422c448a1 Merge: io_uring: update to v5.15
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1142

# Merge Request Required Information

## Summary of Changes
Update the io_uring code base and its dependencies to v5.15.  We will not enable the functionality at this time, this is only a preparatory patch series.  The patch series does touch other subsystems, though:

91ef658fb8b8-namei-ignore-ERR-NULL-names-in-putname.patch
0ee50b47532a-namei-change-filename_parentat-calling-conventions.patch
584d3226d665-namei-make-do_mkdirat-take-struct-filename.patch
7797251bb5ab-namei-make-do_mknodat-take-struct-filename.patch
da2d0cede330-namei-make-do_symlinkat-take-struct-filename.patch
8228e2c31319-namei-add-getname_uflags.patch
020250f31c4c-namei-make-do_linkat-take-struct-filename.patch
45f30dab3957-namei-update-do_-helpers-to-return-ints.patch
d32f89da7fa8-net-add-accept-helper-not-installing-fd.patch
2112ff5ce0c1-iov_iter-track-truncated-size.patch
0766ec82e5fb-namei-Fix-use-after-free-in-kern_path_locked.patch
8fb0f47a9d7a-iov_iter-add-helper-to-save-iov_iter-state.patch
7dedd3e18077-Revert-iov_iter-track-truncated-size.patch
3a862cacf867-fs-add-anon_inode_getfile_secure-similar-to-anon_inode_getfd_secure.patch

As a result, file system, block and networking tests should be run.

Omitted-fix: 81132a39c152 ("fs: remove fget_many and fput_many interface")
             This is outside the scope of this MR, and isn't a "fix" so much as a performance enhancement.

## Approved Bugzilla Ticket
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2107656

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>

Approved-by: Antoine Tenart <atenart@redhat.com>
Approved-by: Ming Lei <ming.lei@redhat.com>
Approved-by: Brian Foster <bfoster@redhat.com>

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
2022-10-21 09:47:33 -04:00
Paolo Abeni 1fae73a2ba net: Fix a data-race around sysctl_somaxconn.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2134161
Tested: LNST, Tier1

Upstream commit:
commit 3c9ba81d72047f2e81bb535d42856517b613aba7
Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Tue Aug 23 10:47:00 2022 -0700

    net: Fix a data-race around sysctl_somaxconn.

    While reading sysctl_somaxconn, it can be changed concurrently.
    Thus, we need to add READ_ONCE() to its reader.

    Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-13 13:00:04 +02:00
Artem Savkov 75a645a56c add missing bpf-cgroup.h includes
Bugzilla: https://bugzilla.redhat.com/2069046

Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit aef2feda97b840ec38e9fa53d0065188453304e8
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Wed Dec 15 18:55:37 2021 -0800

    add missing bpf-cgroup.h includes

    We're about to break the cgroup-defs.h -> bpf-cgroup.h dependency,
    make sure those who actually need more than the definition of
    struct cgroup_bpf include bpf-cgroup.h explicitly.

    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Tejun Heo <tj@kernel.org>
    Link: https://lore.kernel.org/bpf/20211216025538.1649516-3-kuba@kernel.org

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2022-08-24 12:53:49 +02:00
Jeff Moyer 9dd666f0d9 net: add accept helper not installing fd
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2107656

commit d32f89da7fa8ccc8b3fb8f909d61e42b9bc39329
Author: Pavel Begunkov <asml.silence@gmail.com>
Date:   Wed Aug 25 12:25:44 2021 +0100

    net: add accept helper not installing fd
    
    Introduce and reuse a helper that acts similarly to __sys_accept4_file()
    but returns struct file instead of installing file descriptor. Will be
    used by io_uring.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Acked-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Acked-by: David S. Miller <davem@davemloft.net>
    Link: https://lore.kernel.org/r/c57b9e8e818d93683a3d24f8ca50ca038d1da8c4.1629888991.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2022-07-15 14:58:36 -04:00
Patrick Talbert 4333f7706e Merge: PTP: backport fixes from upstream
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/854

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2066451

The first patch is a dependence for patch 007747a984ea ("net: fix SOF_TIMESTAMPING_BIND_PHC to work with multiple sockets"). So I also backported it. Others are ptp_kvm and ptp_vclock fixes

v2: add another fix b6b19a71c8bb ("ptp: free 'vclock_index' in ptp_clock_release()")

Signed-off-by: Hangbin Liu <haliu@redhat.com>

Approved-by: Íñigo Huguet <ihuguet@redhat.com>
Approved-by: Kamal Heib <kheib@redhat.com>
Approved-by: Antoine Tenart <atenart@redhat.com>
Approved-by: Jarod Wilson <jarod@redhat.com>

Signed-off-by: Patrick Talbert <ptalbert@redhat.com>
2022-06-07 09:43:11 +02:00
Patrick Talbert 811a08e92c Merge: bridge: update bridge and switchdev to upstream v5.18
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/824

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2081601
Depends: https://bugzilla.redhat.com/show_bug.cgi?id=2081260
Tested: Using bridge related self-tests

This MR updates the bridge network core to the upstream version v5.18.

List of commits:
```
4682048af0c8 ("net: bridge: remove fdb_notify forward declaration")
5f94a5e276ae ("net: bridge: remove fdb_insert forward declaration")
4731b6d6b257 ("net: bridge: rename fdb_insert to fdb_add_local")
f6814fdcfe1b ("net: bridge: rename br_fdb_insert to br_fdb_add_local")
9574fb558044 ("net: bridge: reduce indentation level in fdb_create")
5cda5272a460 ("net: bridge: move br_fdb_replay inside br_switchdev.c")
fab9eca88410 ("net: bridge: create a common function for populating switchdev FDB entries")
716a30a97a52 ("net: switchdev: merge switchdev_handle_fdb_{add,del}_to_device")
c5f6e5ebc2af ("net: bridge: provide shim definition for br_vlan_flags")
4a6849e46173 ("net: bridge: move br_vlan_replay to br_switchdev.c")
9ae9ff994b0e ("net: bridge: split out the switchdev portion of br_mdb_notify")
9776457c784f ("net: bridge: mdb: move all switchdev logic to br_switchdev.c")
326b212e9cd6 ("net: bridge: switchdev: consistent function naming")
ae0393500e3b ("net: bridge: switchdev: fix shim definition for br_switchdev_mdb_notify")
cc0be1ad686f ("net: bridge: Slightly optimize 'find_portno()'")
520fbdf7fb19 ("net/bridge: replace simple_strtoul to kstrtol")
5a45ab3f248b ("net: bridge: Allow base 16 inputs in sysfs")
442b03c32ca1 ("bridge: use __set_bit in __br_vlan_set_default_pvid")
fd3a45900055 ("net: bridge: Get SIOCGIFBR/SIOCSIFBR ioctl working in compat mode")
dcb2c5c6ca9b ("net: bridge: vlan: fix single net device option dumping")
fd20d9738395 ("net: bridge: vlan: fix memory leak in __allowed_ingress")
d8c2858181cc ("net/switchdev: use struct_size over open coded arithmetic")
5454f5c28eca ("net: bridge: vlan: check for errors from __vlan_del in __vlan_flush")
b2bc58d41fde ("net: bridge: vlan: check early for lack of BRENTRY flag in br_vlan_add_existing")
3116ad0696dd ("net: bridge: vlan: don't notify to switchdev master VLANs without BRENTRY flag")
cab2cd770051 ("net: bridge: vlan: make __vlan_add_flags react only to PVID and UNTAGGED")
27c5f74c7ba7 ("net: bridge: vlan: notify switchdev only when something changed")
8d23a54f5bee ("net: bridge: switchdev: differentiate new VLANs from changed ones")
263029ae3172 ("net: bridge: make nbp_switchdev_unsync_objs() follow reverse order of sync()")
b28d580e2939 ("net: bridge: switchdev: replay all VLAN groups")
7b465f4cf39e ("net: switchdev: rename switchdev_lower_dev_find to switchdev_lower_dev_find_rcu")
c4076cdd21f8 ("net: switchdev: introduce switchdev_handle_port_obj_{add,del} for foreign interfaces")
c832962ac972 ("net: bridge: multicast: notify switchdev driver whenever MC processing gets disabled")
36a29fb6b22d ("bridge: switch br_net_exit to batch mode")
acd8df5880d7 ("net: switchdev: avoid infinite recursion from LAG to bridge with port object handler")
a21d9a670d81 ("net: bridge: Add support for bridge port in locked mode")
fa1c83342987 ("net: bridge: Add support for offloading of locked port flag")
b2b681a41251 ("selftests: forwarding: tests of locked port feature")
ec638740fce9 ("net: switchdev: remove lag_mod_cb from switchdev_handle_fdb_event_to_device")
ec7328b59176 ("net: bridge: mst: Multiple Spanning Tree (MST) mode")
8c678d60562f ("net: bridge: mst: Allow changing a VLAN's MSTI")
122c29486e1f ("net: bridge: mst: Support setting and reporting MST port states")
87c167bb94ee ("net: bridge: mst: Notify switchdev drivers of MST mode changes")
6284c723d9b9 ("net: bridge: mst: Notify switchdev drivers of VLAN MSTI migrations")
7ae9147f4312 ("net: bridge: mst: Notify switchdev drivers of MST state changes")
cceac97afa09 ("net: bridge: mst: Add helper to map an MSTI to a VID set")
48d57b2e5f43 ("net: bridge: mst: Add helper to check if MST is enabled")
f54fd0e16306 ("net: bridge: mst: Add helper to query a port's MST state")
917b149ac3d5 ("selftests: forwarding: Disable learning before link up")
f70f5f1a8fff ("selftests: forwarding: Use same VRF for port and VLAN upper")
cde3fc244b3d ("net: bridge: mst: prevent NULL deref in br_mst_info_size()")
a911ad18a56a ("net: bridge: mst: Restrict info size queries to bridge ports")
7f40ea2145d9 ("net: bridge: switchdev: check br_vlan_group() return value")
```

Signed-off-by: Ivan Vecera <ivecera@redhat.com>

Approved-by: Petr Oros <poros@redhat.com>
Approved-by: Jarod Wilson <jarod@redhat.com>
Approved-by: Hangbin Liu <haliu@redhat.com>

Signed-off-by: Patrick Talbert <ptalbert@redhat.com>
2022-05-23 09:32:29 +02:00
Hangbin Liu b32b81ce3f net: fix SOF_TIMESTAMPING_BIND_PHC to work with multiple sockets
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2066451
Upstream Status: net.git commit 007747a984ea

commit 007747a984ea5e895b7d8b056b24ebf431e1e71d
Author: Miroslav Lichvar <mlichvar@redhat.com>
Date:   Wed Jan 5 11:33:26 2022 +0100

    net: fix SOF_TIMESTAMPING_BIND_PHC to work with multiple sockets

    When multiple sockets using the SOF_TIMESTAMPING_BIND_PHC flag received
    a packet with a hardware timestamp (e.g. multiple PTP instances in
    different PTP domains using the UDPv4/v6 multicast or L2 transport),
    the timestamps received on some sockets were corrupted due to repeated
    conversion of the same timestamp (by the same or different vclocks).

    Fix ptp_convert_timestamp() to not modify the shared skb timestamp
    and return the converted timestamp as a ktime_t instead. If the
    conversion fails, return 0 to not confuse the application with
    timestamps corresponding to an unexpected PHC.

    Fixes: d7c0882655 ("net: socket: support hardware timestamp conversion to PHC bound")
    Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
    Cc: Yangbo Lu <yangbo.lu@nxp.com>
    Cc: Richard Cochran <richardcochran@gmail.com>
    Acked-by: Richard Cochran <richardcochran@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Hangbin Liu <haliu@redhat.com>
2022-05-16 12:23:46 +08:00
Ivan Vecera 854ad3de76 net: bridge: Get SIOCGIFBR/SIOCSIFBR ioctl working in compat mode
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2081601

commit fd3a459000557ff12c1d4b41f1bd30f439f6c942
Author: Remi Pommarel <repk@triplefau.lt>
Date:   Fri Dec 24 12:46:40 2021 +0100

    net: bridge: Get SIOCGIFBR/SIOCSIFBR ioctl working in compat mode

    In compat mode SIOC{G,S}IFBR ioctls were only supporting
    BRCTL_GET_VERSION returning an artificially version to spur userland
    tool to use SIOCDEVPRIVATE instead. But some userland tools ignore that
    and use SIOC{G,S}IFBR unconditionally as seen with busybox's brctl.

    Example of non working 32-bit brctl with CONFIG_COMPAT=y:
    $ brctl show
    brctl: SIOCGIFBR: Invalid argument

    Example of fixed 32-bit brctl with CONFIG_COMPAT=y:
    $ brctl show
    bridge name     bridge id               STP enabled     interfaces
    br0

    Signed-off-by: Remi Pommarel <repk@triplefau.lt>
    Co-developed-by: Arnd Bergmann <arnd@arndb.de>
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2022-05-04 09:55:34 +02:00
Waiman Long bda0da4d09 fs: allocate inode by using alloc_inode_sb()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2013413
Conflicts:
 1) Merge conflict in fs/xfs/xfs_icache.c due to missing upstream commit
    182696fb021f ("xfs: rename _zone variables to _cache").
 2) The hunk for fs/9p/vfs_inode.c is dropped due to merge conflict and
    9P filesystem not supported in RHEL9.
 3) The hunk for fs/ntfs3/super.c is dropped due to file not currently
    present.

commit fd60b28842df833477c42da6a6d63d0d114a5fcc
Author: Muchun Song <songmuchun@bytedance.com>
Date:   Tue, 22 Mar 2022 14:41:03 -0700

    fs: allocate inode by using alloc_inode_sb()

    The inode allocation is supposed to use alloc_inode_sb(), so convert
    kmem_cache_alloc() of all filesystems to alloc_inode_sb().

    Link: https://lkml.kernel.org/r/20220228122126.37293-5-songmuchun@bytedance.com
    Signed-off-by: Muchun Song <songmuchun@bytedance.com>
    Acked-by: Theodore Ts'o <tytso@mit.edu>         [ext4]
    Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
    Cc: Alex Shi <alexs@kernel.org>
    Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
    Cc: Chao Yu <chao@kernel.org>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Fam Zheng <fam.zheng@bytedance.com>
    Cc: Jaegeuk Kim <jaegeuk@kernel.org>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Kari Argillander <kari.argillander@gmail.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Qi Zheng <zhengqi.arch@bytedance.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
    Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Wei Yang <richard.weiyang@gmail.com>
    Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
    Cc: Yang Shi <shy828301@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-04-07 14:11:13 -04:00
Paolo Abeni 2e122ac9c0 net: don't unconditionally copy_from_user a struct ifreq for socket ioctls
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2028276
Tested: LNST, Tier1
Conflicts: the upstream commit had some relevant conflicts due \
  to the net-next commit 876f0bf9d0d5 ("net: socket: simplify \
  dev_ifconf handling"), solved by the merge commit 29ce8f970107
  ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net").

Upstream commit:
commit d0efb16294d145d157432feda83877ae9d7cdf37
Author: Peter Collingbourne <pcc@google.com>
Date:   Thu Aug 26 12:46:01 2021 -0700

    net: don't unconditionally copy_from_user a struct ifreq for socket ioctls

    A common implementation of isatty(3) involves calling a ioctl passing
    a dummy struct argument and checking whether the syscall failed --
    bionic and glibc use TCGETS (passing a struct termios), and musl uses
    TIOCGWINSZ (passing a struct winsize). If the FD is a socket, we will
    copy sizeof(struct ifreq) bytes of data from the argument and return
    -EFAULT if that fails. The result is that the isatty implementations
    may return a non-POSIX-compliant value in errno in the case where part
    of the dummy struct argument is inaccessible, as both struct termios
    and struct winsize are smaller than struct ifreq (at least on arm64).

    Although there is usually enough stack space following the argument
    on the stack that this did not present a practical problem up to now,
    with MTE stack instrumentation it's more likely for the copy to fail,
    as the memory following the struct may have a different tag.

    Fix the problem by adding an early check for whether the ioctl is a
    valid socket ioctl, and return -ENOTTY if it isn't.

    Fixes: 44c02a2c3d ("dev_ioctl(): move copyin/copyout to callers")
    Link: https://linux-review.googlesource.com/id/I869da6cf6daabc3e4b7b82ac979683ba05e27d4d
    Signed-off-by: Peter Collingbourne <pcc@google.com>
    Cc: <stable@vger.kernel.org> # 4.19
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2021-12-09 10:44:08 +01:00
Ivan Vecera cc390e2186 net: bridge: move bridge ioctls out of .ndo_do_ioctl
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2008927

commit ad2f99aedf8fa77f3ae647153284fa63c43d3055
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Tue Jul 27 15:45:16 2021 +0200

    net: bridge: move bridge ioctls out of .ndo_do_ioctl

    Working towards obsoleting the .ndo_do_ioctl operation entirely,
    stop passing the SIOCBRADDIF/SIOCBRDELIF device ioctl commands
    into this callback.

    My first attempt was to add another ndo_siocbr() callback, but
    as there is only a single driver that takes these commands and
    there is already a hook mechanism to call directly into this
    driver, extend this hook instead, and use it for both the
    deviceless and the device specific ioctl commands.

    Cc: Roopa Prabhu <roopa@nvidia.com>
    Cc: Nikolay Aleksandrov <nikolay@nvidia.com>
    Cc: bridge@lists.linux-foundation.org
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2021-10-11 15:44:05 +02:00
Ivan Vecera d42cb6adbc net: socket: return changed ifreq from SIOCDEVPRIVATE
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2008927

commit 88fc023f7de22922c6c61e2f3d4c54befb8b3549
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Tue Jul 27 15:45:15 2021 +0200

    net: socket: return changed ifreq from SIOCDEVPRIVATE

    Some drivers that use SIOCDEVPRIVATE ioctl commands modify
    the ifreq structure and expect it to be passed back to user
    space, which has never really happened for compat mode
    because the calling these drivers through ndo_do_ioctl
    requires overwriting the ifr_data pointer.

    Now that all drivers are converted to ndo_siocdevprivate,
    change it to handle this correctly in both compat and
    native mode.

    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2021-10-11 15:44:04 +02:00