Centos-kernel-stream-9

Commit Graph

Author	SHA1	Message	Date
CKI Backport Bot	9bb50ec24d	net: fix data-races around sk->sk_forward_alloc JIRA: https://issues.redhat.com/browse/RHEL-69689 CVE: CVE-2024-53124 commit 073d89808c065ac4c672c0a613a71b27a80691cb Author: Wang Liang <wangliang74@huawei.com> Date: Thu Nov 7 10:34:05 2024 +0800 net: fix data-races around sk->sk_forward_alloc Syzkaller reported this warning: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 16 at net/ipv4/af_inet.c:156 inet_sock_destruct+0x1c5/0x1e0 Modules linked in: CPU: 0 UID: 0 PID: 16 Comm: ksoftirqd/0 Not tainted 6.12.0-rc5 #26 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:inet_sock_destruct+0x1c5/0x1e0 Code: 24 12 4c 89 e2 5b 48 c7 c7 98 ec bb 82 41 5c e9 d1 18 17 ff 4c 89 e6 5b 48 c7 c7 d0 ec bb 82 41 5c e9 bf 18 17 ff 0f 0b eb 83 <0f> 0b eb 97 0f 0b eb 87 0f 0b e9 68 ff ff ff 66 66 2e 0f 1f 84 00 RSP: 0018:ffffc9000008bd90 EFLAGS: 00010206 RAX: 0000000000000300 RBX: ffff88810b172a90 RCX: 0000000000000007 RDX: 0000000000000002 RSI: 0000000000000300 RDI: ffff88810b172a00 RBP: ffff88810b172a00 R08: ffff888104273c00 R09: 0000000000100007 R10: 0000000000020000 R11: 0000000000000006 R12: ffff88810b172a00 R13: 0000000000000004 R14: 0000000000000000 R15: ffff888237c31f78 FS: 0000000000000000(0000) GS:ffff888237c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffc63fecac8 CR3: 000000000342e000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __warn+0x88/0x130 ? inet_sock_destruct+0x1c5/0x1e0 ? report_bug+0x18e/0x1a0 ? handle_bug+0x53/0x90 ? exc_invalid_op+0x18/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? inet_sock_destruct+0x1c5/0x1e0 __sk_destruct+0x2a/0x200 rcu_do_batch+0x1aa/0x530 ? rcu_do_batch+0x13b/0x530 rcu_core+0x159/0x2f0 handle_softirqs+0xd3/0x2b0 ? __pfx_smpboot_thread_fn+0x10/0x10 run_ksoftirqd+0x25/0x30 smpboot_thread_fn+0xdd/0x1d0 kthread+0xd3/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x34/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> ---[ end trace 0000000000000000 ]--- Its possible that two threads call tcp_v6_do_rcv()/sk_forward_alloc_add() concurrently when sk->sk_state == TCP_LISTEN with sk->sk_lock unlocked, which triggers a data-race around sk->sk_forward_alloc: tcp_v6_rcv tcp_v6_do_rcv skb_clone_and_charge_r sk_rmem_schedule __sk_mem_schedule sk_forward_alloc_add() skb_set_owner_r sk_mem_charge sk_forward_alloc_add() __kfree_skb skb_release_all skb_release_head_state sock_rfree sk_mem_uncharge sk_forward_alloc_add() sk_mem_reclaim // set local var reclaimable __sk_mem_reclaim sk_forward_alloc_add() In this syzkaller testcase, two threads call tcp_v6_do_rcv() with skb->truesize=768, the sk_forward_alloc changes like this: (cpu 1) \| (cpu 2) \| sk_forward_alloc ... \| ... \| 0 __sk_mem_schedule() \| \| +4096 = 4096 \| __sk_mem_schedule() \| +4096 = 8192 sk_mem_charge() \| \| -768 = 7424 \| sk_mem_charge() \| -768 = 6656 ... \| ... \| sk_mem_uncharge() \| \| +768 = 7424 reclaimable=7424 \| \| \| sk_mem_uncharge() \| +768 = 8192 \| reclaimable=8192 \| __sk_mem_reclaim() \| \| -4096 = 4096 \| __sk_mem_reclaim() \| -8192 = -4096 != 0 The skb_clone_and_charge_r() should not be called in tcp_v6_do_rcv() when sk->sk_state is TCP_LISTEN, it happens later in tcp_v6_syn_recv_sock(). Fix the same issue in dccp_v6_do_rcv(). Suggested-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Fixes: `e994b2f0fb` ("tcp: do not lock listener to process SYN packets") Signed-off-by: Wang Liang <wangliang74@huawei.com> Link: https://patch.msgid.link/20241107023405.889239-1-wangliang74@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>	2024-12-02 14:40:01 +00:00
Paolo Abeni	14cb20c8b2	tcp: fix race in tcp_v6_syn_recv_sock() JIRA: https://issues.redhat.com/browse/RHEL-62865 Tested: LNST, Tier1 Upstream commit: commit d37fe4255abe8e7b419b90c5847e8ec2b8debb08 Author: Eric Dumazet <edumazet@google.com> Date: Thu Jun 6 15:46:51 2024 +0000 tcp: fix race in tcp_v6_syn_recv_sock() tcp_v6_syn_recv_sock() calls ip6_dst_store() before inet_sk(newsk)->pinet6 has been set up. This means ip6_dst_store() writes over the parent (listener) np->dst_cookie. This is racy because multiple threads could share the same parent and their final np->dst_cookie could be wrong. Move ip6_dst_store() call after inet_sk(newsk)->pinet6 has been changed and after the copy of parent ipv6_pinfo. Fixes: `e994b2f0fb` ("tcp: do not lock listener to process SYN packets") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-10-16 19:09:14 +02:00
Paolo Abeni	fdad6e7a51	tcp: replace TCP_SKB_CB(skb)->tcp_tw_isn with a per-cpu field JIRA: https://issues.redhat.com/browse/RHEL-62865 Tested: LNST, Tier1 Conflicts: different context in tcp_conn_request(), as rhel-9 \ lacks the TCP AO support. Upstream commit: commit 41eecbd712b73f0d5dcf1152b9a1c27b1f238028 Author: Eric Dumazet <edumazet@google.com> Date: Sun Apr 7 09:33:22 2024 +0000 tcp: replace TCP_SKB_CB(skb)->tcp_tw_isn with a per-cpu field TCP can transform a TIMEWAIT socket into a SYN_RECV one from a SYN packet, and the ISN of the SYNACK packet is normally generated using TIMEWAIT tw_snd_nxt : tcp_timewait_state_process() ... u32 isn = tcptw->tw_snd_nxt + 65535 + 2; if (isn == 0) isn++; TCP_SKB_CB(skb)->tcp_tw_isn = isn; return TCP_TW_SYN; This SYN packet also bypasses normal checks against listen queue being full or not. tcp_conn_request() ... __u32 isn = TCP_SKB_CB(skb)->tcp_tw_isn; ... /* TW buckets are converted to open requests without * limitations, they conserve resources and peer is * evidently real one. */ if ((syncookies == 2 \|\| inet_csk_reqsk_queue_is_full(sk)) && !isn) { want_cookie = tcp_syn_flood_action(sk, rsk_ops->slab_name); if (!want_cookie) goto drop; } This was using TCP_SKB_CB(skb)->tcp_tw_isn field in skb. Unfortunately this field has been accidentally cleared after the call to tcp_timewait_state_process() returning TCP_TW_SYN. Using a field in TCP_SKB_CB(skb) for a temporary state is overkill. Switch instead to a per-cpu variable. As a bonus, we do not have to clear tcp_tw_isn in TCP receive fast path. It is temporarily set then cleared only in the TCP_TW_SYN dance. Fixes: `4ad19de877` ("net: tcp6: fix double call of tcp_v6_fill_cb()") Fixes: `eeea10b83a` ("tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-10-16 19:08:41 +02:00
Paolo Abeni	4cd846284a	tcp: propagate tcp_tw_isn via an extra parameter to ->route_req() JIRA: https://issues.redhat.com/browse/RHEL-62865 Tested: LNST, Tier1 Upstream commit: commit b9e810405880c99baafd550ada7043e86465396e Author: Eric Dumazet <edumazet@google.com> Date: Sun Apr 7 09:33:21 2024 +0000 tcp: propagate tcp_tw_isn via an extra parameter to ->route_req() tcp_v6_init_req() reads TCP_SKB_CB(skb)->tcp_tw_isn to find out if the request socket is created by a SYN hitting a TIMEWAIT socket. This has been buggy for a decade, lets directly pass the information from tcp_conn_request(). This is a preparatory patch to make the following one easier to review. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-10-16 19:07:53 +02:00
Florian Westphal	23f780623d	tcp: annotate data-races around tw->tw_ts_recent and tw->tw_ts_recent_stamp JIRA: https://issues.redhat.com/browse/RHEL-9279 Upstream Status: commit 69e0b33a7fce CS9 lacks both support for TCP Authentication option and usec resolution for TCP timestamps. Both features are out of scope, so do needed context fixups. This change was added to reduce conflicts in the followup patch. commit 69e0b33a7fce4d96649b9fa32e56b696921aa48e Author: Eric Dumazet <edumazet@google.com> Date: Mon Jun 3 15:51:06 2024 +0000 tcp: annotate data-races around tw->tw_ts_recent and tw->tw_ts_recent_stamp These fields can be read and written locklessly, add annotations around these minor races. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Conflicts: net/ipv4/tcp_ipv4.c net/ipv6/tcp_ipv6.c Signed-off-by: Florian Westphal <fwestpha@redhat.com>	2024-08-21 16:55:25 +02:00
Lucas Zampieri	55f96777fb	Merge: net: backport visibility improvements MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4765 JIRA: https://issues.redhat.com/browse/RHEL-48648 Various visibility improvements; mainly around drop reasons, reset reason and improved tracepoints this time. Signed-off-by: Antoine Tenart <atenart@redhat.com> Approved-by: Chris von Recklinghausen <crecklin@redhat.com> Approved-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Lucas Zampieri <lzampier@redhat.com>	2024-08-12 16:18:50 +00:00
Antoine Tenart	3a0f9f0ce0	tcp: use sk_skb_reason_drop to free rx packets JIRA: https://issues.redhat.com/browse/RHEL-48648 Upstream Status: net-next.git commit 46a02aa357529d7b038096955976b14f7c44aa23 Author: Yan Zhai <yan@cloudflare.com> Date: Mon Jun 17 11:09:20 2024 -0700 tcp: use sk_skb_reason_drop to free rx packets Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving socket to the tracepoint. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/r/202406011539.jhwBd7DX-lkp@intel.com/ Signed-off-by: Yan Zhai <yan@cloudflare.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Antoine Tenart <atenart@redhat.com>	2024-07-16 17:29:42 +02:00
Antoine Tenart	0bc1f777a4	tcp: rstreason: handle timewait cases in the receive path JIRA: https://issues.redhat.com/browse/RHEL-48648 Upstream Status: linux.git commit 22a32557758a7100e46dfa8f383a401125e60b16 Author: Jason Xing <kernelxing@tencent.com> Date: Fri May 10 20:25:01 2024 +0800 tcp: rstreason: handle timewait cases in the receive path There are two possible cases where TCP layer can send an RST. Since they happen in the same place, I think using one independent reason is enough to identify this special situation. Signed-off-by: Jason Xing <kernelxing@tencent.com> Link: https://lore.kernel.org/r/20240510122502.27850-5-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Antoine Tenart <atenart@redhat.com>	2024-07-16 17:29:42 +02:00
Antoine Tenart	51c78f9a4a	rstreason: make it work in trace world JIRA: https://issues.redhat.com/browse/RHEL-48648 Upstream Status: linux.git commit b533fb9cf4f7c6ca2aa255a5a1fdcde49fff2b24 Author: Jason Xing <kernelxing@tencent.com> Date: Thu Apr 25 11:13:40 2024 +0800 rstreason: make it work in trace world At last, we should let it work by introducing this reset reason in trace world. One of the possible expected outputs is: ... tcp_send_reset: skbaddr=xxx skaddr=xxx src=xxx dest=xxx state=TCP_ESTABLISHED reason=NOT_SPECIFIED Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Antoine Tenart <atenart@redhat.com>	2024-07-16 17:29:41 +02:00
Antoine Tenart	8ea5cff87d	tcp: support rstreason for passive reset JIRA: https://issues.redhat.com/browse/RHEL-48648 Upstream Status: linux.git commit 120391ef9ca8fe8f82ea3f2961ad802043468226 Author: Jason Xing <kernelxing@tencent.com> Date: Thu Apr 25 11:13:37 2024 +0800 tcp: support rstreason for passive reset Reuse the dropreason logic to show the exact reason of tcp reset, so we can finally display the corresponding item in enum sk_reset_reason instead of reinventing new reset reasons. This patch replaces all the prior NOT_SPECIFIED reasons. Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Antoine Tenart <atenart@redhat.com>	2024-07-16 17:29:41 +02:00
Antoine Tenart	25344d90dd	rstreason: prepare for passive reset JIRA: https://issues.redhat.com/browse/RHEL-48648 Upstream Status: linux.git Conflicts:\ - Context differences due to missing upstream commits ba7783ad45c8 ("net/tcp: Add AO sign to RST packets") and d5dfbfa2f88e ("mptcp: drop duplicate header inclusions") in c9s. commit 6be49deaa09576c141002a2e6f816a1709bc2c86 Author: Jason Xing <kernelxing@tencent.com> Date: Thu Apr 25 11:13:35 2024 +0800 rstreason: prepare for passive reset Adjust the parameter and support passing reason of reset which is for now NOT_SPECIFIED. No functional changes. Signed-off-by: Jason Xing <kernelxing@tencent.com> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Antoine Tenart <atenart@redhat.com>	2024-07-16 17:29:41 +02:00
Antoine Tenart	528623fc31	trace: tcp: fully support trace_tcp_send_reset JIRA: https://issues.redhat.com/browse/RHEL-48648 Upstream Status: linux.git Conflicts:\ - Context differences due to missing upstream commits ba7783ad45c8 ("net/tcp: Add AO sign to RST packets") and 3cccda8db2cf ("ipv6: move np->repflow to atomic flags") in c9s. commit 19822a980e1956a6572998887a7df5a0607a32f6 Author: Jason Xing <kernelxing@tencent.com> Date: Mon Apr 1 15:36:05 2024 +0800 trace: tcp: fully support trace_tcp_send_reset Prior to this patch, what we can see by enabling trace_tcp_send is only happening under two circumstances: 1) active rst mode 2) non-active rst mode and based on the full socket That means the inconsistency occurs if we use tcpdump and trace simultaneously to see how rst happens. It's necessary that we should take into other cases into considerations, say: 1) time-wait socket 2) no socket ... By parsing the incoming skb and reversing its 4-tuple can we know the exact 'flow' which might not exist. Samples after applied this patch: 1. tcp_send_reset: skbaddr=XXX skaddr=XXX src=ip:port dest=ip:port state=TCP_ESTABLISHED 2. tcp_send_reset: skbaddr=000...000 skaddr=XXX src=ip:port dest=ip:port state=UNKNOWN Note: 1) UNKNOWN means we cannot extract the right information from skb. 2) skbaddr/skaddr could be 0 Signed-off-by: Jason Xing <kernelxing@tencent.com> Link: https://lore.kernel.org/r/20240401073605.37335-3-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Antoine Tenart <atenart@redhat.com>	2024-07-16 17:29:41 +02:00
Antoine Tenart	8e320d89a7	tcp: make dropreason in tcp_child_process() work JIRA: https://issues.redhat.com/browse/RHEL-48648 Upstream Status: linux.git commit ee01defe25bad09a37b68dd051a7e931d1e4cd91 Author: Jason Xing <kernelxing@tencent.com> Date: Mon Feb 26 11:22:27 2024 +0800 tcp: make dropreason in tcp_child_process() work It's time to let it work right now. We've already prepared for this:) Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Antoine Tenart <atenart@redhat.com>	2024-07-16 17:29:40 +02:00
Antoine Tenart	8f346a11e7	tcp: make the dropreason really work when calling tcp_rcv_state_process() JIRA: https://issues.redhat.com/browse/RHEL-48648 Upstream Status: linux.git commit b9825695930546af725b1e686b8eaf4c71201728 Author: Jason Xing <kernelxing@tencent.com> Date: Mon Feb 26 11:22:26 2024 +0800 tcp: make the dropreason really work when calling tcp_rcv_state_process() Update three callers including both ipv4 and ipv6 and let the dropreason mechanism work in reality. Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Antoine Tenart <atenart@redhat.com>	2024-07-16 17:29:40 +02:00
Antoine Tenart	1042a45152	tcp: directly drop skb in cookie check for ipv6 JIRA: https://issues.redhat.com/browse/RHEL-48648 Upstream Status: linux.git Conflicts:\ - Context difference due to missing upstream commits efce3d1fdff5 ("tcp: Don't initialise tp->tsoffset in tcp_get_cookie_sock().") and 8e7bab6b9652 ("tcp: Factorise cookie-dependent fields initialisation in cookie_v[46]_check()"). commit ed43e76cdcc497e2b27d84db27e7df5612be2643 Author: Jason Xing <kernelxing@tencent.com> Date: Mon Feb 26 11:22:21 2024 +0800 tcp: directly drop skb in cookie check for ipv6 Like previous patch does, only moving skb drop logical code to cookie_v6_check() for later refinement. Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Antoine Tenart <atenart@redhat.com>	2024-07-16 17:29:40 +02:00
Felix Maurer	8fc3cda22c	net/tcp: refactor tcp_inet6_sk() JIRA: https://issues.redhat.com/browse/RHEL-30902 commit fe79bd65c819cc520aa66de65caae8e4cea29c5a Author: Pavel Begunkov <asml.silence@gmail.com> Date: Fri May 19 14:30:36 2023 +0100 net/tcp: refactor tcp_inet6_sk() Don't keep hand coded offset caluclations and replace it with container_of(). It should be type safer and a bit less confusing. It also makes it with a macro instead of inline function to preserve constness, which was previously casted out like in case of tcp_v6_send_synack(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Felix Maurer <fmaurer@redhat.com>	2024-06-26 17:17:16 +02:00
Hangbin Liu	1446db33d3	ipv6: remove hard coded limitation on ipv6_pinfo JIRA: https://issues.redhat.com/browse/RHEL-31050 Upstream Status: net.git commit f5f80e32de12 Conflicts: context conflicts due to missing commit 67fb43308f4b ("udp: Set NULL to sk->sk_prot->h.udp_table."). commit f5f80e32de12fad2813d37270e8364a03e6d3ef0 Author: Eric Dumazet <edumazet@google.com> Date: Thu Jul 20 11:09:01 2023 +0000 ipv6: remove hard coded limitation on ipv6_pinfo IPv6 inet sockets are supposed to have a "struct ipv6_pinfo" field at the end of their definition, so that inet6_sk_generic() can derive from socket size the offset of the "struct ipv6_pinfo". This is very fragile, and prevents adding bigger alignment in sockets, because inet6_sk_generic() does not work if the compiler adds padding after the ipv6_pinfo component. We are currently working on a patch series to reorganize TCP structures for better data locality and found issues similar to the one fixed in commit `f5d547676c` ("tcp: fix tcp_inet6_sk() for 32bit kernels") Alternative would be to force an alignment on "struct ipv6_pinfo", greater or equal to __alignof__(any ipv6 sock) to ensure there is no padding. This does not look great. v2: fix typo in mptcp_proto_v6_init() (Paolo) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Chao Wu <wwchao@google.com> Cc: Wei Wang <weiwan@google.com> Cc: Coco Li <lixiaoyan@google.com> Cc: YiFei Zhu <zhuyifei@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Hangbin Liu <haliu@redhat.com>	2024-04-02 17:50:46 +08:00
Antoine Tenart	a8adbce266	net: ipv6: fix skb hash for some RST packets Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2214966 Upstream Status: linux.git commit dc6456e938e938d64ffb6383a286b2ac9790a37f Author: Antoine Tenart <atenart@kernel.org> Date: Thu Apr 27 11:21:59 2023 +0200 net: ipv6: fix skb hash for some RST packets The skb hash comes from sk->sk_txhash when using TCP, except for some IPv6 RST packets. This is because in tcp_v6_send_reset when not in TIME_WAIT the hash is taken from sk->sk_hash, while it should come from sk->sk_txhash as those two hashes are not computed the same way. Packetdrill script to test the above, 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 fcntl(3, F_SETFL, O_RDWR\|O_NONBLOCK) = 0 +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress) +0 > (flowlabel 0x1) S 0:0(0) <...> // Wrong ack seq, trigger a rst. +0 < S. 0:0(0) ack 0 win 4000 // Check the flowlabel matches prior one from SYN. +0 > (flowlabel 0x1) R 0:0(0) <...> Fixes: 9258b8b1be2e ("ipv6: tcp: send consistent autoflowlabel in RST packets") Signed-off-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Antoine Tenart <atenart@redhat.com>	2023-06-16 10:55:22 +02:00
Antoine Tenart	b4f1329917	ipv6: tcp: send consistent autoflowlabel in RST packets Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2214966 Upstream Status: linux.git commit 9258b8b1be2e1e241baf8aa703aba1086069ee0f Author: Eric Dumazet <edumazet@google.com> Date: Thu Sep 22 09:50:36 2022 -0700 ipv6: tcp: send consistent autoflowlabel in RST packets Blamed commit added a txhash parameter to tcp_v6_send_response() but forgot to update tcp_v6_send_reset() accordingly. Fixes: aa51b80e1af4 ("ipv6: tcp: send consistent autoflowlabel in SYN_RECV state") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220922165036.1795862-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Antoine Tenart <atenart@redhat.com>	2023-06-16 10:55:21 +02:00
Antoine Tenart	db80d3e170	ipv6: tcp: send consistent autoflowlabel in SYN_RECV state Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2214966 Upstream Status: linux.git commit aa51b80e1af47b3781abb1fb1666445a7616f0cd Author: Eric Dumazet <edumazet@google.com> Date: Wed Aug 31 13:37:29 2022 -0700 ipv6: tcp: send consistent autoflowlabel in SYN_RECV state This is a followup of commit `c67b85558f` ("ipv6: tcp: send consistent autoflowlabel in TIME_WAIT state"), but for SYN_RECV state. In some cases, TCP sends a challenge ACK on behalf of a SYN_RECV request. WHen this happens, we want to use the flow label that was used when the prior SYNACK packet was sent, instead of another one. After his patch, following packetdrill passes: 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +.2 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> +0 > (flowlabel 0x11) S. 0:0(0) ack 1 <...> // Test if a challenge ack is properly sent (same flowlabel than prior SYNACK) +.01 < . 4000000000:4000000000(0) ack 1 win 320 +0 > (flowlabel 0x11) . 1:1(0) ack 1 Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220831203729.458000-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Antoine Tenart <atenart@redhat.com>	2023-06-16 10:55:11 +02:00
Antoine Tenart	30b200a890	tcp: add TCP_MINTTL drop reason Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2184073 Upstream Status: linux.git Conflicts:\ - Context difference due to missing upstream commits 020e71a3cf7f ("ipv4: guard IP_MINTTL with a static key") and 14834c4f4eb3 ("ipv4: annotate data races arount inet->min_ttl") in c9s. commit 2798e36dc233a409a5d3f26f73029596dc504020 Author: Eric Dumazet <edumazet@google.com> Date: Wed Feb 1 17:43:45 2023 +0000 tcp: add TCP_MINTTL drop reason In the unlikely case incoming packets are dropped because of IP_MINTTL / IPV6_MINHOPCOUNT constraints... Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230201174345.2708943-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Antoine Tenart <atenart@redhat.com>	2023-06-06 11:23:15 +02:00
Jan Stancek	fa72082f2d	Merge: net: core: stable backports for 9.3 phase 1 MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/2408 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2188560 Depends: !2404 A bunch of fixes from upstream, affecting the core networking implementation. This also includes a couple of fixes for tun/tap, strictly tied to commit "net: add sock_init_data_uid()" Signed-off-by: Paolo Abeni <pabeni@redhat.com> Approved-by: Sabrina Dubroca <sdubroca@redhat.com> Approved-by: Andrea Claudi <aclaudi@redhat.com> Approved-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: Jan Stancek <jstancek@redhat.com>	2023-05-16 11:49:41 +02:00
Jan Stancek	cb3b1a532c	Merge: tcp: stable backport for 9.3 phase 1 MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/2407 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2188561 A bunch of minor TCP fixes, with no behavioral changes inteded. The only exception is commit `8dda5cd012` ("tcp: minor optimization in tcp_add_backlog()"), which is not a fix but a needed pre-req to avoid conflict in the next one. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Approved-by: Antoine Tenart <atenart@redhat.com> Approved-by: Hangbin Liu <haliu@redhat.com> Approved-by: Andrea Claudi <aclaudi@redhat.com> Approved-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: Jan Stancek <jstancek@redhat.com>	2023-05-16 11:49:40 +02:00
Paolo Abeni	e46d2d95f0	dccp/tcp: Avoid negative sk_forward_alloc by ipv6_pinfo.pktoptions. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2188560 Tested: LNST, Tier1 Upstream commit: commit ca43ccf41224b023fc290073d5603a755fd12eed Author: Kuniyuki Iwashima <kuniyu@amazon.com> Date: Thu Feb 9 16:22:01 2023 -0800 dccp/tcp: Avoid negative sk_forward_alloc by ipv6_pinfo.pktoptions. Eric Dumazet pointed out [0] that when we call skb_set_owner_r() for ipv6_pinfo.pktoptions, sk_rmem_schedule() has not been called, resulting in a negative sk_forward_alloc. We add a new helper which clones a skb and sets its owner only when sk_rmem_schedule() succeeds. Note that we move skb_set_owner_r() forward in (dccp\|tcp)_v6_do_rcv() because tcp_send_synack() can make sk_forward_alloc negative before ipv6_opt_accepted() in the crossed SYN-ACK or self-connect() cases. [0]: https://lore.kernel.org/netdev/CANn89iK9oc20Jdi_41jb9URdF210r7d1Y-+uypbMSbOfY6jqrg@mail.gmail.com/ Fixes: `323fbd0edf` ("net: dccp: Add handling of IPV6_PKTOPTIONS to dccp_v6_do_rcv()") Fixes: `3df80d9320` ("[DCCP]: Introduce DCCPv6") Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-05-02 19:07:41 +02:00
Hangbin Liu	653e992bed	ipv6: Fix tcp socket connection with DSCP. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2186064 Upstream Status: net.git commit 8230680f36fd commit 8230680f36fd1525303d1117768c8852314c488c Author: Guillaume Nault <gnault@redhat.com> Date: Wed Feb 8 18:14:03 2023 +0100 ipv6: Fix tcp socket connection with DSCP. Take into account the IPV6_TCLASS socket option (DSCP) in tcp_v6_connect(). Otherwise fib6_rule_match() can't properly match the DSCP value, resulting in invalid route lookup. For example: ip route add unreachable table main 2001:db8::10/124 ip route add table 100 2001:db8::10/124 dev eth0 ip -6 rule add dsfield 0x04 table 100 echo test \| socat - TCP6:[2001:db8::11]:54321,ipv6-tclass=0x04 Without this patch, socat fails at connect() time ("No route to host") because the fib-rule doesn't jump to table 100 and the lookup ends up being done in the main table. Fixes: `2cc67cc731` ("[IPV6] ROUTE: Routing by Traffic Class.") Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Hangbin Liu <haliu@redhat.com>	2023-04-27 10:04:56 +08:00
Paolo Abeni	220a990332	dccp/tcp: Reset saddr on failure after inet6?_hash_connect(). Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2188561 Tested: LNST, Tier1 Upstream commit: commit 77934dc6db0d2b111a8f2759e9ad2fb67f5cffa5 Author: Kuniyuki Iwashima <kuniyu@amazon.com> Date: Fri Nov 18 17:49:11 2022 -0800 dccp/tcp: Reset saddr on failure after inet6?_hash_connect(). When connect() is called on a socket bound to the wildcard address, we change the socket's saddr to a local address. If the socket fails to connect() to the destination, we have to reset the saddr. However, when an error occurs after inet_hash6?_connect() in (dccp\|tcp)_v[46]_conect(), we forget to reset saddr and leave the socket bound to the address. From the user's point of view, whether saddr is reset or not varies with errno. Let's fix this inconsistent behaviour. Note that after this patch, the repro [0] will trigger the WARN_ON() in inet_csk_get_port() again, but this patch is not buggy and rather fixes a bug papering over the bhash2's bug for which we need another fix. For the record, the repro causes -EADDRNOTAVAIL in inet_hash6_connect() by this sequence: s1 = socket() s1.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1) s1.bind(('127.0.0.1', 10000)) s1.sendto(b'hello', MSG_FASTOPEN, (('127.0.0.1', 10000))) # or s1.connect(('127.0.0.1', 10000)) s2 = socket() s2.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1) s2.bind(('0.0.0.0', 10000)) s2.connect(('127.0.0.1', 10000)) # -EADDRNOTAVAIL s2.listen(32) # WARN_ON(inet_csk(sk)->icsk_bind2_hash != tb2); [0]: https://syzkaller.appspot.com/bug?extid=015d756bbd1f8b5c8f09 Fixes: `3df80d9320` ("[DCCP]: Introduce DCCPv6") Fixes: `7c657876b6` ("[DCCP]: Initial implementation") Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-04-21 09:57:10 +02:00
Guillaume Nault	9194a37d24	tcp/udp: Make early_demux back namespacified. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2186795 Upstream Status: linux.git commit 11052589cf5c0bab3b4884d423d5f60c38fcf25d Author: Kuniyuki Iwashima <kuniyu@amazon.com> Date: Wed Jul 13 10:52:07 2022 -0700 tcp/udp: Make early_demux back namespacified. Commit `e21145a987` ("ipv4: namespacify ip_early_demux sysctl knob") made it possible to enable/disable early_demux on a per-netns basis. Then, we introduced two knobs, tcp_early_demux and udp_early_demux, to switch it for TCP/UDP in commit `dddb64bcb3` ("net: Add sysctl to toggle early demux for tcp and udp"). However, the .proc_handler() was wrong and actually disabled us from changing the behaviour in each netns. We can execute early_demux if net.ipv4.ip_early_demux is on and each proto .early_demux() handler is not NULL. When we toggle (tcp\|udp)_early_demux, the change itself is saved in each netns variable, but the .early_demux() handler is a global variable, so the handler is switched based on the init_net's sysctl variable. Thus, netns (tcp\|udp)_early_demux knobs have nothing to do with the logic. Whether we CAN execute proto .early_demux() is always decided by init_net's sysctl knob, and whether we DO it or not is by each netns ip_early_demux knob. This patch namespacifies (tcp\|udp)_early_demux again. For now, the users of the .early_demux() handler are TCP and UDP only, and they are called directly to avoid retpoline. So, we can remove the .early_demux() handler from inet6?_protos and need not dereference them in ip6?_rcv_finish_core(). If another proto needs .early_demux(), we can restore it at that time. Fixes: `dddb64bcb3` ("net: Add sysctl to toggle early demux for tcp and udp") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20220713175207.7727-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Guillaume Nault <gnault@redhat.com>	2023-04-14 16:19:36 +02:00
Herton R. Krzesinski	f7d56c83b6	Merge: sctp: backports from upstream, 2nd phase MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1878 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2160516 Tested: lksctp-tools func_tests v1-v2: - add the whole patchset of "inet6: Remove inet6_destroy_sock() calls" instead of only 2 patches of them, as Davide suggested. v2->v3 - drop patch "sctp: delete free member from struct sctp_sched_ops" which is a code improvement, as it may cause stuck. v3->v4: - drop patch "sctp: fix memory leak in sctp_stream_outq_migrate()" which cause another stuck. Signed-off-by: Xin Long <lxin@redhat.com> Approved-by: Davide Caratti <dcaratti@redhat.com> Approved-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: Herton R. Krzesinski <herton@redhat.com>	2023-02-23 12:33:18 +00:00
Xin Long	af4a12dd88	inet6: Remove inet6_destroy_sock() in sk->sk_prot->destroy(). Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2160516 Tested: compile only Conflicts: - context difference due to missing commit 0ffe2412531e from upstream. commit b5fc29233d28be7a3322848ebe73ac327559cdb9 Author: Kuniyuki Iwashima <kuniyu@amazon.com> Date: Wed Oct 19 15:35:59 2022 -0700 inet6: Remove inet6_destroy_sock() in sk->sk_prot->destroy(). After commit d38afeec26ed ("tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct()."), we call inet6_destroy_sock() in sk->sk_destruct() by setting inet6_sock_destruct() to it to make sure we do not leak inet6-specific resources. Now we can remove unnecessary inet6_destroy_sock() calls in sk->sk_prot->destroy(). DCCP and SCTP have their own sk->sk_destruct() function, so we change them separately in the following patches. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Xin Long <lxin@redhat.com>	2023-02-01 15:05:51 -05:00
Guillaume Nault	04b96d8fcf	tcp: Fix data-races around sysctl_tcp_reflect_tos. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2160073 Upstream Status: linux.git commit 870e3a634b6a6cb1543b359007aca73fe6a03ac5 Author: Kuniyuki Iwashima <kuniyu@amazon.com> Date: Fri Jul 22 11:22:04 2022 -0700 tcp: Fix data-races around sysctl_tcp_reflect_tos. While reading sysctl_tcp_reflect_tos, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: `ac8f1710c1` ("tcp: reflect tos value received in SYN to the socket") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Guillaume Nault <gnault@redhat.com>	2023-01-17 12:25:15 +01:00
Ivan Vecera	5dc8d666ee	ipv6: Remove __ipv6_only_sock(). Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2144847 Since commit `9fe516ba3f` ("inet: move ipv6only in sock_common"), ipv6_only_sock() and __ipv6_only_sock() are the same macro. Let's remove the one. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit 89e9c7280075f6733b22dd0740daeddeb1256ebf) Signed-off-by: Ivan Vecera <ivecera@redhat.com>	2022-11-24 09:54:19 +01:00
Frantisek Hrbata	1269719102	Merge: BPF and XDP rebase to v5.18 Merge conflicts: ----------------- arch/x86/net/bpf_jit_comp.c - bpf_arch_text_poke() HEAD(!1464) contains `b73b002f7f` ("x86/ibt,bpf: Add ENDBR instructions to prologue and trampoline") Resolved in favour of !1464, but keep the return statement from !1477 MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1477 Bugzilla: https://bugzilla.redhat.com/2120966 Rebase BPF and XDP to the upstream kernel version 5.18 Patch applied, then reverted: ``` 544356 selftests/bpf: switch to new libbpf XDP APIs 0bfb95 selftests, bpf: Do not yet switch to new libbpf XDP APIs ``` Taken in the perf rebase: ``` 23fcfc perf: use generic bpf_program__set_type() to set BPF prog type ``` Unsuported arches: ``` 5c1011 libbpf: Fix riscv register names cf0b5b libbpf: Fix accessing syscall arguments on riscv ``` Depends on changes of other subsystems: ``` 7fc8c3 s390/bpf: encode register within extable entry aebfd1 x86/ibt,ftrace: Search for __fentry__ location 589127 x86/ibt,bpf: Add ENDBR instructions to prologue and trampoline ``` Broken selftest: ``` edae34 selftests net: add UDP GRO fraglist + bpf self-tests cf6783 selftests net: fix bpf build error 7b92aa selftests net: fix kselftest net fatal error ``` Out of scope: ``` baebdf net: dev: Makes sure netif_rx() can be invoked in any context. 5c8166 kbuild: replace $(if A,A,B) with $(or A,B) 1a97ce perf maps: Use a pointer for kmaps 967747 uaccess: remove CONFIG_SET_FS 42b01a s390: always use the packed stack layout bf0882 flow_dissector: Add support for HSR d09a30 s390/extable: move EX_TABLE define to asm-extable.h 3d6671 s390/extable: convert to relative table with data 4efd41 s390: raise minimum supported machine generation to z10 f65e58 flow_dissector: Add support for HSRv0 1a6d7a netdevsim: Introduce support for L3 offload xstats 9b1894 selftests: netdevsim: hw_stats_l3: Add a new test 84005b perf ftrace latency: Add -n/--use-nsec option 36c4a7 kasan, arm64: don't tag executable vmalloc allocations 8df013 docs: netdev: move the netdev-FAQ to the process pages 4d4d00 perf tools: Update copy of libbpf's hashmap.c 0df6ad perf evlist: Rename cpus to user_requested_cpus 1b8089 flow_dissector: fix false-positive __read_overflow2_field() warning 0ae065 perf build: Fix check for btf__load_from_kernel_by_id() in libbpf 8994e9 perf test bpf: Skip test if clang is not present 735346 perf build: Fix btf__load_from_kernel_by_id() feature check f037ac s390/stack: merge empty stack frame slots 335220 docs: netdev: update maintainer-netdev.rst reference a0b098 s390/nospec: remove unneeded header includes 34513a netdevsim: Fix hwstats debugfs file permissions ``` Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Approved-by: John W. Linville <linville@redhat.com> Approved-by: Wander Lairson Costa <wander@redhat.com> Approved-by: Torez Smith <torez@redhat.com> Approved-by: Jan Stancek <jstancek@redhat.com> Approved-by: Prarit Bhargava <prarit@redhat.com> Approved-by: Felix Maurer <fmaurer@redhat.com> Approved-by: Viktor Malik <vmalik@redhat.com> Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>	2022-11-21 05:30:47 -05:00
Davide Caratti	728983215c	tcp: Access &tcp_hashinfo via net. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2137858 Upstream Status: net.git commit 4461568aa4e5 Conflicts: - net/ipv4/tcp_ipv4.c: context mismatch as we don't have upstream commit 28044fc1d495 ("net: Add a bhash2 table hashed by port and address") and 08eaef904031 ("tcp: Clean up some functions.") - net/ipv6/tcp_ipv6.c: context mismatch as we don't have upstream commit 28044fc1d495 ("net: Add a bhash2 table hashed by port and address") - net/ipv4/tcp_minisocks.c: hunk applied manually to fix a build issue caused by missing upstream commit 08eaef904031 ("tcp: Clean up some functions.") commit 4461568aa4e565de2c336f4875ddf912f26da8a5 Author: Kuniyuki Iwashima <kuniyu@amazon.com> Date: Wed Sep 7 18:10:20 2022 -0700 tcp: Access &tcp_hashinfo via net. We will soon introduce an optional per-netns ehash. This means we cannot use tcp_hashinfo directly in most places. Instead, access it via net->ipv4.tcp_death_row.hashinfo. The access will be valid only while initialising tcp_hashinfo itself and creating/destroying each netns. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Davide Caratti <dcaratti@redhat.com>	2022-11-08 17:10:59 +01:00
Davide Caratti	9aac6c4346	net: add per_cpu_fw_alloc field to struct proto Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2137858 Upstream Status: net.git commit 0defbb0af775 Conflicts: - net/core/sock.c: context mismatch because of missing backport of upstream commit f20cfd662a62 ("net: add sanity check in proto_register()") commit 0defbb0af775ef037913786048d099bbe8b9a2c2 Author: Eric Dumazet <edumazet@google.com> Date: Wed Jun 8 23:34:08 2022 -0700 net: add per_cpu_fw_alloc field to struct proto Each protocol having a ->memory_allocated pointer gets a corresponding per-cpu reserve, that following patches will use. Instead of having reserved bytes per socket, we want to have per-cpu reserves. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Davide Caratti <dcaratti@redhat.com>	2022-11-08 17:10:55 +01:00
Frantisek Hrbata	0c3a22328a	Merge: IPv6: 9.2 P1 backport from upstream MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1488 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2135319 Signed-off-by: Hangbin Liu <haliu@redhat.com> Approved-by: Davide Caratti <dcaratti@redhat.com> Approved-by: Sabrina Dubroca <sdubroca@redhat.com> Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>	2022-10-27 08:26:02 -04:00
Jiri Benc	6619cf0a37	net: Add skb->mono_delivery_time to distinguish mono delivery_time from (rcv) timestamp Bugzilla: https://bugzilla.redhat.com/2120966 Conflicts: - [minor] different context in tcp_fragment() due to missing a52fe46ef160 ("tcp: factorize ip_summed setting") commit a1ac9c8acec1605c6b43af418f79facafdced680 Author: Martin KaFai Lau <kafai@fb.com> Date: Wed Mar 2 11:55:25 2022 -0800 net: Add skb->mono_delivery_time to distinguish mono delivery_time from (rcv) timestamp skb->tstamp was first used as the (rcv) timestamp. The major usage is to report it to the user (e.g. SO_TIMESTAMP). Later, skb->tstamp is also set as the (future) delivery_time (e.g. EDT in TCP) during egress and used by the qdisc (e.g. sch_fq) to make decision on when the skb can be passed to the dev. Currently, there is no way to tell skb->tstamp having the (rcv) timestamp or the delivery_time, so it is always reset to 0 whenever forwarded between egress and ingress. While it makes sense to always clear the (rcv) timestamp in skb->tstamp to avoid confusing sch_fq that expects the delivery_time, it is a performance issue [0] to clear the delivery_time if the skb finally egress to a fq@phy-dev. For example, when forwarding from egress to ingress and then finally back to egress: tcp-sender => veth@netns => veth@hostns => fq@eth0@hostns ^ ^ reset rest This patch adds one bit skb->mono_delivery_time to flag the skb->tstamp is storing the mono delivery_time (EDT) instead of the (rcv) timestamp. The current use case is to keep the TCP mono delivery_time (EDT) and to be used with sch_fq. A latter patch will also allow tc-bpf@ingress to read and change the mono delivery_time. In the future, another bit (e.g. skb->user_delivery_time) can be added for the SCM_TXTIME where the clock base is tracked by sk->sk_clockid. [ This patch is a prep work. The following patches will get the other parts of the stack ready first. Then another patch after that will finally set the skb->mono_delivery_time. ] skb_set_delivery_time() function is added. It is used by the tcp_output.c and during ip[6] fragmentation to assign the delivery_time to the skb->tstamp and also set the skb->mono_delivery_time. A note on the change in ip_send_unicast_reply() in ip_output.c. It is only used by TCP to send reset/ack out of a ctl_sk. Like the new skb_set_delivery_time(), this patch sets the skb->mono_delivery_time to 0 for now as a place holder. It will be enabled in a latter patch. A similar case in tcp_ipv6 can be done with skb_set_delivery_time() in tcp_v6_send_response(). [0] (slide 22): https://linuxplumbersconf.org/event/11/contributions/953/attachments/867/1658/LPC_2021_BPF_Datapath_Extensions.pdf Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Benc <jbenc@redhat.com>	2022-10-25 14:57:59 +02:00
Hangbin Liu	d109429414	tcp: Fix data races around icsk->icsk_af_ops. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2135319 Upstream Status: net.git commit f49cd2f4d617 Conflicts: context conflicts due to missing upstream commit 34704ef024ae ("bpf: net: Change do_tcp_getsockopt() to take the sockptr_t argument"). commit f49cd2f4d6170d27a2c61f1fecb03d8a70c91f57 Author: Kuniyuki Iwashima <kuniyu@amazon.com> Date: Thu Oct 6 11:53:49 2022 -0700 tcp: Fix data races around icsk->icsk_af_ops. setsockopt(IPV6_ADDRFORM) and tcp_v6_connect() change icsk->icsk_af_ops under lock_sock(), but tcp_(get\|set)sockopt() read it locklessly. To avoid load/store tearing, we need to add READ_ONCE() and WRITE_ONCE() for the reads and writes. Thanks to Eric Dumazet for providing the syzbot report: BUG: KCSAN: data-race in tcp_setsockopt / tcp_v6_connect write to 0xffff88813c624518 of 8 bytes by task 23936 on cpu 0: tcp_v6_connect+0x5b3/0xce0 net/ipv6/tcp_ipv6.c:240 __inet_stream_connect+0x159/0x6d0 net/ipv4/af_inet.c:660 inet_stream_connect+0x44/0x70 net/ipv4/af_inet.c:724 __sys_connect_file net/socket.c:1976 [inline] __sys_connect+0x197/0x1b0 net/socket.c:1993 __do_sys_connect net/socket.c:2003 [inline] __se_sys_connect net/socket.c:2000 [inline] __x64_sys_connect+0x3d/0x50 net/socket.c:2000 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd read to 0xffff88813c624518 of 8 bytes by task 23937 on cpu 1: tcp_setsockopt+0x147/0x1c80 net/ipv4/tcp.c:3789 sock_common_setsockopt+0x5d/0x70 net/core/sock.c:3585 __sys_setsockopt+0x212/0x2b0 net/socket.c:2252 __do_sys_setsockopt net/socket.c:2263 [inline] __se_sys_setsockopt net/socket.c:2260 [inline] __x64_sys_setsockopt+0x62/0x70 net/socket.c:2260 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0xffffffff8539af68 -> 0xffffffff8539aff8 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 23937 Comm: syz-executor.5 Not tainted 6.0.0-rc4-syzkaller-00331-g4ed9c1e971b1-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022 Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reported-by: syzbot <syzkaller@googlegroups.com> Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Hangbin Liu <haliu@redhat.com>	2022-10-18 11:41:13 +08:00
Antoine Tenart	4600abbb3e	tcp_ipv6: set the drop_reason in the right place Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2059161 Upstream Status: linux.git commit dc7769244e03e932262a4f10eeab11657cb601c7 Author: Jakub Kicinski <kuba@kernel.org> Date: Thu May 19 19:13:47 2022 -0700 tcp_ipv6: set the drop_reason in the right place Looks like the IPv6 version of the patch under Fixes was a copy/paste of the IPv4 but hit the wrong spot. It is tcp_v6_rcv() which uses drop_reason as a boolean, and needs to be protected against reason == 0 before calling free. tcp_v6_do_rcv() has a pretty straightforward flow. The resulting warning looks like this: WARNING: CPU: 1 PID: 0 at net/core/skbuff.c:775 Call Trace: tcp_v6_rcv (net/ipv6/tcp_ipv6.c:1767) ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:438) ip6_input_finish (include/linux/rcupdate.h:726) ip6_input (include/linux/netfilter.h:307) Fixes: f8319dfd1b3b ("net: tcp: reset 'drop_reason' to NOT_SPCIFIED in tcp_v{4,6}_rcv()") Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net> Link: https://lore.kernel.org/r/20220520021347.2270207-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Antoine Tenart <atenart@redhat.com>	2022-10-14 17:40:26 +02:00
Antoine Tenart	626c678449	net: tcp: reset 'drop_reason' to NOT_SPCIFIED in tcp_v{4,6}_rcv() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2059161 Upstream Status: linux.git commit f8319dfd1b3b3be6c08795017fc30f880f8bc861 Author: Menglong Dong <imagedong@tencent.com> Date: Fri May 13 11:03:39 2022 +0800 net: tcp: reset 'drop_reason' to NOT_SPCIFIED in tcp_v{4,6}_rcv() The 'drop_reason' that passed to kfree_skb_reason() in tcp_v4_rcv() and tcp_v6_rcv() can be SKB_NOT_DROPPED_YET(0), as it is used as the return value of tcp_inbound_md5_hash(). And it can panic the kernel with NULL pointer in net_dm_packet_report_size() if the reason is 0, as drop_reasons[0] is NULL. Fixes: 1330b6ef3313 ("skb: make drop reason booleanable") Reviewed-by: Jiang Biao <benbjiang@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Antoine Tenart <atenart@redhat.com>	2022-10-14 17:40:26 +02:00
Antoine Tenart	04f4917aca	skb: make drop reason booleanable Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2059161 Upstream Status: linux.git commit 1330b6ef3313fcec577d2b020c290dc8b9f11f1a Author: Jakub Kicinski <kuba@kernel.org> Date: Mon Mar 7 16:44:21 2022 -0800 skb: make drop reason booleanable We have a number of cases where function returns drop/no drop decision as a boolean. Now that we want to report the reason code as well we have to pass extra output arguments. We can make the reason code evaluate correctly as bool. I believe we're good to reorder the reasons as they are reported to user space as strings. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Antoine Tenart <atenart@redhat.com>	2022-10-13 14:53:24 +02:00
Antoine Tenart	997d93a49f	net/tcp: Merge TCP-MD5 inbound callbacks Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2059161 Upstream Status: linux.git commit 7bbb765b73496699a165d505ecdce962f903b422 Author: Dmitry Safonov <0x7f454c46@gmail.com> Date: Wed Feb 23 17:57:40 2022 +0000 net/tcp: Merge TCP-MD5 inbound callbacks The functions do essentially the same work to verify TCP-MD5 sign. Code can be merged into one family-independent function in order to reduce copy'n'paste and generated code. Later with TCP-AO option added, this will allow to create one function that's responsible for segment verification, that will have all the different checks for MD5/AO/non-signed packets, which in turn will help to see checks for all corner-cases in one function, rather than spread around different families and functions. Cc: Eric Dumazet <edumazet@google.com> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220223175740.452397-1-dima@arista.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Antoine Tenart <atenart@redhat.com>	2022-10-13 14:53:24 +02:00
Antoine Tenart	7e7867a749	net: tcp: use kfree_skb_reason() for tcp_v{4,6}_do_rcv() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2059161 Upstream Status: linux.git commit 8eba65fa5f06519042b98564089b942d795e3f8d Author: Menglong Dong <imagedong@tencent.com> Date: Sun Feb 20 15:06:34 2022 +0800 net: tcp: use kfree_skb_reason() for tcp_v{4,6}_do_rcv() Replace kfree_skb() used in tcp_v4_do_rcv() and tcp_v6_do_rcv() with kfree_skb_reason(). Reviewed-by: Mengen Sun <mengensun@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Antoine Tenart <atenart@redhat.com>	2022-10-13 14:53:22 +02:00
Antoine Tenart	0b99c6c861	net: tcp: add skb drop reasons to tcp_add_backlog() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2059161 Upstream Status: linux.git Conflicts:\ - In tcp.h due to missing commit f35f821935d8 ("tcp: defer skb freeing after socket lock is released") in C9S; which is fine btw as the chunk in tcp.h was later removed upstream by commit 68822bdf76f1 ("net: generalize skb freeing deferral to per-cpu lists"). commit 7a26dc9e7b43f5a24c4b843713e728582adf1c38 Author: Menglong Dong <imagedong@tencent.com> Date: Sun Feb 20 15:06:33 2022 +0800 net: tcp: add skb drop reasons to tcp_add_backlog() Pass the address of drop_reason to tcp_add_backlog() to store the reasons for skb drops when fails. Following drop reasons are introduced: SKB_DROP_REASON_SOCKET_BACKLOG Reviewed-by: Mengen Sun <mengensun@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Antoine Tenart <atenart@redhat.com>	2022-10-13 14:53:22 +02:00
Antoine Tenart	de5f3d75e9	net: tcp: add skb drop reasons to tcp_v{4,6}_inbound_md5_hash() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2059161 Upstream Status: linux.git commit 643b622b51f1f0015e0a80f90b4ef9032e6ddb1b Author: Menglong Dong <imagedong@tencent.com> Date: Sun Feb 20 15:06:32 2022 +0800 net: tcp: add skb drop reasons to tcp_v{4,6}_inbound_md5_hash() Pass the address of drop reason to tcp_v4_inbound_md5_hash() and tcp_v6_inbound_md5_hash() to store the reasons for skb drops when this function fails. Therefore, the drop reason can be passed to kfree_skb_reason() when the skb needs to be freed. Following drop reasons are added: SKB_DROP_REASON_TCP_MD5NOTFOUND SKB_DROP_REASON_TCP_MD5UNEXPECTED SKB_DROP_REASON_TCP_MD5FAILURE SKB_DROP_REASON_TCP_MD5* above correspond to LINUX_MIB_TCPMD5* Reviewed-by: Mengen Sun <mengensun@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Antoine Tenart <atenart@redhat.com>	2022-10-13 14:53:22 +02:00
Antoine Tenart	b55f02222f	net: tcp: use kfree_skb_reason() for tcp_v6_rcv() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2059161 Upstream Status: linux.git commit c0e3154d9c889e1aa1af098f40301395f2e33d8a Author: Menglong Dong <imagedong@tencent.com> Date: Sun Feb 20 15:06:31 2022 +0800 net: tcp: use kfree_skb_reason() for tcp_v6_rcv() Replace kfree_skb() used in tcp_v6_rcv() with kfree_skb_reason(). Reviewed-by: Mengen Sun <mengensun@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Antoine Tenart <atenart@redhat.com>	2022-10-13 14:53:22 +02:00
Felix Maurer	de20724127	net: bpf: Handle return value of BPF_CGROUP_RUN_PROG_INET{4,6}_POST_BIND() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2071620 commit 91a760b26926265a60c77ddf016529bcf3e17a04 Author: Menglong Dong <imagedong@tencent.com> Date: Thu Jan 6 21:20:20 2022 +0800 net: bpf: Handle return value of BPF_CGROUP_RUN_PROG_INET{4,6}_POST_BIND() The return value of BPF_CGROUP_RUN_PROG_INET{4,6}_POST_BIND() in __inet_bind() is not handled properly. While the return value is non-zero, it will set inet_saddr and inet_rcv_saddr to 0 and exit: err = BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk); if (err) { inet->inet_saddr = inet->inet_rcv_saddr = 0; goto out_release_sock; } Let's take UDP for example and see what will happen. For UDP socket, it will be added to 'udp_prot.h.udp_table->hash' and 'udp_prot.h.udp_table->hash2' after the sk->sk_prot->get_port() called success. If 'inet->inet_rcv_saddr' is specified here, then 'sk' will be in the 'hslot2' of 'hash2' that it don't belong to (because inet_saddr is changed to 0), and UDP packet received will not be passed to this sock. If 'inet->inet_rcv_saddr' is not specified here, the sock will work fine, as it can receive packet properly, which is wired, as the 'bind()' is already failed. To undo the get_port() operation, introduce the 'put_port' field for 'struct proto'. For TCP proto, it is inet_put_port(); For UDP proto, it is udp_lib_unhash(); For icmp proto, it is ping_unhash(). Therefore, after sys_bind() fail caused by BPF_CGROUP_RUN_PROG_INET4_POST_BIND(), it will be unbinded, which means that it can try to be binded to another port. Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220106132022.3470772-2-imagedong@tencent.com Signed-off-by: Felix Maurer <fmaurer@redhat.com>	2022-08-24 16:53:48 +02:00
Paolo Abeni	036c0e121e	tcp: add accessors to read/set tp->snd_cwnd Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2101465 Tested: LNST, Tier1 Upstream commit: commit 40570375356c874b1578e05c1dcc3ff7c1322dbe Author: Eric Dumazet <edumazet@google.com> Date: Tue Apr 5 16:35:38 2022 -0700 tcp: add accessors to read/set tp->snd_cwnd We had various bugs over the years with code breaking the assumption that tp->snd_cwnd is greater than zero. Lately, syzbot reported the WARN_ON_ONCE(!tp->prior_cwnd) added in commit `8b8a321ff7` ("tcp: fix zero cwnd in tcp_cwnd_reduction") can trigger, and without a repro we would have to spend considerable time finding the bug. Instead of complaining too late, we want to catch where and when tp->snd_cwnd is set to an illegal value. Signed-off-by: Eric Dumazet <edumazet@google.com> Suggested-by: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Link: https://lore.kernel.org/r/20220405233538.947344-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-06-27 16:43:55 +02:00
Paolo Abeni	bae902a610	inet: fully convert sk->sk_rx_dst to RCU rules Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2079411 Tested: LNST, Tieri1 Conflicts: \ - sk_rx_dst location inside struct sock is slightly different from upstream as rhel-9 already has commit 43f51df41729 ("net: move early demux fields close to sk_refcnt") Upstream commit: commit 8f905c0e7354ef261360fb7535ea079b1082c105 Author: Eric Dumazet <edumazet@google.com> Date: Mon Dec 20 06:33:30 2021 -0800 inet: fully convert sk->sk_rx_dst to RCU rules syzbot reported various issues around early demux, one being included in this changelog [1] sk->sk_rx_dst is using RCU protection without clearly documenting it. And following sequences in tcp_v4_do_rcv()/tcp_v6_do_rcv() are not following standard RCU rules. [a] dst_release(dst); [b] sk->sk_rx_dst = NULL; They look wrong because a delete operation of RCU protected pointer is supposed to clear the pointer before the call_rcu()/synchronize_rcu() guarding actual memory freeing. In some cases indeed, dst could be freed before [b] is done. We could cheat by clearing sk_rx_dst before calling dst_release(), but this seems the right time to stick to standard RCU annotations and debugging facilities. [1] BUG: KASAN: use-after-free in dst_check include/net/dst.h:470 [inline] BUG: KASAN: use-after-free in tcp_v4_early_demux+0x95b/0x960 net/ipv4/tcp_ipv4.c:1792 Read of size 2 at addr ffff88807f1cb73a by task syz-executor.5/9204 CPU: 0 PID: 9204 Comm: syz-executor.5 Not tainted 5.16.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_address_description.constprop.0.cold+0x8d/0x320 mm/kasan/report.c:247 __kasan_report mm/kasan/report.c:433 [inline] kasan_report.cold+0x83/0xdf mm/kasan/report.c:450 dst_check include/net/dst.h:470 [inline] tcp_v4_early_demux+0x95b/0x960 net/ipv4/tcp_ipv4.c:1792 ip_rcv_finish_core.constprop.0+0x15de/0x1e80 net/ipv4/ip_input.c:340 ip_list_rcv_finish.constprop.0+0x1b2/0x6e0 net/ipv4/ip_input.c:583 ip_sublist_rcv net/ipv4/ip_input.c:609 [inline] ip_list_rcv+0x34e/0x490 net/ipv4/ip_input.c:644 __netif_receive_skb_list_ptype net/core/dev.c:5508 [inline] __netif_receive_skb_list_core+0x549/0x8e0 net/core/dev.c:5556 __netif_receive_skb_list net/core/dev.c:5608 [inline] netif_receive_skb_list_internal+0x75e/0xd80 net/core/dev.c:5699 gro_normal_list net/core/dev.c:5853 [inline] gro_normal_list net/core/dev.c:5849 [inline] napi_complete_done+0x1f1/0x880 net/core/dev.c:6590 virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline] virtnet_poll+0xca2/0x11b0 drivers/net/virtio_net.c:1557 __napi_poll+0xaf/0x440 net/core/dev.c:7023 napi_poll net/core/dev.c:7090 [inline] net_rx_action+0x801/0xb40 net/core/dev.c:7177 __do_softirq+0x29b/0x9c2 kernel/softirq.c:558 invoke_softirq kernel/softirq.c:432 [inline] __irq_exit_rcu+0x123/0x180 kernel/softirq.c:637 irq_exit_rcu+0x5/0x20 kernel/softirq.c:649 common_interrupt+0x52/0xc0 arch/x86/kernel/irq.c:240 asm_common_interrupt+0x1e/0x40 arch/x86/include/asm/idtentry.h:629 RIP: 0033:0x7f5e972bfd57 Code: 39 d1 73 14 0f 1f 80 00 00 00 00 48 8b 50 f8 48 83 e8 08 48 39 ca 77 f3 48 39 c3 73 3e 48 89 13 48 8b 50 f8 48 89 38 49 8b 0e <48> 8b 3e 48 83 c3 08 48 83 c6 08 eb bc 48 39 d1 72 9e 48 39 d0 73 RSP: 002b:00007fff8a413210 EFLAGS: 00000283 RAX: 00007f5e97108990 RBX: 00007f5e97108338 RCX: ffffffff81d3aa45 RDX: ffffffff81d3aa45 RSI: 00007f5e97108340 RDI: ffffffff81d3aa45 RBP: 00007f5e97107eb8 R08: 00007f5e97108d88 R09: 0000000093c2e8d9 R10: 0000000000000000 R11: 0000000000000000 R12: 00007f5e97107eb0 R13: 00007f5e97108338 R14: 00007f5e97107ea8 R15: 0000000000000019 </TASK> Allocated by task 13: kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38 kasan_set_track mm/kasan/common.c:46 [inline] set_alloc_info mm/kasan/common.c:434 [inline] __kasan_slab_alloc+0x90/0xc0 mm/kasan/common.c:467 kasan_slab_alloc include/linux/kasan.h:259 [inline] slab_post_alloc_hook mm/slab.h:519 [inline] slab_alloc_node mm/slub.c:3234 [inline] slab_alloc mm/slub.c:3242 [inline] kmem_cache_alloc+0x202/0x3a0 mm/slub.c:3247 dst_alloc+0x146/0x1f0 net/core/dst.c:92 rt_dst_alloc+0x73/0x430 net/ipv4/route.c:1613 ip_route_input_slow+0x1817/0x3a20 net/ipv4/route.c:2340 ip_route_input_rcu net/ipv4/route.c:2470 [inline] ip_route_input_noref+0x116/0x2a0 net/ipv4/route.c:2415 ip_rcv_finish_core.constprop.0+0x288/0x1e80 net/ipv4/ip_input.c:354 ip_list_rcv_finish.constprop.0+0x1b2/0x6e0 net/ipv4/ip_input.c:583 ip_sublist_rcv net/ipv4/ip_input.c:609 [inline] ip_list_rcv+0x34e/0x490 net/ipv4/ip_input.c:644 __netif_receive_skb_list_ptype net/core/dev.c:5508 [inline] __netif_receive_skb_list_core+0x549/0x8e0 net/core/dev.c:5556 __netif_receive_skb_list net/core/dev.c:5608 [inline] netif_receive_skb_list_internal+0x75e/0xd80 net/core/dev.c:5699 gro_normal_list net/core/dev.c:5853 [inline] gro_normal_list net/core/dev.c:5849 [inline] napi_complete_done+0x1f1/0x880 net/core/dev.c:6590 virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline] virtnet_poll+0xca2/0x11b0 drivers/net/virtio_net.c:1557 __napi_poll+0xaf/0x440 net/core/dev.c:7023 napi_poll net/core/dev.c:7090 [inline] net_rx_action+0x801/0xb40 net/core/dev.c:7177 __do_softirq+0x29b/0x9c2 kernel/softirq.c:558 Freed by task 13: kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38 kasan_set_track+0x21/0x30 mm/kasan/common.c:46 kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370 ____kasan_slab_free mm/kasan/common.c:366 [inline] ____kasan_slab_free mm/kasan/common.c:328 [inline] __kasan_slab_free+0xff/0x130 mm/kasan/common.c:374 kasan_slab_free include/linux/kasan.h:235 [inline] slab_free_hook mm/slub.c:1723 [inline] slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1749 slab_free mm/slub.c:3513 [inline] kmem_cache_free+0xbd/0x5d0 mm/slub.c:3530 dst_destroy+0x2d6/0x3f0 net/core/dst.c:127 rcu_do_batch kernel/rcu/tree.c:2506 [inline] rcu_core+0x7ab/0x1470 kernel/rcu/tree.c:2741 __do_softirq+0x29b/0x9c2 kernel/softirq.c:558 Last potentially related work creation: kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38 __kasan_record_aux_stack+0xf5/0x120 mm/kasan/generic.c:348 __call_rcu kernel/rcu/tree.c:2985 [inline] call_rcu+0xb1/0x740 kernel/rcu/tree.c:3065 dst_release net/core/dst.c:177 [inline] dst_release+0x79/0xe0 net/core/dst.c:167 tcp_v4_do_rcv+0x612/0x8d0 net/ipv4/tcp_ipv4.c:1712 sk_backlog_rcv include/net/sock.h:1030 [inline] __release_sock+0x134/0x3b0 net/core/sock.c:2768 release_sock+0x54/0x1b0 net/core/sock.c:3300 tcp_sendmsg+0x36/0x40 net/ipv4/tcp.c:1441 inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:819 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:724 sock_write_iter+0x289/0x3c0 net/socket.c:1057 call_write_iter include/linux/fs.h:2162 [inline] new_sync_write+0x429/0x660 fs/read_write.c:503 vfs_write+0x7cd/0xae0 fs/read_write.c:590 ksys_write+0x1ee/0x250 fs/read_write.c:643 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae The buggy address belongs to the object at ffff88807f1cb700 which belongs to the cache ip_dst_cache of size 176 The buggy address is located 58 bytes inside of 176-byte region [ffff88807f1cb700, ffff88807f1cb7b0) The buggy address belongs to the page: page:ffffea0001fc72c0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7f1cb flags: 0xfff00000000200(slab\|node=0\|zone=1\|lastcpupid=0x7ff) raw: 00fff00000000200 dead000000000100 dead000000000122 ffff8881413bb780 raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 0, migratetype Unmovable, gfp_mask 0x112a20(GFP_ATOMIC\|__GFP_NOWARN\|__GFP_NORETRY\|__GFP_HARDWALL), pid 5, ts 108466983062, free_ts 108048976062 prep_new_page mm/page_alloc.c:2418 [inline] get_page_from_freelist+0xa72/0x2f50 mm/page_alloc.c:4149 __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5369 alloc_pages+0x1a7/0x300 mm/mempolicy.c:2191 alloc_slab_page mm/slub.c:1793 [inline] allocate_slab mm/slub.c:1930 [inline] new_slab+0x32d/0x4a0 mm/slub.c:1993 ___slab_alloc+0x918/0xfe0 mm/slub.c:3022 __slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3109 slab_alloc_node mm/slub.c:3200 [inline] slab_alloc mm/slub.c:3242 [inline] kmem_cache_alloc+0x35c/0x3a0 mm/slub.c:3247 dst_alloc+0x146/0x1f0 net/core/dst.c:92 rt_dst_alloc+0x73/0x430 net/ipv4/route.c:1613 __mkroute_output net/ipv4/route.c:2564 [inline] ip_route_output_key_hash_rcu+0x921/0x2d00 net/ipv4/route.c:2791 ip_route_output_key_hash+0x18b/0x300 net/ipv4/route.c:2619 __ip_route_output_key include/net/route.h:126 [inline] ip_route_output_flow+0x23/0x150 net/ipv4/route.c:2850 ip_route_output_key include/net/route.h:142 [inline] geneve_get_v4_rt+0x3a6/0x830 drivers/net/geneve.c:809 geneve_xmit_skb drivers/net/geneve.c:899 [inline] geneve_xmit+0xc4a/0x3540 drivers/net/geneve.c:1082 __netdev_start_xmit include/linux/netdevice.h:4994 [inline] netdev_start_xmit include/linux/netdevice.h:5008 [inline] xmit_one net/core/dev.c:3590 [inline] dev_hard_start_xmit+0x1eb/0x920 net/core/dev.c:3606 __dev_queue_xmit+0x299a/0x3650 net/core/dev.c:4229 page last free stack trace: reset_page_owner include/linux/page_owner.h:24 [inline] free_pages_prepare mm/page_alloc.c:1338 [inline] free_pcp_prepare+0x374/0x870 mm/page_alloc.c:1389 free_unref_page_prepare mm/page_alloc.c:3309 [inline] free_unref_page+0x19/0x690 mm/page_alloc.c:3388 qlink_free mm/kasan/quarantine.c:146 [inline] qlist_free_all+0x5a/0xc0 mm/kasan/quarantine.c:165 kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:272 __kasan_slab_alloc+0xa2/0xc0 mm/kasan/common.c:444 kasan_slab_alloc include/linux/kasan.h:259 [inline] slab_post_alloc_hook mm/slab.h:519 [inline] slab_alloc_node mm/slub.c:3234 [inline] kmem_cache_alloc_node+0x255/0x3f0 mm/slub.c:3270 __alloc_skb+0x215/0x340 net/core/skbuff.c:414 alloc_skb include/linux/skbuff.h:1126 [inline] alloc_skb_with_frags+0x93/0x620 net/core/skbuff.c:6078 sock_alloc_send_pskb+0x783/0x910 net/core/sock.c:2575 mld_newpack+0x1df/0x770 net/ipv6/mcast.c:1754 add_grhead+0x265/0x330 net/ipv6/mcast.c:1857 add_grec+0x1053/0x14e0 net/ipv6/mcast.c:1995 mld_send_initial_cr.part.0+0xf6/0x230 net/ipv6/mcast.c:2242 mld_send_initial_cr net/ipv6/mcast.c:1232 [inline] mld_dad_work+0x1d3/0x690 net/ipv6/mcast.c:2268 process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298 worker_thread+0x658/0x11f0 kernel/workqueue.c:2445 Memory state around the buggy address: ffff88807f1cb600: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88807f1cb680: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc >ffff88807f1cb700: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88807f1cb780: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc ffff88807f1cb800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: `41063e9dd1` ("ipv4: Early TCP socket demux.") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20211220143330.680945-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-05-12 16:55:33 +02:00
Antoine Tenart	496fd6c98c	ipv6: move inet6_sk(sk)->rx_dst_cookie to sk->sk_rx_dst_cookie Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2041382 Upstream Status: linux.git Tested: ENRT commit ef57c1610dd8fba5031bf71e0db73356190de151 Author: Eric Dumazet <edumazet@google.com> Date: Mon Oct 25 09:48:17 2021 -0700 ipv6: move inet6_sk(sk)->rx_dst_cookie to sk->sk_rx_dst_cookie Increase cache locality by moving rx_dst_coookie next to sk->sk_rx_dst This removes one or two cache line misses in IPv6 early demux (TCP/UDP) Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Antoine Tenart <atenart@redhat.com>	2022-01-21 11:10:05 +01:00
Antoine Tenart	ffc4c3163b	tcp: move inet->rx_dst_ifindex to sk->sk_rx_dst_ifindex Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2041382 Upstream Status: linux.git Tested: ENRT commit 0c0a5ef809f9150e9229e7b13e43183b681b7a39 Author: Eric Dumazet <edumazet@google.com> Date: Mon Oct 25 09:48:16 2021 -0700 tcp: move inet->rx_dst_ifindex to sk->sk_rx_dst_ifindex Increase cache locality by moving rx_dst_ifindex next to sk->sk_rx_dst This is part of an effort to reduce cache line misses in TCP fast path. This removes one cache line miss in early demux. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Antoine Tenart <atenart@redhat.com>	2022-01-21 11:10:01 +01:00

1 2 3 4 5 ...

731 Commits