JIRA: https://issues.redhat.com/browse/RHEL-62202
Upstream Status: linux.git
commit 0598f8f3bb77893a13105d47bb7dfe42f1dc1f4e
Author: Eric Dumazet <edumazet@google.com>
Date: Tue Feb 27 09:24:09 2024 +0000
inet: annotate devconf data-races
Add READ_ONCE() in ipv4_devconf_get() and corresponding
WRITE_ONCE() in ipv4_devconf_set()
Add IPV4_DEVCONF_RO() and IPV4_DEVCONF_ALL_RO() macros,
and use them when reading devconf fields.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240227092411.2315725-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5284
JIRA: https://issues.redhat.com/browse/RHEL-59754
RHEL-57748 backported `ec20b2830093 ("ipv4: Set scope explicitly in ip_route_output().")`, which will set `ip_route_output` tos to 0. This breaks bonding arp monitoring as later in `ip_rt_fix_tos` the scope is reset to RT_SCOPE_UNIVERSE since `tos` is 0. The backported patch 16a28267774c ("ipv4: Don't reset ->flowi4_scope in ip_rt_fix_tos().") fixed this issue as the scope will not set to RT_SCOPE_UNIVERSE.
Signed-off-by: Hangbin Liu <haliu@redhat.com>
Approved-by: Guillaume Nault <gnault@redhat.com>
Approved-by: Florian Westphal <fwestpha@redhat.com>
Approved-by: Ivan Vecera <ivecera@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>
Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-61380
Upstream Status: linux.git
commit cc73bbab4b1fb8a4f53a24645871dafa5f81266a
Author: Ido Schimmel <idosch@nvidia.com>
Date: Thu Jul 18 15:34:07 2024 +0300
ipv4: Fix incorrect source address in Record Route option
The Record Route IP option records the addresses of the routers that
routed the packet. In the case of forwarded packets, the kernel performs
a route lookup via fib_lookup() and fills in the preferred source
address of the matched route.
The lookup is performed with the DS field of the forwarded packet, but
using the RT_TOS() macro which only masks one of the two ECN bits. If
the packet is ECT(0) or CE, the matched route might be different than
the route via which the packet was forwarded as the input path masks
both of the ECN bits, resulting in the wrong address being filled in the
Record Route option.
Fix by masking both of the ECN bits.
Fixes: 8e36360ae8 ("ipv4: Remove route key identity dependencies in ip_rt_get_source().")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20240718123407.434778-1-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Guillaume Nault <gnault@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-61380
Upstream Status: linux.git
commit f036e68212c11e5a7edbb59b5e25299341829485
Author: Ido Schimmel <idosch@nvidia.com>
Date: Mon Jul 15 17:23:54 2024 +0300
ipv4: Fix incorrect TOS in fibmatch route get reply
The TOS value that is returned to user space in the route get reply is
the one with which the lookup was performed ('fl4->flowi4_tos'). This is
fine when the matched route is configured with a TOS as it would not
match if its TOS value did not match the one with which the lookup was
performed.
However, matching on TOS is only performed when the route's TOS is not
zero. It is therefore possible to have the kernel incorrectly return a
non-zero TOS:
# ip link add name dummy1 up type dummy
# ip address add 192.0.2.1/24 dev dummy1
# ip route get fibmatch 192.0.2.2 tos 0xfc
192.0.2.0/24 tos 0x1c dev dummy1 proto kernel scope link src 192.0.2.1
Fix by instead returning the DSCP field from the FIB result structure
which was populated during the route lookup.
Output after the patch:
# ip link add name dummy1 up type dummy
# ip address add 192.0.2.1/24 dev dummy1
# ip route get fibmatch 192.0.2.2 tos 0xfc
192.0.2.0/24 dev dummy1 proto kernel scope link src 192.0.2.1
Extend the existing selftests to not only verify that the correct route
is returned, but that it is also returned with correct "tos" value (or
without it).
Fixes: b61798130f ("net: ipv4: RTM_GETROUTE: return matched fib result when requested")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Guillaume Nault <gnault@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-61380
Upstream Status: linux.git
commit 338bb57e4c2a1c2c6fc92f9c0bd35be7587adca7
Author: Ido Schimmel <idosch@nvidia.com>
Date: Mon Jul 15 17:23:53 2024 +0300
ipv4: Fix incorrect TOS in route get reply
The TOS value that is returned to user space in the route get reply is
the one with which the lookup was performed ('fl4->flowi4_tos'). This is
fine when the matched route is configured with a TOS as it would not
match if its TOS value did not match the one with which the lookup was
performed.
However, matching on TOS is only performed when the route's TOS is not
zero. It is therefore possible to have the kernel incorrectly return a
non-zero TOS:
# ip link add name dummy1 up type dummy
# ip address add 192.0.2.1/24 dev dummy1
# ip route get 192.0.2.2 tos 0xfc
192.0.2.2 tos 0x1c dev dummy1 src 192.0.2.1 uid 0
cache
Fix by adding a DSCP field to the FIB result structure (inside an
existing 4 bytes hole), populating it in the route lookup and using it
when filling the route get reply.
Output after the patch:
# ip link add name dummy1 up type dummy
# ip address add 192.0.2.1/24 dev dummy1
# ip route get 192.0.2.2 tos 0xfc
192.0.2.2 dev dummy1 src 192.0.2.1 uid 0
cache
Fixes: 1a00fee4ff ("ipv4: Remove rt_key_{src,dst,tos} from struct rtable.")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Guillaume Nault <gnault@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-61380
Upstream Status: linux.git
Conflicts: (context) Missing upstream commit e6175a2ed1f1 ("xfrm: fix
"disable_policy" flag use when arriving from different
devices"):
Centos Stream 9 doesn't have the IPSKB_NOPOLICY flag in
struct inet_skb_parm (include/net/ip.h).
commit 6ac66cb03ae306c2e288a9be18226310529f5b25
Author: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
Date: Thu Aug 31 10:03:30 2023 +0200
ipv4: ignore dst hint for multipath routes
Route hints when the nexthop is part of a multipath group causes packets
in the same receive batch to be sent to the same nexthop irrespective of
the multipath hash of the packet. So, do not extract route hint for
packets whose destination is part of a multipath group.
A new SKB flag IPSKB_MULTIPATH is introduced for this purpose, set the
flag when route is looked up in ip_mkroute_input() and use it in
ip_extract_route_hint() to check for the existence of the flag.
Fixes: 02b2494161 ("ipv4: use dst hint for ipv4 list receive")
Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Guillaume Nault <gnault@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-59754
Upstream Status: net.git commit 16a28267774c
commit 16a28267774cd9f85405ef83d4afcbd0355e5817
Author: Guillaume Nault <gnault@redhat.com>
Date: Thu Apr 21 01:21:24 2022 +0200
ipv4: Don't reset ->flowi4_scope in ip_rt_fix_tos().
All callers already initialise ->flowi4_scope with RT_SCOPE_UNIVERSE,
either by manual field assignment, memset(0) of the whole structure or
implicit structure initialisation of on-stack variables
(RT_SCOPE_UNIVERSE actually equals 0).
Therefore, we don't need to always initialise ->flowi4_scope in
ip_rt_fix_tos(). We only need to reduce the scope to RT_SCOPE_LINK when
the special RTO_ONLINK flag is present in the tos.
This will allow some code simplification, like removing
ip_rt_fix_tos(). Also, the long term idea is to remove RTO_ONLINK
entirely by properly initialising ->flowi4_scope, instead of
overloading ->flowi4_tos with a special flag. Eventually, this will
allow to convert ->flowi4_tos to dscp_t.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Hangbin Liu <haliu@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-59087
commit 3e453ca122d483eb519f934b6624215f0536301c
Author: Petr Machata <petrm@nvidia.com>
Date: Fri Jun 7 17:13:53 2024 +0200
net: ipv4,ipv6: Pass multipath hash computation through a helper
The following patches will add a sysctl to control multipath hash
seed. In order to centralize the hash computation, add a helper,
fib_multipath_hash_from_keys(), and have all IPv4 and IPv6 route.c
invocations of flow_hash_from_keys() go through this helper instead.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240607151357.421181-2-petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-41185
CVE: CVE-2024-36971
Tested: compile only
Conflicts:
- context difference in include/net/dst_ops.h due to missing
43c2817225fc from upstream.
commit 92f1655aa2b2294d0b49925f3b875a634bd3b59e
Author: Eric Dumazet <edumazet@google.com>
Date: Tue May 28 11:43:53 2024 +0000
net: fix __dst_negative_advice() race
__dst_negative_advice() does not enforce proper RCU rules when
sk->dst_cache must be cleared, leading to possible UAF.
RCU rules are that we must first clear sk->sk_dst_cache,
then call dst_release(old_dst).
Note that sk_dst_reset(sk) is implementing this protocol correctly,
while __dst_negative_advice() uses the wrong order.
Given that ip6_negative_advice() has special logic
against RTF_CACHE, this means each of the three ->negative_advice()
existing methods must perform the sk_dst_reset() themselves.
Note the check against NULL dst is centralized in
__dst_negative_advice(), there is no need to duplicate
it in various callbacks.
Many thanks to Clement Lecigne for tracking this issue.
This old bug became visible after the blamed commit, using UDP sockets.
Fixes: a87cb3e48e ("net: Facility to report route quality of connected sockets")
Reported-by: Clement Lecigne <clecigne@google.com>
Diagnosed-by: Clement Lecigne <clecigne@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <tom@herbertland.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240528114353.1794151-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Xin Long <lxin@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-40130
commit bf3fcbf7e7a08015d3b169bad6281b29d45c272d
Author: Beniamino Galvani <b.galvani@gmail.com>
Date: Mon Oct 16 09:15:20 2023 +0200
ipv4: rename and move ip_route_output_tunnel()
At the moment ip_route_output_tunnel() is used only by bareudp.
Ideally, other UDP tunnel implementations should use it, but to do so
the function needs to accept new parameters that are specific for UDP
tunnels, such as the ports.
Prepare for these changes by renaming the function to
udp_tunnel_dst_lookup() and move it to file
net/ipv4/udp_tunnel_core.c.
Suggested-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Beniamino Galvani <b.galvani@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-15695
commit 418a73074da9182f571e467eaded03ea501f3281
Author: Maxime Bizon <mbizon@freebox.fr>
Date: Thu Apr 20 20:25:08 2023 +0200
net: dst: fix missing initialization of rt_uncached
xfrm_alloc_dst() followed by xfrm4_dst_destroy(), without a
xfrm4_fill_dst() call in between, causes the following BUG:
BUG: spinlock bad magic on CPU#0, fbxhostapd/732
lock: 0x890b7668, .magic: 890b7668, .owner: <none>/-1, .owner_cpu: 0
CPU: 0 PID: 732 Comm: fbxhostapd Not tainted 6.3.0-rc6-next-20230414-00613-ge8de66369925-dirty #9
Hardware name: Marvell Kirkwood (Flattened Device Tree)
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x28/0x30
dump_stack_lvl from do_raw_spin_lock+0x20/0x80
do_raw_spin_lock from rt_del_uncached_list+0x30/0x64
rt_del_uncached_list from xfrm4_dst_destroy+0x3c/0xbc
xfrm4_dst_destroy from dst_destroy+0x5c/0xb0
dst_destroy from rcu_process_callbacks+0xc4/0xec
rcu_process_callbacks from __do_softirq+0xb4/0x22c
__do_softirq from call_with_stack+0x1c/0x24
call_with_stack from do_softirq+0x60/0x6c
do_softirq from __local_bh_enable_ip+0xa0/0xcc
Patch "net: dst: Prevent false sharing vs. dst_entry:: __refcnt" moved
rt_uncached and rt_uncached_list fields from rtable struct to dst
struct, so they are more zeroed by memset_after(xdst, 0, u.dst) in
xfrm_alloc_dst().
Note that rt_uncached (list_head) was never properly initialized at
alloc time, but xfrm[46]_dst_destroy() is written in such a way that
it was not an issue thanks to the memset:
if (xdst->u.rt.dst.rt_uncached_list)
rt_del_uncached_list(&xdst->u.rt);
The route code does it the other way around: rt_uncached_list is
assumed to be valid IIF rt_uncached list_head is not empty:
void rt_del_uncached_list(struct rtable *rt)
{
if (!list_empty(&rt->dst.rt_uncached)) {
struct uncached_list *ul = rt->dst.rt_uncached_list;
spin_lock_bh(&ul->lock);
list_del_init(&rt->dst.rt_uncached);
spin_unlock_bh(&ul->lock);
}
}
This patch adds mandatory rt_uncached list_head initialization in
generic dst_init(), and adapt xfrm[46]_dst_destroy logic to match the
rest of the code.
Fixes: d288a162dd1c ("net: dst: Prevent false sharing vs. dst_entry:: __refcnt")
Reported-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/oe-lkp/202304162125.18b7bcdd-oliver.sang@intel.com
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
CC: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Maxime Bizon <mbizon@freebox.fr>
Link: https://lore.kernel.org/r/20230420182508.2417582-1-mbizon@freebox.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Felix Maurer <fmaurer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-15695
Conflicts:
- include/net/dst.h: We the kABI padding added at the end of dst_entry,
keep it at the end.
commit d288a162dd1c73507da582966f17dd226e34a0c0
Author: Wangyang Guo <wangyang.guo@intel.com>
Date: Thu Mar 23 21:55:29 2023 +0100
net: dst: Prevent false sharing vs. dst_entry:: __refcnt
dst_entry::__refcnt is highly contended in scenarios where many connections
happen from and to the same IP. The reference count is an atomic_t, so the
reference count operations have to take the cache-line exclusive.
Aside of the unavoidable reference count contention there is another
significant problem which is caused by that: False sharing.
perf top identified two affected read accesses. dst_entry::lwtstate and
rtable::rt_genid.
dst_entry:__refcnt is located at offset 64 of dst_entry, which puts it into
a seperate cacheline vs. the read mostly members located at the beginning
of the struct.
That prevents false sharing vs. the struct members in the first 64
bytes of the structure, but there is also
dst_entry::lwtstate
which is located after the reference count and in the same cache line. This
member is read after a reference count has been acquired.
struct rtable embeds a struct dst_entry at offset 0. struct dst_entry has a
size of 112 bytes, which means that the struct members of rtable which
follow the dst member share the same cache line as dst_entry::__refcnt.
Especially
rtable::rt_genid
is also read by the contexts which have a reference count acquired
already.
When dst_entry:__refcnt is incremented or decremented via an atomic
operation these read accesses stall. This was found when analysing the
memtier benchmark in 1:100 mode, which amplifies the problem extremly.
Move the rt[6i]_uncached[_list] members out of struct rtable and struct
rt6_info into struct dst_entry to provide padding and move the lwtstate
member after that so it ends up in the same cache line.
The resulting improvement depends on the micro-architecture and the number
of CPUs. It ranges from +20% to +120% with a localhost memtier/memcached
benchmark.
[ tglx: Rearrange struct ]
Signed-off-by: Wangyang Guo <wangyang.guo@intel.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230323102800.042297517@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Felix Maurer <fmaurer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-16999
commit b071af523579df7341cabf0f16fc661125e9a13f
Author: Eric Dumazet <edumazet@google.com>
Date: Mon Mar 13 20:17:31 2023 +0000
neighbour: annotate lockless accesses to n->nud_state
We have many lockless accesses to n->nud_state.
Before adding another one in the following patch,
add annotations to readers and writers.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/2843
JIRA: https://issues.redhat.com/browse/RHEL-1848
Already in CS9
Omitted-fix: 327b18b7aaed ("mm/kfence: select random number before taking raw lock")
Omitted-fix: bfbfb6182ad1 ("nfsd_splice_actor(): handle compound pages")
Omitted-fix: ac8db824ead0 ("NFSD: Fix reads with a non-zero offset that don't end on a page boundary")
Omitted-fix: b3719108ae60 ("perf kmem: Support legacy tracepoints")
Omitted-fix: dce088ab0d51 ("perf kmem: Support field "node" in evsel__process_alloc_event() coping with recent tracepoint restructuring")
Omitted-fix: c18c20f16219 ("mm, slab: remove duplicate kernel-doc comment for ksize()")
Omitted-fix: cfccd2e63e7e ("mm, compaction: finish pageblocks on complete migration failure")
Omitted-fix: 6342140db660 ("selftests/timens: add a test for vfork+exit")
Omitted-fix: be6667b0db97 ("selftests/vm: dedup hugepage allocation logic")
Omitted-fix: 9d0d94684007 ("selftests/vm: add selftest to verify multi THP collapse")
Omitted-fix: 1370a21fe470 ("selftests/vm: add selftest to verify recollapse of THPs")
Omitted-fix: b25806dcd3d5 ("mm: memcontrol: deprecate swapaccounting=0 mode")
Omitted-fix: b94c4e949c36 ("mm: memcontrol: use do_memsw_account() in a few more places")
Omitted-fix: e55b9f96860f ("mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol")
Omitted-fix: 6f777dcef774 ("docs: kmsan: fix formatting of "Example report"")
Omitted fix: 26e1a0c3277d ("mm: use pmdp_get_lockless() without surplus barrier()")
Omitted-fix: 0cb8fd4d1416 ("mm/migrate: remove cruft from migration_entry_wait()s")
patches resulting in empty commits after conflict resolution
Omitted-fix: 4a7e922587d2 ("selftests: vm: add /dev/userfaultfd test cases to run_vmtests.sh")
patches that are functionally identical
Omitted-fix: 6f777dcef774 ("docs: kmsan: fix formatting of "Example report"")
Is identical to 436fa4a699bc ("docs: kmsan: fix formatting of "Example report"")
Defer to crypto group
Omitted-fix: f900fde28883 ("crypto: testmgr - fix RNG performance in fuzz tests")
Not including since we're specifically excluding the Maple Tree VMA Iterator
Omitted-fix: 524e00b36e8c ("mm: remove rb tree.")
'series' patches that won't be addressed by this MR
Omitted-fix: 9905eed48e82 ("Merge branch 'af_unix-OOB-fixes'")
Omitted-fix: 2e4b231ac125 ("scsi: NCR5380: Use sc_data_direction instead of rq_data_dir()")
Omitted-fix: 40e16ce7b6fa ("scsi: advansys: Use scsi_cmd_to_rq() instead of scsi_cmnd.request")
Omitted-fix: 11bf4ec58073 ("scsi: aha1542: Use scsi_cmd_to_rq() instead of scsi_cmnd.request")
Omitted-fix: 3ada9c791b1d ("scsi: dpt_i2o: Use scsi_cmd_to_rq() instead of scsi_cmnd.request")
Omitted-fix: 240ec1197786 ("scsi: ips: Use scsi_cmd_to_rq() instead of scsi_cmnd.request")
Omitted-fix: ce425dd7dbc9 ("scsi: mvumi: Use scsi_cmd_to_rq() instead of scsi_cmnd.request")
Omitted-fix: 2fd8f23aae36 ("scsi: myrb: Use scsi_cmd_to_rq() instead of scsi_cmnd.request")
Omitted-fix: 43b2d1b14ed0 ("scsi: myrs: Use scsi_cmd_to_rq() instead of scsi_cmnd.request")
Omitted-fix: 0f8f3ea84a89 ("scsi: ncr53c8xx: Use scsi_cmd_to_rq() instead of scsi_cmnd.request")
Omitted-fix: 3f5e62c5e074 ("scsi: qla1280: Use scsi_cmd_to_rq() instead of scsi_cmnd.request")
Omitted-fix: ba4baf0951bb ("scsi: qlogicpti: Use scsi_cmd_to_rq() instead of scsi_cmnd.request")
Omitted-fix: ec808ef9b838 ("scsi: snic: Use scsi_cmd_to_rq() instead of scsi_cmnd.request")
Omitted-fix: bbfa8d7d1283 ("scsi: stex: Use scsi_cmd_to_rq() instead of scsi_cmnd.request")
Omitted-fix: 6c5d5422c533 ("scsi: sun3_scsi: Use scsi_cmd_to_rq() instead of scsi_cmnd.request")
Omitted-fix: 77ff7756c73e ("scsi: sym53c8xx: Use scsi_cmd_to_rq() instead of scsi_cmnd.request")
Omitted-fix: 80ca10b6052d ("scsi: xen-scsifront: Use scsi_cmd_to_rq() instead of scsi_cmnd.request")
Omitted-fix: 332f606b32b6 ("ovl: enable RCU'd ->get_acl()")
Omitted-fix: b3b6f5b92255 ("btrfs: handle idmaps in btrfs_new_inode()")
Omitted-fix: ca07274c3da9 ("btrfs: allow idmapped rename inode op")
Omitted-fix: c020d2eaf1a8 ("btrfs: allow idmapped getattr inode op")
Omitted-fix: 72105277dcfc ("btrfs: allow idmapped mknod inode op")
Omitted-fix: e93ca491d03f ("btrfs: allow idmapped create inode op")
Omitted-fix: b0b3e44d346c ("btrfs: allow idmapped mkdir inode op")
Omitted-fix: 5a0521086e5f ("btrfs: allow idmapped symlink inode op")
Omitted-fix: 98b6ab5fc098 ("btrfs: allow idmapped tmpfile inode op")
Omitted-fix: d4d094646142 ("btrfs: allow idmapped setattr inode op")
Omitted-fix: 3bc71ba02cf5 ("btrfs: allow idmapped permission inode op")
Omitted-fix: 5474bf400f16 ("btrfs: check whether fsgid/fsuid are mapped during subvolume creation")
Omitted-fix: 4d4340c912cc ("btrfs: allow idmapped SNAP_CREATE/SUBVOL_CREATE ioctls")
Omitted-fix: c4ed533bdc79 ("btrfs: allow idmapped SNAP_DESTROY ioctls")
Omitted-fix: aabb34e7a31c ("btrfs: relax restrictions for SNAP_DESTROY_V2 with subvolids")
Omitted-fix: e4fed17a32b6 ("btrfs: allow idmapped SET_RECEIVED_SUBVOL ioctls")
Omitted-fix: 39e1674ff035 ("btrfs: allow idmapped SUBVOL_SETFLAGS ioctl")
Omitted-fix: 6623d9a0b0ce ("btrfs: allow idmapped INO_LOOKUP_USER ioctl")
Omitted-fix: 4a8b34afa9c9 ("btrfs: handle ACLs on idmapped mounts")
Omitted-fix: 5b9b26f5d0b8 ("btrfs: allow idmapped mount")
Omitted-fix: 8cc5c54de44c ("docs: update mapping documentation")
Omitted-fix: 02e407991350 ("fs: remove unused low-level mapping helpers")
Omitted-fix: ce70fd9a551a ("scsi: core: Remove the cmd field from struct scsi_request")
Omitted-fix: 5b794f98074a ("scsi: core: Remove the sense and sense_len fields from struct scsi_request")
Omitted-fix: a9a4ea1166d6 ("scsi: core: Move the resid_len field from struct scsi_request to struct scsi_cmnd")
Omitted-fix: dbb4c84d87af ("scsi: core: Move the result field from struct scsi_request to struct scsi_cmnd")
Omitted-fix: 6aded12b10e0 ("scsi: core: Remove struct scsi_request")
Omitted-fix: 264403033105 ("scsi: core: Remove <scsi/scsi_request.h>")
Omitted-fix: cd4b46cdb491 ("scsi: 53c700: Use scsi_cmd_to_rq() instead of scsi_cmnd.request")
Omitted-fix: 417c434aa1b4 ("docs/zh_CN: core-api: Update the translation of cachetlb.rst to 5.19-rc3")
Omitted-fix: 1ebfae49fd44 ("docs/zh_CN: core-api: Update the translation of cpu_hotplug.rst to 5.19-rc3")
Omitted-fix: 722ecdbce68a ("docs/zh_CN: core-api: Update the translation of irq/irq-domain.rst to 5.19-rc3")
Omitted-fix: b2fdf7f080b4 ("docs/zh_CN: core-api: Update the translation of kernel-api.rst to 5.19-rc3")
Omitted-fix: e86a0e297f0b ("docs/zh_CN: core-api: Update the translation of printk-format.rst to 5.19-rc3")
Omitted-fix: c290f175e73f ("docs/zh_CN: core-api: Update the translation of workqueue.rst to 5.19-rc3")
Omitted-fix: 4a6d00a43ef7 ("docs/zh_CN: core-api: Update the translation of xarray.rst to 5.19-rc3")
Omitted-fix: e8f60cd7db24 ("Merge tag 'perf-tools-fixes-for-v6.2-2-2023-01-11' of git://git.kernel.org/pub/scm/linux/ker…")
Omitted-fix: 3a761d72fa62 ("exportfs: support idmapped mounts")
Omitted-fix: 22f289ce1f8b ("ovl: use ovl_lookup_upper() wrapper")
Omitted-fix: 50db8d027355 ("ovl: handle idmappings for layer fileattrs")
Omitted-fix: c85bcc912f4f ("kselftests: memcg: update the oom group leaf events test")
Omitted-fix: be74553f250f ("kselftests: memcg: speed up the memory.high test")
Omitted-fix: 1bd1a4dd3e8c ("MAINTAINERS: add corresponding kselftests to cgroup entry")
Omitted-fix: 3a761d72fa62 ("exportfs: support idmapped mounts")
Omitted-fix: 22f289ce1f8b ("ovl: use ovl_lookup_upper() wrapper")
Omitted-fix: 50db8d027355 ("ovl: handle idmappings for layer fileattrs")
Omitted-fix: c85bcc912f4f ("kselftests: memcg: update the oom group leaf events test")
Omitted-fix: be74553f250f ("kselftests: memcg: speed up the memory.high test")
Omitted-fix: 1bd1a4dd3e8c ("MAINTAINERS: add corresponding kselftests to cgroup entry")
Omitted-fix: cdc69458a5f3 ("cgroup: account for memory_recursiveprot in test_memcg_low()")
Omitted-fix: 72b1e03aa725 ("cgroup: account for memory_localevents in test_memcg_oom_group_leaf_events()")
Omitted-fix: 830316807e02 ("cgroup: remove racy check in test_memcg_sock()")
Omitted-fix: c1a31a2f7a9c ("cgroup: fix racy check in alloc_pagecache_max_30M() helper function")
Omitted-fix: c01d4d0a82b7 ("random: quiet urandom warning ratelimit suppression message")
Omitted-fix: 21873bd66b6e ("Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux")
Omitted-fix: ff3b72a5d614 ("selftests: memcg: fix compilation")
Omitted-fix: 1d09069f5313 ("selftests: memcg: expect no low events in unprotected sibling")
Omitted-fix: 63fbdd3c77ec ("net: use DEBUG_NET_WARN_ON_ONCE() in __release_sock()")
Omitted-fix: 76458faeb285 ("net: use DEBUG_NET_WARN_ON_ONCE() in dev_loopback_xmit()")
Omitted-fix: 3e7f2b8d3088 ("net: use WARN_ON_ONCE() in inet_sock_destruct()")
Omitted-fix: 7890e2f09d43 ("net: use DEBUG_NET_WARN_ON_ONCE() in skb_release_head_state()")
Omitted-fix: ee2640df2393 ("net: add debug checks in napi_consume_skb and __napi_alloc_skb()")
Omitted-fix: 39e0f991a62e ("random: mark bootloader randomness code as __init")
Omitted-fix: 6342140db660 ("selftests/timens: add a test for vfork+exit")
Omitted-fix: cf21b355ccb3 ("af_unix: Optimise hash table layout.")
Omitted-fix: c12db92d62bf ("ovl: port to vfs{g,u}id_t and associated helpers")
Omitted-fix: 73db6a063c78 ("ovl: port to vfs{g,u}id_t and associated helpers")
Omitted-fix: 1e8a9191ccc2 ("f2fs: port to vfs{g,u}id_t and associated helpers")
Omitted-fix: a03a972b26da ("fuse: port to vfs{g,u}id_t and associated helpers")
Omitted-fix: 00d369bc2de5 ("fuse: port to vfs{g,u}id_t and associated helpers")
Omitted-fix: 276a3f7cf1d9 ("ksmbd: port to vfs{g,u}id_t and associated helpers")
Omitted-fix: 45c311501c77 ("fs: use mount types in iattr")
Omitted-fix: 1f36146a5a3d ("fs: introduce tiny iattr ownership update helpers")
Omitted-fix: 35faf3109a78 ("fs: port to iattr ownership update helpers")
Omitted-fix: 71e7b535b890 ("quota: port quota helpers mount ids")
Omitted-fix: b27c82e12965 ("attr: port attribute changes to new types")
Omitted-fix: cf21b355ccb3 ("af_unix: Optimise hash table layout.")
Omitted-fix: e95ab1d85289 ("selftests: net: af_unix: Test connect() with different netns.")
Omitted-fix: 169005eae2af ("docs/zh_CN: Update the translation of mm-api to 6.1-rc8")
Omitted-fix: 659797dc4d64 ("Docs/zh_CN: Update the translation of iio_configfs to 5.19-rc8")
Omitted-fix: 6a5057e9dc13 ("Docs/zh_CN: Update the translation of sparse to 5.19-rc8")
Omitted-fix: 63c1d2516b05 ("Docs/zh_CN: Update the translation of testing-overview to 5.19-rc8")
Omitted-fix: 83b41bb27b25 ("Docs/zh_CN: Update the translation of usage to 5.19-rc8")
Omitted-fix: c78478e164d4 ("Docs/zh_CN: Update the translation of pci-iov-howto to 5.19-rc8")
Omitted-fix: ce1120076c53 ("Docs/zh_CN: Update the translation of pci to 5.19-rc8")
Omitted-fix: 4116ff79749d ("Docs/zh_CN: Update the translation of sched-stats to 5.19-rc8")
Omitted-fix: 7f02464739da ("9p: convert to advancing variant of iov_iter_get_pages_alloc()")
Omitted-fix: 5b09c9fec086 ("do_proc_readlink(): constify path")
Omitted-fix: ea4af4aa03c3 ("nd_jump_link(): constify path")
Omitted-fix: 20f45ad50d65 ("spufs: constify path")
Omitted-fix: 88569546e8a1 ("ecryptfs: constify path")
Omitted-fix: 9204a97f7ae8 ("sched: Change wait_task_inactive()s match_state")
Omitted-fix: 04c6b79ae4f0 ("btrfs: convert __process_pages_contig() to use filemap_get_folios_contig()")
Omitted-fix: a75b81c3f63b ("btrfs: convert end_compressed_writeback() to use filemap_get_folios()")
Omitted-fix: 47d554199513 ("btrfs: convert process_page_range() to use filemap_get_folios_contig()")
Omitted-fix: 24a1efb4a912 ("nilfs2: convert nilfs_find_uncommited_extent() to use filemap_get_folios_contig()")
Omitted-fix: 7c18b64bba3b ("mips: ralink: mt7621: do not use kzalloc too early")
Omitted-fix: 7d37539037c2 ("fuse: implement ->tmpfile()")
Omitted-fix: f743f16c548b ("treewide: use get_random_{u8,u16}() when possible, part 2")
Omitted-fix: 6ab587e8e8b4 ("docs/zh_CN: Update the translation of delay-accounting to 6.1-rc8")
Omitted-fix: cf306a26cb3a ("docs/zh_CN: Update the translation of kernel-api to 6.1-rc8")
Omitted-fix: e07e9f22259e ("docs/zh_CN: Update the translation of testing-overview to 6.1-rc8")
Omitted-fix: ffdd9bd7a278 ("docs/zh_CN: Update the translation of reclaim to 6.1-rc8")
Omitted-fix: 9a833802a04d ("docs/zh_CN: Update the translation of start to 6.1-rc8")
Omitted-fix: 7cb52d4b3724 ("docs/zh_CN: Update the translation of usage to 6.1-rc8")
Omitted-fix: 03474d581df3 ("docs/zh_CN: Update the translation of msi-howto to 6.1-rc8")
Omitted-fix: 7df047be4363 ("docs/zh_CN: Update the translation of energy-model to 6.1-rc8")
Omitted-fix: e0068090095c ("docs/zh_CN: Update the translation of highmem to 6.1-rc8")
Omitted-fix: 0f3d70cb01da ("docs/zh_CN: Update the translation of ksm to 6.1-rc8")
Omitted-fix: 11018ef90ce7 ("s390/checksum: remove not needed uaccess.h include")
Omitted-fix: 2ea3498980f5 ("mm/damon/core: split out DAMOS-charged region skip logic into a new function")
Omitted-fix: e63a30c51f84 ("mm/damon/core: split damos application logic into a new function")
Omitted-fix: d1cbbf621fc2 ("mm/damon/core: split out scheme stat update logic into a new function")
Omitted-fix: 898810e5ca54 ("mm/damon/core: split out scheme quota adjustment logic into a new function")
Omitted-fix: 789a230613c8 ("mm/damon/sysfs: use damon_addr_range for region's start and end values")
Omitted-fix: 1f71981408ef ("mm/damon/sysfs: remove parameters of damon_sysfs_region_alloc()")
Omitted-fix: 39240595917e ("mm/damon/sysfs: move sysfs_lock to common module")
Omitted-fix: d332fe11debe ("mm/damon/sysfs: move unsigned long range directory to common module")
Omitted-fix: 4acd715ff57f ("mm/damon/sysfs: split out kdamond-independent schemes stats update logic into a new function")
Omitted-fix: c8e7b4d0ba34 ("mm/damon/sysfs: split out schemes directory implementation to separate file")
Omitted fix: dfe843dce775 ("s390/checksum: support GENERIC_CSUM, enable it for KASAN")
Omitted fix: e42ac7789df6 ("s390/checksum: always use cksm instruction")
Omitted fix: 1a167ddd3c56 ("x86: kmsan: pgtable: reduce vmalloc space")
Omitted fix: 7cf8f44a5a1c ("x86: fs: kmsan: disable CONFIG_DCACHE_WORD_ACCESS")
Omitted fix: 1468c6f4558b ("mm: fs: initialize fsdata passed to write_begin/write_end interface")
Omitted fix: 0aa8ea3c5d35 ("mm/compaction: correct comment of fast_find_migrateblock in isolate_migratepages")
Omitted fix: 42855f588e18 ("x86/purgatory: disable KMSAN instrumentation")
Omitted fix: 11385b261200 ("x86/uaccess: instrument copy_from_user_nmi()")
Omitted fix: f70da5ee8fe1 ("mm/damon: convert damon_pa_mark_accessed_or_deactivate() to use folios")
Omitted fix: 5a9e34747c9f ("mm/swap: convert deactivate_page() to folio_deactivate()")
Omitted fix: 0aa8ea3c5d35 ("mm/compaction: correct comment of fast_find_migrateblock in isolate_migratepages")
Omitted fix: de1f5055523e ("mm/mempolicy: convert queue_pages_pmd() to queue_folios_pmd()")
Omitted fix: 3dae02bbd07f ("mm/mempolicy: convert queue_pages_pte_range() to queue_folios_pte_range()")
Omitted fix: 0a2c1e818316 ("mm/mempolicy: convert queue_pages_hugetlb() to queue_folios_hugetlb()")
Omitted fix: d451b89dcd18 ("mm/mempolicy: convert queue_pages_required() to queue_folio_required()")
Omitted fix: 4a64981dfee9 ("mm/mempolicy: convert migrate_page_add() to migrate_folio_add()")
Omitted fix: 0aa8ea3c5d35 ("mm/compaction: correct comment of fast_find_migrateblock in isolate_migratepages")
Omitted fix: 46c475bd676b ("mm/pgtable: kmap_local_page() instead of kmap_atomic()")
Omitted fix: 0d940a9b270b ("mm/pgtable: allow pte_offset_map[_lock]() to fail")
Omitted fix: 65747aaf42b7 ("mm/filemap: allow pte_offset_map_lock() to fail")
Omitted fix: 45fe85e9811e ("mm/page_vma_mapped: delete bogosity in page_vma_mapped_walk()")
Omitted fix: 90f43b0a13cd ("mm/page_vma_mapped: reformat map_pte() with less indentation")
Omitted fix: 2798bbe75b9c ("mm/page_vma_mapped: pte_offset_map_nolock() not pte_lockptr()")
Omitted fix: 7780d04046a2 ("mm/pagewalkers: ACTION_AGAIN if pte_offset_map_lock() fails")
Omitted fix: be872f83bf57 ("mm/pagewalk: walk_pte_range() allow for pte_offset_map()")
Omitted fix: e5ad581c7f1c ("mm/vmwgfx: simplify pmd & pud mapping dirty helpers")
Omitted fix: 0d1c81edc61e ("mm/vmalloc: vmalloc_to_page() use pte_offset_kernel()")
Omitted fix: 6ec1905f6ec7 ("mm/hmm: retry if pte_offset_map() fails")
Omitted fix: 2b683a4ff6ee ("mm/userfaultfd: retry if pte_offset_map() fails")
Omitted fix: 3622d3cde308 ("mm/userfaultfd: allow pte_offset_map_lock() to fail")
Omitted fix: 9f2bad096d2f ("mm/debug_vm_pgtable,page_table_check: warn pte map fails")
Omitted fix: 04dee9e85cf5 ("mm/various: give up if pte_offset_map[_lock]() fails")
Omitted fix: 670ddd8cdcbd ("mm/mprotect: delete pmd_none_or_clear_bad_unless_trans_huge()")
Omitted fix: a5be621ee292 ("mm/mremap: retry if either pte_offset_map_*lock() fails")
Omitted fix: 179d3e4f3bfa ("mm/madvise: clean up force_shm_swapin_readahead()")
Omitted fix: d850fa729873 ("mm/swapoff: allow pte_offset_map[_lock]() to fail")
Omitted fix: 52fc048320ad ("mm/mglru: allow pte_offset_map_nolock() to fail")
Omitted fix: 4b56069c95d6 ("mm/migrate_device: allow pte_offset_map_lock() to fail")
Omitted fix: 2378118bd9da ("mm/gup: remove FOLL_SPLIT_PMD use of pmd_trans_unstable()")
Omitted fix: c9c1ee20ee84 ("mm/huge_memory: split huge pmd under one pte_offset_map()")
Omitted fix: 895f5ee464cc ("mm/khugepaged: allow pte_offset_map[_lock]() to fail")
Omitted fix: 3db82b9374ca ("mm/memory: allow pte_offset_map[_lock]() to fail")
Omitted fix: c7ad08804fae ("mm/memory: handle_pte_fault() use pte_offset_map_nolock()")
Omitted fix: 20b18aada185 ("madvise:madvise_free_huge_pmd(): don't use mapcount() against large folio for sharing check")
Omitted fix: 3db82b9374ca ("mm/memory: allow pte_offset_map[_lock]() to fail")
Omitted fix: c7ad08804fae ("mm/memory: handle_pte_fault() use pte_offset_map_nolock()")
Omitted fix: 20b18aada185 ("madvise:madvise_free_huge_pmd(): don't use mapcount() against large folio for sharing check")
Coming Soon:
Omitted-fix: 6f0df8e16eb5 ("memcontrol: ensure memcg acquired by id is properly set up")
Omitted-fix: ee40d543e97d ("mm/pagewalk: fix bootstopping regression from extra pte_unmap()")
Omitted-fix: ab048302026d ("ovl: fix failed copyup of fileattr on a symlink")
Omitted-fix: 92fe9dcbe4e1 ("hugetlbfs: clear resv_map pointer if mmap fails")
Omitted-fix: bf4916922c60 ("hugetlbfs: extend hugetlb_vma_lock to private VMAs")
Omitted-fix: 2820b0f09be9 ("hugetlbfs: close race between MADV_DONTNEED and page fault")
Brew: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=56452800
Tested: KT1+mm regression: https://beaker.engineering.redhat.com/jobs/8467307
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
Approved-by: Jan Stancek <jstancek@redhat.com>
Approved-by: Mika Penttilä <mpenttil@redhat.com>
Approved-by: Jerry Snitselaar <jsnitsel@redhat.com>
Approved-by: Alex Gladkov <agladkov@redhat.com>
Approved-by: Vladis Dronov <vdronov@redhat.com>
Approved-by: Dean Nelson <dnelson@redhat.com>
Approved-by: Rafael Aquini <aquini@redhat.com>
Approved-by: Baoquan He <5820488-baoquan_he@users.noreply.gitlab.com>
Approved-by: Jiri Benc <jbenc@redhat.com>
Approved-by: John W. Linville <linville@redhat.com>
Signed-off-by: Scott Weaver <scweaver@redhat.com>
Conflicts:
drivers/gpu/drm/tests/drm_buddy_test.c
drivers/gpu/drm/tests/drm_mm_test.c - We already have
ce28ab1380e8 ("drm/tests: Add back seed value information")
so keep calls to kunit_info.
drop changes to drivers/misc/habanalabs/gaudi2/gaudi2.c
fs/ntfs3/fslog.c - files not in CS9
net/sunrpc/auth_gss/gss_krb5_wrap.c - We already have
7f675ca7757b ("SUNRPC: Improve Kerberos confounder generation")
so code to change is gone.
drivers/gpu/drm/i915/i915_gem_gtt.c
drivers/gpu/drm/i915/selftests/i915_selftest.c
drivers/gpu/drm/tests/drm_buddy_test.c
drivers/gpu/drm/tests/drm_mm_test.c
change added under
4cb818386e ("Merge DRM changes from upstream v6.0.8..v6.1")
JIRA: https://issues.redhat.com/browse/RHEL-1848
commit a251c17aa558d8e3128a528af5cf8b9d7caae4fd
Author: Jason A. Donenfeld <Jason@zx2c4.com>
Date: Wed Oct 5 17:43:22 2022 +0200
treewide: use get_random_u32() when possible
The prandom_u32() function has been a deprecated inline wrapper around
get_random_u32() for several releases now, and compiles down to the
exact same code. Replace the deprecated wrapper with a direct call to
the real function. The same also applies to get_random_int(), which is
just a wrapper around get_random_u32(). This was done as a basic find
and replace.
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz> # for ext4
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake
Acked-by: Chuck Lever <chuck.lever@oracle.com> # for nfsd
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> # for thunderbol
t
Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
Acked-by: Helge Deller <deller@gmx.de> # for parisc
Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-14295
Upstream Status: linux.git
commit 0add5c597f3253a9c6108a0a81d57f44ab0d9d30
Author: Benjamin Poirier <bpoirier@nvidia.com>
Date: Tue Sep 26 14:27:30 2023 -0400
ipv4: Set offload_failed flag in fibmatch results
Due to a small omission, the offload_failed flag is missing from ipv4
fibmatch results. Make sure it is set correctly.
The issue can be witnessed using the following commands:
echo "1 1" > /sys/bus/netdevsim/new_device
ip link add dummy1 up type dummy
ip route add 192.0.2.0/24 dev dummy1
echo 1 > /sys/kernel/debug/netdevsim/netdevsim1/fib/fail_route_offload
ip route add 198.51.100.0/24 dev dummy1
ip route
# 192.168.15.0/24 has rt_trap
# 198.51.100.0/24 has rt_offload_failed
ip route get 192.168.15.1 fibmatch
# Result has rt_trap
ip route get 198.51.100.1 fibmatch
# Result differs from the route shown by `ip route`, it is missing
# rt_offload_failed
ip link del dev dummy1
echo 1 > /sys/bus/netdevsim/del_device
Fixes: 36c5100e85 ("IPv4: Add "offload failed" indication to routes")
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230926182730.231208-1-bpoirier@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Guillaume Nault <gnault@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-5426
CVE: CVE-2023-42754
commit 0113d9c9d1ccc07f5a3710dac4aa24b6d711278c
Author: Kyle Zeng <zengyhkyle@gmail.com>
Date: Thu Sep 14 22:12:57 2023 -0700
ipv4: fix null-deref in ipv4_link_failure
Currently, we assume the skb is associated with a device before calling
__ip_options_compile, which is not always the case if it is re-routed by
ipvs.
When skb->dev is NULL, dev_net(skb->dev) will become null-dereference.
This patch adds a check for the edge case and switch to use the net_device
from the rtable when skb->dev is NULL.
Fixes: ed0de45a10 ("ipv4: recompile ip options in ipv4_link_failure")
Suggested-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Kyle Zeng <zengyhkyle@gmail.com>
Cc: Stephen Suryaputra <ssuryaextr@gmail.com>
Cc: Vadim Fedorenko <vfedorenko@novek.ru>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Felix Maurer <fmaurer@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2175258
Conflicts:
- Removed chunks of unsupported protocol AX.25
- Renamed the funtions also in ipvlan. Commit 40b9d1ab63f5 ("ipvlan: hold lower
dev to avoid possible use-after-free") was backported out of order so it had
to use the old functions names.
commit d62607c3fe45911b2331fac073355a8c914bbde2
Author: Jakub Kicinski <kuba@kernel.org>
Date: Tue Jun 7 21:39:55 2022 -0700
net: rename reference+tracking helpers
Netdev reference helpers have a dev_ prefix for historic
reasons. Renaming the old helpers would be too much churn
but we can rename the tracking ones which are relatively
recent and should be the default for new code.
Rename:
dev_hold_track() -> netdev_hold()
dev_put_track() -> netdev_put()
dev_replace_track() -> netdev_ref_replace()
Link: https://lore.kernel.org/r/20220608043955.919359-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2180612
Tested: compile only
commit 29e5375d7fcb5f88b438d74d537bbfd67ac75a64
Author: Eric Dumazet <edumazet@google.com>
Date: Thu Feb 10 13:42:31 2022 -0800
ipv4: add (struct uncached_list)->quarantine list
This is an optimization to keep the per-cpu lists as short as possible:
Whenever rt_flush_dev() changes one rtable dst.dev
matching the disappearing device, it can can transfer the object
to a quarantine list, waiting for a final rt_del_uncached_list().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Xin Long <lxin@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2160073
Upstream Status: linux.git
commit 8895a9c2ac76fb9d3922fed4fe092c8ec5e5cccc
Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date: Mon Jul 18 10:26:41 2022 -0700
ipv4: Fix data-races around sysctl_fib_multipath_hash_fields.
While reading sysctl_fib_multipath_hash_fields, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its readers.
Fixes: ce5c9c20d3 ("ipv4: Add a sysctl to control multipath hash fields")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2160073
Upstream Status: linux.git
commit 7998c12a08c97cc26660532c9f90a34bd7d8da5a
Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date: Mon Jul 18 10:26:40 2022 -0700
ipv4: Fix data-races around sysctl_fib_multipath_hash_policy.
While reading sysctl_fib_multipath_hash_policy, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its readers.
Fixes: bf4e0a3db9 ("net: ipv4: add support for ECMP hash policy choice")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2149949
Upstream Status: linux.git
Conflicts: (context) Missing upstream commit ac6627a28dbf ("net: ipv4:
Consolidate ipv4_mtu and ip_dst_mtu_maybe_forward"):
Centos Stream returns immediately in the if condition.
commit 60c158dc7b1f0558f6cadd5b50d0386da0000d50
Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date: Wed Jul 13 13:51:53 2022 -0700
ip: Fix data-races around sysctl_ip_fwd_use_pmtu.
While reading sysctl_ip_fwd_use_pmtu, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.
Fixes: f87c10a8aa ("ipv4: introduce ip_dst_mtu_maybe_forward and protect forwarding path against pmtu spoofing")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2140160
Conflicts:
- removed n/a hunk for unsupported prestera driver
commit 888ade8f90d7dbbdc8552ae9b23d311f9e61ab0e
Author: Guillaume Nault <gnault@redhat.com>
Date: Fri Apr 8 22:08:37 2022 +0200
ipv4: Use dscp_t in struct fib_rt_info
Use the new dscp_t type to replace the tos field of struct fib_rt_info.
This ensures ECN bits are ignored and makes it compatible with the
fa_dscp field of struct fib_alias.
This also allows sparse to flag potential incorrect uses of DSCP and
ECN bits.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2140160
commit 32ccf1107980e8ed5c62cf6666da7a47a4fc7ecf
Author: Guillaume Nault <gnault@redhat.com>
Date: Fri Feb 4 14:58:19 2022 +0100
ipv4: Use dscp_t in struct fib_alias
Use the new dscp_t type to replace the fa_tos field of fib_alias. This
ensures ECN bits are ignored and makes the field compatible with the
fc_dscp field of struct fib_config.
Converting old *tos variables and fields to dscp_t allows sparse to
flag incorrect uses of DSCP and ECN bits. This patch is entirely about
type annotation and shouldn't change any existing behaviour.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Acked-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2129093
Upstream Status: commit 40867d74c374b
commit 40867d74c374b235e14d839f3a77f26684feefe5
Author: David Ahern <dsahern@kernel.org>
Date: Mon Mar 14 14:45:51 2022 -0600
net: Add l3mdev index to flow struct and avoid oif reset for port devices
The fundamental premise of VRF and l3mdev core code is binding a socket
to a device (l3mdev or netdev with an L3 domain) to indicate L3 scope.
Legacy code resets flowi_oif to the l3mdev losing any original port
device binding. Ben (among others) has demonstrated use cases where the
original port device binding is important and needs to be retained.
This patch handles that by adding a new entry to the common flow struct
that can indicate the l3mdev index for later rule and table matching
avoiding the need to reset flowi_oif.
In addition to allowing more use cases that require port device binds,
this patch brings a few datapath simplications:
1. l3mdev_fib_rule_match is only called when walking fib rules and
always after l3mdev_update_flow. That allows an optimization to bail
early for non-VRF type uses cases when flowi_l3mdev is not set. Also,
only that index needs to be checked for the FIB table id.
2. l3mdev_update_flow can be called with flowi_oif set to a l3mdev
(e.g., VRF) device. By resetting flowi_oif only for this case the
FLOWI_FLAG_SKIP_NH_OIF flag is not longer needed and can be removed,
removing several checks in the datapath. The flowi_iif path can be
simplified to only be called if the it is not loopback (loopback can
not be assigned to an L3 domain) and the l3mdev index is not already
set.
3. Avoid another device lookup in the output path when the fib lookup
returns a reject failure.
Note: 2 functional tests for local traffic with reject fib rules are
updated to reflect the new direct failure at FIB lookup time for ping
rather than the failure on packet path. The current code fails like this:
HINT: Fails since address on vrf device is out of device scope
COMMAND: ip netns exec ns-A ping -c1 -w1 -I eth1 172.16.3.1
ping: Warning: source address might be selected on device other than: eth1
PING 172.16.3.1 (172.16.3.1) from 172.16.3.1 eth1: 56(84) bytes of data.
--- 172.16.3.1 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms
where the test now directly fails:
HINT: Fails since address on vrf device is out of device scope
COMMAND: ip netns exec ns-A ping -c1 -w1 -I eth1 172.16.3.1
ping: connect: No route to host
Signed-off-by: David Ahern <dsahern@kernel.org>
Tested-by: Ben Greear <greearb@candelatech.com>
Link: https://lore.kernel.org/r/20220314204551.16369-1-dsahern@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Phil Sutter <psutter@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2059161
Upstream Status: linux.git
commit c4eb664191b4a5ff6856478f903924176697719e
Author: Menglong Dong <imagedong@tencent.com>
Date: Wed Apr 13 16:15:53 2022 +0800
net: ipv4: add skb drop reasons to ip_error()
Eventually, I find out the handler function for inputting route lookup
fail: ip_error().
The drop reasons we used in ip_error() are almost corresponding to
IPSTATS_MIB_*, and following new reasons are introduced:
SKB_DROP_REASON_IP_INADDRERRORS
SKB_DROP_REASON_IP_INNOROUTES
Isn't the name SKB_DROP_REASON_IP_HOSTUNREACH and
SKB_DROP_REASON_IP_NETUNREACH more accurate? To make them corresponding
to IPSTATS_MIB_*, we keep their name still.
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2111270
Upstream Status: commit 49ecc2e9c3ab
commit 49ecc2e9c3abd269951972fa8b23a4d081111b80
Author: Eric Dumazet <edumazet@google.com>
Date: Mon Nov 15 09:23:03 2021 -0800
net: align static siphash keys
siphash keys use 16 bytes.
Define siphash_aligned_key_t macro so that we can make sure they
are not crossing a cache line boundary.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Florian Westphal <fwestpha@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2096377
commit 9038c320001dd07f60736018edf608ac5baca0ab
Author: Eric Dumazet <edumazet@google.com>
Date: Sat Dec 4 20:22:03 2021 -0800
net: dst: add net device refcount tracking to dst_entry
We want to track all dev_hold()/dev_put() to ease leak hunting.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2081383
Upstream Status: linux.git
commit 544b4dd568e3b09c1ab38a759d3187e7abda11a0
Author: Guillaume Nault <gnault@redhat.com>
Date: Thu Mar 17 13:45:09 2022 +0100
ipv4: Fix route lookups when handling ICMP redirects and PMTU updates
The PMTU update and ICMP redirect helper functions initialise their fl4
variable with either __build_flow_key() or build_sk_flow_key(). These
initialisation functions always set ->flowi4_scope with
RT_SCOPE_UNIVERSE and might set the ECN bits of ->flowi4_tos. This is
not a problem when the route lookup is later done via
ip_route_output_key_hash(), which properly clears the ECN bits from
->flowi4_tos and initialises ->flowi4_scope based on the RTO_ONLINK
flag. However, some helpers call fib_lookup() directly, without
sanitising the tos and scope fields, so the route lookup can fail and,
as a result, the ICMP redirect or PMTU update aren't taken into
account.
Fix this by extracting the ->flowi4_tos and ->flowi4_scope sanitisation
code into ip_rt_fix_tos(), then use this function in handlers that call
fib_lookup() directly.
Note 1: We can't sanitise ->flowi4_tos and ->flowi4_scope in a central
place (like __build_flow_key() or flowi4_init_output()), because
ip_route_output_key_hash() expects non-sanitised values. When called
with sanitised values, it can erroneously overwrite RT_SCOPE_LINK with
RT_SCOPE_UNIVERSE in ->flowi4_scope. Therefore we have to be careful to
sanitise the values only for those paths that don't call
ip_route_output_key_hash().
Note 2: The problem is mostly about sanitising ->flowi4_tos. Having
->flowi4_scope initialised with RT_SCOPE_UNIVERSE instead of
RT_SCOPE_LINK probably wasn't really a problem: sockets with the
SOCK_LOCALROUTE flag set (those that'd result in RTO_ONLINK being set)
normally shouldn't receive ICMP redirects or PMTU updates.
Fixes: 4895c771c7 ("ipv4: Add FIB nexthop exceptions.")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2081383
Upstream Status: linux.git
commit 9fcf986cc4bc6a3a39f23fbcbbc3a9e52d3c24fd
Author: Eric Dumazet <edumazet@google.com>
Date: Wed Feb 16 09:32:16 2022 -0800
ipv4: fix data races in fib_alias_hw_flags_set
fib_alias_hw_flags_set() can be used by concurrent threads,
and is only RCU protected.
We need to annotate accesses to following fields of struct fib_alias:
offload, trap, offload_failed
Because of READ_ONCE()WRITE_ONCE() limitations, make these
field u8.
BUG: KCSAN: data-race in fib_alias_hw_flags_set / fib_alias_hw_flags_set
read to 0xffff888134224a6a of 1 bytes by task 2013 on cpu 1:
fib_alias_hw_flags_set+0x28a/0x470 net/ipv4/fib_trie.c:1050
nsim_fib4_rt_hw_flags_set drivers/net/netdevsim/fib.c:350 [inline]
nsim_fib4_rt_add drivers/net/netdevsim/fib.c:367 [inline]
nsim_fib4_rt_insert drivers/net/netdevsim/fib.c:429 [inline]
nsim_fib4_event drivers/net/netdevsim/fib.c:461 [inline]
nsim_fib_event drivers/net/netdevsim/fib.c:881 [inline]
nsim_fib_event_work+0x1852/0x2cf0 drivers/net/netdevsim/fib.c:1477
process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
process_scheduled_works kernel/workqueue.c:2370 [inline]
worker_thread+0x7df/0xa70 kernel/workqueue.c:2456
kthread+0x1bf/0x1e0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30
write to 0xffff888134224a6a of 1 bytes by task 4872 on cpu 0:
fib_alias_hw_flags_set+0x2d5/0x470 net/ipv4/fib_trie.c:1054
nsim_fib4_rt_hw_flags_set drivers/net/netdevsim/fib.c:350 [inline]
nsim_fib4_rt_add drivers/net/netdevsim/fib.c:367 [inline]
nsim_fib4_rt_insert drivers/net/netdevsim/fib.c:429 [inline]
nsim_fib4_event drivers/net/netdevsim/fib.c:461 [inline]
nsim_fib_event drivers/net/netdevsim/fib.c:881 [inline]
nsim_fib_event_work+0x1852/0x2cf0 drivers/net/netdevsim/fib.c:1477
process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
process_scheduled_works kernel/workqueue.c:2370 [inline]
worker_thread+0x7df/0xa70 kernel/workqueue.c:2456
kthread+0x1bf/0x1e0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30
value changed: 0x00 -> 0x02
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 4872 Comm: kworker/0:0 Not tainted 5.17.0-rc3-syzkaller-00188-g1d41d2e82623-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events nsim_fib_event_work
Fixes: 90b93f1b31 ("ipv4: Add "offload" and "trap" indications to routes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20220216173217.3792411-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2037315
Upstream commit(s):
commit 1160dfa178eb848327e9dec39960a735f4dc1685
Author: Yajun Deng <yajun.deng@linux.dev>
Date: Thu Aug 5 19:55:27 2021 +0800
net: Remove redundant if statements
The 'if (dev)' statement already move into dev_{put , hold}, so remove
redundant if statements.
Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Petr Oros <poros@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2015112
Upstream Status: linux.git
Tested: LNST
CVE: CVE-2021-20322
commit 67d6d681e15b578c1725bad8ad079e05d1c48a8e
Author: Eric Dumazet <edumazet@google.com>
Date: Sun Aug 29 15:16:15 2021 -0700
ipv4: make exception cache less predictible
Even after commit 6457378fe7 ("ipv4: use siphash instead of Jenkins in
fnhe_hashfun()"), an attacker can still use brute force to learn
some secrets from a victim linux host.
One way to defeat these attacks is to make the max depth of the hash
table bucket a random value.
Before this patch, each bucket of the hash table used to store exceptions
could contain 6 items under attack.
After the patch, each bucket would contains a random number of items,
between 6 and 10. The attacker can no longer infer secrets.
This is slightly increasing memory size used by the hash table,
by 50% in average, we do not expect this to be a problem.
This patch is more complex than the prior one (IPv6 equivalent),
because IPv4 was reusing the oldest entry.
Since we need to be able to evict more than one entry per
update_or_create_fnhe() call, I had to replace
fnhe_oldest() with fnhe_remove_oldest().
Also note that we will queue extra kfree_rcu() calls under stress,
which hopefully wont be a too big issue.
Fixes: 4895c771c7 ("ipv4: Add FIB nexthop exceptions.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Keyu Man <kman001@ucr.edu>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: David Ahern <dsahern@kernel.org>
Tested-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Antoine Tenart <atenart@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2024572
Upstream Status: linux.git
commit 92548b0ee220e000d81c27ac9a80e0ede895a881
Author: Eric Dumazet <edumazet@google.com>
Date: Mon Aug 30 19:02:10 2021 -0700
ipv4: fix endianness issue in inet_rtm_getroute_build_skb()
The UDP length field should be in network order.
This removes the following sparse error:
net/ipv4/route.c:3173:27: warning: incorrect type in assignment (different base types)
net/ipv4/route.c:3173:27: expected restricted __be16 [usertype] len
net/ipv4/route.c:3173:27: got unsigned long
Fixes: 404eb77ea7 ("ipv4: support sport, dport and ip_proto in RTM_GETROUTE")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Roopa Prabhu <roopa@nvidia.com>
Cc: David Ahern <dsahern@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Guillaume Nault <gnault@redhat.com>
A group of security researchers brought to our attention
the weakness of hash function used in fnhe_hashfun().
Lets use siphash instead of Jenkins Hash, to considerably
reduce security risks.
Also remove the inline keyword, this really is distracting.
Fixes: d546c62154 ("ipv4: harden fnhe_hashfun()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Keyu Man <kman001@ucr.edu>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Trivial conflict in net/netfilter/nf_tables_api.c.
Duplicate fix in tools/testing/selftests/net/devlink_port_split.py
- take the net-next version.
skmsg, and L4 bpf - keep the bpf code but remove the flags
and err params.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 14972cbd34 ("net: lwtunnel: Handle fragmentation") moved
fragmentation logic away from lwtunnel by carry encap headroom and
use it in output MTU calculation. But the forwarding part was not
covered and created difference in MTU for output and forwarding and
further to silent drops on ipv4 forwarding path. Fix it by taking
into account lwtunnel encap headroom.
The same commit also introduced difference in how to treat RTAX_MTU
in IPv4 and IPv6 where latter explicitly removes lwtunnel encap
headroom from route MTU. Make IPv4 version do the same.
Fixes: 14972cbd34 ("net: lwtunnel: Handle fragmentation")
Suggested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Trivial conflicts in net/can/isotp.c and
tools/testing/selftests/net/mptcp/mptcp_connect.sh
scaled_ppm_to_ppb() was moved from drivers/ptp/ptp_clock.c
to include/linux/ptp_clock_kernel.h in -next so re-apply
the fix there.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Oliver reported a use case where deleting a VRF device can hang
waiting for the refcnt to drop to 0. The root cause is that the dst
is allocated against the VRF device but cached on the loopback
device.
The use case (added to the selftests) has an implicit VRF crossing
due to the ordering of the FIB rules (lookup local is before the
l3mdev rule, but the problem occurs even if the FIB rules are
re-ordered with local after l3mdev because the VRF table does not
have a default route to terminate the lookup). The end result is
is that the FIB lookup returns the loopback device as the nexthop,
but the ingress device is in a VRF. The mismatch causes the dst
alloc against the VRF device but then cached on the loopback.
The fix is to bring the trick used for IPv6 (see ip6_rt_get_dev_rcu):
pick the dst alloc device based the fib lookup result but with checks
that the result has a nexthop device (e.g., not an unreachable or
prohibit entry).
Fixes: f5a0aab84b ("net: ipv4: dst for local input routes should use l3mdev if relevant")
Reported-by: Oliver Herms <oliver.peter.herms@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new multipath hash policy where the packet fields used for hash
calculation are determined by user space via the
fib_multipath_hash_fields sysctl that was introduced in the previous
patch.
The current set of available packet fields includes both outer and inner
fields, which requires two invocations of the flow dissector. Avoid
unnecessary dissection of the outer or inner flows by skipping
dissection if none of the outer or inner fields are required.
In accordance with the existing policies, when an skb is not available,
packet fields are extracted from the provided flow key. In which case,
only outer fields are considered.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
A subsequent patch will add another multipath hash policy where the
multipath hash is calculated directly by the policy specific code and
not outside of the switch statement.
Prepare for this change by moving the multipath hash calculation inside
the switch statement.
No functional changes intended.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 73f156a6e8 ("inetpeer: get rid of ip_id_count")
I used a very small hash table that could be abused
by patient attackers to reveal sensitive information.
Switch to a dynamic sizing, depending on RAM size.
Typical big hosts will now use 128x more storage (2 MB)
to get a similar increase in security and reduction
of hash collisions.
As a bonus, use of alloc_large_system_hash() spreads
allocated memory among all NUMA nodes.
Fixes: 73f156a6e8 ("inetpeer: get rid of ip_id_count")
Reported-by: Amit Klein <aksecurity@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
proc_creat_seq() that directly take a struct seq_operations,
and deal with network namespaces in ->open.
Signed-off-by: Yejune Deng <yejune.deng@gmail.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>