Centos-kernel-stream-9

Commit Graph

Author	SHA1	Message	Date
Petr Oros	5597fb4160	net: fix crash when config small gso_max_size/gso_ipv4_max_size JIRA: https://issues.redhat.com/browse/RHEL-57756 CVE: CVE-2024-50258 Upstream commit(s): commit 9ab5cf19fb0e4680f95e506d6c544259bf1111c4 Author: Wang Liang <wangliang74@huawei.com> Date: Wed Oct 23 11:52:13 2024 +0800 net: fix crash when config small gso_max_size/gso_ipv4_max_size Config a small gso_max_size/gso_ipv4_max_size will lead to an underflow in sk_dst_gso_max_size(), which may trigger a BUG_ON crash, because sk->sk_gso_max_size would be much bigger than device limits. Call Trace: tcp_write_xmit tso_segs = tcp_init_tso_segs(skb, mss_now); tcp_set_skb_tso_segs tcp_skb_pcount_set // skb->len = 524288, mss_now = 8 // u16 tso_segs = 524288/8 = 65535 -> 0 tso_segs = DIV_ROUND_UP(skb->len, mss_now) BUG_ON(!tso_segs) Add check for the minimum value of gso_max_size and gso_ipv4_max_size. Fixes: `46e6b992c2` ("rtnetlink: allow GSO maximums to be set on device creation") Fixes: 9eefedd58ae1 ("net: add gso_ipv4_max_size and gro_ipv4_max_size per device") Signed-off-by: Wang Liang <wangliang74@huawei.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241023035213.517386-1-wangliang74@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Petr Oros <poros@redhat.com>	2024-12-10 10:37:56 +01:00
Petr Oros	7ed4c990cc	rtnetlink: Add bulk registration helpers for rtnetlink message handlers. JIRA: https://issues.redhat.com/browse/RHEL-57756 Upstream commit(s): commit 07cc7b0b942bf55ef1a471470ecda8d2a6a6541f Author: Kuniyuki Iwashima <kuniyu@amazon.com> Date: Tue Oct 8 11:47:32 2024 -0700 rtnetlink: Add bulk registration helpers for rtnetlink message handlers. Before commit `addf9b90de` ("net: rtnetlink: use rcu to free rtnl message handlers"), once rtnl_msg_handlers[protocol] was allocated, the following rtnl_register_module() for the same protocol never failed. However, after the commit, rtnl_msg_handler[protocol][msgtype] needs to be allocated in each rtnl_register_module(), so each call could fail. Many callers of rtnl_register_module() do not handle the returned error, and we need to add many error handlings. To handle that easily, let's add wrapper functions for bulk registration of rtnetlink message handlers. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Petr Oros <poros@redhat.com>	2024-12-10 10:37:56 +01:00
Petr Oros	b24948627d	rtnetlink: delete redundant judgment statements JIRA: https://issues.redhat.com/browse/RHEL-57756 Upstream commit(s): commit 2d522384fb5b8187cb7f8fe7d05c119ac38fd8f3 Author: Li Zetao <lizetao1@huawei.com> Date: Thu Aug 22 12:32:46 2024 +0800 rtnetlink: delete redundant judgment statements The initial value of err is -ENOBUFS, and err is guaranteed to be less than 0 before all goto errout. Therefore, on the error path of errout, there is no need to repeatedly judge that err is less than 0, and delete redundant judgments to make the code more concise. Signed-off-by: Li Zetao <lizetao1@huawei.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Petr Oros <poros@redhat.com>	2024-12-10 10:37:55 +01:00
Petr Oros	42016f1fd6	net: reduce rtnetlink_rcv_msg() stack usage JIRA: https://issues.redhat.com/browse/RHEL-57756 Upstream commit(s): commit cef4902b0fadfc4181176ef5713f0b7cf2a40d8f Author: Eric Dumazet <edumazet@google.com> Date: Wed Jul 10 15:16:53 2024 +0000 net: reduce rtnetlink_rcv_msg() stack usage IFLA_MAX is increasing slowly but surely. Some compilers use more than 512 bytes of stack in rtnetlink_rcv_msg() because it calls rtnl_calcit() for RTM_GETLINK message. Use noinline_for_stack attribute to not inline rtnl_calcit(), and directly use nla_for_each_attr_type() (Jakub suggestion) because we only care about IFLA_EXT_MASK at this stage. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240710151653.3786604-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Petr Oros <poros@redhat.com>	2024-12-10 10:37:55 +01:00
Petr Oros	c976657153	rtnetlink: move rtnl_lock handling out of af_netlink JIRA: https://issues.redhat.com/browse/RHEL-57756 Upstream commit(s): commit 5380d64f8d766576ac5c0f627418b2d0e1d2641f Author: Jakub Kicinski <kuba@kernel.org> Date: Thu Jun 6 12:29:05 2024 -0700 rtnetlink: move rtnl_lock handling out of af_netlink Now that we have an intermediate layer of code for handling rtnl-level netlink dump quirks, we can move the rtnl_lock taking there. For dump handlers with RTNL_FLAG_DUMP_SPLIT_NLM_DONE we can avoid taking rtnl_lock just to generate NLM_DONE, once again. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Petr Oros <poros@redhat.com>	2024-12-10 10:37:55 +01:00
Petr Oros	5f99ca47f6	rtnetlink: allow rtnl_fill_link_netnsid() to run under RCU protection JIRA: https://issues.redhat.com/browse/RHEL-57756 Upstream commit(s): commit 9cf621bd5fcbeadc2804951d13d487e22e95b363 Author: Eric Dumazet <edumazet@google.com> Date: Fri May 3 19:20:59 2024 +0000 rtnetlink: allow rtnl_fill_link_netnsid() to run under RCU protection We want to be able to run rtnl_fill_ifinfo() under RCU protection instead of RTNL in the future. All rtnl_link_ops->get_link_net() methods already using dev_net() are ready. I added READ_ONCE() annotations on others. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Petr Oros <poros@redhat.com>	2024-12-10 10:37:54 +01:00
Petr Oros	1f803ff5bd	rtnetlink: do not depend on RTNL in rtnl_xdp_prog_skb() JIRA: https://issues.redhat.com/browse/RHEL-57756 Upstream commit(s): commit 979aad40da9217d5e907ee4ad7c7f0dc555944a7 Author: Eric Dumazet <edumazet@google.com> Date: Fri May 3 19:20:58 2024 +0000 rtnetlink: do not depend on RTNL in rtnl_xdp_prog_skb() dev->xdp_prog is protected by RCU, we can lift RTNL requirement from rtnl_xdp_prog_skb(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Petr Oros <poros@redhat.com>	2024-12-10 10:37:54 +01:00
Petr Oros	861345bba2	rtnetlink: do not depend on RTNL in rtnl_fill_proto_down() JIRA: https://issues.redhat.com/browse/RHEL-57756 Upstream commit(s): commit 6890ab31d1a35444741e6150db19d64797db2919 Author: Eric Dumazet <edumazet@google.com> Date: Fri May 3 19:20:57 2024 +0000 rtnetlink: do not depend on RTNL in rtnl_fill_proto_down() Change dev_change_proto_down() and dev_change_proto_down_reason() to write once on dev->proto_down and dev->proto_down_reason. Then rtnl_fill_proto_down() can use READ_ONCE() annotations and run locklessly. rtnl_proto_down_size() should assume worst case, because readng dev->proto_down_reason multiple times would be racy without RTNL in the future. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Petr Oros <poros@redhat.com>	2024-12-10 10:37:54 +01:00
Petr Oros	0f51110863	rtnetlink: do not depend on RTNL for many attributes JIRA: https://issues.redhat.com/browse/RHEL-57756 Upstream commit(s): commit 6747a5d4990b8c8d7392f7a06b7a4bb5f4ada80e Author: Eric Dumazet <edumazet@google.com> Date: Fri May 3 19:20:56 2024 +0000 rtnetlink: do not depend on RTNL for many attributes Following device fields can be read locklessly in rtnl_fill_ifinfo() : type, ifindex, operstate, link_mode, mtu, min_mtu, max_mtu, group, promiscuity, allmulti, num_tx_queues, gso_max_segs, gso_max_size, gro_max_size, gso_ipv4_max_size, gro_ipv4_max_size, tso_max_size, tso_max_segs, num_rx_queues. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Petr Oros <poros@redhat.com>	2024-12-10 10:37:54 +01:00
Petr Oros	ce54afa357	rtnetlink: do not depend on RTNL for IFLA_TXQLEN output JIRA: https://issues.redhat.com/browse/RHEL-57756 Upstream commit(s): commit ad13b5b0d1f9eb8e048394919e6393e520b14552 Author: Eric Dumazet <edumazet@google.com> Date: Fri May 3 19:20:54 2024 +0000 rtnetlink: do not depend on RTNL for IFLA_TXQLEN output rtnl_fill_ifinfo() can read dev->tx_queue_len locklessly, granted we add corresponding READ_ONCE()/WRITE_ONCE() annotations. Add missing READ_ONCE(dev->tx_queue_len) in teql_enqueue() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Petr Oros <poros@redhat.com>	2024-12-10 10:37:54 +01:00
Petr Oros	0db2362ca1	rtnetlink: do not depend on RTNL for IFLA_IFNAME output JIRA: https://issues.redhat.com/browse/RHEL-57756 Upstream commit(s): commit 8a58268133622c3d50155ac5798ad1d51d6bd3be Author: Eric Dumazet <edumazet@google.com> Date: Fri May 3 19:20:53 2024 +0000 rtnetlink: do not depend on RTNL for IFLA_IFNAME output We can use netdev_copy_name() to no longer rely on RTNL to fetch dev->name. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Petr Oros <poros@redhat.com>	2024-12-10 10:37:54 +01:00
Petr Oros	83d54ab0fe	rtnetlink: do not depend on RTNL for IFLA_QDISC output JIRA: https://issues.redhat.com/browse/RHEL-57756 Upstream commit(s): commit 698419ffb6fc83dd7b0359d9e8476e732967eed2 Author: Eric Dumazet <edumazet@google.com> Date: Fri May 3 19:20:52 2024 +0000 rtnetlink: do not depend on RTNL for IFLA_QDISC output dev->qdisc can be read using RCU protection. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Petr Oros <poros@redhat.com>	2024-12-10 10:37:54 +01:00
Petr Oros	5c66564865	rtnetlink: use for_each_netdev_dump() in rtnl_stats_dump() JIRA: https://issues.redhat.com/browse/RHEL-57756 Upstream commit(s): commit 0feb396f7428b95710ea72c1dc33ae363019fae5 Author: Eric Dumazet <edumazet@google.com> Date: Thu May 2 11:37:48 2024 +0000 rtnetlink: use for_each_netdev_dump() in rtnl_stats_dump() Switch rtnl_stats_dump() to use for_each_netdev_dump() instead of net->dev_index_head[] hash table. This makes the code much easier to read, and fixes scalability issues. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240502113748.1622637-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Petr Oros <poros@redhat.com>	2024-12-10 10:37:54 +01:00
Petr Oros	a50ab4a87a	rtnetlink: change rtnl_stats_dump() return value JIRA: https://issues.redhat.com/browse/RHEL-57756 Upstream commit(s): commit 136c2a9a2a8760d8dae83ae7c882c50be02bdb63 Author: Eric Dumazet <edumazet@google.com> Date: Thu May 2 11:37:47 2024 +0000 rtnetlink: change rtnl_stats_dump() return value By returning 0 (or an error) instead of skb->len, we allow NLMSG_DONE to be appended to the current skb at the end of a dump, saving a couple of recvmsg() system calls. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240502113748.1622637-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Petr Oros <poros@redhat.com>	2024-12-10 10:37:54 +01:00
Petr Oros	b5e456d0ed	netlink: let core handle error cases in dump operations JIRA: https://issues.redhat.com/browse/RHEL-57756 Upstream commit(s): commit 02e24903e5a46b7a7fca44bcfe0cd6fa5b240c34 Author: Eric Dumazet <edumazet@google.com> Date: Wed Mar 6 10:24:26 2024 +0000 netlink: let core handle error cases in dump operations After commit b5a899154aa9 ("netlink: handle EMSGSIZE errors in the core"), we can remove some code that was not 100 % correct anyway. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306102426.245689-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Petr Oros <poros@redhat.com>	2024-12-10 10:37:52 +01:00
Rado Vrbovsky	fb874c9815	Merge: CNB96: netlink/devlink: update devlink & netlink to the v6.9 MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5257 JIRA: https://issues.redhat.com/browse/RHEL-57755 Depends: !5414 Depends: !4753 Signed-off-by: Petr Oros <poros@redhat.com> Approved-by: Ivan Vecera <ivecera@redhat.com> Approved-by: José Ignacio Tornos Martínez <jtornosm@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>	2024-11-27 11:19:20 +00:00
Rado Vrbovsky	18484e6ffa	Merge: CNB96: net: RTNL pressure reduction MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5605 A series of patches reducing RTNL pressure in net, namely the following upstream series and their prerequisites / fixes / related changes: - 3cbab89268c6 Merge branch 'inet-implement-lockless-rtm_getnetconf-ops' - 9f780efa6eaa Merge branch 'ipv6-devconf-lockless' - e96082570933 Merge branch 'inet_dump_ifaddr-no-rtnl' - 570c86ed60cc Merge branch 'ipv6-lockless-dump-addrs' Depends: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5414 JIRA: https://issues.redhat.com/browse/RHEL-62205 JIRA: https://issues.redhat.com/browse/RHEL-62204 JIRA: https://issues.redhat.com/browse/RHEL-62203 JIRA: https://issues.redhat.com/browse/RHEL-62202 Signed-off-by: Antoine Tenart <atenart@redhat.com> Approved-by: Sabrina Dubroca <sdubroca@redhat.com> Approved-by: Ivan Vecera <ivecera@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>	2024-11-22 09:20:48 +00:00
Petr Oros	f70dcaae94	net: make dev_unreg_count global JIRA: https://issues.redhat.com/browse/RHEL-57755 Upstream commit(s): commit ffabe98cb576097b77d404d39e8b3df03caa986a Author: Eric Dumazet <edumazet@google.com> Date: Fri Feb 2 10:11:06 2024 +0000 net: make dev_unreg_count global We can use a global dev_unreg_count counter instead of a per netns one. As a bonus we can factorize the changes done on it for bulk device removals. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Petr Oros <poros@redhat.com>	2024-11-20 10:13:42 +01:00
Paolo Abeni	d49c5b08c8	rtnetlink: Don't ignore IFLA_TARGET_NETNSID when ifname is specified in rtnl_dellink(). JIRA: https://issues.redhat.com/browse/RHEL-62849 Tested: LNST, Tier1 Upstream commit: commit 9415d375d8520e0ed55f0c0b058928da9a5b5b3d Author: Kuniyuki Iwashima <kuniyu@amazon.com> Date: Fri Jul 26 17:19:53 2024 -0700 rtnetlink: Don't ignore IFLA_TARGET_NETNSID when ifname is specified in rtnl_dellink(). The cited commit accidentally replaced tgt_net with net in rtnl_dellink(). As a result, IFLA_TARGET_NETNSID is ignored if the interface is specified with IFLA_IFNAME or IFLA_ALT_IFNAME. Let's pass tgt_net to rtnl_dev_get(). Fixes: `cc6090e985` ("net: rtnetlink: introduce helper to get net_device instance by ifname") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-11-15 09:21:35 +01:00
Antoine Tenart	ba114b046d	rtnetlink: make the "split" NLM_DONE handling generic JIRA: https://issues.redhat.com/browse/RHEL-62204 Upstream Status: linux.git commit 5b4b62a169e10401cca34a6e7ac39161986f5605 Author: Jakub Kicinski <kuba@kernel.org> Date: Mon Jun 3 11:48:26 2024 -0700 rtnetlink: make the "split" NLM_DONE handling generic Jaroslav reports Dell's OMSA Systems Management Data Engine expects NLM_DONE in a separate recvmsg(), both for rtnl_dump_ifinfo() and inet_dump_ifaddr(). We already added a similar fix previously in commit 460b0d33cf10 ("inet: bring NLM_DONE out to a separate recv() again") Instead of modifying all the dump handlers, and making them look different than modern for_each_netdev_dump()-based dump handlers - put the workaround in rtnetlink code. This will also help us move the custom rtnl-locking from af_netlink in the future (in net-next). Note that this change is not touching rtnl_dump_all(). rtnl_dump_all() is different kettle of fish and a potential problem. We now mix families in a single recvmsg(), but NLM_DONE is not coalesced. Tested: ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_addr.yaml \ --dump getaddr --json '{"ifa-family": 2}' ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \ --dump getroute --json '{"rtm-family": 2}' ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_link.yaml \ --dump getlink Fixes: 3e41af90767d ("rtnetlink: use xarray iterator to implement rtnl_dump_ifinfo()") Fixes: cdb2f80f1c10 ("inet: use xa_array iterator to implement inet_dump_ifaddr()") Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com> Link: https://lore.kernel.org/all/CAK8fFZ7MKoFSEzMBDAOjoUt+vTZRRQgLDNXEOfdCCXSoXXKE0g@mail.gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Antoine Tenart <atenart@redhat.com>	2024-11-14 10:16:49 +01:00
Antoine Tenart	7ab8c5dc6d	rtnetlink: use xarray iterator to implement rtnl_dump_ifinfo() JIRA: https://issues.redhat.com/browse/RHEL-62204 Upstream Status: linux.git commit 3e41af90767dcf8e5ca91cfbbbcb772584940df9 Author: Eric Dumazet <edumazet@google.com> Date: Sun Feb 11 21:44:04 2024 +0000 rtnetlink: use xarray iterator to implement rtnl_dump_ifinfo() Adopt net->dev_by_index as I did in commit 0e0939c0adf9 ("net-procfs: use xarray iterator to implement /proc/net/dev") This makes sure an existing device is always visible in the dump, regardless of concurrent insertions/deletions. v2: added suggestions from Jakub Kicinski and Ido Schimmel, thanks for the help ! Link: https://lore.kernel.org/all/20240209142441.6c56435b@kernel.org/ Link: https://lore.kernel.org/all/ZckR-XOsULLI9EHc@shredder/ Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20240211214404.1882191-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Antoine Tenart <atenart@redhat.com>	2024-11-14 10:16:49 +01:00
Ivan Vecera	dd746452b2	rtnetlink: provide RCU protection to rtnl_fill_prop_list() JIRA: https://issues.redhat.com/browse/RHEL-62123 commit 0ec4e48c3a233820e0bce1f5ba9ed3e4520f90e9 Author: Eric Dumazet <edumazet@google.com> Date: Thu Feb 22 10:50:21 2024 +0000 rtnetlink: provide RCU protection to rtnl_fill_prop_list() We want to be able to run rtnl_fill_ifinfo() under RCU protection instead of RTNL in the future. dev->name_node items are already rcu protected. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ivan Vecera <ivecera@redhat.com>	2024-10-24 16:14:44 +02:00
Ivan Vecera	938311f7cc	rtnetlink: make rtnl_fill_link_ifmap() RCU ready JIRA: https://issues.redhat.com/browse/RHEL-62123 commit 74808e72e0b2d7cac886151198c0330daadaee70 Author: Eric Dumazet <edumazet@google.com> Date: Thu Feb 22 10:50:20 2024 +0000 rtnetlink: make rtnl_fill_link_ifmap() RCU ready Use READ_ONCE() to read the following device fields: dev->mem_start dev->mem_end dev->base_addr dev->irq dev->dma dev->if_port Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ivan Vecera <ivecera@redhat.com>	2024-10-24 16:14:43 +02:00
Ivan Vecera	8da88cd9da	rtnetlink: add RTNL_FLAG_DUMP_UNLOCKED flag JIRA: https://issues.redhat.com/browse/RHEL-62123 commit 386520e0ecc01004d3a29c70c5a77d4bbf8a8420 Author: Eric Dumazet <edumazet@google.com> Date: Thu Feb 22 10:50:15 2024 +0000 rtnetlink: add RTNL_FLAG_DUMP_UNLOCKED flag Similarly to RTNL_FLAG_DOIT_UNLOCKED, this new flag allows dump operations registered via rtnl_register() or rtnl_register_module() to opt-out from RTNL protection. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ivan Vecera <ivecera@redhat.com>	2024-10-24 16:14:43 +02:00
Ivan Vecera	f183fb3c8a	rtnetlink: prepare nla_put_iflink() to run under RCU JIRA: https://issues.redhat.com/browse/RHEL-62123 Conflicts: * drivers/net/netkit.c - hunk omitted as the driver is not present in RHEL * net/dsa/user.c - the hunk applied in dsa/slave.c due to absence of DSA deps commit e353ea9ce471331c13edffd5977eadd602d1bb80 Author: Eric Dumazet <edumazet@google.com> Date: Thu Feb 22 10:50:08 2024 +0000 rtnetlink: prepare nla_put_iflink() to run under RCU We want to be able to run rtnl_fill_ifinfo() under RCU protection instead of RTNL in the future. This patch prepares dev_get_iflink() and nla_put_iflink() to run either with RTNL or RCU held. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ivan Vecera <ivecera@redhat.com>	2024-10-24 16:14:43 +02:00
Rado Vrbovsky	f177edd8c5	Merge: CNB96: netdev_features: start cleaning netdev_features_t up MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5362 JIRA: https://issues.redhat.com/browse/RHEL-59091 Explanation from the upstream cover letter by Alexander Lobakin: > NETDEV_FEATURE_COUNT is currently 64, which means we can't add any new > features as netdev_features_t is u64. > As per several discussions, instead of converting netdev_features_t to > a bitmap, which would mean A LOT of changes, we can try cleaning up > netdev feature bits. > There's a bunch of bits which don't really mean features, rather device > attributes/properties that can't be changed via Ethtool in any of the > drivers. Such attributes can be moved to netdev private flags without > losing any functionality. > > Start converting some read-only netdev features to private flags from > the ones that are most obvious, like lockless Tx, inability to change > network namespace etc. I was able to reduce NETDEV_FEATURE_COUNT from > 64 to 60, which mean 4 free slots for new features. There are obviously > more read-only features to convert, such as highDMA, "challenged VLAN", > HSR (4 bits) - this will be done in subsequent series. > Please note that netdev features are not uAPI/ABI by any means. Ethtool > passes their names and bits to the userspace separately and there are no > hardcoded names/bits in the userspace, so that new Ethtool could work > on older kernels and vice versa. Even shell scripts won't most likely > break since the removed bits were always read-only, meaning nobody would > try touching them from a script. I proposed a Release Note Text in the Jira to document that "tx-lockless", "netns-local", "fcoe-mtu" will no longer appear in "ethtool -k". Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Approved-by: José Ignacio Tornos Martínez <jtornosm@redhat.com> Approved-by: Ivan Vecera <ivecera@redhat.com> Approved-by: Antoine Tenart <atenart@redhat.com> Approved-by: Eric Chanudet <echanude@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>	2024-10-20 09:09:03 +00:00
Michal Schmidt	12a989692f	netdevice: convert private flags > BIT(31) to bitfields JIRA: https://issues.redhat.com/browse/RHEL-59091 commit beb5a9bea8239cdf4adf6b62672e30db3e9fa5ce Author: Alexander Lobakin <aleksander.lobakin@intel.com> Date: Thu Aug 29 14:33:36 2024 +0200 netdevice: convert private flags > BIT(31) to bitfields Make dev->priv_flags `u32` back and define bits higher than 31 as bitfield booleans as per Jakub's suggestion. This simplifies code which accesses these bits with no optimization loss (testb both before/after), allows to not extend &netdev_priv_flags each time, but also scales better as bits > 63 in the future would only add a new u64 to the structure with no complications, comparing to that extending ::priv_flags would require converting it to a bitmap. Note that I picked `unsigned long :1` to not lose any potential optimizations comparing to `bool :1` etc. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Conflicts: drivers/net/ethernet/microchip/lan966x/lan966x_main.c - Driver not present in RHEL 9. Signed-off-by: Michal Schmidt <mschmidt@redhat.com>	2024-10-03 17:59:39 +02:00
Michal Schmidt	8e7994801b	netlink: introduce type-checking attribute iteration JIRA: https://issues.redhat.com/browse/RHEL-57750 commit e8058a49e67fe7bc7e4a0308851a3ca3a6d2e45d Author: Johannes Berg <johannes.berg@intel.com> Date: Thu Mar 28 20:31:45 2024 +0100 netlink: introduce type-checking attribute iteration There are, especially with multi-attr arrays, many cases of needing to iterate all attributes of a specific type in a netlink message or a nested attribute. Add specific macros to support that case. Also convert many instances using this spatch: @@ iterator nla_for_each_attr; iterator name nla_for_each_attr_type; identifier nla; expression head, len, rem; expression ATTR; type T; identifier x; @@ -nla_for_each_attr(nla, head, len, rem) +nla_for_each_attr_type(nla, ATTR, head, len, rem) { <... T x; ...> -if (nla_type(nla) == ATTR) { ... -} } @@ identifier nla; iterator nla_for_each_nested; iterator name nla_for_each_nested_type; expression attr, rem; expression ATTR; type T; identifier x; @@ -nla_for_each_nested(nla, attr, rem) +nla_for_each_nested_type(nla, ATTR, attr, rem) { <... T x; ...> -if (nla_type(nla) == ATTR) { ... -} } @@ iterator nla_for_each_attr; iterator name nla_for_each_attr_type; identifier nla; expression head, len, rem; expression ATTR; type T; identifier x; @@ -nla_for_each_attr(nla, head, len, rem) +nla_for_each_attr_type(nla, ATTR, head, len, rem) { <... T x; ...> -if (nla_type(nla) != ATTR) continue; ... } @@ identifier nla; iterator nla_for_each_nested; iterator name nla_for_each_nested_type; expression attr, rem; expression ATTR; type T; identifier x; @@ -nla_for_each_nested(nla, attr, rem) +nla_for_each_nested_type(nla, ATTR, attr, rem) { <... T x; ...> -if (nla_type(nla) != ATTR) continue; ... } Although I had to undo one bad change this made, and I also adjusted some other code for whitespace and to use direct variable initialization now. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20240328203144.b5a6c895fb80.I1869b44767379f204998ff44dd239803f39c23e0@changeid Signed-off-by: Jakub Kicinski <kuba@kernel.org> Conflicts: drivers/net/ethernet/netronome/nfp/nfp_net_common.c - The driver lacks .ndo_bridge_setlink implementation in RHEL 9. net/core/bpf_sk_storage.c - Missing commit bcc29b7f5af6 ("bpf: Add length check for SK_DIAG_BPF_STORAGE_REQ_MAP_FD parsing") Signed-off-by: Michal Schmidt <mschmidt@redhat.com>	2024-10-01 12:19:13 +02:00
Ivan Vecera	c1b0641934	net: remove dev_base_lock from do_setlink() JIRA: https://issues.redhat.com/browse/RHEL-59100 commit 2dd4d828d648e101aaf19326afcdfee8667cb185 Author: Eric Dumazet <edumazet@google.com> Date: Tue Feb 13 06:32:43 2024 +0000 net: remove dev_base_lock from do_setlink() We hold RTNL here, and dev->link_mode readers already are using READ_ONCE(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ivan Vecera <ivecera@redhat.com>	2024-09-17 12:17:18 +02:00
Ivan Vecera	2e6db4aa04	net: add netdev_set_operstate() helper JIRA: https://issues.redhat.com/browse/RHEL-59100 commit 6a2968ee1ee2cc6fce30f6f5724442b34b1483b3 Author: Eric Dumazet <edumazet@google.com> Date: Tue Feb 13 06:32:42 2024 +0000 net: add netdev_set_operstate() helper dev_base_lock is going away, add netdev_set_operstate() helper so that hsr does not have to know core internals. Remove dev_base_lock acquisition from rfc2863_policy() v3: use an "unsigned int" for dev->operstate, so that try_cmpxchg() can work on all arches. ( https://lore.kernel.org/oe-kbuild-all/202402081918.OLyGaea3-lkp@intel.com/ ) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ivan Vecera <ivecera@redhat.com>	2024-09-17 12:17:17 +02:00
Ivan Vecera	116bf3b894	net-sysfs: convert dev->operstate reads to lockless ones JIRA: https://issues.redhat.com/browse/RHEL-59100 commit 004d138364fd10dd5ff8ceb54cfdc2d792a7b338 Author: Eric Dumazet <edumazet@google.com> Date: Tue Feb 13 06:32:39 2024 +0000 net-sysfs: convert dev->operstate reads to lockless ones operstate_show() can omit dev_base_lock acquisition only to read dev->operstate. Annotate accesses to dev->operstate. Writers still acquire dev_base_lock for mutual exclusion. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ivan Vecera <ivecera@redhat.com>	2024-09-17 12:17:15 +02:00
Ivan Vecera	218f188cbf	dev: annotate accesses to dev->link JIRA: https://issues.redhat.com/browse/RHEL-59100 commit a6473fe9b623f6667af72d972b87cd9a5ff87e21 Author: Eric Dumazet <edumazet@google.com> Date: Tue Feb 13 06:32:35 2024 +0000 dev: annotate accesses to dev->link Following patch will read dev->link locklessly, annotate the write from do_setlink(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ivan Vecera <ivecera@redhat.com>	2024-09-17 12:17:12 +02:00
Ivan Vecera	4863bafaf6	net: core: synchronize link-watch when carrier is queried JIRA: https://issues.redhat.com/browse/RHEL-59100 commit facd15dfd69122042502d99ab8c9f888b48ee994 Author: Johannes Berg <johannes.berg@intel.com> Date: Mon Dec 4 21:47:07 2023 +0100 net: core: synchronize link-watch when carrier is queried There are multiple ways to query for the carrier state: through rtnetlink, sysfs, and (possibly) ethtool. Synchronize linkwatch work before these operations so that we don't have a situation where userspace queries the carrier state between the driver's carrier off->on transition and linkwatch running and expects it to work, when really (at least) TX cannot work until linkwatch has run. I previously posted a longer explanation of how this applies to wireless [1] but with this wireless can simply query the state before sending data, to ensure the kernel is ready for it. [1] https://lore.kernel.org/all/346b21d87c69f817ea3c37caceb34f1f56255884.camel@sipsolutions.net/ Signed-off-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231204214706.303c62768415.I1caedccae72ee5a45c9085c5eb49c145ce1c0dd5@changeid Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Ivan Vecera <ivecera@redhat.com>	2024-09-17 12:17:10 +02:00
Davide Caratti	1809a32ff2	rtnetlink: Correct nested IFLA_VF_VLAN_LIST attribute validation JIRA: https://issues.redhat.com/browse/RHEL-39715 CVE: CVE-2024-36017 Upstream Status: net.git commit 1aec77b2bb2ed1db0f5efc61c4c1ca3813307489 commit 1aec77b2bb2ed1db0f5efc61c4c1ca3813307489 Author: Roded Zats <rzats@paloaltonetworks.com> Date: Thu May 2 18:57:51 2024 +0300 rtnetlink: Correct nested IFLA_VF_VLAN_LIST attribute validation Each attribute inside a nested IFLA_VF_VLAN_LIST is assumed to be a struct ifla_vf_vlan_info so the size of such attribute needs to be at least of sizeof(struct ifla_vf_vlan_info) which is 14 bytes. The current size validation in do_setvfinfo is against NLA_HDRLEN (4 bytes) which is less than sizeof(struct ifla_vf_vlan_info) so this validation is not enough and a too small attribute might be cast to a struct ifla_vf_vlan_info, this might result in an out of bands read access when accessing the saved (casted) entry in ivvl. Fixes: `79aab093a0` ("net: Update API for VF vlan protocol 802.1ad support") Signed-off-by: Roded Zats <rzats@paloaltonetworks.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20240502155751.75705-1-rzats@paloaltonetworks.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Davide Caratti <dcaratti@redhat.com>	2024-06-20 11:16:44 +02:00
Lucas Zampieri	a1c1d84297	Merge: rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4431 JIRA: https://issues.redhat.com/browse/RHEL-36874 CVE: CVE-2024-27414 Upstream Status: all maiinline in net.git Conflicts: None Tested: boot-tested only Signed-off-by: Davide Caratti <dcaratti@redhat.com> Approved-by: Antoine Tenart <atenart@redhat.com> Approved-by: Xin Long <lxin@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Lucas Zampieri <lzampier@redhat.com>	2024-06-19 18:25:40 +00:00
Lucas Zampieri	1cc33b9d3b	Merge: CNB95: bridge: update bridge core to upstream v6.8 MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4261 JIRA: https://issues.redhat.com/browse/RHEL-36219 Depends: !4249 Tested: using existing bridge self-tests Commits: ``` 29cfb2aaa442 ("bridge: Add backup nexthop ID support") b408453053fb ("selftests: net: Add bridge backup port and backup nexthop ID test") cbf51acbc5d5 ("net: bridge: Set BR_FDB_ADDED_BY_USER early in fdb_add_entry") bdb4dfda3b41 ("net: bridge: Track and limit dynamically learned FDB entries") ddd1ad68826d ("net: bridge: Add netlink knobs for number / max learned FDB entries") 19297c3ab23c ("net: bridge: Set strict_start_type for br_policy") 6f84090333bb ("selftests: forwarding: bridge_fdb_learning_limit: Add a new selftest") ee6f05dcd672 ("br_netfilter: use single forward hook for ip and arp") b9109b5b77f0 ("bridge: mcast: Dump MDB entries even when snooping is disabled") 1b6d993509c1 ("bridge: mcast: Account for missing attributes") 62ef9cba98a2 ("bridge: mcast: Factor out a helper for PG entry size calculation") 6d0259dd6c53 ("bridge: mcast: Rename MDB entry get function") ff97d2a956a1 ("vxlan: mdb: Adjust function arguments") 14c32a46d992 ("vxlan: mdb: Factor out a helper for remote entry size calculation") 68b380a395a7 ("bridge: mcast: Add MDB get support") 32d9673e96dc ("vxlan: mdb: Add MDB get support") ddd17a54e692 ("rtnetlink: Add MDB get support") e8bba9e83c88 ("selftests: bridge_mdb: Use MDB get instead of dump") 0514dd05939a ("selftests: vxlan_mdb: Use MDB get instead of dump") 6808918343a8 ("net: bridge: fill in MODULE_DESCRIPTION()") e8a4195d843f ("docs: bridge: update doc format to rst") 8ebe06611666 ("net: bridge: add document for IFLA_BR enum") 8c4bafdb01cc ("net: bridge: add document for IFLA_BRPORT enum") bcc1f84e4d34 ("docs: bridge: Add kAPI/uAPI fields") 567d2608209f ("docs: bridge: add STP doc") 041a6ac4bf79 ("docs: bridge: add VLAN doc") 75ceac88efb8 ("docs: bridge: add multicast doc") 3c37f17d6ca9 ("docs: bridge: add switchdev doc") 1b1a4c7e82ae ("docs: bridge: add netfilter doc") d2afc2cd7f1f ("docs: bridge: add other features") 25ae948b4478 ("selftests/net: add lib.sh") 4624a78c18c6 ("selftests/net: convert test_bridge_backup_port.sh to run it in unique namespace") 312abe3d93a3 ("selftests/net: convert test_bridge_neigh_suppress.sh to run it in unique namespace") e37a11fca418 ("bridge: add MDB state mask uAPI attribute") a6acb535afb2 ("bridge: mdb: Add MDB bulk deletion support") 4cde72fead4c ("vxlan: mdb: Add MDB bulk deletion support") bd2dcb94c81e ("selftests: bridge_mdb: Add MDB bulk deletion test") c3e87a7fcd0b ("selftests: vxlan_mdb: Add MDB bulk deletion test") c2b2ee36250d ("bridge: cfm: fix enum typo in br_cc_ccm_tx_parse") 2114e83381d3 ("selftests: forwarding: Avoid failures to source net/lib.sh") 49078c1b80b6 ("selftests: forwarding: Remove executable bits from lib.sh") fc836129f708 ("selftests/net/lib: update busywait timeout value") f5c3eb4b7251 ("bridge: mcast: fix disabled snooping after long uptime") b40f873a7c80 ("selftests: net: Add missing matchall classifier") 96cd5ac4c0e6 ("selftests: forwarding: List helper scripts in TEST_FILES Makefile variable") 38ee0cb2a2e2 ("selftests: net: Fix bridge backup port test flakiness") 93590849a05e ("selftests: forwarding: Fix layer 2 miss test flakiness") 7399e2ce4d42 ("selftests: forwarding: Fix bridge MDB test flakiness") dd6b34589441 ("selftests: forwarding: Suppress grep warnings") f97f1fcc9690 ("selftests: forwarding: Fix bridge locked port test flakiness") dc489f86257c ("net: bridge: switchdev: Skip MDB replays of deferred events on offload") f7a70d650b0b ("net: bridge: switchdev: Ensure deferred event delivery on unoffload") 9adcac650618 ("netlink: specs: Add missing bridge linkinfo attrs") 83e93942796d ("selftests/net/lib: no need to record ns name if it already exist") ``` Signed-off-by: Ivan Vecera <ivecera@redhat.com> Approved-by: Hangbin Liu <haliu@redhat.com> Approved-by: Kamal Heib <kheib@redhat.com> Approved-by: José Ignacio Tornos Martínez <jtornosm@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Lucas Zampieri <lzampier@redhat.com>	2024-06-10 13:42:40 +00:00
Davide Caratti	db1c39363a	rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back JIRA: https://issues.redhat.com/browse/RHEL-36874 CVE: CVE-2024-27414 Upstream Status: net.git commit 743ad091fb46e622f1b690385bb15e3cd3daf874 commit 743ad091fb46e622f1b690385bb15e3cd3daf874 Author: Lin Ma <linma@zju.edu.cn> Date: Tue Feb 27 20:11:28 2024 +0800 rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back In the commit d73ef2d69c0d ("rtnetlink: let rtnl_bridge_setlink checks IFLA_BRIDGE_MODE length"), an adjustment was made to the old loop logic in the function `rtnl_bridge_setlink` to enable the loop to also check the length of the IFLA_BRIDGE_MODE attribute. However, this adjustment removed the `break` statement and led to an error logic of the flags writing back at the end of this function. if (have_flags) memcpy(nla_data(attr), &flags, sizeof(flags)); // attr should point to IFLA_BRIDGE_FLAGS NLA !!! Before the mentioned commit, the `attr` is granted to be IFLA_BRIDGE_FLAGS. However, this is not necessarily true fow now as the updated loop will let the attr point to the last NLA, even an invalid NLA which could cause overflow writes. This patch introduces a new variable `br_flag` to save the NLA pointer that points to IFLA_BRIDGE_FLAGS and uses it to resolve the mentioned error logic. Fixes: d73ef2d69c0d ("rtnetlink: let rtnl_bridge_setlink checks IFLA_BRIDGE_MODE length") Signed-off-by: Lin Ma <linma@zju.edu.cn> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20240227121128.608110-1-linma@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Davide Caratti <dcaratti@redhat.com>	2024-06-06 11:14:39 +02:00
Lucas Zampieri	71cdbf2b82	Merge: net: dst: Improve concurrency performance of dst_entry MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3031 JIRA: https://issues.redhat.com/browse/RHEL-15695 Tested: verified performance improvement with memcached/memtier_bench with one thread per core each. The patches improve the performance of parallel local connections. Because the receive side of the connection is handled on the same cpu as the data was sent for local connections, contention and false sharing was observed between the sending core and the receiving core. Signed-off-by: Felix Maurer <fmaurer@redhat.com> Approved-by: Antoine Tenart <atenart@redhat.com> Approved-by: Paolo Abeni <pabeni@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Lucas Zampieri <lzampier@redhat.com>	2024-06-03 19:53:47 +00:00
Felix Maurer	934e8cc341	net: dst: Switch to rcuref_t reference counting JIRA: https://issues.redhat.com/browse/RHEL-15695 commit bc9d3a9f2afca189a6ae40225b6985e3c775375e Author: Thomas Gleixner <tglx@linutronix.de> Date: Thu Mar 23 21:55:32 2023 +0100 net: dst: Switch to rcuref_t reference counting Under high contention dst_entry::__refcnt becomes a significant bottleneck. atomic_inc_not_zero() is implemented with a cmpxchg() loop, which goes into high retry rates on contention. Switch the reference count to rcuref_t which results in a significant performance gain. Rename the reference count member to __rcuref to reflect the change. The gain depends on the micro-architecture and the number of concurrent operations and has been measured in the range of +25% to +130% with a localhost memtier/memcached benchmark which amplifies the problem massively. Running the memtier/memcached benchmark over a real (1Gb) network connection the conversion on top of the false sharing fix for struct dst_entry::__refcnt results in a total gain in the 2%-5% range over the upstream baseline. Reported-by: Wangyang Guo <wangyang.guo@intel.com> Reported-by: Arjan Van De Ven <arjan.van.de.ven@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230307125538.989175656@linutronix.de Link: https://lore.kernel.org/r/20230323102800.215027837@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Felix Maurer <fmaurer@redhat.com>	2024-05-21 17:19:20 +02:00
Ivan Vecera	f90088128e	rtnetlink: Add MDB get support JIRA: https://issues.redhat.com/browse/RHEL-36219 commit ddd17a54e692bef1b646febf5242db10982e1965 Author: Ido Schimmel <idosch@nvidia.com> Date: Wed Oct 25 15:30:18 2023 +0300 rtnetlink: Add MDB get support Now that both the bridge and VXLAN drivers implement the MDB get net device operation, expose the functionality to user space by registering a handler for RTM_GETMDB messages. Derive the net device from the ifindex specified in the ancillary header and invoke its MDB get NDO. Note that unlike other get handlers, the allocation of the skb containing the response is not performed in the common rtnetlink code as the size is variable and needs to be determined by the respective driver. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ivan Vecera <ivecera@redhat.com>	2024-05-17 13:48:17 +02:00
Ivan Vecera	a83d6a5553	bridge: Add backup nexthop ID support JIRA: https://issues.redhat.com/browse/RHEL-36219 commit 29cfb2aaa4425a608651a05b9b875bc445394443 Author: Ido Schimmel <idosch@nvidia.com> Date: Mon Jul 17 11:12:28 2023 +0300 bridge: Add backup nexthop ID support Add a new bridge port attribute that allows attaching a nexthop object ID to an skb that is redirected to a backup bridge port with VLAN tunneling enabled. Specifically, when redirecting a known unicast packet, read the backup nexthop ID from the bridge port that lost its carrier and set it in the bridge control block of the skb before forwarding it via the backup port. Note that reading the ID from the bridge port should not result in a cache miss as the ID is added next to the 'backup_port' field that was already accessed. After this change, the 'state' field still stays on the first cache line, together with other data path related fields such as 'flags and 'vlgrp': struct net_bridge_port { struct net_bridge * br; /* 0 8 / struct net_device dev; /* 8 8 / netdevice_tracker dev_tracker; / 16 0 / struct list_head list; / 16 16 / long unsigned int flags; / 32 8 / struct net_bridge_vlan_group vlgrp; /* 40 8 / struct net_bridge_port backup_port; /* 48 8 / u32 backup_nhid; / 56 4 / u8 priority; / 60 1 / u8 state; / 61 1 / u16 port_no; / 62 2 / / --- cacheline 1 boundary (64 bytes) --- */ [...] } __attribute__((__aligned__(8))); When forwarding an skb via a bridge port that has VLAN tunneling enabled, check if the backup nexthop ID stored in the bridge control block is valid (i.e., not zero). If so, instead of attaching the pre-allocated metadata (that only has the tunnel key set), allocate a new metadata, set both the tunnel key and the nexthop object ID and attach it to the skb. By default, do not dump the new attribute to user space as a value of zero is an invalid nexthop object ID. The above is useful for EVPN multihoming. When one of the links composing an Ethernet Segment (ES) fails, traffic needs to be redirected towards the host via one of the other ES peers. For example, if a host is multihomed to three different VTEPs, the backup port of each ES link needs to be set to the VXLAN device and the backup nexthop ID needs to point to an FDB nexthop group that includes the IP addresses of the other two VTEPs. The VXLAN driver will extract the ID from the metadata of the redirected skb, calculate its flow hash and forward it towards one of the other VTEPs. If the ID does not exist, or represents an invalid nexthop object, the VXLAN driver will drop the skb. This relieves the bridge driver from the need to validate the ID. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ivan Vecera <ivecera@redhat.com>	2024-05-17 13:47:59 +02:00
Petr Oros	149ecfe407	dpll: move all dpll<>netdev helpers to dpll code JIRA: https://issues.redhat.com/browse/RHEL-32098 Conflicts: - drivers/net/ethernet/mellanox/mlx5/core/dpll.c: chunk omitted due to missing 496fd0a26bbf73 ("mlx5: Implement SyncE support using DPLL infrastructure") Upstream commit(s): commit 289e922582af5b4721ba02e86bde4d9ba918158a Author: Jakub Kicinski <kuba@kernel.org> Date: Mon Mar 4 17:35:32 2024 -0800 dpll: move all dpll<>netdev helpers to dpll code Older versions of GCC really want to know the full definition of the type involved in rcu_assign_pointer(). struct dpll_pin is defined in a local header, net/core can't reach it. Move all the netdev <> dpll code into dpll, where the type is known. Otherwise we'd need multiple function calls to jump between the compilation units. This is the same problem the commit under fixes was trying to address, but with rcu_assign_pointer() not rcu_dereference(). Some of the exports are not needed, networking core can't be a module, we only need exports for the helpers used by drivers. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/all/35a869c8-52e8-177-1d4d-e57578b99b6@linux-m68k.org/ Fixes: 640f41ed33b5 ("dpll: fix build failure due to rcu_dereference_check() on unknown type") Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240305013532.694866-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Petr Oros <poros@redhat.com>	2024-05-16 20:47:06 +02:00
Petr Oros	866233764b	net: add rcu safety to rtnl_prop_list_size() JIRA: https://issues.redhat.com/browse/RHEL-30145 Upstream commit(s): commit 9f30831390ede02d9fcd54fd9ea5a585ab649f4a Author: Eric Dumazet <edumazet@google.com> Date: Fri Feb 9 18:12:48 2024 +0000 net: add rcu safety to rtnl_prop_list_size() rtnl_prop_list_size() can be called while alternative names are added or removed concurrently. if_nlmsg_size() / rtnl_calcit() can indeed be called without RTNL held. Use explicit RCU protection to avoid UAF. Fixes: `88f4fb0c74` ("net: rtnetlink: put alternative names to getlink message") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240209181248.96637-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Petr Oros <poros@redhat.com>	2024-04-26 17:16:11 +02:00
Petr Oros	74ee6c85d9	rtnetlink: bridge: Enable MDB bulk deletion JIRA: https://issues.redhat.com/browse/RHEL-30145 Upstream commit(s): commit 2601e9c4b1176253e33025ca24e56ed67c8d434f Author: Ido Schimmel <idosch@nvidia.com> Date: Sun Dec 17 10:32:42 2023 +0200 rtnetlink: bridge: Enable MDB bulk deletion Now that both the common code as well as individual drivers support MDB bulk deletion, allow user space to make such requests. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Petr Oros <poros@redhat.com>	2024-04-26 17:16:10 +02:00
Petr Oros	0dc57cd5b3	rtnetlink: bridge: Invoke MDB bulk deletion when needed JIRA: https://issues.redhat.com/browse/RHEL-30145 Upstream commit(s): commit d8e81f131178dad603c6817421056030ed2f4ac2 Author: Ido Schimmel <idosch@nvidia.com> Date: Sun Dec 17 10:32:39 2023 +0200 rtnetlink: bridge: Invoke MDB bulk deletion when needed Invoke the new MDB bulk deletion device operation when the 'NLM_F_BULK' flag is set in the netlink message header. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Petr Oros <poros@redhat.com>	2024-04-26 17:16:10 +02:00
Petr Oros	3574764a85	rtnetlink: bridge: Use a different policy for MDB bulk delete JIRA: https://issues.redhat.com/browse/RHEL-30145 Upstream commit(s): commit e0cd06f7fcb51b8acd6e68e64cc805be1283de9d Author: Ido Schimmel <idosch@nvidia.com> Date: Sun Dec 17 10:32:37 2023 +0200 rtnetlink: bridge: Use a different policy for MDB bulk delete For MDB bulk delete we will need to validate 'MDBA_SET_ENTRY' differently compared to regular delete. Specifically, allow the ifindex to be zero (in case not filtering on bridge port) and force the address to be zero as bulk delete based on address is not supported. Do that by introducing a new policy and choosing the correct policy based on the presence of the 'NLM_F_BULK' flag in the netlink message header. Use nlmsg_parse() for strict validation. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Petr Oros <poros@redhat.com>	2024-04-26 17:16:10 +02:00
Petr Oros	2e71d91d3c	net: rtnl: use rcu_replace_pointer_rtnl in rtnl_unregister_* JIRA: https://issues.redhat.com/browse/RHEL-30145 Upstream commit(s): commit 174523479aae31b17c043de127c87ff2aef3d54e Author: Pedro Tammela <pctammela@mojatatu.com> Date: Fri Dec 15 14:57:11 2023 -0300 net: rtnl: use rcu_replace_pointer_rtnl in rtnl_unregister_* With the introduction of the rcu_replace_pointer_rtnl helper, cleanup the rtnl_unregister_* functions to use the helper instead of open coding it. Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Petr Oros <poros@redhat.com>	2024-04-26 17:16:08 +02:00
Petr Oros	147089bf66	rtnetlink: introduce nlmsg_new_large and use it in rtnl_getlink JIRA: https://issues.redhat.com/browse/RHEL-30145 Upstream commit(s): commit ac40916a3f7243efbe6e129ebf495b5c33a3adfe Author: Li RongQing <lirongqing@baidu.com> Date: Wed Nov 15 20:01:08 2023 +0800 rtnetlink: introduce nlmsg_new_large and use it in rtnl_getlink if a PF has 256 or more VFs, ip link command will allocate an order 3 memory or more, and maybe trigger OOM due to memory fragment, the VFs needed memory size is computed in rtnl_vfinfo_size. so introduce nlmsg_new_large which calls netlink_alloc_large_skb in which vmalloc is used for large memory, to avoid the failure of allocating memory ip invoked oom-killer: gfp_mask=0xc2cc0(GFP_KERNEL\|__GFP_NOWARN\|\ __GFP_COMP\|__GFP_NOMEMALLOC), order=3, oom_score_adj=0 CPU: 74 PID: 204414 Comm: ip Kdump: loaded Tainted: P OE Call Trace: dump_stack+0x57/0x6a dump_header+0x4a/0x210 oom_kill_process+0xe4/0x140 out_of_memory+0x3e8/0x790 __alloc_pages_slowpath.constprop.116+0x953/0xc50 __alloc_pages_nodemask+0x2af/0x310 kmalloc_large_node+0x38/0xf0 __kmalloc_node_track_caller+0x417/0x4d0 __kmalloc_reserve.isra.61+0x2e/0x80 __alloc_skb+0x82/0x1c0 rtnl_getlink+0x24f/0x370 rtnetlink_rcv_msg+0x12c/0x350 netlink_rcv_skb+0x50/0x100 netlink_unicast+0x1b2/0x280 netlink_sendmsg+0x355/0x4a0 sock_sendmsg+0x5b/0x60 ____sys_sendmsg+0x1ea/0x250 ___sys_sendmsg+0x88/0xd0 __sys_sendmsg+0x5e/0xa0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f95a65a5b70 Cc: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Link: https://lore.kernel.org/r/20231115120108.3711-1-lirongqing@baidu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Petr Oros <poros@redhat.com>	2024-04-26 17:16:06 +02:00
Petr Oros	e45e8ed090	net: Handle bulk delete policy in bridge driver JIRA: https://issues.redhat.com/browse/RHEL-30145 Upstream commit(s): commit 38985e8c278b82e6d4d62d4acd57c761cc23ce63 Author: Amit Cohen <amcohen@nvidia.com> Date: Mon Oct 9 13:06:08 2023 +0300 net: Handle bulk delete policy in bridge driver The merge commit 92716869375b ("Merge branch 'br-flush-filtering'") added support for FDB flushing in bridge driver. The following patches will extend VXLAN driver to support FDB flushing as well. The netlink message for bulk delete is shared between the drivers. With the existing implementation, there is no way to prevent user from flushing with attributes that are not supported per driver. For example, when VNI will be added, user will not get an error for flush FDB entries in bridge with VNI, although this attribute is not relevant for bridge. As preparation for support of FDB flush in VXLAN driver, move the policy to be handled in bridge driver, later a new policy for VXLAN will be added in VXLAN driver. Do not pass 'vid' as part of ndo_fdb_del_bulk(), as this field is relevant only for bridge. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Petr Oros <poros@redhat.com>	2024-04-26 17:16:01 +02:00
Ivan Vecera	7d341ab9ea	net: validate veth and vxcan peer ifindexes JIRA: https://issues.redhat.com/browse/RHEL-30656 commit f534f6581ec084fe94d6759f7672bd009794b07e Author: Jakub Kicinski <kuba@kernel.org> Date: Fri Aug 18 18:26:02 2023 -0700 net: validate veth and vxcan peer ifindexes veth and vxcan need to make sure the ifindexes of the peer are not negative, core does not validate this. Using iproute2 with user-space-level checking removed: Before: # ./ip link add index 10 type veth peer index -1 # ip link show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000 link/ether 52:54:00:74:b2:03 brd ff:ff:ff:ff:ff:ff 10: veth1@veth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 8a:90:ff:57:6d:5d brd ff:ff:ff:ff:ff:ff -1: veth0@veth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether ae:ed:18:e6:fa:7f brd ff:ff:ff:ff:ff:ff Now: $ ./ip link add index 10 type veth peer index -1 Error: ifindex can't be negative. This problem surfaced in net-next because an explicit WARN() was added, the root cause is older. Fixes: `e6f8f1a739` ("veth: Allow to create peer link with given ifindex") Fixes: `a8f820a380` ("can: add Virtual CAN Tunnel driver (vxcan)") Reported-by: syzbot+5ba06978f34abb058571@syzkaller.appspotmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ivan Vecera <ivecera@redhat.com>	2024-04-10 09:19:31 +02:00

1 2 3 4 5 ...

798 Commits