Centos-kernel-stream-9

Commit Graph

Author	SHA1	Message	Date
Benjamin Coddington	82eb8441b8	net: add a refcount tracker for kernel sockets JIRA: https://issues.redhat.com/browse/RHEL-73723 Conflicts: the __netns_tracker_alloc interface has been updated upstream b6d7c0eb2dcbd, but in RHEL the hunk for notrefcnt_tracker was not included (See RHEL commit `3b0a87ad0e`, RHEL-24101). We merge it in here. Also, we've dropped the rds hunk, as that seems unmantained in RHEL and is missing the path where that hunk should operate. commit 0cafd77dcd032d1687efaba5598cf07bce85997f Author: Eric Dumazet <edumazet@google.com> Date: Thu Oct 20 23:20:18 2022 +0000 net: add a refcount tracker for kernel sockets Commit ffa84b5ffb37 ("net: add netns refcount tracker to struct sock") added a tracker to sockets, but did not track kernel sockets. We still have syzbot reports hinting about netns being destroyed while some kernel TCP sockets had not been dismantled. This patch tracks kernel sockets, and adds a ref_tracker_dir_print() call to net_free() right before the netns is freed. Normally, each layer is responsible for properly releasing its kernel sockets before last call to net_free(). This debugging facility is enabled with CONFIG_NET_NS_REFCNT_TRACKER=y Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Tested-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Benjamin Coddington <bcodding@redhat.com>	2025-01-31 06:45:48 -05:00
Petr Oros	c2f50d8bdf	netlink: fix false positive warning in extack during dumps JIRA: https://issues.redhat.com/browse/RHEL-57756 Upstream commit(s): commit 3bf39fa849ab8ed52abb6715922e6102d3df9f97 Author: Jakub Kicinski <kuba@kernel.org> Date: Tue Nov 19 14:44:31 2024 -0800 netlink: fix false positive warning in extack during dumps Commit under fixes extended extack reporting to dumps. It works under normal conditions, because extack errors are usually reported during ->start() or the first ->dump(), it's quite rare that the dump starts okay but fails later. If the dump does fail later, however, the input skb will already have the initiating message pulled, so checking if bad attr falls within skb->data will fail. Switch the check to using nlh, which is always valid. syzbot found a way to hit that scenario by filling up the receive queue. In this case we initiate a dump but don't call ->dump() until there is read space for an skb. WARNING: CPU: 1 PID: 5845 at net/netlink/af_netlink.c:2210 netlink_ack_tlv_fill+0x1a8/0x560 net/netlink/af_netlink.c:2209 RIP: 0010:netlink_ack_tlv_fill+0x1a8/0x560 net/netlink/af_netlink.c:2209 Call Trace: <TASK> netlink_dump_done+0x513/0x970 net/netlink/af_netlink.c:2250 netlink_dump+0x91f/0xe10 net/netlink/af_netlink.c:2351 netlink_recvmsg+0x6bb/0x11d0 net/netlink/af_netlink.c:1983 sock_recvmsg_nosec net/socket.c:1051 [inline] sock_recvmsg+0x22f/0x280 net/socket.c:1073 __sys_recvfrom+0x246/0x3d0 net/socket.c:2267 __do_sys_recvfrom net/socket.c:2285 [inline] __se_sys_recvfrom net/socket.c:2281 [inline] __x64_sys_recvfrom+0xde/0x100 net/socket.c:2281 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7ff37dd17a79 Reported-by: syzbot+d4373fa8042c06cefa84@syzkaller.appspotmail.com Fixes: 8af4f60472fc ("netlink: support all extack types in dumps") Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20241119224432.1713040-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Petr Oros <poros@redhat.com>	2024-12-10 10:37:56 +01:00
Petr Oros	d2063b54dd	netlink: terminate outstanding dump on socket close JIRA: https://issues.redhat.com/browse/RHEL-57756 CVE: CVE-2024-53140 Upstream commit(s): commit 1904fb9ebf911441f90a68e96b22aa73e4410505 Author: Jakub Kicinski <kuba@kernel.org> Date: Tue Nov 5 17:52:34 2024 -0800 netlink: terminate outstanding dump on socket close Netlink supports iterative dumping of data. It provides the families the following ops: - start - (optional) kicks off the dumping process - dump - actual dump helper, keeps getting called until it returns 0 - done - (optional) pairs with .start, can be used for cleanup The whole process is asynchronous and the repeated calls to .dump don't actually happen in a tight loop, but rather are triggered in response to recvmsg() on the socket. This gives the user full control over the dump, but also means that the user can close the socket without getting to the end of the dump. To make sure .start is always paired with .done we check if there is an ongoing dump before freeing the socket, and if so call .done. The complication is that sockets can get freed from BH and .done is allowed to sleep. So we use a workqueue to defer the call, when needed. Unfortunately this does not work correctly. What we defer is not the cleanup but rather releasing a reference on the socket. We have no guarantee that we own the last reference, if someone else holds the socket they may release it in BH and we're back to square one. The whole dance, however, appears to be unnecessary. Only the user can interact with dumps, so we can clean up when socket is closed. And close always happens in process context. Some async code may still access the socket after close, queue notification skbs to it etc. but no dumps can start, end or otherwise make progress. Delete the workqueue and flush the dump state directly from the release handler. Note that further cleanup is possible in -next, for instance we now always call .done before releasing the main module reference, so dump doesn't have to take a reference of its own. Reported-by: syzkaller <syzkaller@googlegroups.com> Fixes: `ed5d7788a9` ("netlink: Do not schedule work from sk_destruct") Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241106015235.2458807-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Petr Oros <poros@redhat.com>	2024-12-10 10:37:56 +01:00
Petr Oros	485d2c677f	genetlink: hold RCU in genlmsg_mcast() JIRA: https://issues.redhat.com/browse/RHEL-57756 Upstream commit(s): commit 56440d7ec28d60f8da3bfa09062b3368ff9b16db Author: Eric Dumazet <edumazet@google.com> Date: Fri Oct 11 17:12:17 2024 +0000 genetlink: hold RCU in genlmsg_mcast() While running net selftests with CONFIG_PROVE_RCU_LIST=y I saw one lockdep splat [1]. genlmsg_mcast() uses for_each_net_rcu(), and must therefore hold RCU. Instead of letting all callers guard genlmsg_multicast_allns() with a rcu_read_lock()/rcu_read_unlock() pair, do it in genlmsg_mcast(). This also means the @flags parameter is useless, we need to always use GFP_ATOMIC. [1] [10882.424136] ============================= [10882.424166] WARNING: suspicious RCU usage [10882.424309] 6.12.0-rc2-virtme #1156 Not tainted [10882.424400] ----------------------------- [10882.424423] net/netlink/genetlink.c:1940 RCU-list traversed in non-reader section!! [10882.424469] other info that might help us debug this: [10882.424500] rcu_scheduler_active = 2, debug_locks = 1 [10882.424744] 2 locks held by ip/15677: [10882.424791] #0: ffffffffb6b491b0 (cb_lock){++++}-{3:3}, at: genl_rcv (net/netlink/genetlink.c:1219) [10882.426334] #1: ffffffffb6b49248 (genl_mutex){+.+.}-{3:3}, at: genl_rcv_msg (net/netlink/genetlink.c:61 net/netlink/genetlink.c:57 net/netlink/genetlink.c:1209) [10882.426465] stack backtrace: [10882.426805] CPU: 14 UID: 0 PID: 15677 Comm: ip Not tainted 6.12.0-rc2-virtme #1156 [10882.426919] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [10882.427046] Call Trace: [10882.427131] <TASK> [10882.427244] dump_stack_lvl (lib/dump_stack.c:123) [10882.427335] lockdep_rcu_suspicious (kernel/locking/lockdep.c:6822) [10882.427387] genlmsg_multicast_allns (net/netlink/genetlink.c:1940 (discriminator 7) net/netlink/genetlink.c:1977 (discriminator 7)) [10882.427436] l2tp_tunnel_notify.constprop.0 (net/l2tp/l2tp_netlink.c:119) l2tp_netlink [10882.427683] l2tp_nl_cmd_tunnel_create (net/l2tp/l2tp_netlink.c:253) l2tp_netlink [10882.427748] genl_family_rcv_msg_doit (net/netlink/genetlink.c:1115) [10882.427834] genl_rcv_msg (net/netlink/genetlink.c:1195 net/netlink/genetlink.c:1210) [10882.427877] ? __pfx_l2tp_nl_cmd_tunnel_create (net/l2tp/l2tp_netlink.c:186) l2tp_netlink [10882.427927] ? __pfx_genl_rcv_msg (net/netlink/genetlink.c:1201) [10882.427959] netlink_rcv_skb (net/netlink/af_netlink.c:2551) [10882.428069] genl_rcv (net/netlink/genetlink.c:1220) [10882.428095] netlink_unicast (net/netlink/af_netlink.c:1332 net/netlink/af_netlink.c:1357) [10882.428140] netlink_sendmsg (net/netlink/af_netlink.c:1901) [10882.428210] ____sys_sendmsg (net/socket.c:729 (discriminator 1) net/socket.c:744 (discriminator 1) net/socket.c:2607 (discriminator 1)) Fixes: `33f72e6f0c` ("l2tp : multicast notification to the registered listeners") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Chapman <jchapman@katalix.com> Cc: Tom Parkin <tparkin@katalix.com> Cc: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20241011171217.3166614-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Petr Oros <poros@redhat.com>	2024-12-10 10:37:56 +01:00
Petr Oros	3b136d4a05	net: netlink: Remove the dump_cb_mutex field from struct netlink_sock JIRA: https://issues.redhat.com/browse/RHEL-57756 Upstream commit(s): commit 18aaa82bd36ae3d4eaa3f1d1d8cf643e39f151cd Author: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Date: Thu Aug 22 09:03:20 2024 +0200 net: netlink: Remove the dump_cb_mutex field from struct netlink_sock Commit 5fbf57a937f4 ("net: netlink: remove the cb_mutex "injection" from netlink core") has removed the usage of the 'dump_cb_mutex' field from the struct netlink_sock. Remove the field itself now. It saves a few bytes in the structure. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Petr Oros <poros@redhat.com>	2024-12-10 10:37:56 +01:00
Petr Oros	210b1b6518	net: netlink: remove the cb_mutex "injection" from netlink core JIRA: https://issues.redhat.com/browse/RHEL-57756 Upstream commit(s): commit 5fbf57a937f418fe204f9dbb7735e91984f4ee6a Author: Jakub Kicinski <kuba@kernel.org> Date: Thu Jun 6 12:29:06 2024 -0700 net: netlink: remove the cb_mutex "injection" from netlink core Back in 2007, in commit `af65bdfce9` ("[NETLINK]: Switch cb_lock spinlock to mutex and allow to override it") netlink core was extended to allow subsystems to replace the dump mutex lock with its own lock. The mechanism was used by rtnetlink to take rtnl_lock but it isn't sufficiently flexible for other users. Over the 17 years since it was added no other user appeared. Since rtnetlink needs conditional locking now, and doesn't use it either, axe this feature complete. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Petr Oros <poros@redhat.com>	2024-12-10 10:37:55 +01:00
Petr Oros	c976657153	rtnetlink: move rtnl_lock handling out of af_netlink JIRA: https://issues.redhat.com/browse/RHEL-57756 Upstream commit(s): commit 5380d64f8d766576ac5c0f627418b2d0e1d2641f Author: Jakub Kicinski <kuba@kernel.org> Date: Thu Jun 6 12:29:05 2024 -0700 rtnetlink: move rtnl_lock handling out of af_netlink Now that we have an intermediate layer of code for handling rtnl-level netlink dump quirks, we can move the rtnl_lock taking there. For dump handlers with RTNL_FLAG_DUMP_SPLIT_NLM_DONE we can avoid taking rtnl_lock just to generate NLM_DONE, once again. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Petr Oros <poros@redhat.com>	2024-12-10 10:37:55 +01:00
Petr Oros	79e50be9d3	netlink: support all extack types in dumps JIRA: https://issues.redhat.com/browse/RHEL-57756 Upstream commit(s): commit 8af4f60472fce1f22db5068107b37bcc1a65eabd Author: Jakub Kicinski <kuba@kernel.org> Date: Fri Apr 19 19:35:41 2024 -0700 netlink: support all extack types in dumps Note that when this commit message refers to netlink dump it only means the actual dumping part, the parsing / dump start is handled by the same code as "doit". Commit `4a19edb60d` ("netlink: Pass extack to dump handlers") added support for returning extack messages from dump handlers, but left out other extack info, e.g. bad attribute. This used to be fine because until YNL we had little practical use for the machine readable attributes, and only messages were used in practice. YNL flips the preference 180 degrees, it's now much more useful to point to a bad attr with NL_SET_BAD_ATTR() than type an English message saying "attribute XYZ is $reason-why-bad". Support all of extack. The fact that extack only gets added if it fits remains unaddressed. Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240420023543.3300306-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Petr Oros <poros@redhat.com>	2024-12-10 10:37:53 +01:00
Petr Oros	8d84730795	netlink: move extack writing helpers JIRA: https://issues.redhat.com/browse/RHEL-57756 Upstream commit(s): commit 652332e3f1d6209dab372e0dfc7a5bbe209bf698 Author: Jakub Kicinski <kuba@kernel.org> Date: Fri Apr 19 19:35:40 2024 -0700 netlink: move extack writing helpers Next change will need them in netlink_dump_done(), pure move. Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240420023543.3300306-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Petr Oros <poros@redhat.com>	2024-12-10 10:37:53 +01:00
Petr Oros	388833a034	netlink: create a new header for internal genetlink symbols JIRA: https://issues.redhat.com/browse/RHEL-57756 Upstream commit(s): commit 5bc63d3a6f466add504f283d9f743f20ca9ec334 Author: Jakub Kicinski <kuba@kernel.org> Date: Fri Mar 29 10:57:08 2024 -0700 netlink: create a new header for internal genetlink symbols There are things in linux/genetlink.h which are only used under net/netlink/. Move them to a new local header. A new header with just 2 externs isn't great, but alternative would be to include af_netlink.h in genetlink.c which feels even worse. Link: https://lore.kernel.org/r/20240329175710.291749-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Petr Oros <poros@redhat.com>	2024-12-10 10:37:53 +01:00
Petr Oros	ebe2c68d7c	net/netlink: Add getsockopt support for NETLINK_LISTEN_ALL_NSID JIRA: https://issues.redhat.com/browse/RHEL-57755 Upstream commit(s): commit 8b6d307f4391c20cfd76bbb15f8b3784d36e0755 Author: Juntong Deng <juntong.deng@outlook.com> Date: Fri Mar 8 11:33:04 2024 +0000 net/netlink: Add getsockopt support for NETLINK_LISTEN_ALL_NSID Currently getsockopt does not support NETLINK_LISTEN_ALL_NSID, and we are unable to get the value of NETLINK_LISTEN_ALL_NSID socket option through getsockopt. This patch adds getsockopt support for NETLINK_LISTEN_ALL_NSID. Signed-off-by: Juntong Deng <juntong.deng@outlook.com> Link: https://lore.kernel.org/r/AM6PR03MB58482322B7B335308DA56FE599272@AM6PR03MB5848.eurprd03.prod.outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Petr Oros <poros@redhat.com>	2024-11-20 10:13:46 +01:00
Petr Oros	a6919b30a1	genetlink: fit NLMSG_DONE into same read() as families JIRA: https://issues.redhat.com/browse/RHEL-57755 Upstream commit(s): commit 87d381973e49404f658d6923a617932eeda9415f Author: Jakub Kicinski <kuba@kernel.org> Date: Sat Mar 2 21:24:08 2024 -0800 genetlink: fit NLMSG_DONE into same read() as families Make sure ctrl_fill_info() returns sensible error codes and propagate them out to netlink core. Let netlink core decide when to return skb->len and when to treat the exit as an error. Netlink core does better job at it, if we always return skb->len the core doesn't know when we're done dumping and NLMSG_DONE ends up in a separate read(). Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Petr Oros <poros@redhat.com>	2024-11-20 10:13:44 +01:00
Petr Oros	a811b8aeb9	netlink: handle EMSGSIZE errors in the core JIRA: https://issues.redhat.com/browse/RHEL-57755 Upstream commit(s): commit b5a899154aa94cc573db3ae1f61dabe7bfe8b579 Author: Jakub Kicinski <kuba@kernel.org> Date: Sat Mar 2 21:24:06 2024 -0800 netlink: handle EMSGSIZE errors in the core Eric points out that our current suggested way of handling EMSGSIZE errors ((err == -EMSGSIZE) ? skb->len : err) will break if we didn't fit even a single object into the buffer provided by the user. This should not happen for well behaved applications, but we can fix that, and free netlink families from dealing with that completely by moving error handling into the core. Let's assume from now on that all EMSGSIZE errors in dumps are because we run out of skb space. Families can now propagate the error nla_put_*() etc generated and not worry about any return value magic. If some family really wants to send EMSGSIZE to user space, assuming it generates the same error on the next dump iteration the skb->len should be 0, and user space should still see the EMSGSIZE. This should simplify families and prevent mistakes in return values which lead to DONE being forced into a separate recv() call as discovered by Ido some time ago. Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Petr Oros <poros@redhat.com>	2024-11-20 10:13:44 +01:00
Rado Vrbovsky	88c4382a10	Merge: thermal/intel_hfi: update to upstream v6.12 MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5328 JIRA: https://issues.redhat.com/browse/RHEL-20130 Signed-off-by: David Arcari <darcari@redhat.com> Approved-by: Andrew Halaney <ahalaney@redhat.com> Approved-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Approved-by: Lenny Szubowicz <lszubowi@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>	2024-11-19 13:50:47 +00:00
Rado Vrbovsky	d639f49798	Merge: CVE-2024-50024: net: Fix an unsafe loop on the list MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5544 JIRA: https://issues.redhat.com/browse/RHEL-63844 CVE: CVE-2024-50024 ``` net: Fix an unsafe loop on the list The kernel may crash when deleting a genetlink family if there are still listeners for that family: Oops: Kernel access of bad area, sig: 11 [#1] ... NIP [c000000000c080bc] netlink_update_socket_mc+0x3c/0xc0 LR [c000000000c0f764] __netlink_clear_multicast_users+0x74/0xc0 Call Trace: __netlink_clear_multicast_users+0x74/0xc0 genl_unregister_family+0xd4/0x2d0 Change the unsafe loop on the list to a safe one, because inside the loop there is an element removal from this list. Fixes: `b8273570f8` ("genetlink: fix netns vs. netlink table locking (2)") Cc: stable@vger.kernel.org Signed-off-by: Anastasia Kovaleva <a.kovaleva@yadro.com> Reviewed-by: Dmitry Bogdanov <d.bogdanov@yadro.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241003104431.12391-1-a.kovaleva@yadro.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> (cherry picked from commit 1dae9f1187189bc09ff6d25ca97ead711f7e26f9) ``` Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com> --- <small>Created 2024-10-22 12:45 UTC by backporter - [KWF FAQ](https://red.ht/kernel_workflow_doc) - [Slack #team-kernel-workflow](https://redhat-internal.slack.com/archives/C04LRUPMJQ5) - [Source](https://gitlab.com/cki-project/kernel-workflow/-/blob/main/webhook/utils/backporter.py) - [Documentation](https://gitlab.com/cki-project/kernel-workflow/-/blob/main/docs/README.backporter.md) - [Report an issue](https://gitlab.com/cki-project/kernel-workflow/-/issues/new?issue%5Btitle%5D=backporter%20webhook%20issue)</small> Approved-by: Hangbin Liu <haliu@redhat.com> Approved-by: Xin Long <lxin@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>	2024-11-11 08:40:52 +00:00
David Arcari	db4a2096ba	genetlink: Add per family bind/unbind callbacks JIRA: https://issues.redhat.com/browse/RHEL-20130 commit 3de21a8990d3c2cc507e9cc4ed00f36358d5b93e Author: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Date: Mon Feb 12 17:16:13 2024 +0100 genetlink: Add per family bind/unbind callbacks Add genetlink family bind()/unbind() callbacks when adding/removing multicast group to/from netlink client socket via setsockopt() or bind() syscall. They can be used to track if consumers of netlink multicast messages emerge or disappear. Thus, a client implementing callbacks, can now send events only when there are active consumers, preventing unnecessary work when none exist. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240212161615.161935-2-stanislaw.gruszka@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David Arcari <darcari@redhat.com>	2024-10-25 14:16:36 -04:00
Ivan Vecera	8da88cd9da	rtnetlink: add RTNL_FLAG_DUMP_UNLOCKED flag JIRA: https://issues.redhat.com/browse/RHEL-62123 commit 386520e0ecc01004d3a29c70c5a77d4bbf8a8420 Author: Eric Dumazet <edumazet@google.com> Date: Thu Feb 22 10:50:15 2024 +0000 rtnetlink: add RTNL_FLAG_DUMP_UNLOCKED flag Similarly to RTNL_FLAG_DOIT_UNLOCKED, this new flag allows dump operations registered via rtnl_register() or rtnl_register_module() to opt-out from RTNL protection. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ivan Vecera <ivecera@redhat.com>	2024-10-24 16:14:43 +02:00
Ivan Vecera	2240d90442	rtnetlink: change nlk->cb_mutex role JIRA: https://issues.redhat.com/browse/RHEL-62123 commit e39951d965bf58b5aba7f61dc1140dcb8271af22 Author: Eric Dumazet <edumazet@google.com> Date: Thu Feb 22 10:50:14 2024 +0000 rtnetlink: change nlk->cb_mutex role In commit `af65bdfce9` ("[NETLINK]: Switch cb_lock spinlock to mutex and allow to override it"), Patrick McHardy used a common mutex to protect both nlk->cb and the dump() operations. The override is used for rtnl dumps, registered with rntl_register() and rntl_register_module(). We want to be able to opt-out some dump() operations to not acquire RTNL, so we need to protect nlk->cb with a per socket mutex. This patch renames nlk->cb_def_mutex to nlk->nl_cb_mutex The optional pointer to the mutex used to protect dump() call is stored in nlk->dump_cb_mutex Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ivan Vecera <ivecera@redhat.com>	2024-10-24 16:14:43 +02:00
Ivan Vecera	25c3484119	netlink: hold nlk->cb_mutex longer in __netlink_dump_start() JIRA: https://issues.redhat.com/browse/RHEL-62123 commit b5590270068c4324dac4a2b5a4a156e02e21339f Author: Eric Dumazet <edumazet@google.com> Date: Thu Feb 22 10:50:13 2024 +0000 netlink: hold nlk->cb_mutex longer in __netlink_dump_start() __netlink_dump_start() releases nlk->cb_mutex right before calling netlink_dump() which grabs it again. This seems dangerous, even if KASAN did not bother yet. Add a @lock_taken parameter to netlink_dump() to let it grab the mutex if called from netlink_recvmsg() only. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ivan Vecera <ivecera@redhat.com>	2024-10-24 16:14:43 +02:00
Ivan Vecera	e4d9647420	netlink: fix netlink_diag_dump() return value JIRA: https://issues.redhat.com/browse/RHEL-62123 commit 6647b338fc5c6741736fe51a25fc2c0bec6398b8 Author: Eric Dumazet <edumazet@google.com> Date: Thu Feb 22 10:50:12 2024 +0000 netlink: fix netlink_diag_dump() return value __netlink_diag_dump() returns 1 if the dump is not complete, zero if no error occurred. If err variable is zero, this means the dump is complete: We should not return skb->len in this case, but 0. This allows NLMSG_DONE to be appended to the skb. User space does not have to call us again only to get NLMSG_DONE. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ivan Vecera <ivecera@redhat.com>	2024-10-24 16:14:43 +02:00
CKI Backport Bot	2bfc0a91e9	net: Fix an unsafe loop on the list JIRA: https://issues.redhat.com/browse/RHEL-63844 CVE: CVE-2024-50024 commit 1dae9f1187189bc09ff6d25ca97ead711f7e26f9 Author: Anastasia Kovaleva <a.kovaleva@yadro.com> Date: Thu Oct 3 13:44:31 2024 +0300 net: Fix an unsafe loop on the list The kernel may crash when deleting a genetlink family if there are still listeners for that family: Oops: Kernel access of bad area, sig: 11 [#1] ... NIP [c000000000c080bc] netlink_update_socket_mc+0x3c/0xc0 LR [c000000000c0f764] __netlink_clear_multicast_users+0x74/0xc0 Call Trace: __netlink_clear_multicast_users+0x74/0xc0 genl_unregister_family+0xd4/0x2d0 Change the unsafe loop on the list to a safe one, because inside the loop there is an element removal from this list. Fixes: `b8273570f8` ("genetlink: fix netns vs. netlink table locking (2)") Cc: stable@vger.kernel.org Signed-off-by: Anastasia Kovaleva <a.kovaleva@yadro.com> Reviewed-by: Dmitry Bogdanov <d.bogdanov@yadro.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241003104431.12391-1-a.kovaleva@yadro.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>	2024-10-22 12:45:46 +00:00
Petr Oros	14b592f444	netlink: use kvmalloc() in netlink_alloc_large_skb() JIRA: https://issues.redhat.com/browse/RHEL-30145 Upstream commit(s): commit f8cbf6bde4c8d5d32330bcceafa7b139fec89f97 Author: Eric Dumazet <edumazet@google.com> Date: Sat Feb 24 09:06:30 2024 +0000 netlink: use kvmalloc() in netlink_alloc_large_skb() This is a followup of commit 234ec0b6034b ("netlink: fix potential sleeping issue in mqueue_flush_file"), because vfree_atomic() overhead is unfortunate for medium sized allocations. 1) If the allocation is smaller than PAGE_SIZE, do not bother with vmalloc() at all. Some arches have 64KB PAGE_SIZE, while NLMSG_GOODSIZE is smaller than 8KB. 2) Use kvmalloc(), which might allocate one high order page instead of vmalloc if memory is not too fragmented. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20240224090630.605917-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Petr Oros <poros@redhat.com>	2024-04-30 12:47:44 +02:00
Petr Oros	7a9f656ed8	netlink: Fix kernel-infoleak-after-free in __skb_datagram_iter JIRA: https://issues.redhat.com/browse/RHEL-30145 Upstream commit(s): commit 661779e1fcafe1b74b3f3fe8e980c1e207fea1fd Author: Ryosuke Yasuoka <ryasuoka@redhat.com> Date: Wed Feb 21 16:40:48 2024 +0900 netlink: Fix kernel-infoleak-after-free in __skb_datagram_iter syzbot reported the following uninit-value access issue [1]: netlink_to_full_skb() creates a new `skb` and puts the `skb->data` passed as a 1st arg of netlink_to_full_skb() onto new `skb`. The data size is specified as `len` and passed to skb_put_data(). This `len` is based on `skb->end` that is not data offset but buffer offset. The `skb->end` contains data and tailroom. Since the tailroom is not initialized when the new `skb` created, KMSAN detects uninitialized memory area when copying the data. This patch resolved this issue by correct the len from `skb->end` to `skb->len`, which is the actual data offset. BUG: KMSAN: kernel-infoleak-after-free in instrument_copy_to_user include/linux/instrumented.h:114 [inline] BUG: KMSAN: kernel-infoleak-after-free in copy_to_user_iter lib/iov_iter.c:24 [inline] BUG: KMSAN: kernel-infoleak-after-free in iterate_ubuf include/linux/iov_iter.h:29 [inline] BUG: KMSAN: kernel-infoleak-after-free in iterate_and_advance2 include/linux/iov_iter.h:245 [inline] BUG: KMSAN: kernel-infoleak-after-free in iterate_and_advance include/linux/iov_iter.h:271 [inline] BUG: KMSAN: kernel-infoleak-after-free in _copy_to_iter+0x364/0x2520 lib/iov_iter.c:186 instrument_copy_to_user include/linux/instrumented.h:114 [inline] copy_to_user_iter lib/iov_iter.c:24 [inline] iterate_ubuf include/linux/iov_iter.h:29 [inline] iterate_and_advance2 include/linux/iov_iter.h:245 [inline] iterate_and_advance include/linux/iov_iter.h:271 [inline] _copy_to_iter+0x364/0x2520 lib/iov_iter.c:186 copy_to_iter include/linux/uio.h:197 [inline] simple_copy_to_iter+0x68/0xa0 net/core/datagram.c:532 __skb_datagram_iter+0x123/0xdc0 net/core/datagram.c:420 skb_copy_datagram_iter+0x5c/0x200 net/core/datagram.c:546 skb_copy_datagram_msg include/linux/skbuff.h:3960 [inline] packet_recvmsg+0xd9c/0x2000 net/packet/af_packet.c:3482 sock_recvmsg_nosec net/socket.c:1044 [inline] sock_recvmsg net/socket.c:1066 [inline] sock_read_iter+0x467/0x580 net/socket.c:1136 call_read_iter include/linux/fs.h:2014 [inline] new_sync_read fs/read_write.c:389 [inline] vfs_read+0x8f6/0xe00 fs/read_write.c:470 ksys_read+0x20f/0x4c0 fs/read_write.c:613 __do_sys_read fs/read_write.c:623 [inline] __se_sys_read fs/read_write.c:621 [inline] __x64_sys_read+0x93/0xd0 fs/read_write.c:621 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b Uninit was stored to memory at: skb_put_data include/linux/skbuff.h:2622 [inline] netlink_to_full_skb net/netlink/af_netlink.c:181 [inline] __netlink_deliver_tap_skb net/netlink/af_netlink.c:298 [inline] __netlink_deliver_tap+0x5be/0xc90 net/netlink/af_netlink.c:325 netlink_deliver_tap net/netlink/af_netlink.c:338 [inline] netlink_deliver_tap_kernel net/netlink/af_netlink.c:347 [inline] netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline] netlink_unicast+0x10f1/0x1250 net/netlink/af_netlink.c:1368 netlink_sendmsg+0x1238/0x13d0 net/netlink/af_netlink.c:1910 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] ____sys_sendmsg+0x9c2/0xd60 net/socket.c:2584 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2638 __sys_sendmsg net/socket.c:2667 [inline] __do_sys_sendmsg net/socket.c:2676 [inline] __se_sys_sendmsg net/socket.c:2674 [inline] __x64_sys_sendmsg+0x307/0x490 net/socket.c:2674 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b Uninit was created at: free_pages_prepare mm/page_alloc.c:1087 [inline] free_unref_page_prepare+0xb0/0xa40 mm/page_alloc.c:2347 free_unref_page_list+0xeb/0x1100 mm/page_alloc.c:2533 release_pages+0x23d3/0x2410 mm/swap.c:1042 free_pages_and_swap_cache+0xd9/0xf0 mm/swap_state.c:316 tlb_batch_pages_flush mm/mmu_gather.c:98 [inline] tlb_flush_mmu_free mm/mmu_gather.c:293 [inline] tlb_flush_mmu+0x6f5/0x980 mm/mmu_gather.c:300 tlb_finish_mmu+0x101/0x260 mm/mmu_gather.c:392 exit_mmap+0x49e/0xd30 mm/mmap.c:3321 __mmput+0x13f/0x530 kernel/fork.c:1349 mmput+0x8a/0xa0 kernel/fork.c:1371 exit_mm+0x1b8/0x360 kernel/exit.c:567 do_exit+0xd57/0x4080 kernel/exit.c:858 do_group_exit+0x2fd/0x390 kernel/exit.c:1021 __do_sys_exit_group kernel/exit.c:1032 [inline] __se_sys_exit_group kernel/exit.c:1030 [inline] __x64_sys_exit_group+0x3c/0x50 kernel/exit.c:1030 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b Bytes 3852-3903 of 3904 are uninitialized Memory access of size 3904 starts at ffff88812ea1e000 Data copied to user address 0000000020003280 CPU: 1 PID: 5043 Comm: syz-executor297 Not tainted 6.7.0-rc5-syzkaller-00047-g5bd7ef53ffe5 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023 Fixes: `1853c94964` ("netlink, mmap: transform mmap skb into full skb on taps") Reported-and-tested-by: syzbot+34ad5fab48f7bf510349@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=34ad5fab48f7bf510349 [1] Signed-off-by: Ryosuke Yasuoka <ryasuoka@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240221074053.1794118-1-ryasuoka@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Petr Oros <poros@redhat.com>	2024-04-26 17:16:11 +02:00
Petr Oros	991a5ee575	netlink: fix potential sleeping issue in mqueue_flush_file JIRA: https://issues.redhat.com/browse/RHEL-30145 Upstream commit(s): commit 234ec0b6034b16869d45128b8cd2dc6ffe596f04 Author: Zhengchao Shao <shaozhengchao@huawei.com> Date: Mon Jan 22 09:18:07 2024 +0800 netlink: fix potential sleeping issue in mqueue_flush_file I analyze the potential sleeping issue of the following processes: Thread A Thread B ... netlink_create //ref = 1 do_mq_notify ... sock = netlink_getsockbyfilp ... //ref = 2 info->notify_sock = sock; ... ... netlink_sendmsg ... skb = netlink_alloc_large_skb //skb->head is vmalloced ... netlink_unicast ... sk = netlink_getsockbyportid //ref = 3 ... netlink_sendskb ... __netlink_sendskb ... skb_queue_tail //put skb to sk_receive_queue ... sock_put //ref = 2 ... ... ... netlink_release ... deferred_put_nlk_sk //ref = 1 mqueue_flush_file spin_lock remove_notification netlink_sendskb sock_put //ref = 0 sk_free ... __sk_destruct netlink_sock_destruct skb_queue_purge //get skb from sk_receive_queue ... __skb_queue_purge_reason kfree_skb_reason __kfree_skb ... skb_release_all skb_release_head_state netlink_skb_destructor vfree(skb->head) //sleeping while holding spinlock In netlink_sendmsg, if the memory pointed to by skb->head is allocated by vmalloc, and is put to sk_receive_queue queue, also the skb is not freed. When the mqueue executes flush, the sleeping bug will occur. Use vfree_atomic instead of vfree in netlink_skb_destructor to solve the issue. Fixes: `c05cdb1b86` ("netlink: allow large data transfers from user-space") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20240122011807.2110357-1-shaozhengchao@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Petr Oros <poros@redhat.com>	2024-04-26 17:16:11 +02:00
Petr Oros	91f18bfb6c	genetlink: Use internal flags for multicast groups JIRA: https://issues.redhat.com/browse/RHEL-30145 Upstream commit(s): commit cd4d7263d58ab98fd4dee876776e4da6c328faa3 Author: Ido Schimmel <idosch@nvidia.com> Date: Wed Dec 20 17:43:58 2023 +0200 genetlink: Use internal flags for multicast groups As explained in commit e03781879a0d ("drop_monitor: Require 'CAP_SYS_ADMIN' when joining "events" group"), the "flags" field in the multicast group structure reuses uAPI flags despite the field not being exposed to user space. This makes it impossible to extend its use without adding new uAPI flags, which is inappropriate for internal kernel checks. Solve this by adding internal flags (i.e., "GENL_MCAST_*") and convert the existing users to use them instead of the uAPI flags. Tested using the reproducers in commit 44ec98ea5ea9 ("psample: Require 'CAP_NET_ADMIN' when joining "packets" group") and commit e03781879a0d ("drop_monitor: Require 'CAP_SYS_ADMIN' when joining "events" group"). No functional changes intended. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Petr Oros <poros@redhat.com>	2024-04-26 17:16:10 +02:00
Petr Oros	d60d7ef2e0	netlink: introduce typedef for filter function JIRA: https://issues.redhat.com/browse/RHEL-30145 Upstream commit(s): commit 403863e985e8eba608d53b2907caaf37b6176290 Author: Jiri Pirko <jiri@nvidia.com> Date: Sat Dec 16 13:29:58 2023 +0100 netlink: introduce typedef for filter function Make the code using filter function a bit nicer by consolidating the filter function arguments using typedef. Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Petr Oros <poros@redhat.com>	2024-04-26 17:16:10 +02:00
Petr Oros	54dcfdcba1	genetlink: introduce per-sock family private storage JIRA: https://issues.redhat.com/browse/RHEL-30145 Conflicts: - adjusted context conflict due to `4760ca4bff` ("net: add reserved fields to genl_family") Upstream commit(s): commit a731132424adeda4d5383ef61afae2e804063fb7 Author: Jiri Pirko <jiri@nvidia.com> Date: Sat Dec 16 13:29:57 2023 +0100 genetlink: introduce per-sock family private storage Introduce an xarray for Generic netlink family to store per-socket private. Initialize this xarray only if family uses per-socket privs. Introduce genl_sk_priv_get() to get the socket priv pointer for a family and initialize it in case it does not exist. Introduce __genl_sk_priv_get() to obtain socket priv pointer for a family under RCU read lock. Allow family to specify the priv size, init() and destroy() callbacks. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Petr Oros <poros@redhat.com>	2024-04-26 17:16:10 +02:00
Petr Oros	147089bf66	rtnetlink: introduce nlmsg_new_large and use it in rtnl_getlink JIRA: https://issues.redhat.com/browse/RHEL-30145 Upstream commit(s): commit ac40916a3f7243efbe6e129ebf495b5c33a3adfe Author: Li RongQing <lirongqing@baidu.com> Date: Wed Nov 15 20:01:08 2023 +0800 rtnetlink: introduce nlmsg_new_large and use it in rtnl_getlink if a PF has 256 or more VFs, ip link command will allocate an order 3 memory or more, and maybe trigger OOM due to memory fragment, the VFs needed memory size is computed in rtnl_vfinfo_size. so introduce nlmsg_new_large which calls netlink_alloc_large_skb in which vmalloc is used for large memory, to avoid the failure of allocating memory ip invoked oom-killer: gfp_mask=0xc2cc0(GFP_KERNEL\|__GFP_NOWARN\|\ __GFP_COMP\|__GFP_NOMEMALLOC), order=3, oom_score_adj=0 CPU: 74 PID: 204414 Comm: ip Kdump: loaded Tainted: P OE Call Trace: dump_stack+0x57/0x6a dump_header+0x4a/0x210 oom_kill_process+0xe4/0x140 out_of_memory+0x3e8/0x790 __alloc_pages_slowpath.constprop.116+0x953/0xc50 __alloc_pages_nodemask+0x2af/0x310 kmalloc_large_node+0x38/0xf0 __kmalloc_node_track_caller+0x417/0x4d0 __kmalloc_reserve.isra.61+0x2e/0x80 __alloc_skb+0x82/0x1c0 rtnl_getlink+0x24f/0x370 rtnetlink_rcv_msg+0x12c/0x350 netlink_rcv_skb+0x50/0x100 netlink_unicast+0x1b2/0x280 netlink_sendmsg+0x355/0x4a0 sock_sendmsg+0x5b/0x60 ____sys_sendmsg+0x1ea/0x250 ___sys_sendmsg+0x88/0xd0 __sys_sendmsg+0x5e/0xa0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f95a65a5b70 Cc: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Link: https://lore.kernel.org/r/20231115120108.3711-1-lirongqing@baidu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Petr Oros <poros@redhat.com>	2024-04-26 17:16:06 +02:00
Petr Oros	d16936e50a	netlink: fill in missing MODULE_DESCRIPTION() JIRA: https://issues.redhat.com/browse/RHEL-30145 Upstream commit(s): commit 016b9332a3346e97a6cacffea0f9dc10e1235a75 Author: Jakub Kicinski <kuba@kernel.org> Date: Wed Nov 1 21:57:24 2023 -0700 netlink: fill in missing MODULE_DESCRIPTION() W=1 builds now warn if a module is built without a MODULE_DESCRIPTION(). Fill it in for sock_diag. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Petr Oros <poros@redhat.com>	2024-04-26 17:16:05 +02:00
Petr Oros	bedce06bfb	genetlink: don't merge dumpit split op for different cmds into single iter JIRA: https://issues.redhat.com/browse/RHEL-30145 Upstream commit(s): commit f862ed2d0bf0cf51c28c1a69e3c2a1558d5a2978 Author: Jiri Pirko <jiri@nvidia.com> Date: Sat Oct 21 13:27:02 2023 +0200 genetlink: don't merge dumpit split op for different cmds into single iter Currently, split ops of doit and dumpit are merged into a single iter item when they are subsequent. However, there is no guarantee that the dumpit op is for the same cmd as doit op. Fix this by checking if cmd is the same for both. This problem does not occur in existing families. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231021112711.660606-2-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Petr Oros <poros@redhat.com>	2024-04-26 17:16:03 +02:00
Petr Oros	e942232c80	netlink: add variable-length / auto integers JIRA: https://issues.redhat.com/browse/RHEL-30145 Upstream commit(s): commit 374d345d9b5e13380c66d7042f9533a6ac6d1195 Author: Jakub Kicinski <kuba@kernel.org> Date: Wed Oct 18 14:39:20 2023 -0700 netlink: add variable-length / auto integers We currently push everyone to use padding to align 64b values in netlink. Un-padded nla_put_u64() doesn't even exist any more. The story behind this possibly start with this thread: https://lore.kernel.org/netdev/20121204.130914.1457976839967676240.davem@davemloft.net/ where DaveM was concerned about the alignment of a structure containing 64b stats. If user space tries to access such struct directly: struct some_stats *stats = nla_data(attr); printf("A: %llu", stats->a); lack of alignment may become problematic for some architectures. These days we most often put every single member in a separate attribute, meaning that the code above would use a helper like nla_get_u64(), which can deal with alignment internally. Even for arches which don't have good unaligned access - access aligned to 4B should be pretty efficient. Kernel and well known libraries deal with unaligned input already. Padded 64b is quite space-inefficient (64b + pad means at worst 16B per attr vs 32b which takes 8B). It is also more typing: if (nla_put_u64_pad(rsp, NETDEV_A_SOMETHING_SOMETHING, value, NETDEV_A_SOMETHING_PAD)) Create a new attribute type which will use 32 bits at netlink level if value is small enough (probably most of the time?), and (4B-aligned) 64 bits otherwise. Kernel API is just: if (nla_put_uint(rsp, NETDEV_A_SOMETHING_SOMETHING, value)) Calling this new type "just" sint / uint with no specific size will hopefully also make people more comfortable with using it. Currently telling people "don't use u8, you may need the bits, and netlink will round up to 4B, anyway" is the #1 comment we give to newcomers. In terms of netlink layout it looks like this: 0 4 8 12 16 32b: [nlattr][ u32 ] 64b: [ pad ][nlattr][ u64 ] uint(32) [nlattr][ u32 ] uint(64) [nlattr][ u64 ] Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Petr Oros <poros@redhat.com>	2024-04-26 17:16:03 +02:00
Petr Oros	ea990c71f1	netlink: Annotate struct netlink_policy_dump_state with __counted_by JIRA: https://issues.redhat.com/browse/RHEL-30145 Upstream commit(s): commit eaede99c3aeb38613c40a150f676f772faf2b42b Author: Kees Cook <keescook@chromium.org> Date: Tue Oct 3 16:21:02 2023 -0700 netlink: Annotate struct netlink_policy_dump_state with __counted_by Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct netlink_policy_dump_state. Additionally update the size of the usage array length before accessing it. This requires remembering the old size for the memset() and later assignments. Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Johannes Berg <johannes.berg@intel.com> Cc: netdev@vger.kernel.org Link: https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci [1] Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Petr Oros <poros@redhat.com>	2024-04-26 17:16:00 +02:00
Ivan Vecera	556894708a	netlink: annotate data-races around sk->sk_err JIRA: https://issues.redhat.com/browse/RHEL-30656 commit d0f95894fda7d4f895b29c1097f92d7fee278cb2 Author: Eric Dumazet <edumazet@google.com> Date: Tue Oct 3 18:34:55 2023 +0000 netlink: annotate data-races around sk->sk_err syzbot caught another data-race in netlink when setting sk->sk_err. Annotate all of them for good measure. BUG: KCSAN: data-race in netlink_recvmsg / netlink_recvmsg write to 0xffff8881613bb220 of 4 bytes by task 28147 on cpu 0: netlink_recvmsg+0x448/0x780 net/netlink/af_netlink.c:1994 sock_recvmsg_nosec net/socket.c:1027 [inline] sock_recvmsg net/socket.c:1049 [inline] __sys_recvfrom+0x1f4/0x2e0 net/socket.c:2229 __do_sys_recvfrom net/socket.c:2247 [inline] __se_sys_recvfrom net/socket.c:2243 [inline] __x64_sys_recvfrom+0x78/0x90 net/socket.c:2243 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd write to 0xffff8881613bb220 of 4 bytes by task 28146 on cpu 1: netlink_recvmsg+0x448/0x780 net/netlink/af_netlink.c:1994 sock_recvmsg_nosec net/socket.c:1027 [inline] sock_recvmsg net/socket.c:1049 [inline] __sys_recvfrom+0x1f4/0x2e0 net/socket.c:2229 __do_sys_recvfrom net/socket.c:2247 [inline] __se_sys_recvfrom net/socket.c:2243 [inline] __x64_sys_recvfrom+0x78/0x90 net/socket.c:2243 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0x00000000 -> 0x00000016 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 28146 Comm: syz-executor.0 Not tainted 6.6.0-rc3-syzkaller-00055-g9ed22ae6be81 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/06/2023 Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231003183455.3410550-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Ivan Vecera <ivecera@redhat.com>	2024-04-10 09:19:34 +02:00
Ivan Vecera	51411db015	genetlink: add a family pointer to struct genl_info JIRA: https://issues.redhat.com/browse/RHEL-30656 commit 5c670a010de46687ed27553602d8131ce4d7a9fb Author: Jakub Kicinski <kuba@kernel.org> Date: Mon Aug 14 14:47:19 2023 -0700 genetlink: add a family pointer to struct genl_info Having family in struct genl_info is quite useful. It cuts down the number of arguments which need to be passed to helpers which already take struct genl_info. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230814214723.2924989-7-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Ivan Vecera <ivecera@redhat.com>	2024-04-10 09:19:30 +02:00
Ivan Vecera	821b40dbfa	genetlink: use attrs from struct genl_info JIRA: https://issues.redhat.com/browse/RHEL-30656 commit 7288dd2fd4888c85c687f8ded69c280938d1a7b6 Author: Jakub Kicinski <kuba@kernel.org> Date: Mon Aug 14 14:47:18 2023 -0700 genetlink: use attrs from struct genl_info Since dumps carry struct genl_info now, use the attrs pointer from genl_info and remove the one in struct genl_dumpit_info. Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230814214723.2924989-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Ivan Vecera <ivecera@redhat.com>	2024-04-10 09:19:30 +02:00
Ivan Vecera	5e510e57b4	genetlink: add struct genl_info to struct genl_dumpit_info JIRA: https://issues.redhat.com/browse/RHEL-30656 commit 9272af109fe65d1a13f28c5c13777b62d3e97e8c Author: Jakub Kicinski <kuba@kernel.org> Date: Mon Aug 14 14:47:17 2023 -0700 genetlink: add struct genl_info to struct genl_dumpit_info Netlink GET implementations must currently juggle struct genl_info and struct netlink_callback, depending on whether they were called from doit or dumpit. Add genl_info to the dump state and populate the fields. This way implementations can simply pass struct genl_info around. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230814214723.2924989-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Ivan Vecera <ivecera@redhat.com>	2024-04-10 09:19:30 +02:00
Ivan Vecera	25a5e1ea3a	genetlink: remove userhdr from struct genl_info JIRA: https://issues.redhat.com/browse/RHEL-30656 commit bffcc6882a1bb2be8c9420184966f4c2c822078e Author: Jakub Kicinski <kuba@kernel.org> Date: Mon Aug 14 14:47:16 2023 -0700 genetlink: remove userhdr from struct genl_info Only three families use info->userhdr today and going forward we discourage using fixed headers in new families. So having the pointer to user header in struct genl_info is an overkill. Compute the header pointer at runtime. Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Link: https://lore.kernel.org/r/20230814214723.2924989-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Ivan Vecera <ivecera@redhat.com>	2024-04-10 09:19:30 +02:00
Ivan Vecera	8a3425c399	genetlink: push conditional locking into dumpit/done JIRA: https://issues.redhat.com/browse/RHEL-30656 commit 84817d8c6042e6261ea45c53fe8b5a0bd55c3993 Author: Jakub Kicinski <kuba@kernel.org> Date: Mon Aug 14 14:47:14 2023 -0700 genetlink: push conditional locking into dumpit/done Add helpers which take/release the genl mutex based on family->parallel_ops. Remove the separation between handling of ops in locked and parallel families. Future patches would make the duplicated code grow even more. Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230814214723.2924989-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Ivan Vecera <ivecera@redhat.com>	2024-04-10 09:19:30 +02:00
Ivan Vecera	374f808933	netlink: convert nlk->flags to atomic flags JIRA: https://issues.redhat.com/browse/RHEL-30656 commit 8fe08d70a2b61b35a0a1235c78cf321e7528351f Author: Eric Dumazet <edumazet@google.com> Date: Fri Aug 11 07:22:26 2023 +0000 netlink: convert nlk->flags to atomic flags sk_diag_put_flags(), netlink_setsockopt(), netlink_getsockopt() and others use nlk->flags without correct locking. Use set_bit(), clear_bit(), test_bit(), assign_bit() to remove data-races. Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ivan Vecera <ivecera@redhat.com>	2024-04-10 09:19:29 +02:00
Ivan Vecera	c6218efc01	netlink: Add new netlink_release function JIRA: https://issues.redhat.com/browse/RHEL-30656 commit a4c9a56e6a2cdeeab7caef1f496b7bfefd95b50e Author: Anjali Kulkarni <anjali.k.kulkarni@oracle.com> Date: Wed Jul 19 13:18:17 2023 -0700 netlink: Add new netlink_release function A new function netlink_release is added in netlink_sock to store the protocol's release function. This is called when the socket is deleted. This can be supplied by the protocol via the release function in netlink_kernel_cfg. This is being added for the NETLINK_CONNECTOR protocol, so it can free it's data when socket is deleted. Signed-off-by: Anjali Kulkarni <anjali.k.kulkarni@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ivan Vecera <ivecera@redhat.com>	2024-04-10 09:19:27 +02:00
Ivan Vecera	cf823aa591	genetlink: add explicit ordering break check for split ops JIRA: https://issues.redhat.com/browse/RHEL-30656 commit 5766946ea5117e4edeb78c80cac367fb06854cc1 Author: Jiri Pirko <jiri@nvidia.com> Date: Thu Jul 20 13:13:54 2023 +0200 genetlink: add explicit ordering break check for split ops Currently, if cmd in the split ops array is of lower value than the previous one, genl_validate_ops() continues to do the checks as if the values are equal. This may result in non-obvious WARN_ON() hit in these check. Instead, check the incorrect ordering explicitly and put a WARN_ON() in case it is broken. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20230720111354.562242-1-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Ivan Vecera <ivecera@redhat.com>	2024-04-10 09:19:27 +02:00
Ivan Vecera	e9b850700c	netlink: Make use of __assign_bit() API JIRA: https://issues.redhat.com/browse/RHEL-30656 commit b8e39b38487e68c6503419db6e4a851a0ef56de7 Author: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Date: Mon Jul 10 13:08:30 2023 +0300 netlink: Make use of __assign_bit() API We have for some time the __assign_bit() API to replace open coded if (foo) __set_bit(n, bar); else __clear_bit(n, bar); Use this API in the code. No functional change intended. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Message-ID: <20230710100830.89936-2-andriy.shevchenko@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com>	2024-04-10 09:19:27 +02:00
Ivan Vecera	764a373f7a	netlink: Add __sock_i_ino() for __netlink_diag_dump(). JIRA: https://issues.redhat.com/browse/RHEL-30656 commit 25a9c8a4431c364f97f75558cb346d2ad3f53fbb Author: Kuniyuki Iwashima <kuniyu@amazon.com> Date: Mon Jun 26 09:43:13 2023 -0700 netlink: Add __sock_i_ino() for __netlink_diag_dump(). syzbot reported a warning in __local_bh_enable_ip(). [0] Commit 8d61f926d420 ("netlink: fix potential deadlock in netlink_set_err()") converted read_lock(&nl_table_lock) to read_lock_irqsave() in __netlink_diag_dump() to prevent a deadlock. However, __netlink_diag_dump() calls sock_i_ino() that uses read_lock_bh() and read_unlock_bh(). If CONFIG_TRACE_IRQFLAGS=y, read_unlock_bh() finally enables IRQ even though it should stay disabled until the following read_unlock_irqrestore(). Using read_lock() in sock_i_ino() would trigger a lockdep splat in another place that was fixed in commit `f064af1e50` ("net: fix a lockdep splat"), so let's add __sock_i_ino() that would be safe to use under BH disabled. [0]: WARNING: CPU: 0 PID: 5012 at kernel/softirq.c:376 __local_bh_enable_ip+0xbe/0x130 kernel/softirq.c:376 Modules linked in: CPU: 0 PID: 5012 Comm: syz-executor487 Not tainted 6.4.0-rc7-syzkaller-00202-g6f68fc395f49 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023 RIP: 0010:__local_bh_enable_ip+0xbe/0x130 kernel/softirq.c:376 Code: 45 bf 01 00 00 00 e8 91 5b 0a 00 e8 3c 15 3d 00 fb 65 8b 05 ec e9 b5 7e 85 c0 74 58 5b 5d c3 65 8b 05 b2 b6 b4 7e 85 c0 75 a2 <0f> 0b eb 9e e8 89 15 3d 00 eb 9f 48 89 ef e8 6f 49 18 00 eb a8 0f RSP: 0018:ffffc90003a1f3d0 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000201 RCX: 1ffffffff1cf5996 RDX: 0000000000000000 RSI: 0000000000000201 RDI: ffffffff8805c6f3 RBP: ffffffff8805c6f3 R08: 0000000000000001 R09: ffff8880152b03a3 R10: ffffed1002a56074 R11: 0000000000000005 R12: 00000000000073e4 R13: dffffc0000000000 R14: 0000000000000002 R15: 0000000000000000 FS: 0000555556726300(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000045ad50 CR3: 000000007c646000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> sock_i_ino+0x83/0xa0 net/core/sock.c:2559 __netlink_diag_dump+0x45c/0x790 net/netlink/diag.c:171 netlink_diag_dump+0xd6/0x230 net/netlink/diag.c:207 netlink_dump+0x570/0xc50 net/netlink/af_netlink.c:2269 __netlink_dump_start+0x64b/0x910 net/netlink/af_netlink.c:2374 netlink_dump_start include/linux/netlink.h:329 [inline] netlink_diag_handler_dump+0x1ae/0x250 net/netlink/diag.c:238 __sock_diag_cmd net/core/sock_diag.c:238 [inline] sock_diag_rcv_msg+0x31e/0x440 net/core/sock_diag.c:269 netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2547 sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:280 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1365 netlink_sendmsg+0x925/0xe30 net/netlink/af_netlink.c:1914 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg+0xde/0x190 net/socket.c:747 ____sys_sendmsg+0x71c/0x900 net/socket.c:2503 ___sys_sendmsg+0x110/0x1b0 net/socket.c:2557 __sys_sendmsg+0xf7/0x1c0 net/socket.c:2586 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f5303aaabb9 Code: 28 c3 e8 2a 14 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffc7506e548 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5303aaabb9 RDX: 0000000000000000 RSI: 0000000020000180 RDI: 0000000000000003 RBP: 00007f5303a6ed60 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f5303a6edf0 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK> Fixes: 8d61f926d420 ("netlink: fix potential deadlock in netlink_set_err()") Reported-by: syzbot+5da61cf6a9bc1902d422@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=5da61cf6a9bc1902d422 Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230626164313.52528-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Ivan Vecera <ivecera@redhat.com>	2024-04-10 09:19:27 +02:00
Ivan Vecera	e385a6038e	netlink: fix potential deadlock in netlink_set_err() JIRA: https://issues.redhat.com/browse/RHEL-30656 commit 8d61f926d42045961e6b65191c09e3678d86a9cf Author: Eric Dumazet <edumazet@google.com> Date: Wed Jun 21 15:43:37 2023 +0000 netlink: fix potential deadlock in netlink_set_err() syzbot reported a possible deadlock in netlink_set_err() [1] A similar issue was fixed in commit `1d482e666b` ("netlink: disable IRQs for netlink_lock_table()") in netlink_lock_table() This patch adds IRQ safety to netlink_set_err() and __netlink_diag_dump() which were not covered by cited commit. [1] WARNING: possible irq lock inversion dependency detected 6.4.0-rc6-syzkaller-00240-g4e9f0ec38852 #0 Not tainted syz-executor.2/23011 just changed the state of lock: ffffffff8e1a7a58 (nl_table_lock){.+.?}-{2:2}, at: netlink_set_err+0x2e/0x3a0 net/netlink/af_netlink.c:1612 but this lock was taken by another, SOFTIRQ-safe lock in the past: (&local->queue_stop_reason_lock){..-.}-{2:2} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(nl_table_lock); local_irq_disable(); lock(&local->queue_stop_reason_lock); lock(nl_table_lock); <Interrupt> lock(&local->queue_stop_reason_lock); * DEADLOCK * Fixes: `1d482e666b` ("netlink: disable IRQs for netlink_lock_table()") Reported-by: syzbot+a7d200a347f912723e5c@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=a7d200a347f912723e5c Link: https://lore.kernel.org/netdev/000000000000e38d1605fea5747e@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20230621154337.1668594-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Ivan Vecera <ivecera@redhat.com>	2024-04-10 09:19:26 +02:00
Ivan Vecera	587fa7af5a	net/netlink: fix NETLINK_LIST_MEMBERSHIPS length report JIRA: https://issues.redhat.com/browse/RHEL-30656 commit f4e4534850a9d18c250a93f8d7fbb51310828110 Author: Pedro Tammela <pctammela@mojatatu.com> Date: Mon May 29 12:33:35 2023 -0300 net/netlink: fix NETLINK_LIST_MEMBERSHIPS length report The current code for the length calculation wrongly truncates the reported length of the groups array, causing an under report of the subscribed groups. To fix this, use 'BITS_TO_BYTES()' which rounds up the division by 8. Fixes: `b42be38b27` ("netlink: add API to retrieve all group memberships") Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230529153335.389815-1-pctammela@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Ivan Vecera <ivecera@redhat.com>	2024-04-10 09:19:22 +02:00
Ivan Vecera	8fcbe46cde	netlink: annotate accesses to nlk->cb_running JIRA: https://issues.redhat.com/browse/RHEL-30656 commit a939d14919b799e6fff8a9c80296ca229ba2f8a4 Author: Eric Dumazet <edumazet@google.com> Date: Tue May 9 16:56:34 2023 +0000 netlink: annotate accesses to nlk->cb_running Both netlink_recvmsg() and netlink_native_seq_show() read nlk->cb_running locklessly. Use READ_ONCE() there. Add corresponding WRITE_ONCE() to netlink_dump() and __netlink_dump_start() syzbot reported: BUG: KCSAN: data-race in __netlink_dump_start / netlink_recvmsg write to 0xffff88813ea4db59 of 1 bytes by task 28219 on cpu 0: __netlink_dump_start+0x3af/0x4d0 net/netlink/af_netlink.c:2399 netlink_dump_start include/linux/netlink.h:308 [inline] rtnetlink_rcv_msg+0x70f/0x8c0 net/core/rtnetlink.c:6130 netlink_rcv_skb+0x126/0x220 net/netlink/af_netlink.c:2577 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:6192 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0x56f/0x640 net/netlink/af_netlink.c:1365 netlink_sendmsg+0x665/0x770 net/netlink/af_netlink.c:1942 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg net/socket.c:747 [inline] sock_write_iter+0x1aa/0x230 net/socket.c:1138 call_write_iter include/linux/fs.h:1851 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x463/0x760 fs/read_write.c:584 ksys_write+0xeb/0x1a0 fs/read_write.c:637 __do_sys_write fs/read_write.c:649 [inline] __se_sys_write fs/read_write.c:646 [inline] __x64_sys_write+0x42/0x50 fs/read_write.c:646 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd read to 0xffff88813ea4db59 of 1 bytes by task 28222 on cpu 1: netlink_recvmsg+0x3b4/0x730 net/netlink/af_netlink.c:2022 sock_recvmsg_nosec+0x4c/0x80 net/socket.c:1017 ____sys_recvmsg+0x2db/0x310 net/socket.c:2718 ___sys_recvmsg net/socket.c:2762 [inline] do_recvmmsg+0x2e5/0x710 net/socket.c:2856 __sys_recvmmsg net/socket.c:2935 [inline] __do_sys_recvmmsg net/socket.c:2958 [inline] __se_sys_recvmmsg net/socket.c:2951 [inline] __x64_sys_recvmmsg+0xe2/0x160 net/socket.c:2951 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0x00 -> 0x01 Fixes: `16b304f340` ("netlink: Eliminate kmalloc in netlink dump operation.") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ivan Vecera <ivecera@redhat.com>	2024-04-10 09:19:21 +02:00
Ivan Vecera	8a3f7c7be9	netlink: Use copy_to_user() for optval in netlink_getsockopt(). JIRA: https://issues.redhat.com/browse/RHEL-30656 commit d913d32cc2707e9cd24fe6fa6d7d470e9c728980 Author: Kuniyuki Iwashima <kuniyu@amazon.com> Date: Fri Apr 21 11:52:55 2023 -0700 netlink: Use copy_to_user() for optval in netlink_getsockopt(). Brad Spencer provided a detailed report [0] that when calling getsockopt() for AF_NETLINK, some SOL_NETLINK options set only 1 byte even though such options require at least sizeof(int) as length. The options return a flag value that fits into 1 byte, but such behaviour confuses users who do not initialise the variable before calling getsockopt() and do not strictly check the returned value as char. Currently, netlink_getsockopt() uses put_user() to copy data to optlen and optval, but put_user() casts the data based on the pointer, char *optval. As a result, only 1 byte is set to optval. To avoid this behaviour, we need to use copy_to_user() or cast optval for put_user(). Note that this changes the behaviour on big-endian systems, but we document that the size of optval is int in the man page. $ man 7 netlink ... Socket options To set or get a netlink socket option, call getsockopt(2) to read or setsockopt(2) to write the option with the option level argument set to SOL_NETLINK. Unless otherwise noted, optval is a pointer to an int. Fixes: `9a4595bc7e` ("[NETLINK]: Add set/getsockopt options to support more than 32 groups") Fixes: `be0c22a46c` ("netlink: add NETLINK_BROADCAST_ERROR socket option") Fixes: `38938bfe34` ("netlink: add NETLINK_NO_ENOBUFS socket flag") Fixes: `0a6a3a23ea` ("netlink: add NETLINK_CAP_ACK socket option") Fixes: `2d4bc93368` ("netlink: extended ACK reporting") Fixes: `89d35528d1` ("netlink: Add new socket option to enable strict checking on dumps") Reported-by: Brad Spencer <bspencer@blackberry.com> Link: https://lore.kernel.org/netdev/ZD7VkNWFfp22kTDt@datsun.rim.net/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Link: https://lore.kernel.org/r/20230421185255.94606-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Ivan Vecera <ivecera@redhat.com>	2024-04-10 09:19:21 +02:00
Ivan Vecera	bf93242e27	netlink: remove unused 'compare' function JIRA: https://issues.redhat.com/browse/RHEL-30656 commit 6978052448f9eb19f7b03243ac0416104e5ee50d Author: Florian Westphal <fw@strlen.de> Date: Wed Mar 8 15:20:06 2023 +0100 netlink: remove unused 'compare' function No users in the tree. Tested with allmodconfig build. Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20230308142006.20879-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Ivan Vecera <ivecera@redhat.com>	2024-04-10 09:19:19 +02:00
Ivan Vecera	44d65dbd81	netlink: Reverse the patch which removed filtering JIRA: https://issues.redhat.com/browse/RHEL-30344 commit a3377386b56420d78a4c0a931a40f9a25c3ca2bd Author: Anjali Kulkarni <anjali.k.kulkarni@oracle.com> Date: Wed Jul 19 13:18:16 2023 -0700 netlink: Reverse the patch which removed filtering To use filtering at the connector & cn_proc layers, we need to enable filtering in the netlink layer. This reverses the patch which removed netlink filtering - commit ID for that patch: 549017aa1bb7 (netlink: remove netlink_broadcast_filtered). Signed-off-by: Anjali Kulkarni <anjali.k.kulkarni@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ivan Vecera <ivecera@redhat.com>	2024-04-10 09:16:00 +02:00
Ivan Vecera	676be2326e	netlink: annotate lockless accesses to nlk->max_recvmsg_len JIRA: https://issues.redhat.com/browse/RHEL-30344 commit a1865f2e7d10dde00d35a2122b38d2e469ae67ed Author: Eric Dumazet <edumazet@google.com> Date: Mon Apr 3 21:46:43 2023 +0000 netlink: annotate lockless accesses to nlk->max_recvmsg_len syzbot reported a data-race in data-race in netlink_recvmsg() [1] Indeed, netlink_recvmsg() can be run concurrently, and netlink_dump() also needs protection. [1] BUG: KCSAN: data-race in netlink_recvmsg / netlink_recvmsg read to 0xffff888141840b38 of 8 bytes by task 23057 on cpu 0: netlink_recvmsg+0xea/0x730 net/netlink/af_netlink.c:1988 sock_recvmsg_nosec net/socket.c:1017 [inline] sock_recvmsg net/socket.c:1038 [inline] __sys_recvfrom+0x1ee/0x2e0 net/socket.c:2194 __do_sys_recvfrom net/socket.c:2212 [inline] __se_sys_recvfrom net/socket.c:2208 [inline] __x64_sys_recvfrom+0x78/0x90 net/socket.c:2208 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd write to 0xffff888141840b38 of 8 bytes by task 23037 on cpu 1: netlink_recvmsg+0x114/0x730 net/netlink/af_netlink.c:1989 sock_recvmsg_nosec net/socket.c:1017 [inline] sock_recvmsg net/socket.c:1038 [inline] ____sys_recvmsg+0x156/0x310 net/socket.c:2720 ___sys_recvmsg net/socket.c:2762 [inline] do_recvmmsg+0x2e5/0x710 net/socket.c:2856 __sys_recvmmsg net/socket.c:2935 [inline] __do_sys_recvmmsg net/socket.c:2958 [inline] __se_sys_recvmmsg net/socket.c:2951 [inline] __x64_sys_recvmmsg+0xe2/0x160 net/socket.c:2951 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0x0000000000000000 -> 0x0000000000001000 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 23037 Comm: syz-executor.2 Not tainted 6.3.0-rc4-syzkaller-00195-g5a57b48fdfcb #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023 Fixes: `9063e21fb0` ("netlink: autosize skb lengthes") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230403214643.768555-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Ivan Vecera <ivecera@redhat.com>	2024-04-02 11:15:41 +02:00

1 2 3 4 5 ...

731 Commits