Commit Graph

731 Commits

Author SHA1 Message Date
Benjamin Coddington 82eb8441b8 net: add a refcount tracker for kernel sockets
JIRA: https://issues.redhat.com/browse/RHEL-73723
Conflicts: the __netns_tracker_alloc interface has been updated upstream
b6d7c0eb2dcbd, but in RHEL the hunk for notrefcnt_tracker was not included
(See RHEL commit 3b0a87ad0e, RHEL-24101).  We merge it in here.  Also,
we've dropped the rds hunk, as that seems unmantained in RHEL and is missing
the path where that hunk should operate.

commit 0cafd77dcd032d1687efaba5598cf07bce85997f
Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Oct 20 23:20:18 2022 +0000

    net: add a refcount tracker for kernel sockets

    Commit ffa84b5ffb37 ("net: add netns refcount tracker to struct sock")
    added a tracker to sockets, but did not track kernel sockets.

    We still have syzbot reports hinting about netns being destroyed
    while some kernel TCP sockets had not been dismantled.

    This patch tracks kernel sockets, and adds a ref_tracker_dir_print()
    call to net_free() right before the netns is freed.

    Normally, each layer is responsible for properly releasing its
    kernel sockets before last call to net_free().

    This debugging facility is enabled with CONFIG_NET_NS_REFCNT_TRACKER=y

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Tested-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
2025-01-31 06:45:48 -05:00
Petr Oros c2f50d8bdf netlink: fix false positive warning in extack during dumps
JIRA: https://issues.redhat.com/browse/RHEL-57756

Upstream commit(s):
commit 3bf39fa849ab8ed52abb6715922e6102d3df9f97
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Tue Nov 19 14:44:31 2024 -0800

    netlink: fix false positive warning in extack during dumps

    Commit under fixes extended extack reporting to dumps.
    It works under normal conditions, because extack errors are
    usually reported during ->start() or the first ->dump(),
    it's quite rare that the dump starts okay but fails later.
    If the dump does fail later, however, the input skb will
    already have the initiating message pulled, so checking
    if bad attr falls within skb->data will fail.

    Switch the check to using nlh, which is always valid.

    syzbot found a way to hit that scenario by filling up
    the receive queue. In this case we initiate a dump
    but don't call ->dump() until there is read space for
    an skb.

    WARNING: CPU: 1 PID: 5845 at net/netlink/af_netlink.c:2210 netlink_ack_tlv_fill+0x1a8/0x560 net/netlink/af_netlink.c:2209
    RIP: 0010:netlink_ack_tlv_fill+0x1a8/0x560 net/netlink/af_netlink.c:2209
    Call Trace:
     <TASK>
     netlink_dump_done+0x513/0x970 net/netlink/af_netlink.c:2250
     netlink_dump+0x91f/0xe10 net/netlink/af_netlink.c:2351
     netlink_recvmsg+0x6bb/0x11d0 net/netlink/af_netlink.c:1983
     sock_recvmsg_nosec net/socket.c:1051 [inline]
     sock_recvmsg+0x22f/0x280 net/socket.c:1073
     __sys_recvfrom+0x246/0x3d0 net/socket.c:2267
     __do_sys_recvfrom net/socket.c:2285 [inline]
     __se_sys_recvfrom net/socket.c:2281 [inline]
     __x64_sys_recvfrom+0xde/0x100 net/socket.c:2281
     do_syscall_x64 arch/x86/entry/common.c:52 [inline]
     do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
     entry_SYSCALL_64_after_hwframe+0x77/0x7f
     RIP: 0033:0x7ff37dd17a79

    Reported-by: syzbot+d4373fa8042c06cefa84@syzkaller.appspotmail.com
    Fixes: 8af4f60472fc ("netlink: support all extack types in dumps")
    Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
    Link: https://patch.msgid.link/20241119224432.1713040-1-kuba@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-12-10 10:37:56 +01:00
Petr Oros d2063b54dd netlink: terminate outstanding dump on socket close
JIRA: https://issues.redhat.com/browse/RHEL-57756

CVE: CVE-2024-53140

Upstream commit(s):
commit 1904fb9ebf911441f90a68e96b22aa73e4410505
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Tue Nov 5 17:52:34 2024 -0800

    netlink: terminate outstanding dump on socket close

    Netlink supports iterative dumping of data. It provides the families
    the following ops:
     - start - (optional) kicks off the dumping process
     - dump  - actual dump helper, keeps getting called until it returns 0
     - done  - (optional) pairs with .start, can be used for cleanup
    The whole process is asynchronous and the repeated calls to .dump
    don't actually happen in a tight loop, but rather are triggered
    in response to recvmsg() on the socket.

    This gives the user full control over the dump, but also means that
    the user can close the socket without getting to the end of the dump.
    To make sure .start is always paired with .done we check if there
    is an ongoing dump before freeing the socket, and if so call .done.

    The complication is that sockets can get freed from BH and .done
    is allowed to sleep. So we use a workqueue to defer the call, when
    needed.

    Unfortunately this does not work correctly. What we defer is not
    the cleanup but rather releasing a reference on the socket.
    We have no guarantee that we own the last reference, if someone
    else holds the socket they may release it in BH and we're back
    to square one.

    The whole dance, however, appears to be unnecessary. Only the user
    can interact with dumps, so we can clean up when socket is closed.
    And close always happens in process context. Some async code may
    still access the socket after close, queue notification skbs to it etc.
    but no dumps can start, end or otherwise make progress.

    Delete the workqueue and flush the dump state directly from the release
    handler. Note that further cleanup is possible in -next, for instance
    we now always call .done before releasing the main module reference,
    so dump doesn't have to take a reference of its own.

    Reported-by: syzkaller <syzkaller@googlegroups.com>
    Fixes: ed5d7788a9 ("netlink: Do not schedule work from sk_destruct")
    Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Link: https://patch.msgid.link/20241106015235.2458807-1-kuba@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-12-10 10:37:56 +01:00
Petr Oros 485d2c677f genetlink: hold RCU in genlmsg_mcast()
JIRA: https://issues.redhat.com/browse/RHEL-57756

Upstream commit(s):
commit 56440d7ec28d60f8da3bfa09062b3368ff9b16db
Author: Eric Dumazet <edumazet@google.com>
Date:   Fri Oct 11 17:12:17 2024 +0000

    genetlink: hold RCU in genlmsg_mcast()

    While running net selftests with CONFIG_PROVE_RCU_LIST=y I saw
    one lockdep splat [1].

    genlmsg_mcast() uses for_each_net_rcu(), and must therefore hold RCU.

    Instead of letting all callers guard genlmsg_multicast_allns()
    with a rcu_read_lock()/rcu_read_unlock() pair, do it in genlmsg_mcast().

    This also means the @flags parameter is useless, we need to always use
    GFP_ATOMIC.

    [1]
    [10882.424136] =============================
    [10882.424166] WARNING: suspicious RCU usage
    [10882.424309] 6.12.0-rc2-virtme #1156 Not tainted
    [10882.424400] -----------------------------
    [10882.424423] net/netlink/genetlink.c:1940 RCU-list traversed in non-reader section!!
    [10882.424469]
    other info that might help us debug this:

    [10882.424500]
    rcu_scheduler_active = 2, debug_locks = 1
    [10882.424744] 2 locks held by ip/15677:
    [10882.424791] #0: ffffffffb6b491b0 (cb_lock){++++}-{3:3}, at: genl_rcv (net/netlink/genetlink.c:1219)
    [10882.426334] #1: ffffffffb6b49248 (genl_mutex){+.+.}-{3:3}, at: genl_rcv_msg (net/netlink/genetlink.c:61 net/netlink/genetlink.c:57 net/netlink/genetlink.c:1209)
    [10882.426465]
    stack backtrace:
    [10882.426805] CPU: 14 UID: 0 PID: 15677 Comm: ip Not tainted 6.12.0-rc2-virtme #1156
    [10882.426919] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
    [10882.427046] Call Trace:
    [10882.427131]  <TASK>
    [10882.427244] dump_stack_lvl (lib/dump_stack.c:123)
    [10882.427335] lockdep_rcu_suspicious (kernel/locking/lockdep.c:6822)
    [10882.427387] genlmsg_multicast_allns (net/netlink/genetlink.c:1940 (discriminator 7) net/netlink/genetlink.c:1977 (discriminator 7))
    [10882.427436] l2tp_tunnel_notify.constprop.0 (net/l2tp/l2tp_netlink.c:119) l2tp_netlink
    [10882.427683] l2tp_nl_cmd_tunnel_create (net/l2tp/l2tp_netlink.c:253) l2tp_netlink
    [10882.427748] genl_family_rcv_msg_doit (net/netlink/genetlink.c:1115)
    [10882.427834] genl_rcv_msg (net/netlink/genetlink.c:1195 net/netlink/genetlink.c:1210)
    [10882.427877] ? __pfx_l2tp_nl_cmd_tunnel_create (net/l2tp/l2tp_netlink.c:186) l2tp_netlink
    [10882.427927] ? __pfx_genl_rcv_msg (net/netlink/genetlink.c:1201)
    [10882.427959] netlink_rcv_skb (net/netlink/af_netlink.c:2551)
    [10882.428069] genl_rcv (net/netlink/genetlink.c:1220)
    [10882.428095] netlink_unicast (net/netlink/af_netlink.c:1332 net/netlink/af_netlink.c:1357)
    [10882.428140] netlink_sendmsg (net/netlink/af_netlink.c:1901)
    [10882.428210] ____sys_sendmsg (net/socket.c:729 (discriminator 1) net/socket.c:744 (discriminator 1) net/socket.c:2607 (discriminator 1))

    Fixes: 33f72e6f0c ("l2tp : multicast notification to the registered listeners")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: James Chapman <jchapman@katalix.com>
    Cc: Tom Parkin <tparkin@katalix.com>
    Cc: Johannes Berg <johannes.berg@intel.com>
    Link: https://patch.msgid.link/20241011171217.3166614-1-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-12-10 10:37:56 +01:00
Petr Oros 3b136d4a05 net: netlink: Remove the dump_cb_mutex field from struct netlink_sock
JIRA: https://issues.redhat.com/browse/RHEL-57756

Upstream commit(s):
commit 18aaa82bd36ae3d4eaa3f1d1d8cf643e39f151cd
Author: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Date:   Thu Aug 22 09:03:20 2024 +0200

    net: netlink: Remove the dump_cb_mutex field from struct netlink_sock

    Commit 5fbf57a937f4 ("net: netlink: remove the cb_mutex "injection" from
    netlink core") has removed the usage of the 'dump_cb_mutex' field from the
    struct netlink_sock.

    Remove the field itself now. It saves a few bytes in the structure.

    Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-12-10 10:37:56 +01:00
Petr Oros 210b1b6518 net: netlink: remove the cb_mutex "injection" from netlink core
JIRA: https://issues.redhat.com/browse/RHEL-57756

Upstream commit(s):
commit 5fbf57a937f418fe204f9dbb7735e91984f4ee6a
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Thu Jun 6 12:29:06 2024 -0700

    net: netlink: remove the cb_mutex "injection" from netlink core

    Back in 2007, in commit af65bdfce9 ("[NETLINK]: Switch cb_lock spinlock
    to mutex and allow to override it") netlink core was extended to allow
    subsystems to replace the dump mutex lock with its own lock.

    The mechanism was used by rtnetlink to take rtnl_lock but it isn't
    sufficiently flexible for other users. Over the 17 years since
    it was added no other user appeared. Since rtnetlink needs conditional
    locking now, and doesn't use it either, axe this feature complete.

    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-12-10 10:37:55 +01:00
Petr Oros c976657153 rtnetlink: move rtnl_lock handling out of af_netlink
JIRA: https://issues.redhat.com/browse/RHEL-57756

Upstream commit(s):
commit 5380d64f8d766576ac5c0f627418b2d0e1d2641f
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Thu Jun 6 12:29:05 2024 -0700

    rtnetlink: move rtnl_lock handling out of af_netlink

    Now that we have an intermediate layer of code for handling
    rtnl-level netlink dump quirks, we can move the rtnl_lock
    taking there.

    For dump handlers with RTNL_FLAG_DUMP_SPLIT_NLM_DONE we can
    avoid taking rtnl_lock just to generate NLM_DONE, once again.

    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-12-10 10:37:55 +01:00
Petr Oros 79e50be9d3 netlink: support all extack types in dumps
JIRA: https://issues.redhat.com/browse/RHEL-57756

Upstream commit(s):
commit 8af4f60472fce1f22db5068107b37bcc1a65eabd
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Fri Apr 19 19:35:41 2024 -0700

    netlink: support all extack types in dumps

    Note that when this commit message refers to netlink dump
    it only means the actual dumping part, the parsing / dump
    start is handled by the same code as "doit".

    Commit 4a19edb60d ("netlink: Pass extack to dump handlers")
    added support for returning extack messages from dump handlers,
    but left out other extack info, e.g. bad attribute.

    This used to be fine because until YNL we had little practical
    use for the machine readable attributes, and only messages were
    used in practice.

    YNL flips the preference 180 degrees, it's now much more useful
    to point to a bad attr with NL_SET_BAD_ATTR() than type
    an English message saying "attribute XYZ is $reason-why-bad".

    Support all of extack. The fact that extack only gets added if
    it fits remains unaddressed.

    Reviewed-by: David Ahern <dsahern@kernel.org>
    Link: https://lore.kernel.org/r/20240420023543.3300306-4-kuba@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-12-10 10:37:53 +01:00
Petr Oros 8d84730795 netlink: move extack writing helpers
JIRA: https://issues.redhat.com/browse/RHEL-57756

Upstream commit(s):
commit 652332e3f1d6209dab372e0dfc7a5bbe209bf698
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Fri Apr 19 19:35:40 2024 -0700

    netlink: move extack writing helpers

    Next change will need them in netlink_dump_done(), pure move.

    Reviewed-by: David Ahern <dsahern@kernel.org>
    Link: https://lore.kernel.org/r/20240420023543.3300306-3-kuba@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-12-10 10:37:53 +01:00
Petr Oros 388833a034 netlink: create a new header for internal genetlink symbols
JIRA: https://issues.redhat.com/browse/RHEL-57756

Upstream commit(s):
commit 5bc63d3a6f466add504f283d9f743f20ca9ec334
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Fri Mar 29 10:57:08 2024 -0700

    netlink: create a new header for internal genetlink symbols

    There are things in linux/genetlink.h which are only used
    under net/netlink/. Move them to a new local header.
    A new header with just 2 externs isn't great, but alternative
    would be to include af_netlink.h in genetlink.c which feels
    even worse.

    Link: https://lore.kernel.org/r/20240329175710.291749-2-kuba@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-12-10 10:37:53 +01:00
Petr Oros ebe2c68d7c net/netlink: Add getsockopt support for NETLINK_LISTEN_ALL_NSID
JIRA: https://issues.redhat.com/browse/RHEL-57755

Upstream commit(s):
commit 8b6d307f4391c20cfd76bbb15f8b3784d36e0755
Author: Juntong Deng <juntong.deng@outlook.com>
Date:   Fri Mar 8 11:33:04 2024 +0000

    net/netlink: Add getsockopt support for NETLINK_LISTEN_ALL_NSID

    Currently getsockopt does not support NETLINK_LISTEN_ALL_NSID,
    and we are unable to get the value of NETLINK_LISTEN_ALL_NSID
    socket option through getsockopt.

    This patch adds getsockopt support for NETLINK_LISTEN_ALL_NSID.

    Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
    Link: https://lore.kernel.org/r/AM6PR03MB58482322B7B335308DA56FE599272@AM6PR03MB5848.eurprd03.prod.outlook.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-11-20 10:13:46 +01:00
Petr Oros a6919b30a1 genetlink: fit NLMSG_DONE into same read() as families
JIRA: https://issues.redhat.com/browse/RHEL-57755

Upstream commit(s):
commit 87d381973e49404f658d6923a617932eeda9415f
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Sat Mar 2 21:24:08 2024 -0800

    genetlink: fit NLMSG_DONE into same read() as families

    Make sure ctrl_fill_info() returns sensible error codes and
    propagate them out to netlink core. Let netlink core decide
    when to return skb->len and when to treat the exit as an
    error. Netlink core does better job at it, if we always
    return skb->len the core doesn't know when we're done
    dumping and NLMSG_DONE ends up in a separate read().

    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Reviewed-by: Ido Schimmel <idosch@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-11-20 10:13:44 +01:00
Petr Oros a811b8aeb9 netlink: handle EMSGSIZE errors in the core
JIRA: https://issues.redhat.com/browse/RHEL-57755

Upstream commit(s):
commit b5a899154aa94cc573db3ae1f61dabe7bfe8b579
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Sat Mar 2 21:24:06 2024 -0800

    netlink: handle EMSGSIZE errors in the core

    Eric points out that our current suggested way of handling
    EMSGSIZE errors ((err == -EMSGSIZE) ? skb->len : err) will
    break if we didn't fit even a single object into the buffer
    provided by the user. This should not happen for well behaved
    applications, but we can fix that, and free netlink families
    from dealing with that completely by moving error handling
    into the core.

    Let's assume from now on that all EMSGSIZE errors in dumps are
    because we run out of skb space. Families can now propagate
    the error nla_put_*() etc generated and not worry about any
    return value magic. If some family really wants to send EMSGSIZE
    to user space, assuming it generates the same error on the next
    dump iteration the skb->len should be 0, and user space should
    still see the EMSGSIZE.

    This should simplify families and prevent mistakes in return
    values which lead to DONE being forced into a separate recv()
    call as discovered by Ido some time ago.

    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Reviewed-by: Ido Schimmel <idosch@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-11-20 10:13:44 +01:00
Rado Vrbovsky 88c4382a10 Merge: thermal/intel_hfi: update to upstream v6.12
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5328

JIRA: https://issues.redhat.com/browse/RHEL-20130      
    
Signed-off-by: David Arcari <darcari@redhat.com>

Approved-by: Andrew Halaney <ahalaney@redhat.com>
Approved-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Approved-by: Lenny Szubowicz <lszubowi@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2024-11-19 13:50:47 +00:00
Rado Vrbovsky d639f49798 Merge: CVE-2024-50024: net: Fix an unsafe loop on the list
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5544

JIRA: https://issues.redhat.com/browse/RHEL-63844  
CVE: CVE-2024-50024

```
net: Fix an unsafe loop on the list

The kernel may crash when deleting a genetlink family if there are still
listeners for that family:

Oops: Kernel access of bad area, sig: 11 [#1]
  ...
  NIP [c000000000c080bc] netlink_update_socket_mc+0x3c/0xc0
  LR [c000000000c0f764] __netlink_clear_multicast_users+0x74/0xc0
  Call Trace:
__netlink_clear_multicast_users+0x74/0xc0
genl_unregister_family+0xd4/0x2d0

Change the unsafe loop on the list to a safe one, because inside the
loop there is an element removal from this list.

Fixes: b8273570f8 ("genetlink: fix netns vs. netlink table locking (2)")
Cc: stable@vger.kernel.org
Signed-off-by: Anastasia Kovaleva <a.kovaleva@yadro.com>
Reviewed-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20241003104431.12391-1-a.kovaleva@yadro.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit 1dae9f1187189bc09ff6d25ca97ead711f7e26f9)
```

Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>

---

<small>Created 2024-10-22 12:45 UTC by backporter - [KWF FAQ](https://red.ht/kernel_workflow_doc) - [Slack #team-kernel-workflow](https://redhat-internal.slack.com/archives/C04LRUPMJQ5) - [Source](https://gitlab.com/cki-project/kernel-workflow/-/blob/main/webhook/utils/backporter.py) - [Documentation](https://gitlab.com/cki-project/kernel-workflow/-/blob/main/docs/README.backporter.md) - [Report an issue](https://gitlab.com/cki-project/kernel-workflow/-/issues/new?issue%5Btitle%5D=backporter%20webhook%20issue)</small>

Approved-by: Hangbin Liu <haliu@redhat.com>
Approved-by: Xin Long <lxin@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2024-11-11 08:40:52 +00:00
David Arcari db4a2096ba genetlink: Add per family bind/unbind callbacks
JIRA: https://issues.redhat.com/browse/RHEL-20130

commit 3de21a8990d3c2cc507e9cc4ed00f36358d5b93e
Author: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Date:   Mon Feb 12 17:16:13 2024 +0100

    genetlink: Add per family bind/unbind callbacks

    Add genetlink family bind()/unbind() callbacks when adding/removing
    multicast group to/from netlink client socket via setsockopt() or
    bind() syscall.

    They can be used to track if consumers of netlink multicast messages
    emerge or disappear. Thus, a client implementing callbacks, can now
    send events only when there are active consumers, preventing unnecessary
    work when none exist.

    Suggested-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Link: https://lore.kernel.org/r/20240212161615.161935-2-stanislaw.gruszka@linux.intel.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: David Arcari <darcari@redhat.com>
2024-10-25 14:16:36 -04:00
Ivan Vecera 8da88cd9da rtnetlink: add RTNL_FLAG_DUMP_UNLOCKED flag
JIRA: https://issues.redhat.com/browse/RHEL-62123

commit 386520e0ecc01004d3a29c70c5a77d4bbf8a8420
Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Feb 22 10:50:15 2024 +0000

    rtnetlink: add RTNL_FLAG_DUMP_UNLOCKED flag

    Similarly to RTNL_FLAG_DOIT_UNLOCKED, this new flag
    allows dump operations registered via rtnl_register()
    or rtnl_register_module() to opt-out from RTNL protection.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-10-24 16:14:43 +02:00
Ivan Vecera 2240d90442 rtnetlink: change nlk->cb_mutex role
JIRA: https://issues.redhat.com/browse/RHEL-62123

commit e39951d965bf58b5aba7f61dc1140dcb8271af22
Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Feb 22 10:50:14 2024 +0000

    rtnetlink: change nlk->cb_mutex role

    In commit af65bdfce9 ("[NETLINK]: Switch cb_lock spinlock
    to mutex and allow to override it"), Patrick McHardy used
    a common mutex to protect both nlk->cb and the dump() operations.

    The override is used for rtnl dumps, registered with
    rntl_register() and rntl_register_module().

    We want to be able to opt-out some dump() operations
    to not acquire RTNL, so we need to protect nlk->cb
    with a per socket mutex.

    This patch renames nlk->cb_def_mutex to nlk->nl_cb_mutex

    The optional pointer to the mutex used to protect dump()
    call is stored in nlk->dump_cb_mutex

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-10-24 16:14:43 +02:00
Ivan Vecera 25c3484119 netlink: hold nlk->cb_mutex longer in __netlink_dump_start()
JIRA: https://issues.redhat.com/browse/RHEL-62123

commit b5590270068c4324dac4a2b5a4a156e02e21339f
Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Feb 22 10:50:13 2024 +0000

    netlink: hold nlk->cb_mutex longer in __netlink_dump_start()

    __netlink_dump_start() releases nlk->cb_mutex right before
    calling netlink_dump() which grabs it again.

    This seems dangerous, even if KASAN did not bother yet.

    Add a @lock_taken parameter to netlink_dump() to let it
    grab the mutex if called from netlink_recvmsg() only.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-10-24 16:14:43 +02:00
Ivan Vecera e4d9647420 netlink: fix netlink_diag_dump() return value
JIRA: https://issues.redhat.com/browse/RHEL-62123

commit 6647b338fc5c6741736fe51a25fc2c0bec6398b8
Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Feb 22 10:50:12 2024 +0000

    netlink: fix netlink_diag_dump() return value

    __netlink_diag_dump() returns 1 if the dump is not complete,
    zero if no error occurred.

    If err variable is zero, this means the dump is complete:
    We should not return skb->len in this case, but 0.

    This allows NLMSG_DONE to be appended to the skb.
    User space does not have to call us again only to get NLMSG_DONE.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-10-24 16:14:43 +02:00
CKI Backport Bot 2bfc0a91e9 net: Fix an unsafe loop on the list
JIRA: https://issues.redhat.com/browse/RHEL-63844
CVE: CVE-2024-50024

commit 1dae9f1187189bc09ff6d25ca97ead711f7e26f9
Author: Anastasia Kovaleva <a.kovaleva@yadro.com>
Date:   Thu Oct 3 13:44:31 2024 +0300

    net: Fix an unsafe loop on the list

    The kernel may crash when deleting a genetlink family if there are still
    listeners for that family:

    Oops: Kernel access of bad area, sig: 11 [#1]
      ...
      NIP [c000000000c080bc] netlink_update_socket_mc+0x3c/0xc0
      LR [c000000000c0f764] __netlink_clear_multicast_users+0x74/0xc0
      Call Trace:
    __netlink_clear_multicast_users+0x74/0xc0
    genl_unregister_family+0xd4/0x2d0

    Change the unsafe loop on the list to a safe one, because inside the
    loop there is an element removal from this list.

    Fixes: b8273570f8 ("genetlink: fix netns vs. netlink table locking (2)")
    Cc: stable@vger.kernel.org
    Signed-off-by: Anastasia Kovaleva <a.kovaleva@yadro.com>
    Reviewed-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
    Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Link: https://patch.msgid.link/20241003104431.12391-1-a.kovaleva@yadro.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>
2024-10-22 12:45:46 +00:00
Petr Oros 14b592f444 netlink: use kvmalloc() in netlink_alloc_large_skb()
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit f8cbf6bde4c8d5d32330bcceafa7b139fec89f97
Author: Eric Dumazet <edumazet@google.com>
Date:   Sat Feb 24 09:06:30 2024 +0000

    netlink: use kvmalloc() in netlink_alloc_large_skb()

    This is a followup of commit 234ec0b6034b ("netlink: fix potential
    sleeping issue in mqueue_flush_file"), because vfree_atomic()
    overhead is unfortunate for medium sized allocations.

    1) If the allocation is smaller than PAGE_SIZE, do not bother
       with vmalloc() at all. Some arches have 64KB PAGE_SIZE,
       while NLMSG_GOODSIZE is smaller than 8KB.

    2) Use kvmalloc(), which might allocate one high order page
       instead of vmalloc if memory is not too fragmented.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Zhengchao Shao <shaozhengchao@huawei.com>
    Link: https://lore.kernel.org/r/20240224090630.605917-1-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-30 12:47:44 +02:00
Petr Oros 7a9f656ed8 netlink: Fix kernel-infoleak-after-free in __skb_datagram_iter
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit 661779e1fcafe1b74b3f3fe8e980c1e207fea1fd
Author: Ryosuke Yasuoka <ryasuoka@redhat.com>
Date:   Wed Feb 21 16:40:48 2024 +0900

    netlink: Fix kernel-infoleak-after-free in __skb_datagram_iter

    syzbot reported the following uninit-value access issue [1]:

    netlink_to_full_skb() creates a new `skb` and puts the `skb->data`
    passed as a 1st arg of netlink_to_full_skb() onto new `skb`. The data
    size is specified as `len` and passed to skb_put_data(). This `len`
    is based on `skb->end` that is not data offset but buffer offset. The
    `skb->end` contains data and tailroom. Since the tailroom is not
    initialized when the new `skb` created, KMSAN detects uninitialized
    memory area when copying the data.

    This patch resolved this issue by correct the len from `skb->end` to
    `skb->len`, which is the actual data offset.

    BUG: KMSAN: kernel-infoleak-after-free in instrument_copy_to_user include/linux/instrumented.h:114 [inline]
    BUG: KMSAN: kernel-infoleak-after-free in copy_to_user_iter lib/iov_iter.c:24 [inline]
    BUG: KMSAN: kernel-infoleak-after-free in iterate_ubuf include/linux/iov_iter.h:29 [inline]
    BUG: KMSAN: kernel-infoleak-after-free in iterate_and_advance2 include/linux/iov_iter.h:245 [inline]
    BUG: KMSAN: kernel-infoleak-after-free in iterate_and_advance include/linux/iov_iter.h:271 [inline]
    BUG: KMSAN: kernel-infoleak-after-free in _copy_to_iter+0x364/0x2520 lib/iov_iter.c:186
     instrument_copy_to_user include/linux/instrumented.h:114 [inline]
     copy_to_user_iter lib/iov_iter.c:24 [inline]
     iterate_ubuf include/linux/iov_iter.h:29 [inline]
     iterate_and_advance2 include/linux/iov_iter.h:245 [inline]
     iterate_and_advance include/linux/iov_iter.h:271 [inline]
     _copy_to_iter+0x364/0x2520 lib/iov_iter.c:186
     copy_to_iter include/linux/uio.h:197 [inline]
     simple_copy_to_iter+0x68/0xa0 net/core/datagram.c:532
     __skb_datagram_iter+0x123/0xdc0 net/core/datagram.c:420
     skb_copy_datagram_iter+0x5c/0x200 net/core/datagram.c:546
     skb_copy_datagram_msg include/linux/skbuff.h:3960 [inline]
     packet_recvmsg+0xd9c/0x2000 net/packet/af_packet.c:3482
     sock_recvmsg_nosec net/socket.c:1044 [inline]
     sock_recvmsg net/socket.c:1066 [inline]
     sock_read_iter+0x467/0x580 net/socket.c:1136
     call_read_iter include/linux/fs.h:2014 [inline]
     new_sync_read fs/read_write.c:389 [inline]
     vfs_read+0x8f6/0xe00 fs/read_write.c:470
     ksys_read+0x20f/0x4c0 fs/read_write.c:613
     __do_sys_read fs/read_write.c:623 [inline]
     __se_sys_read fs/read_write.c:621 [inline]
     __x64_sys_read+0x93/0xd0 fs/read_write.c:621
     do_syscall_x64 arch/x86/entry/common.c:52 [inline]
     do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83
     entry_SYSCALL_64_after_hwframe+0x63/0x6b

    Uninit was stored to memory at:
     skb_put_data include/linux/skbuff.h:2622 [inline]
     netlink_to_full_skb net/netlink/af_netlink.c:181 [inline]
     __netlink_deliver_tap_skb net/netlink/af_netlink.c:298 [inline]
     __netlink_deliver_tap+0x5be/0xc90 net/netlink/af_netlink.c:325
     netlink_deliver_tap net/netlink/af_netlink.c:338 [inline]
     netlink_deliver_tap_kernel net/netlink/af_netlink.c:347 [inline]
     netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline]
     netlink_unicast+0x10f1/0x1250 net/netlink/af_netlink.c:1368
     netlink_sendmsg+0x1238/0x13d0 net/netlink/af_netlink.c:1910
     sock_sendmsg_nosec net/socket.c:730 [inline]
     __sock_sendmsg net/socket.c:745 [inline]
     ____sys_sendmsg+0x9c2/0xd60 net/socket.c:2584
     ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2638
     __sys_sendmsg net/socket.c:2667 [inline]
     __do_sys_sendmsg net/socket.c:2676 [inline]
     __se_sys_sendmsg net/socket.c:2674 [inline]
     __x64_sys_sendmsg+0x307/0x490 net/socket.c:2674
     do_syscall_x64 arch/x86/entry/common.c:52 [inline]
     do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83
     entry_SYSCALL_64_after_hwframe+0x63/0x6b

    Uninit was created at:
     free_pages_prepare mm/page_alloc.c:1087 [inline]
     free_unref_page_prepare+0xb0/0xa40 mm/page_alloc.c:2347
     free_unref_page_list+0xeb/0x1100 mm/page_alloc.c:2533
     release_pages+0x23d3/0x2410 mm/swap.c:1042
     free_pages_and_swap_cache+0xd9/0xf0 mm/swap_state.c:316
     tlb_batch_pages_flush mm/mmu_gather.c:98 [inline]
     tlb_flush_mmu_free mm/mmu_gather.c:293 [inline]
     tlb_flush_mmu+0x6f5/0x980 mm/mmu_gather.c:300
     tlb_finish_mmu+0x101/0x260 mm/mmu_gather.c:392
     exit_mmap+0x49e/0xd30 mm/mmap.c:3321
     __mmput+0x13f/0x530 kernel/fork.c:1349
     mmput+0x8a/0xa0 kernel/fork.c:1371
     exit_mm+0x1b8/0x360 kernel/exit.c:567
     do_exit+0xd57/0x4080 kernel/exit.c:858
     do_group_exit+0x2fd/0x390 kernel/exit.c:1021
     __do_sys_exit_group kernel/exit.c:1032 [inline]
     __se_sys_exit_group kernel/exit.c:1030 [inline]
     __x64_sys_exit_group+0x3c/0x50 kernel/exit.c:1030
     do_syscall_x64 arch/x86/entry/common.c:52 [inline]
     do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83
     entry_SYSCALL_64_after_hwframe+0x63/0x6b

    Bytes 3852-3903 of 3904 are uninitialized
    Memory access of size 3904 starts at ffff88812ea1e000
    Data copied to user address 0000000020003280

    CPU: 1 PID: 5043 Comm: syz-executor297 Not tainted 6.7.0-rc5-syzkaller-00047-g5bd7ef53ffe5 #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023

    Fixes: 1853c94964 ("netlink, mmap: transform mmap skb into full skb on taps")
    Reported-and-tested-by: syzbot+34ad5fab48f7bf510349@syzkaller.appspotmail.com
    Closes: https://syzkaller.appspot.com/bug?extid=34ad5fab48f7bf510349 [1]
    Signed-off-by: Ryosuke Yasuoka <ryasuoka@redhat.com>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Link: https://lore.kernel.org/r/20240221074053.1794118-1-ryasuoka@redhat.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:11 +02:00
Petr Oros 991a5ee575 netlink: fix potential sleeping issue in mqueue_flush_file
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit 234ec0b6034b16869d45128b8cd2dc6ffe596f04
Author: Zhengchao Shao <shaozhengchao@huawei.com>
Date:   Mon Jan 22 09:18:07 2024 +0800

    netlink: fix potential sleeping issue in mqueue_flush_file

    I analyze the potential sleeping issue of the following processes:
    Thread A                                Thread B
    ...                                     netlink_create  //ref = 1
    do_mq_notify                            ...
      sock = netlink_getsockbyfilp          ...     //ref = 2
      info->notify_sock = sock;             ...
    ...                                     netlink_sendmsg
    ...                                       skb = netlink_alloc_large_skb  //skb->head is vmalloced
    ...                                       netlink_unicast
    ...                                         sk = netlink_getsockbyportid //ref = 3
    ...                                         netlink_sendskb
    ...                                           __netlink_sendskb
    ...                                             skb_queue_tail //put skb to sk_receive_queue
    ...                                         sock_put //ref = 2
    ...                                     ...
    ...                                     netlink_release
    ...                                       deferred_put_nlk_sk //ref = 1
    mqueue_flush_file
      spin_lock
      remove_notification
        netlink_sendskb
          sock_put  //ref = 0
            sk_free
              ...
              __sk_destruct
                netlink_sock_destruct
                  skb_queue_purge  //get skb from sk_receive_queue
                    ...
                    __skb_queue_purge_reason
                      kfree_skb_reason
                        __kfree_skb
                        ...
                        skb_release_all
                          skb_release_head_state
                            netlink_skb_destructor
                              vfree(skb->head)  //sleeping while holding spinlock

    In netlink_sendmsg, if the memory pointed to by skb->head is allocated by
    vmalloc, and is put to sk_receive_queue queue, also the skb is not freed.
    When the mqueue executes flush, the sleeping bug will occur. Use
    vfree_atomic instead of vfree in netlink_skb_destructor to solve the issue.

    Fixes: c05cdb1b86 ("netlink: allow large data transfers from user-space")
    Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
    Link: https://lore.kernel.org/r/20240122011807.2110357-1-shaozhengchao@huawei.com
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:11 +02:00
Petr Oros 91f18bfb6c genetlink: Use internal flags for multicast groups
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit cd4d7263d58ab98fd4dee876776e4da6c328faa3
Author: Ido Schimmel <idosch@nvidia.com>
Date:   Wed Dec 20 17:43:58 2023 +0200

    genetlink: Use internal flags for multicast groups

    As explained in commit e03781879a0d ("drop_monitor: Require
    'CAP_SYS_ADMIN' when joining "events" group"), the "flags" field in the
    multicast group structure reuses uAPI flags despite the field not being
    exposed to user space. This makes it impossible to extend its use
    without adding new uAPI flags, which is inappropriate for internal
    kernel checks.

    Solve this by adding internal flags (i.e., "GENL_MCAST_*") and convert
    the existing users to use them instead of the uAPI flags.

    Tested using the reproducers in commit 44ec98ea5ea9 ("psample: Require
    'CAP_NET_ADMIN' when joining "packets" group") and commit e03781879a0d
    ("drop_monitor: Require 'CAP_SYS_ADMIN' when joining "events" group").

    No functional changes intended.

    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:10 +02:00
Petr Oros d60d7ef2e0 netlink: introduce typedef for filter function
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit 403863e985e8eba608d53b2907caaf37b6176290
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Sat Dec 16 13:29:58 2023 +0100

    netlink: introduce typedef for filter function

    Make the code using filter function a bit nicer by consolidating the
    filter function arguments using typedef.

    Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:10 +02:00
Petr Oros 54dcfdcba1 genetlink: introduce per-sock family private storage
JIRA: https://issues.redhat.com/browse/RHEL-30145

Conflicts:
- adjusted context conflict due to 4760ca4bff ("net: add reserved
  fields to genl_family")

Upstream commit(s):
commit a731132424adeda4d5383ef61afae2e804063fb7
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Sat Dec 16 13:29:57 2023 +0100

    genetlink: introduce per-sock family private storage

    Introduce an xarray for Generic netlink family to store per-socket
    private. Initialize this xarray only if family uses per-socket privs.

    Introduce genl_sk_priv_get() to get the socket priv pointer for a family
    and initialize it in case it does not exist.
    Introduce __genl_sk_priv_get() to obtain socket priv pointer for a
    family under RCU read lock.

    Allow family to specify the priv size, init() and destroy() callbacks.

    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:10 +02:00
Petr Oros 147089bf66 rtnetlink: introduce nlmsg_new_large and use it in rtnl_getlink
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit ac40916a3f7243efbe6e129ebf495b5c33a3adfe
Author: Li RongQing <lirongqing@baidu.com>
Date:   Wed Nov 15 20:01:08 2023 +0800

    rtnetlink: introduce nlmsg_new_large and use it in rtnl_getlink

    if a PF has 256 or more VFs, ip link command will allocate an order 3
    memory or more, and maybe trigger OOM due to memory fragment,
    the VFs needed memory size is computed in rtnl_vfinfo_size.

    so introduce nlmsg_new_large which calls netlink_alloc_large_skb in
    which vmalloc is used for large memory, to avoid the failure of
    allocating memory

        ip invoked oom-killer: gfp_mask=0xc2cc0(GFP_KERNEL|__GFP_NOWARN|\
            __GFP_COMP|__GFP_NOMEMALLOC), order=3, oom_score_adj=0
        CPU: 74 PID: 204414 Comm: ip Kdump: loaded Tainted: P           OE
        Call Trace:
        dump_stack+0x57/0x6a
        dump_header+0x4a/0x210
        oom_kill_process+0xe4/0x140
        out_of_memory+0x3e8/0x790
        __alloc_pages_slowpath.constprop.116+0x953/0xc50
        __alloc_pages_nodemask+0x2af/0x310
        kmalloc_large_node+0x38/0xf0
        __kmalloc_node_track_caller+0x417/0x4d0
        __kmalloc_reserve.isra.61+0x2e/0x80
        __alloc_skb+0x82/0x1c0
        rtnl_getlink+0x24f/0x370
        rtnetlink_rcv_msg+0x12c/0x350
        netlink_rcv_skb+0x50/0x100
        netlink_unicast+0x1b2/0x280
        netlink_sendmsg+0x355/0x4a0
        sock_sendmsg+0x5b/0x60
        ____sys_sendmsg+0x1ea/0x250
        ___sys_sendmsg+0x88/0xd0
        __sys_sendmsg+0x5e/0xa0
        do_syscall_64+0x33/0x40
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f95a65a5b70

    Cc: Yunsheng Lin <linyunsheng@huawei.com>
    Signed-off-by: Li RongQing <lirongqing@baidu.com>
    Link: https://lore.kernel.org/r/20231115120108.3711-1-lirongqing@baidu.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:06 +02:00
Petr Oros d16936e50a netlink: fill in missing MODULE_DESCRIPTION()
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit 016b9332a3346e97a6cacffea0f9dc10e1235a75
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Wed Nov 1 21:57:24 2023 -0700

    netlink: fill in missing MODULE_DESCRIPTION()

    W=1 builds now warn if a module is built without
    a MODULE_DESCRIPTION(). Fill it in for sock_diag.

    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:05 +02:00
Petr Oros bedce06bfb genetlink: don't merge dumpit split op for different cmds into single iter
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit f862ed2d0bf0cf51c28c1a69e3c2a1558d5a2978
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Sat Oct 21 13:27:02 2023 +0200

    genetlink: don't merge dumpit split op for different cmds into single iter

    Currently, split ops of doit and dumpit are merged into a single iter
    item when they are subsequent. However, there is no guarantee that the
    dumpit op is for the same cmd as doit op.

    Fix this by checking if cmd is the same for both.
    This problem does not occur in existing families.

    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
    Link: https://lore.kernel.org/r/20231021112711.660606-2-jiri@resnulli.us
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:03 +02:00
Petr Oros e942232c80 netlink: add variable-length / auto integers
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit 374d345d9b5e13380c66d7042f9533a6ac6d1195
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Wed Oct 18 14:39:20 2023 -0700

    netlink: add variable-length / auto integers

    We currently push everyone to use padding to align 64b values
    in netlink. Un-padded nla_put_u64() doesn't even exist any more.

    The story behind this possibly start with this thread:
    https://lore.kernel.org/netdev/20121204.130914.1457976839967676240.davem@davemloft.net/
    where DaveM was concerned about the alignment of a structure
    containing 64b stats. If user space tries to access such struct
    directly:

            struct some_stats *stats = nla_data(attr);
            printf("A: %llu", stats->a);

    lack of alignment may become problematic for some architectures.
    These days we most often put every single member in a separate
    attribute, meaning that the code above would use a helper like
    nla_get_u64(), which can deal with alignment internally.
    Even for arches which don't have good unaligned access - access
    aligned to 4B should be pretty efficient.
    Kernel and well known libraries deal with unaligned input already.

    Padded 64b is quite space-inefficient (64b + pad means at worst 16B
    per attr vs 32b which takes 8B). It is also more typing:

        if (nla_put_u64_pad(rsp, NETDEV_A_SOMETHING_SOMETHING,
                            value, NETDEV_A_SOMETHING_PAD))

    Create a new attribute type which will use 32 bits at netlink
    level if value is small enough (probably most of the time?),
    and (4B-aligned) 64 bits otherwise. Kernel API is just:

        if (nla_put_uint(rsp, NETDEV_A_SOMETHING_SOMETHING, value))

    Calling this new type "just" sint / uint with no specific size
    will hopefully also make people more comfortable with using it.
    Currently telling people "don't use u8, you may need the bits,
    and netlink will round up to 4B, anyway" is the #1 comment
    we give to newcomers.

    In terms of netlink layout it looks like this:

             0       4       8       12      16
    32b:     [nlattr][ u32  ]
    64b:     [  pad ][nlattr][     u64      ]
    uint(32) [nlattr][ u32  ]
    uint(64) [nlattr][     u64      ]

    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:03 +02:00
Petr Oros ea990c71f1 netlink: Annotate struct netlink_policy_dump_state with __counted_by
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit eaede99c3aeb38613c40a150f676f772faf2b42b
Author: Kees Cook <keescook@chromium.org>
Date:   Tue Oct 3 16:21:02 2023 -0700

    netlink: Annotate struct netlink_policy_dump_state with __counted_by

    Prepare for the coming implementation by GCC and Clang of the __counted_by
    attribute. Flexible array members annotated with __counted_by can have
    their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for
    array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
    functions).

    As found with Coccinelle[1], add __counted_by for struct netlink_policy_dump_state.

    Additionally update the size of the usage array length before accessing
    it. This requires remembering the old size for the memset() and later
    assignments.

    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Eric Dumazet <edumazet@google.com>
    Cc: Jakub Kicinski <kuba@kernel.org>
    Cc: Paolo Abeni <pabeni@redhat.com>
    Cc: Johannes Berg <johannes.berg@intel.com>
    Cc: netdev@vger.kernel.org
    Link: https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci [1]
    Signed-off-by: Kees Cook <keescook@chromium.org>
    Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:00 +02:00
Ivan Vecera 556894708a netlink: annotate data-races around sk->sk_err
JIRA: https://issues.redhat.com/browse/RHEL-30656

commit d0f95894fda7d4f895b29c1097f92d7fee278cb2
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Oct 3 18:34:55 2023 +0000

    netlink: annotate data-races around sk->sk_err

    syzbot caught another data-race in netlink when
    setting sk->sk_err.

    Annotate all of them for good measure.

    BUG: KCSAN: data-race in netlink_recvmsg / netlink_recvmsg

    write to 0xffff8881613bb220 of 4 bytes by task 28147 on cpu 0:
    netlink_recvmsg+0x448/0x780 net/netlink/af_netlink.c:1994
    sock_recvmsg_nosec net/socket.c:1027 [inline]
    sock_recvmsg net/socket.c:1049 [inline]
    __sys_recvfrom+0x1f4/0x2e0 net/socket.c:2229
    __do_sys_recvfrom net/socket.c:2247 [inline]
    __se_sys_recvfrom net/socket.c:2243 [inline]
    __x64_sys_recvfrom+0x78/0x90 net/socket.c:2243
    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
    entry_SYSCALL_64_after_hwframe+0x63/0xcd

    write to 0xffff8881613bb220 of 4 bytes by task 28146 on cpu 1:
    netlink_recvmsg+0x448/0x780 net/netlink/af_netlink.c:1994
    sock_recvmsg_nosec net/socket.c:1027 [inline]
    sock_recvmsg net/socket.c:1049 [inline]
    __sys_recvfrom+0x1f4/0x2e0 net/socket.c:2229
    __do_sys_recvfrom net/socket.c:2247 [inline]
    __se_sys_recvfrom net/socket.c:2243 [inline]
    __x64_sys_recvfrom+0x78/0x90 net/socket.c:2243
    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
    entry_SYSCALL_64_after_hwframe+0x63/0xcd

    value changed: 0x00000000 -> 0x00000016

    Reported by Kernel Concurrency Sanitizer on:
    CPU: 1 PID: 28146 Comm: syz-executor.0 Not tainted 6.6.0-rc3-syzkaller-00055-g9ed22ae6be81 #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/06/2023

    Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reported-by: syzbot <syzkaller@googlegroups.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Link: https://lore.kernel.org/r/20231003183455.3410550-1-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-04-10 09:19:34 +02:00
Ivan Vecera 51411db015 genetlink: add a family pointer to struct genl_info
JIRA: https://issues.redhat.com/browse/RHEL-30656

commit 5c670a010de46687ed27553602d8131ce4d7a9fb
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Mon Aug 14 14:47:19 2023 -0700

    genetlink: add a family pointer to struct genl_info

    Having family in struct genl_info is quite useful. It cuts
    down the number of arguments which need to be passed to
    helpers which already take struct genl_info.

    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Link: https://lore.kernel.org/r/20230814214723.2924989-7-kuba@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-04-10 09:19:30 +02:00
Ivan Vecera 821b40dbfa genetlink: use attrs from struct genl_info
JIRA: https://issues.redhat.com/browse/RHEL-30656

commit 7288dd2fd4888c85c687f8ded69c280938d1a7b6
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Mon Aug 14 14:47:18 2023 -0700

    genetlink: use attrs from struct genl_info

    Since dumps carry struct genl_info now, use the attrs pointer
    from genl_info and remove the one in struct genl_dumpit_info.

    Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
    Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Link: https://lore.kernel.org/r/20230814214723.2924989-6-kuba@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-04-10 09:19:30 +02:00
Ivan Vecera 5e510e57b4 genetlink: add struct genl_info to struct genl_dumpit_info
JIRA: https://issues.redhat.com/browse/RHEL-30656

commit 9272af109fe65d1a13f28c5c13777b62d3e97e8c
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Mon Aug 14 14:47:17 2023 -0700

    genetlink: add struct genl_info to struct genl_dumpit_info

    Netlink GET implementations must currently juggle struct genl_info
    and struct netlink_callback, depending on whether they were called
    from doit or dumpit.

    Add genl_info to the dump state and populate the fields.
    This way implementations can simply pass struct genl_info around.

    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Link: https://lore.kernel.org/r/20230814214723.2924989-5-kuba@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-04-10 09:19:30 +02:00
Ivan Vecera 25a5e1ea3a genetlink: remove userhdr from struct genl_info
JIRA: https://issues.redhat.com/browse/RHEL-30656

commit bffcc6882a1bb2be8c9420184966f4c2c822078e
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Mon Aug 14 14:47:16 2023 -0700

    genetlink: remove userhdr from struct genl_info

    Only three families use info->userhdr today and going forward
    we discourage using fixed headers in new families.
    So having the pointer to user header in struct genl_info
    is an overkill. Compute the header pointer at runtime.

    Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Aaron Conole <aconole@redhat.com>
    Link: https://lore.kernel.org/r/20230814214723.2924989-4-kuba@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-04-10 09:19:30 +02:00
Ivan Vecera 8a3425c399 genetlink: push conditional locking into dumpit/done
JIRA: https://issues.redhat.com/browse/RHEL-30656

commit 84817d8c6042e6261ea45c53fe8b5a0bd55c3993
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Mon Aug 14 14:47:14 2023 -0700

    genetlink: push conditional locking into dumpit/done

    Add helpers which take/release the genl mutex based
    on family->parallel_ops. Remove the separation between
    handling of ops in locked and parallel families.

    Future patches would make the duplicated code grow even more.

    Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Link: https://lore.kernel.org/r/20230814214723.2924989-2-kuba@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-04-10 09:19:30 +02:00
Ivan Vecera 374f808933 netlink: convert nlk->flags to atomic flags
JIRA: https://issues.redhat.com/browse/RHEL-30656

commit 8fe08d70a2b61b35a0a1235c78cf321e7528351f
Author: Eric Dumazet <edumazet@google.com>
Date:   Fri Aug 11 07:22:26 2023 +0000

    netlink: convert nlk->flags to atomic flags

    sk_diag_put_flags(), netlink_setsockopt(), netlink_getsockopt()
    and others use nlk->flags without correct locking.

    Use set_bit(), clear_bit(), test_bit(), assign_bit() to remove
    data-races.

    Reported-by: syzbot <syzkaller@googlegroups.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-04-10 09:19:29 +02:00
Ivan Vecera c6218efc01 netlink: Add new netlink_release function
JIRA: https://issues.redhat.com/browse/RHEL-30656

commit a4c9a56e6a2cdeeab7caef1f496b7bfefd95b50e
Author: Anjali Kulkarni <anjali.k.kulkarni@oracle.com>
Date:   Wed Jul 19 13:18:17 2023 -0700

    netlink: Add new netlink_release function

    A new function netlink_release is added in netlink_sock to store the
    protocol's release function. This is called when the socket is deleted.
    This can be supplied by the protocol via the release function in
    netlink_kernel_cfg. This is being added for the NETLINK_CONNECTOR
    protocol, so it can free it's data when socket is deleted.

    Signed-off-by: Anjali Kulkarni <anjali.k.kulkarni@oracle.com>
    Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
    Acked-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-04-10 09:19:27 +02:00
Ivan Vecera cf823aa591 genetlink: add explicit ordering break check for split ops
JIRA: https://issues.redhat.com/browse/RHEL-30656

commit 5766946ea5117e4edeb78c80cac367fb06854cc1
Author: Jiri Pirko <jiri@nvidia.com>
Date:   Thu Jul 20 13:13:54 2023 +0200

    genetlink: add explicit ordering break check for split ops

    Currently, if cmd in the split ops array is of lower value than the
    previous one, genl_validate_ops() continues to do the checks as if
    the values are equal. This may result in non-obvious WARN_ON() hit in
    these check.

    Instead, check the incorrect ordering explicitly and put a WARN_ON()
    in case it is broken.

    Signed-off-by: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
    Link: https://lore.kernel.org/r/20230720111354.562242-1-jiri@resnulli.us
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-04-10 09:19:27 +02:00
Ivan Vecera e9b850700c netlink: Make use of __assign_bit() API
JIRA: https://issues.redhat.com/browse/RHEL-30656

commit b8e39b38487e68c6503419db6e4a851a0ef56de7
Author: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Date:   Mon Jul 10 13:08:30 2023 +0300

    netlink: Make use of __assign_bit() API

    We have for some time the __assign_bit() API to replace open coded

            if (foo)
                    __set_bit(n, bar);
            else
                    __clear_bit(n, bar);

    Use this API in the code. No functional change intended.

    Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
    Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    Message-ID: <20230710100830.89936-2-andriy.shevchenko@linux.intel.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-04-10 09:19:27 +02:00
Ivan Vecera 764a373f7a netlink: Add __sock_i_ino() for __netlink_diag_dump().
JIRA: https://issues.redhat.com/browse/RHEL-30656

commit 25a9c8a4431c364f97f75558cb346d2ad3f53fbb
Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Mon Jun 26 09:43:13 2023 -0700

    netlink: Add __sock_i_ino() for __netlink_diag_dump().

    syzbot reported a warning in __local_bh_enable_ip(). [0]

    Commit 8d61f926d420 ("netlink: fix potential deadlock in
    netlink_set_err()") converted read_lock(&nl_table_lock) to
    read_lock_irqsave() in __netlink_diag_dump() to prevent a deadlock.

    However, __netlink_diag_dump() calls sock_i_ino() that uses
    read_lock_bh() and read_unlock_bh().  If CONFIG_TRACE_IRQFLAGS=y,
    read_unlock_bh() finally enables IRQ even though it should stay
    disabled until the following read_unlock_irqrestore().

    Using read_lock() in sock_i_ino() would trigger a lockdep splat
    in another place that was fixed in commit f064af1e50 ("net: fix
    a lockdep splat"), so let's add __sock_i_ino() that would be safe
    to use under BH disabled.

    [0]:
    WARNING: CPU: 0 PID: 5012 at kernel/softirq.c:376 __local_bh_enable_ip+0xbe/0x130 kernel/softirq.c:376
    Modules linked in:
    CPU: 0 PID: 5012 Comm: syz-executor487 Not tainted 6.4.0-rc7-syzkaller-00202-g6f68fc395f49 #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023
    RIP: 0010:__local_bh_enable_ip+0xbe/0x130 kernel/softirq.c:376
    Code: 45 bf 01 00 00 00 e8 91 5b 0a 00 e8 3c 15 3d 00 fb 65 8b 05 ec e9 b5 7e 85 c0 74 58 5b 5d c3 65 8b 05 b2 b6 b4 7e 85 c0 75 a2 <0f> 0b eb 9e e8 89 15 3d 00 eb 9f 48 89 ef e8 6f 49 18 00 eb a8 0f
    RSP: 0018:ffffc90003a1f3d0 EFLAGS: 00010046
    RAX: 0000000000000000 RBX: 0000000000000201 RCX: 1ffffffff1cf5996
    RDX: 0000000000000000 RSI: 0000000000000201 RDI: ffffffff8805c6f3
    RBP: ffffffff8805c6f3 R08: 0000000000000001 R09: ffff8880152b03a3
    R10: ffffed1002a56074 R11: 0000000000000005 R12: 00000000000073e4
    R13: dffffc0000000000 R14: 0000000000000002 R15: 0000000000000000
    FS:  0000555556726300(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 000000000045ad50 CR3: 000000007c646000 CR4: 00000000003506f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     <TASK>
     sock_i_ino+0x83/0xa0 net/core/sock.c:2559
     __netlink_diag_dump+0x45c/0x790 net/netlink/diag.c:171
     netlink_diag_dump+0xd6/0x230 net/netlink/diag.c:207
     netlink_dump+0x570/0xc50 net/netlink/af_netlink.c:2269
     __netlink_dump_start+0x64b/0x910 net/netlink/af_netlink.c:2374
     netlink_dump_start include/linux/netlink.h:329 [inline]
     netlink_diag_handler_dump+0x1ae/0x250 net/netlink/diag.c:238
     __sock_diag_cmd net/core/sock_diag.c:238 [inline]
     sock_diag_rcv_msg+0x31e/0x440 net/core/sock_diag.c:269
     netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2547
     sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:280
     netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
     netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1365
     netlink_sendmsg+0x925/0xe30 net/netlink/af_netlink.c:1914
     sock_sendmsg_nosec net/socket.c:724 [inline]
     sock_sendmsg+0xde/0x190 net/socket.c:747
     ____sys_sendmsg+0x71c/0x900 net/socket.c:2503
     ___sys_sendmsg+0x110/0x1b0 net/socket.c:2557
     __sys_sendmsg+0xf7/0x1c0 net/socket.c:2586
     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
     do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
     entry_SYSCALL_64_after_hwframe+0x63/0xcd
    RIP: 0033:0x7f5303aaabb9
    Code: 28 c3 e8 2a 14 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
    RSP: 002b:00007ffc7506e548 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5303aaabb9
    RDX: 0000000000000000 RSI: 0000000020000180 RDI: 0000000000000003
    RBP: 00007f5303a6ed60 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 00007f5303a6edf0
    R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
     </TASK>

    Fixes: 8d61f926d420 ("netlink: fix potential deadlock in netlink_set_err()")
    Reported-by: syzbot+5da61cf6a9bc1902d422@syzkaller.appspotmail.com
    Link: https://syzkaller.appspot.com/bug?extid=5da61cf6a9bc1902d422
    Suggested-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Link: https://lore.kernel.org/r/20230626164313.52528-1-kuniyu@amazon.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-04-10 09:19:27 +02:00
Ivan Vecera e385a6038e netlink: fix potential deadlock in netlink_set_err()
JIRA: https://issues.redhat.com/browse/RHEL-30656

commit 8d61f926d42045961e6b65191c09e3678d86a9cf
Author: Eric Dumazet <edumazet@google.com>
Date:   Wed Jun 21 15:43:37 2023 +0000

    netlink: fix potential deadlock in netlink_set_err()

    syzbot reported a possible deadlock in netlink_set_err() [1]

    A similar issue was fixed in commit 1d482e666b ("netlink: disable IRQs
    for netlink_lock_table()") in netlink_lock_table()

    This patch adds IRQ safety to netlink_set_err() and __netlink_diag_dump()
    which were not covered by cited commit.

    [1]

    WARNING: possible irq lock inversion dependency detected
    6.4.0-rc6-syzkaller-00240-g4e9f0ec38852 #0 Not tainted

    syz-executor.2/23011 just changed the state of lock:
    ffffffff8e1a7a58 (nl_table_lock){.+.?}-{2:2}, at: netlink_set_err+0x2e/0x3a0 net/netlink/af_netlink.c:1612
    but this lock was taken by another, SOFTIRQ-safe lock in the past:
     (&local->queue_stop_reason_lock){..-.}-{2:2}

    and interrupts could create inverse lock ordering between them.

    other info that might help us debug this:
     Possible interrupt unsafe locking scenario:

           CPU0                    CPU1
           ----                    ----
      lock(nl_table_lock);
                                   local_irq_disable();
                                   lock(&local->queue_stop_reason_lock);
                                   lock(nl_table_lock);
      <Interrupt>
        lock(&local->queue_stop_reason_lock);

     *** DEADLOCK ***

    Fixes: 1d482e666b ("netlink: disable IRQs for netlink_lock_table()")
    Reported-by: syzbot+a7d200a347f912723e5c@syzkaller.appspotmail.com
    Link: https://syzkaller.appspot.com/bug?extid=a7d200a347f912723e5c
    Link: https://lore.kernel.org/netdev/000000000000e38d1605fea5747e@google.com/T/#u
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: Johannes Berg <johannes.berg@intel.com>
    Link: https://lore.kernel.org/r/20230621154337.1668594-1-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-04-10 09:19:26 +02:00
Ivan Vecera 587fa7af5a net/netlink: fix NETLINK_LIST_MEMBERSHIPS length report
JIRA: https://issues.redhat.com/browse/RHEL-30656

commit f4e4534850a9d18c250a93f8d7fbb51310828110
Author: Pedro Tammela <pctammela@mojatatu.com>
Date:   Mon May 29 12:33:35 2023 -0300

    net/netlink: fix NETLINK_LIST_MEMBERSHIPS length report

    The current code for the length calculation wrongly truncates the reported
    length of the groups array, causing an under report of the subscribed
    groups. To fix this, use 'BITS_TO_BYTES()' which rounds up the
    division by 8.

    Fixes: b42be38b27 ("netlink: add API to retrieve all group memberships")
    Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
    Reviewed-by: Simon Horman <simon.horman@corigine.com>
    Link: https://lore.kernel.org/r/20230529153335.389815-1-pctammela@mojatatu.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-04-10 09:19:22 +02:00
Ivan Vecera 8fcbe46cde netlink: annotate accesses to nlk->cb_running
JIRA: https://issues.redhat.com/browse/RHEL-30656

commit a939d14919b799e6fff8a9c80296ca229ba2f8a4
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue May 9 16:56:34 2023 +0000

    netlink: annotate accesses to nlk->cb_running

    Both netlink_recvmsg() and netlink_native_seq_show() read
    nlk->cb_running locklessly. Use READ_ONCE() there.

    Add corresponding WRITE_ONCE() to netlink_dump() and
    __netlink_dump_start()

    syzbot reported:
    BUG: KCSAN: data-race in __netlink_dump_start / netlink_recvmsg

    write to 0xffff88813ea4db59 of 1 bytes by task 28219 on cpu 0:
    __netlink_dump_start+0x3af/0x4d0 net/netlink/af_netlink.c:2399
    netlink_dump_start include/linux/netlink.h:308 [inline]
    rtnetlink_rcv_msg+0x70f/0x8c0 net/core/rtnetlink.c:6130
    netlink_rcv_skb+0x126/0x220 net/netlink/af_netlink.c:2577
    rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:6192
    netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
    netlink_unicast+0x56f/0x640 net/netlink/af_netlink.c:1365
    netlink_sendmsg+0x665/0x770 net/netlink/af_netlink.c:1942
    sock_sendmsg_nosec net/socket.c:724 [inline]
    sock_sendmsg net/socket.c:747 [inline]
    sock_write_iter+0x1aa/0x230 net/socket.c:1138
    call_write_iter include/linux/fs.h:1851 [inline]
    new_sync_write fs/read_write.c:491 [inline]
    vfs_write+0x463/0x760 fs/read_write.c:584
    ksys_write+0xeb/0x1a0 fs/read_write.c:637
    __do_sys_write fs/read_write.c:649 [inline]
    __se_sys_write fs/read_write.c:646 [inline]
    __x64_sys_write+0x42/0x50 fs/read_write.c:646
    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
    entry_SYSCALL_64_after_hwframe+0x63/0xcd

    read to 0xffff88813ea4db59 of 1 bytes by task 28222 on cpu 1:
    netlink_recvmsg+0x3b4/0x730 net/netlink/af_netlink.c:2022
    sock_recvmsg_nosec+0x4c/0x80 net/socket.c:1017
    ____sys_recvmsg+0x2db/0x310 net/socket.c:2718
    ___sys_recvmsg net/socket.c:2762 [inline]
    do_recvmmsg+0x2e5/0x710 net/socket.c:2856
    __sys_recvmmsg net/socket.c:2935 [inline]
    __do_sys_recvmmsg net/socket.c:2958 [inline]
    __se_sys_recvmmsg net/socket.c:2951 [inline]
    __x64_sys_recvmmsg+0xe2/0x160 net/socket.c:2951
    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
    entry_SYSCALL_64_after_hwframe+0x63/0xcd

    value changed: 0x00 -> 0x01

    Fixes: 16b304f340 ("netlink: Eliminate kmalloc in netlink dump operation.")
    Reported-by: syzbot <syzkaller@googlegroups.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-04-10 09:19:21 +02:00
Ivan Vecera 8a3f7c7be9 netlink: Use copy_to_user() for optval in netlink_getsockopt().
JIRA: https://issues.redhat.com/browse/RHEL-30656

commit d913d32cc2707e9cd24fe6fa6d7d470e9c728980
Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Fri Apr 21 11:52:55 2023 -0700

    netlink: Use copy_to_user() for optval in netlink_getsockopt().

    Brad Spencer provided a detailed report [0] that when calling getsockopt()
    for AF_NETLINK, some SOL_NETLINK options set only 1 byte even though such
    options require at least sizeof(int) as length.

    The options return a flag value that fits into 1 byte, but such behaviour
    confuses users who do not initialise the variable before calling
    getsockopt() and do not strictly check the returned value as char.

    Currently, netlink_getsockopt() uses put_user() to copy data to optlen and
    optval, but put_user() casts the data based on the pointer, char *optval.
    As a result, only 1 byte is set to optval.

    To avoid this behaviour, we need to use copy_to_user() or cast optval for
    put_user().

    Note that this changes the behaviour on big-endian systems, but we document
    that the size of optval is int in the man page.

      $ man 7 netlink
      ...
      Socket options
           To set or get a netlink socket option, call getsockopt(2) to read
           or setsockopt(2) to write the option with the option level argument
           set to SOL_NETLINK.  Unless otherwise noted, optval is a pointer to
           an int.

    Fixes: 9a4595bc7e ("[NETLINK]: Add set/getsockopt options to support more than 32 groups")
    Fixes: be0c22a46c ("netlink: add NETLINK_BROADCAST_ERROR socket option")
    Fixes: 38938bfe34 ("netlink: add NETLINK_NO_ENOBUFS socket flag")
    Fixes: 0a6a3a23ea ("netlink: add NETLINK_CAP_ACK socket option")
    Fixes: 2d4bc93368 ("netlink: extended ACK reporting")
    Fixes: 89d35528d1 ("netlink: Add new socket option to enable strict checking on dumps")
    Reported-by: Brad Spencer <bspencer@blackberry.com>
    Link: https://lore.kernel.org/netdev/ZD7VkNWFfp22kTDt@datsun.rim.net/
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
    Link: https://lore.kernel.org/r/20230421185255.94606-1-kuniyu@amazon.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-04-10 09:19:21 +02:00
Ivan Vecera bf93242e27 netlink: remove unused 'compare' function
JIRA: https://issues.redhat.com/browse/RHEL-30656

commit 6978052448f9eb19f7b03243ac0416104e5ee50d
Author: Florian Westphal <fw@strlen.de>
Date:   Wed Mar 8 15:20:06 2023 +0100

    netlink: remove unused 'compare' function

    No users in the tree.  Tested with allmodconfig build.

    Signed-off-by: Florian Westphal <fw@strlen.de>
    Link: https://lore.kernel.org/r/20230308142006.20879-1-fw@strlen.de
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-04-10 09:19:19 +02:00
Ivan Vecera 44d65dbd81 netlink: Reverse the patch which removed filtering
JIRA: https://issues.redhat.com/browse/RHEL-30344

commit a3377386b56420d78a4c0a931a40f9a25c3ca2bd
Author: Anjali Kulkarni <anjali.k.kulkarni@oracle.com>
Date:   Wed Jul 19 13:18:16 2023 -0700

    netlink: Reverse the patch which removed filtering

    To use filtering at the connector & cn_proc layers, we need to enable
    filtering in the netlink layer. This reverses the patch which removed
    netlink filtering - commit ID for that patch:
    549017aa1bb7 (netlink: remove netlink_broadcast_filtered).

    Signed-off-by: Anjali Kulkarni <anjali.k.kulkarni@oracle.com>
    Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
    Acked-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-04-10 09:16:00 +02:00
Ivan Vecera 676be2326e netlink: annotate lockless accesses to nlk->max_recvmsg_len
JIRA: https://issues.redhat.com/browse/RHEL-30344

commit a1865f2e7d10dde00d35a2122b38d2e469ae67ed
Author: Eric Dumazet <edumazet@google.com>
Date:   Mon Apr 3 21:46:43 2023 +0000

    netlink: annotate lockless accesses to nlk->max_recvmsg_len

    syzbot reported a data-race in data-race in netlink_recvmsg() [1]

    Indeed, netlink_recvmsg() can be run concurrently,
    and netlink_dump() also needs protection.

    [1]
    BUG: KCSAN: data-race in netlink_recvmsg / netlink_recvmsg

    read to 0xffff888141840b38 of 8 bytes by task 23057 on cpu 0:
    netlink_recvmsg+0xea/0x730 net/netlink/af_netlink.c:1988
    sock_recvmsg_nosec net/socket.c:1017 [inline]
    sock_recvmsg net/socket.c:1038 [inline]
    __sys_recvfrom+0x1ee/0x2e0 net/socket.c:2194
    __do_sys_recvfrom net/socket.c:2212 [inline]
    __se_sys_recvfrom net/socket.c:2208 [inline]
    __x64_sys_recvfrom+0x78/0x90 net/socket.c:2208
    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
    entry_SYSCALL_64_after_hwframe+0x63/0xcd

    write to 0xffff888141840b38 of 8 bytes by task 23037 on cpu 1:
    netlink_recvmsg+0x114/0x730 net/netlink/af_netlink.c:1989
    sock_recvmsg_nosec net/socket.c:1017 [inline]
    sock_recvmsg net/socket.c:1038 [inline]
    ____sys_recvmsg+0x156/0x310 net/socket.c:2720
    ___sys_recvmsg net/socket.c:2762 [inline]
    do_recvmmsg+0x2e5/0x710 net/socket.c:2856
    __sys_recvmmsg net/socket.c:2935 [inline]
    __do_sys_recvmmsg net/socket.c:2958 [inline]
    __se_sys_recvmmsg net/socket.c:2951 [inline]
    __x64_sys_recvmmsg+0xe2/0x160 net/socket.c:2951
    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
    entry_SYSCALL_64_after_hwframe+0x63/0xcd

    value changed: 0x0000000000000000 -> 0x0000000000001000

    Reported by Kernel Concurrency Sanitizer on:
    CPU: 1 PID: 23037 Comm: syz-executor.2 Not tainted 6.3.0-rc4-syzkaller-00195-g5a57b48fdfcb #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023

    Fixes: 9063e21fb0 ("netlink: autosize skb lengthes")
    Reported-by: syzbot <syzkaller@googlegroups.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Simon Horman <simon.horman@corigine.com>
    Link: https://lore.kernel.org/r/20230403214643.768555-1-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-04-02 11:15:41 +02:00