Commit Graph

153 Commits

Author SHA1 Message Date
Jeff Moyer 192d6b9fca io_uring/net: harden multishot termination case for recv
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit c314094cb4cfa6fc5a17f4881ead2dfebfa717a7
Author: Jens Axboe <axboe@kernel.dk>
Date:   Thu Sep 26 07:08:10 2024 -0600

    io_uring/net: harden multishot termination case for recv
    
    If the recv returns zero, or an error, then it doesn't matter if more
    data has already been received for this buffer. A condition like that
    should terminate the multishot receive. Rather than pass in the
    collected return value, pass in whether to terminate or keep the recv
    going separately.
    
    Note that this isn't a bug right now, as the only way to get there is
    via setting MSG_WAITALL with multishot receive. And if an application
    does that, then -EINVAL is returned anyway. But it seems like an easy
    bug to introduce, so let's make it a bit more explicit.
    
    Link: https://github.com/axboe/liburing/issues/1246
    Cc: stable@vger.kernel.org
    Fixes: b3fdea6ecb55 ("io_uring: multishot recv")
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-12-02 11:14:53 -05:00
Jeff Moyer c1986a7572 io_uring/net: don't pick multiple buffers for non-bundle send
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit 8fe8ac24adcd76b12edbfdefa078567bfff117d4
Author: Jens Axboe <axboe@kernel.dk>
Date:   Wed Aug 7 15:09:33 2024 -0600

    io_uring/net: don't pick multiple buffers for non-bundle send
    
    If a send is issued marked with IOSQE_BUFFER_SELECT for selecting a
    buffer, unless it's a bundle, it should not select multiple buffers.
    
    Cc: stable@vger.kernel.org
    Fixes: a05d1f625c7a ("io_uring/net: support bundles for send")
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-12-02 11:14:52 -05:00
Jeff Moyer 5dfd23310f io_uring/net: ensure expanded bundle send gets marked for cleanup
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit 70ed519ed59da3a92c3acedeb84a30e5a66051ce
Author: Jens Axboe <axboe@kernel.dk>
Date:   Wed Aug 7 15:08:17 2024 -0600

    io_uring/net: ensure expanded bundle send gets marked for cleanup
    
    If the iovec inside the kmsg isn't already allocated AND one gets
    expanded beyond the fixed size, then the request may not already have
    been marked for cleanup. Ensure that it is.
    
    Cc: stable@vger.kernel.org
    Fixes: a05d1f625c7a ("io_uring/net: support bundles for send")
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-12-02 11:14:52 -05:00
Jeff Moyer 7f6552b1f1 io_uring/net: ensure expanded bundle recv gets marked for cleanup
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit 11893e144ed75be55d99349760513ca104781fc0
Author: Jens Axboe <axboe@kernel.dk>
Date:   Wed Aug 7 15:06:45 2024 -0600

    io_uring/net: ensure expanded bundle recv gets marked for cleanup
    
    If the iovec inside the kmsg isn't already allocated AND one gets
    expanded beyond the fixed size, then the request may not already have
    been marked for cleanup. Ensure that it is.
    
    Cc: stable@vger.kernel.org
    Fixes: 2f9c9515bdfd ("io_uring/net: support bundles for recv")
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-12-02 11:14:52 -05:00
Jeff Moyer c78661e6f0 io_uring/net: don't clear msg_inq before io_recv_buf_select() needs it
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit 6e92c646f5a4230d939a0882f879fc50dfa116c5
Author: Jens Axboe <axboe@kernel.dk>
Date:   Tue Jul 2 09:37:30 2024 -0600

    io_uring/net: don't clear msg_inq before io_recv_buf_select() needs it
    
    For bundle receives to function properly, the previous iteration msg_inq
    value is needed to make a judgement call on how much data there is to
    receive. A previous fix ended up clearing it earlier as an error case
    would potentially errantly set IORING_CQE_F_SOCK_NONEMPTY if the request
    got failed.
    
    Move the assignment to post assigning buffers for the receive, but
    ensure that it's cleared for the buffer selection error case. With that,
    buffer selection has the right msg_inq value and can correctly bundle
    receives as designed.
    
    Noticed while testing where it was apparent than more than 1 buffer was
    never received. After fix was in place, multiple buffers are correctly
    picked for receive. This provides a 10x speedup for the test case, as
    the buffer size used was 64b.
    
    Fixes: 18414a4a2eab ("io_uring/net: assign kmsg inq/flags before buffer selection")
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-12-02 11:12:48 -05:00
Jeff Moyer ade423a5e6 io_uring/net: assign kmsg inq/flags before buffer selection
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit 18414a4a2eabb0281d12d374c92874327e0e3fe3
Author: Jens Axboe <axboe@kernel.dk>
Date:   Thu May 30 13:35:50 2024 -0600

    io_uring/net: assign kmsg inq/flags before buffer selection
    
    syzbot reports that recv is using an uninitialized value:
    
    =====================================================
    BUG: KMSAN: uninit-value in io_req_cqe_overflow io_uring/io_uring.c:810 [inline]
    BUG: KMSAN: uninit-value in io_req_complete_post io_uring/io_uring.c:937 [inline]
    BUG: KMSAN: uninit-value in io_issue_sqe+0x1f1b/0x22c0 io_uring/io_uring.c:1763
     io_req_cqe_overflow io_uring/io_uring.c:810 [inline]
     io_req_complete_post io_uring/io_uring.c:937 [inline]
     io_issue_sqe+0x1f1b/0x22c0 io_uring/io_uring.c:1763
     io_wq_submit_work+0xa17/0xeb0 io_uring/io_uring.c:1860
     io_worker_handle_work+0xc04/0x2000 io_uring/io-wq.c:597
     io_wq_worker+0x447/0x1410 io_uring/io-wq.c:651
     ret_from_fork+0x6d/0x90 arch/x86/kernel/process.c:147
     ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
    
    Uninit was stored to memory at:
     io_req_set_res io_uring/io_uring.h:215 [inline]
     io_recv_finish+0xf10/0x1560 io_uring/net.c:861
     io_recv+0x12ec/0x1ea0 io_uring/net.c:1175
     io_issue_sqe+0x429/0x22c0 io_uring/io_uring.c:1751
     io_wq_submit_work+0xa17/0xeb0 io_uring/io_uring.c:1860
     io_worker_handle_work+0xc04/0x2000 io_uring/io-wq.c:597
     io_wq_worker+0x447/0x1410 io_uring/io-wq.c:651
     ret_from_fork+0x6d/0x90 arch/x86/kernel/process.c:147
     ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
    
    Uninit was created at:
     slab_post_alloc_hook mm/slub.c:3877 [inline]
     slab_alloc_node mm/slub.c:3918 [inline]
     __do_kmalloc_node mm/slub.c:4038 [inline]
     __kmalloc+0x6e4/0x1060 mm/slub.c:4052
     kmalloc include/linux/slab.h:632 [inline]
     io_alloc_async_data+0xc0/0x220 io_uring/io_uring.c:1662
     io_msg_alloc_async io_uring/net.c:166 [inline]
     io_recvmsg_prep_setup io_uring/net.c:725 [inline]
     io_recvmsg_prep+0xbe8/0x1a20 io_uring/net.c:806
     io_init_req io_uring/io_uring.c:2135 [inline]
     io_submit_sqe io_uring/io_uring.c:2182 [inline]
     io_submit_sqes+0x1135/0x2f10 io_uring/io_uring.c:2335
     __do_sys_io_uring_enter io_uring/io_uring.c:3246 [inline]
     __se_sys_io_uring_enter+0x40f/0x3c80 io_uring/io_uring.c:3183
     __x64_sys_io_uring_enter+0x11f/0x1a0 io_uring/io_uring.c:3183
     x64_sys_call+0x2c0/0x3b50 arch/x86/include/generated/asm/syscalls_64.h:427
     do_syscall_x64 arch/x86/entry/common.c:52 [inline]
     do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83
     entry_SYSCALL_64_after_hwframe+0x77/0x7f
    
    which appears to be io_recv_finish() reading kmsg->msg.msg_inq to decide
    if it needs to set IORING_CQE_F_SOCK_NONEMPTY or not. If the recv is
    entered with buffer selection, but no buffer is available, then we jump
    error path which calls io_recv_finish() without having assigned
    kmsg->msg_inq. This might cause an errant setting of the NONEMPTY flag
    for a request get gets errored with -ENOBUFS.
    
    Reported-by: syzbot+b1647099e82b3b349fbf@syzkaller.appspotmail.com
    Fixes: 4a3223f7bfda ("io_uring/net: switch io_recv() to using io_async_msghdr")
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-12-02 11:12:47 -05:00
Jeff Moyer 12e997c3e6 io_uring/net: wire up IORING_CQE_F_SOCK_NONEMPTY for accept
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit ac287da2e0ea5be2523222981efec86f0ca977cd
Author: Jens Axboe <axboe@kernel.dk>
Date:   Thu May 9 09:41:10 2024 -0600

    io_uring/net: wire up IORING_CQE_F_SOCK_NONEMPTY for accept
    
    If the given protocol supports passing back whether or not we had more
    pending accept post this one, pass back this information to userspace.
    This is done by setting IORING_CQE_F_SOCK_NONEMPTY in the CQE flags,
    just like we do for recv/recvmsg if there's more data available post
    a receive operation.
    
    We can also use this information to be smarter about multishot retry,
    as we don't need to do a pointless retry if we know for a fact that
    there aren't any more connections to accept.
    
    Suggested-by: Norman Maurer <norman_maurer@apple.com>
    Acked-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-12-02 11:12:47 -05:00
Jeff Moyer 8ba710b1a7 net: have do_accept() take a struct proto_accept_arg argument
JIRA: https://issues.redhat.com/browse/RHEL-64867
Conflicts: RHEL is missing commit 1ded5e5a5931 ("net: annotate
data-races around sock->ops"), which accounts for the differences in
ops structure dereferencing.

commit 0645fbe760afcc5332c858d1cbf416bf77ef3c29
Author: Jens Axboe <axboe@kernel.dk>
Date:   Thu May 9 09:31:05 2024 -0600

    net: have do_accept() take a struct proto_accept_arg argument
    
    In preparation for passing in more information via this API, change
    do_accept() to take a proto_accept_arg struct pointer rather than just
    the file flags separately.
    
    No functional changes in this patch.
    
    Acked-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-12-02 11:12:47 -05:00
Jeff Moyer aa0b488ee2 io_uring/net: add IORING_ACCEPT_POLL_FIRST flag
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit d3da8e98592693811c14c31f05380f378411fea1
Author: Jens Axboe <axboe@kernel.dk>
Date:   Wed May 8 08:17:50 2024 -0600

    io_uring/net: add IORING_ACCEPT_POLL_FIRST flag
    
    Similarly to how polling first is supported for receive, it makes sense
    to provide the same for accept. An accept operation does a lot of
    expensive setup, like allocating an fd, a socket/inode, etc. If no
    connection request is already pending, this is wasted and will just be
    cleaned up and freed, only to retry via the usual poll trigger.
    
    Add IORING_ACCEPT_POLL_FIRST, which tells accept to only initiate the
    accept request if poll says we have something to accept.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 17:48:44 -05:00
Jeff Moyer 2ba805570e io_uring/net: add IORING_ACCEPT_DONTWAIT flag
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit 7dcc758cca432510f77b2fe1077be2314bc3785b
Author: Jens Axboe <axboe@kernel.dk>
Date:   Tue May 7 14:06:15 2024 -0600

    io_uring/net: add IORING_ACCEPT_DONTWAIT flag
    
    This allows the caller to perform a non-blocking attempt, similarly to
    how recvmsg has MSG_DONTWAIT. If set, and we get -EAGAIN on a connection
    attempt, propagate the result to userspace rather than arm poll and
    wait for a retry.
    
    Suggested-by: Norman Maurer <norman_maurer@apple.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 17:47:44 -05:00
Jeff Moyer 6dd8158416 io_uring/net: support bundles for recv
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit 2f9c9515bdfde9e4df1f35782284074d3625ff8a
Author: Jens Axboe <axboe@kernel.dk>
Date:   Tue Mar 5 16:22:04 2024 -0700

    io_uring/net: support bundles for recv
    
    If IORING_OP_RECV is used with provided buffers, the caller may also set
    IORING_RECVSEND_BUNDLE to turn it into a multi-buffer recv. This grabs
    buffers available and receives into them, posting a single completion for
    all of it.
    
    This can be used with multishot receive as well, or without it.
    
    Now that both send and receive support bundles, add a feature flag for
    it as well. If IORING_FEAT_RECVSEND_BUNDLE is set after registering the
    ring, then the kernel supports bundles for recv and send.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 17:37:44 -05:00
Jeff Moyer 00d90d013d io_uring/net: support bundles for send
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit a05d1f625c7aa681d8816bc0f10089289ad07aad
Author: Jens Axboe <axboe@kernel.dk>
Date:   Tue Mar 5 13:10:04 2024 -0700

    io_uring/net: support bundles for send
    
    If IORING_OP_SEND is used with provided buffers, the caller may also
    set IORING_RECVSEND_BUNDLE to turn it into a multi-buffer send. The idea
    is that an application can fill outgoing buffers in a provided buffer
    group, and then arm a single send that will service them all. Once
    there are no more buffers to send, or if the requested length has
    been sent, the request posts a single completion for all the buffers.
    
    This only enables it for IORING_OP_SEND, IORING_OP_SENDMSG is coming
    in a separate patch. However, this patch does do a lot of the prep
    work that makes wiring up the sendmsg variant pretty trivial. They
    share the prep side.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 17:36:44 -05:00
Jeff Moyer 3309b0778b io_uring/net: add provided buffer support for IORING_OP_SEND
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit ac5f71a3d9d7eb540f6bf7e794eb4a3e4c3f11dd
Author: Jens Axboe <axboe@kernel.dk>
Date:   Mon Feb 19 10:46:44 2024 -0700

    io_uring/net: add provided buffer support for IORING_OP_SEND
    
    It's pretty trivial to wire up provided buffer support for the send
    side, just like how it's done the receive side. This enables setting up
    a buffer ring that an application can use to push pending sends to,
    and then have a send pick a buffer from that ring.
    
    One of the challenges with async IO and networking sends is that you
    can get into reordering conditions if you have more than one inflight
    at the same time. Consider the following scenario where everything is
    fine:
    
    1) App queues sendA for socket1
    2) App queues sendB for socket1
    3) App does io_uring_submit()
    4) sendA is issued, completes successfully, posts CQE
    5) sendB is issued, completes successfully, posts CQE
    
    All is fine. Requests are always issued in-order, and both complete
    inline as most sends do.
    
    However, if we're flooding socket1 with sends, the following could
    also result from the same sequence:
    
    1) App queues sendA for socket1
    2) App queues sendB for socket1
    3) App does io_uring_submit()
    4) sendA is issued, socket1 is full, poll is armed for retry
    5) Space frees up in socket1, this triggers sendA retry via task_work
    6) sendB is issued, completes successfully, posts CQE
    7) sendA is retried, completes successfully, posts CQE
    
    Now we've sent sendB before sendA, which can make things unhappy. If
    both sendA and sendB had been using provided buffers, then it would look
    as follows instead:
    
    1) App queues dataA for sendA, queues sendA for socket1
    2) App queues dataB for sendB queues sendB for socket1
    3) App does io_uring_submit()
    4) sendA is issued, socket1 is full, poll is armed for retry
    5) Space frees up in socket1, this triggers sendA retry via task_work
    6) sendB is issued, picks first buffer (dataA), completes successfully,
       posts CQE (which says "I sent dataA")
    7) sendA is retried, picks first buffer (dataB), completes successfully,
       posts CQE (which says "I sent dataB")
    
    Now we've sent the data in order, and everybody is happy.
    
    It's worth noting that this also opens the door for supporting multishot
    sends, as provided buffers would be a prerequisite for that. Those can
    trigger either when new buffers are added to the outgoing ring, or (if
    stalled due to lack of space) when space frees up in the socket.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 17:34:44 -05:00
Jeff Moyer aa82d3d3c1 io_uring/net: add generic multishot retry helper
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit 3e747dedd47b6250390abfc08dc0aa4817d3c052
Author: Jens Axboe <axboe@kernel.dk>
Date:   Sun Feb 25 12:52:39 2024 -0700

    io_uring/net: add generic multishot retry helper
    
    This is just moving io_recv_prep_retry() higher up so it can get used
    for sends as well, and rename it to be generically useful for both
    sends and receives.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 17:33:44 -05:00
Jeff Moyer de812aa812 io_uring/net: set MSG_ZEROCOPY for sendzc in advance
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit d285da7dbd3b3cc9b4cf822039a87ca4e4106ecf
Author: Pavel Begunkov <asml.silence@gmail.com>
Date:   Mon Apr 8 00:54:57 2024 +0100

    io_uring/net: set MSG_ZEROCOPY for sendzc in advance
    
    We can set MSG_ZEROCOPY at the preparation step, do it so we don't have
    to care about it later in the issue callback.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/c2c22aaa577624977f045979a6db2b9fb2e5648c.1712534031.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 17:19:44 -05:00
Jeff Moyer 44fb41cbaf io_uring/net: get rid of io_notif_complete_tw_ext
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit 6b7f864bb70591b1ba8f538c13de2a8396bfec8a
Author: Pavel Begunkov <asml.silence@gmail.com>
Date:   Mon Apr 8 00:54:56 2024 +0100

    io_uring/net: get rid of io_notif_complete_tw_ext
    
    io_notif_complete_tw_ext() can be removed and combined with
    io_notif_complete_tw to make it simpler without sacrificing
    anything.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/025a124a5e20e2474a57e2f04f16c422eb83063c.1712534031.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 17:18:44 -05:00
Jeff Moyer f1ebf01f03 io_uring/alloc_cache: switch to array based caching
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit 414d0f45c316221acbf066658afdbae5b354a5cc
Author: Jens Axboe <axboe@kernel.dk>
Date:   Wed Mar 20 15:19:44 2024 -0600

    io_uring/alloc_cache: switch to array based caching
    
    Currently lists are being used to manage this, but best practice is
    usually to have these in an array instead as that it cheaper to manage.
    
    Outside of that detail, games are also played with KASAN as the list
    is inside the cached entry itself.
    
    Finally, all users of this need a struct io_cache_entry embedded in
    their struct, which is union'ized with something else in there that
    isn't used across the free -> realloc cycle.
    
    Get rid of all of that, and simply have it be an array. This will not
    change the memory used, as we're just trading an 8-byte member entry
    for the per-elem array size.
    
    This reduces the overhead of the recycled allocations, and it reduces
    the amount of code code needed to support recycling to about half of
    what it currently is.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 16:56:44 -05:00
Jeff Moyer 8842cece02 io_uring/net: move connect to always using async data
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit e2ea5a7069133c01fe3dbda95d77af7f193a1a52
Author: Jens Axboe <axboe@kernel.dk>
Date:   Mon Mar 18 20:37:22 2024 -0600

    io_uring/net: move connect to always using async data
    
    While doing that, get rid of io_async_connect and just use the generic
    io_async_msghdr. Both of them have a struct sockaddr_storage in there,
    and while io_async_msghdr is bigger, if the same type can be used then
    the netmsg_cache can get reused for connect as well.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 16:52:44 -05:00
Jeff Moyer e7d63496ff io_uring/net: drop 'kmsg' parameter from io_req_msg_cleanup()
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit d80f940701302e84d1398ecb103083468b566a69
Author: Jens Axboe <axboe@kernel.dk>
Date:   Mon Mar 18 13:52:42 2024 -0600

    io_uring/net: drop 'kmsg' parameter from io_req_msg_cleanup()
    
    Now that iovec recycling is being done, the iovec is no longer being
    freed in there. Hence the kmsg parameter is now useless.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 16:47:44 -05:00
Jeff Moyer 5853eedcc0 io_uring/net: add iovec recycling
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit 75191341785eef51f87ff54b0ed9dfbd5a72e7c2
Author: Jens Axboe <axboe@kernel.dk>
Date:   Sat Mar 16 15:33:53 2024 -0600

    io_uring/net: add iovec recycling
    
    Right now the io_async_msghdr is recycled to avoid the overhead of
    allocating+freeing it for every request. But the iovec is not included,
    hence that will be allocated and freed for each transfer regardless.
    This commit enables recyling of the iovec between io_async_msghdr
    recycles. This avoids alloc+free for each one if an iovec is used, and
    on top of that, it extends the cache hot nature of msg to the iovec as
    well.
    
    Also enables KASAN for the iovec entries, so that reuse can be detected
    even while they are in the cache.
    
    The io_async_msghdr also shrinks from 376 -> 288 bytes, an 88 byte
    saving (or ~23% smaller), as the fast_iovec entry is dropped from 8
    entries to a single entry. There's no point keeping a big fast iovec
    entry, if iovecs aren't being allocated and freed continually.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 16:46:44 -05:00
Jeff Moyer 1d1ea00e25 io_uring/net: remove (now) dead code in io_netmsg_recycle()
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit 9f8539fe299c250af42325eccff66e8b8d1f15da
Author: Jens Axboe <axboe@kernel.dk>
Date:   Wed Mar 20 19:09:50 2024 -0600

    io_uring/net: remove (now) dead code in io_netmsg_recycle()
    
    All net commands have async data at this point, there's no reason to
    check if this is the case or not.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 16:45:44 -05:00
Jeff Moyer ed7e692cfe io_uring: kill io_msg_alloc_async_prep()
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit 6498c5c97ce73770ed227eb52b14d21c8343fd5b
Author: Jens Axboe <axboe@kernel.dk>
Date:   Mon Mar 18 10:07:37 2024 -0600

    io_uring: kill io_msg_alloc_async_prep()
    
    We now ONLY call io_msg_alloc_async() from inside prep handling, which
    is always locked. No need for this helper anymore, or the check in
    io_msg_alloc_async() on whether the ring is locked or not.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 16:44:44 -05:00
Jeff Moyer 34fd898903 io_uring/net: get rid of ->prep_async() for send side
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit 50220d6ac8ff31eb065fba818e960f549fb89d4d
Author: Jens Axboe <axboe@kernel.dk>
Date:   Mon Mar 18 08:09:47 2024 -0600

    io_uring/net: get rid of ->prep_async() for send side
    
    Move the io_async_msghdr out of the issue path and into prep handling,
    e it's now done unconditionally and hence does not need to be part
    of the issue path. This means any usage of io_sendrecv_prep_async() and
    io_sendmsg_prep_async(), and hence the forced async setup path is now
    unified with the normal prep setup.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 16:43:44 -05:00
Jeff Moyer 8ffe89e94d io_uring/net: get rid of ->prep_async() for receive side
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit c6f32c7d9e09bf1368447e9a29e869193ecbb756
Author: Jens Axboe <axboe@kernel.dk>
Date:   Mon Mar 18 07:36:03 2024 -0600

    io_uring/net: get rid of ->prep_async() for receive side
    
    Move the io_async_msghdr out of the issue path and into prep handling,
    since it's now done unconditionally and hence does not need to be part
    of the issue path. This reduces the footprint of the multishot fast
    path of multiple invocations of ->issue() per prep, and also means that
    using ->prep_async() can be dropped for recvmsg asthis is now done via
    setup on the prep side.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 16:42:44 -05:00
Jeff Moyer 83f2bab14c io_uring/net: always set kmsg->msg.msg_control_user before issue
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit 3ba8345aec886a3a01331e944a6a8568bf94bd10
Author: Jens Axboe <axboe@kernel.dk>
Date:   Fri Apr 12 12:39:54 2024 -0600

    io_uring/net: always set kmsg->msg.msg_control_user before issue
    
    We currently set this separately for async/sync entry, but let's just
    move it to a generic pre-issue spot and eliminate the difference
    between the two.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 16:41:44 -05:00
Jeff Moyer d2e47f2afe io_uring/net: always setup an io_async_msghdr
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit 790b68b32a678b65b161861f83b2b782b6b9246b
Author: Jens Axboe <axboe@kernel.dk>
Date:   Sat Mar 16 17:26:09 2024 -0600

    io_uring/net: always setup an io_async_msghdr
    
    Rather than use an on-stack one and then need to allocate and copy if
    async execution is required, always grab one upfront. This should be
    very cheap, and potentially even have cache hotness benefits for
    back-to-back send/recv requests.
    
    For any recv type of request, this is probably a good choice in general,
    as it's expected that no data is available initially. For send this is
    not necessarily the case, as space in the socket buffer is expected to
    be available. However, getting a cached io_async_msghdr is very cheap,
    and as it should be cache hot, probably the difference here is neglible,
    if any.
    
    A nice side benefit is that io_setup_async_msg can get killed
    completely, which has some nasty iovec manipulation code.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 16:40:44 -05:00
Jeff Moyer 2c1ec5357c io_uring/net: unify cleanup handling
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit f5b00ab2221a26202da7d10542a98203075bfdf8
Author: Jens Axboe <axboe@kernel.dk>
Date:   Tue Mar 12 09:38:08 2024 -0600

    io_uring/net: unify cleanup handling
    
    Now that recv/recvmsg both do the same cleanup, put it in the retry and
    finish handlers.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 16:39:44 -05:00
Jeff Moyer 8620a3c4b1 io_uring/net: switch io_recv() to using io_async_msghdr
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit 4a3223f7bfda14c532856152b12aace525cf8079
Author: Jens Axboe <axboe@kernel.dk>
Date:   Tue Mar 5 15:39:16 2024 -0700

    io_uring/net: switch io_recv() to using io_async_msghdr
    
    No functional changes in this patch, just in preparation for carrying
    more state than what is available now, if necessary.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 16:38:44 -05:00
Jeff Moyer ef97076384 io_uring/net: switch io_send() and io_send_zc() to using io_async_msghdr
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit 54cdcca05abde32acc3233950ddc79d8be25515f
Author: Jens Axboe <axboe@kernel.dk>
Date:   Tue Mar 5 09:34:21 2024 -0700

    io_uring/net: switch io_send() and io_send_zc() to using io_async_msghdr
    
    No functional changes in this patch, just in preparation for carrying
    more state then what is being done now, if necessary. While unifying
    some of this code, add a generic send setup prep handler that they can
    both use.
    
    This gets rid of some manual msghdr and sockaddr on the stack, and makes
    it look a bit more like the sendmsg/recvmsg variants. Going forward, more
    can get unified on top.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 16:37:44 -05:00
Jeff Moyer 74280b745d io_uring: refactor io_fill_cqe_req_aux
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit e5c12945be5016d681ff305ea7306fef5902219d
Author: Pavel Begunkov <asml.silence@gmail.com>
Date:   Mon Mar 18 22:00:31 2024 +0000

    io_uring: refactor io_fill_cqe_req_aux
    
    The restriction on multishot execution context disallowing io-wq is
    driven by rules of io_fill_cqe_req_aux(), it should only be called in
    the master task context, either from the syscall path or in task_work.
    Since task_work now always takes the ctx lock implying
    IO_URING_F_COMPLETE_DEFER, we can just assume that the function is
    always called with its defer argument set to true.
    
    Kill the argument. Also rename the function for more consistency as
    "fill" in CQE related functions was usually meant for raw interfaces
    only copying data into the CQ without any locking, waking the user
    and other accounting "post" functions take care of.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Tested-by: Ming Lei <ming.lei@redhat.com>
    Link: https://lore.kernel.org/r/93423d106c33116c7d06bf277f651aa68b427328.1710799188.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 16:29:44 -05:00
Jeff Moyer cb7f7e4460 RHEL-only: convert READ/WRITE to ITER_DEST/ITER_SOURCE
JIRA: https://issues.redhat.com/browse/RHEL-64867
Upstream status: RHEL-only

Commit de4eda9de2d9 ("use less confusing names for iov_iter direction
initializers") was backported to RHEL, but only in part.  Make
modifications to io_uring/ so that patches apply cleanly.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 16:25:44 -05:00
Jeff Moyer 54c4403058 io_uring/net: dedup io_recv_finish req completion
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit 1af04699c59713a7693cc63d80b29152579e61c3
Author: Pavel Begunkov <asml.silence@gmail.com>
Date:   Fri Mar 8 13:55:58 2024 +0000

    io_uring/net: dedup io_recv_finish req completion
    
    There are two block in io_recv_finish() completing the request, which we
    can combine and remove jumping.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/0e338dcb33c88de83809fda021cba9e7c9681620.1709905727.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 15:56:44 -05:00
Jeff Moyer 6a4cf656b5 io_uring/net: add io_req_msg_cleanup() helper
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit d9b441889c3595aa18f89ee42c6d22bb62234343
Author: Jens Axboe <axboe@kernel.dk>
Date:   Wed Mar 6 07:57:57 2024 -0700

    io_uring/net: add io_req_msg_cleanup() helper
    
    For the fast inline path, we manually recycle the io_async_msghdr and
    free the iovec, and then clear the REQ_F_NEED_CLEANUP flag to avoid
    that needing doing in the slower path. We already do that in 2 spots, and
    in preparation for adding more, add a helper and use it.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 15:55:44 -05:00
Jeff Moyer 96b449a012 io_uring/net: simplify msghd->msg_inq checking
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit fb6328bc2ab58dcf2998bd173f1ef0f3eb7be19a
Author: Jens Axboe <axboe@kernel.dk>
Date:   Wed Mar 6 10:57:33 2024 -0700

    io_uring/net: simplify msghd->msg_inq checking
    
    Just check for larger than zero rather than check for non-zero and
    not -1. This is easier to read, and also protects against any errants
    < 0 values that aren't -1.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 15:54:44 -05:00
Jeff Moyer 542681cc14 io_uring/kbuf: rename REQ_F_PARTIAL_IO to REQ_F_BL_NO_RECYCLE
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit 186daf2385295acf19ecf48f4d5214cc2d925933
Author: Jens Axboe <axboe@kernel.dk>
Date:   Thu Mar 7 12:53:24 2024 -0700

    io_uring/kbuf: rename REQ_F_PARTIAL_IO to REQ_F_BL_NO_RECYCLE
    
    We only use the flag for this purpose, so rename it accordingly. This
    further prevents various other use cases of it, keeping it clean and
    consistent. Then we can also check it in one spot, when it's being
    attempted recycled, and remove some dead code in io_kbuf_recycle_ring().
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 15:53:44 -05:00
Jeff Moyer 71307ae6fc io_uring/net: clear REQ_F_BL_EMPTY in the multishot retry handler
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit b5311dbc2c2eefac00f12888dcd15e90238d1828
Author: Jens Axboe <axboe@kernel.dk>
Date:   Thu Mar 7 13:19:46 2024 -0700

    io_uring/net: clear REQ_F_BL_EMPTY in the multishot retry handler
    
    This flag should not be persistent across retries, so ensure we clear
    it before potentially attemting a retry.
    
    Fixes: c3f9109dbc9e ("io_uring/kbuf: flag request if buffer pool is empty after buffer pick")
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 15:52:44 -05:00
Jeff Moyer 2fb19021ca io_uring/net: improve the usercopy for sendmsg/recvmsg
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit 792060de8b3e9ca11fab4afc0c3c5927186152a2
Author: Jens Axboe <axboe@kernel.dk>
Date:   Mon Feb 26 16:43:01 2024 -0700

    io_uring/net: improve the usercopy for sendmsg/recvmsg
    
    We're spending a considerable amount of the sendmsg/recvmsg time just
    copying in the message header. And for provided buffers, the known
    single entry iovec.
    
    Be a bit smarter about it and enable/disable user access around our
    copying. In a test case that does both sendmsg and recvmsg, the
    runtime before this change (averaged over multiple runs, very stable
    times however):
    
    Kernel          Time            Diff
    ====================================
    -git            4720 usec
    -git+commit     4311 usec       -8.7%
    
    and looking at a profile diff, we see the following:
    
    0.25%     +9.33%  [kernel.kallsyms]     [k] _copy_from_user
    4.47%     -3.32%  [kernel.kallsyms]     [k] __io_msg_copy_hdr.constprop.0
    
    where we drop more than 9% of _copy_from_user() time, and consequently
    add time to __io_msg_copy_hdr() where the copies are now attributed to,
    but with a net win of 6%.
    
    In comparison, the same test case with send/recv runs in 3745 usec, which
    is (expectedly) still quite a bit faster. But at least sendmsg/recvmsg is
    now only ~13% slower, where it was ~21% slower before.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 15:49:44 -05:00
Jeff Moyer 11fadb0b78 io_uring/net: ensure async prep handlers always initialize ->done_io
JIRA: https://issues.redhat.com/browse/RHEL-27755

commit f3a640cca951ef9715597e68f5363afc0f452a88
Author: Jens Axboe <axboe@kernel.dk>
Date:   Fri Mar 15 16:36:23 2024 -0600

    io_uring/net: ensure async prep handlers always initialize ->done_io
    
    If we get a request with IOSQE_ASYNC set, then we first run the prep
    async handlers. But if we then fail setting it up and want to post
    a CQE with -EINVAL, we use ->done_io. This was previously guarded with
    REQ_F_PARTIAL_IO, and the normal setup handlers do set it up before any
    potential errors, but we need to cover the async setup too.
    
    Fixes: 9817ad85899f ("io_uring/net: remove dependency on REQ_F_PARTIAL_IO for sr->done_io")
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-07-02 14:33:42 -04:00
Jeff Moyer f0de2829e0 io_uring/net: correctly handle multishot recvmsg retry setup
JIRA: https://issues.redhat.com/browse/RHEL-27755

commit deaef31bc1ec7966698a427da8c161930830e1cf
Author: Jens Axboe <axboe@kernel.dk>
Date:   Thu Mar 7 17:48:03 2024 -0700

    io_uring/net: correctly handle multishot recvmsg retry setup
    
    If we loop for multishot receive on the initial attempt, and then abort
    later on to wait for more, we miss a case where we should be copying the
    io_async_msghdr from the stack to stable storage. This leads to the next
    retry potentially failing, if the application had the msghdr on the
    stack.
    
    Cc: stable@vger.kernel.org
    Fixes: 9bb66906f23e ("io_uring: support multishot in recvmsg")
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-07-02 14:33:41 -04:00
Jeff Moyer 3103004c63 io_uring/net: correct the type of variable
JIRA: https://issues.redhat.com/browse/RHEL-27755

commit 86bcacc957fc2d0403aa0e652757eec59a5fd7ca
Author: Muhammad Usama Anjum <usama.anjum@collabora.com>
Date:   Fri Mar 1 19:43:48 2024 +0500

    io_uring/net: correct the type of variable
    
    The namelen is of type int. It shouldn't be made size_t which is
    unsigned. The signed number is needed for error checking before use.
    
    Fixes: c55978024d12 ("io_uring/net: move receive multishot out of the generic msghdr path")
    Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
    Link: https://lore.kernel.org/r/20240301144349.2807544-1-usama.anjum@collabora.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-07-02 14:33:41 -04:00
Jeff Moyer 7ba923ccba io_uring/net: fix overflow check in io_recvmsg_mshot_prep()
JIRA: https://issues.redhat.com/browse/RHEL-27755
JIRA: https://issues.redhat.com/browse/RHEL-36928
CVE: CVE-2024-35827

commit 8ede3db5061bb1fe28e2c9683329aafa89d2b1b4
Author: Dan Carpenter <dan.carpenter@linaro.org>
Date:   Fri Mar 1 18:29:39 2024 +0300

    io_uring/net: fix overflow check in io_recvmsg_mshot_prep()
    
    The "controllen" variable is type size_t (unsigned long).  Casting it
    to int could lead to an integer underflow.
    
    The check_add_overflow() function considers the type of the destination
    which is type int.  If we add two positive values and the result cannot
    fit in an integer then that's counted as an overflow.
    
    However, if we cast "controllen" to an int and it turns negative, then
    negative values *can* fit into an int type so there is no overflow.
    
    Good: 100 + (unsigned long)-4 = 96  <-- overflow
     Bad: 100 + (int)-4 = 96 <-- no overflow
    
    I deleted the cast of the sizeof() as well.  That's not a bug but the
    cast is unnecessary.
    
    Fixes: 9b0fc3c054ff ("io_uring: fix types in io_recvmsg_multishot_overflow")
    Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
    Link: https://lore.kernel.org/r/138bd2e2-ede8-4bcc-aa7b-f3d9de167a37@moroto.mountain
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-07-02 14:33:41 -04:00
Jeff Moyer 280da3b275 io_uring/net: move receive multishot out of the generic msghdr path
JIRA: https://issues.redhat.com/browse/RHEL-27755
Conflicts: RHEL does not have commit de4eda9de2d9 ("use less confusing
  names for iov_iter direction initializers"), so change ITER_DEST to
  READ and ITER_SOURCE to write.

commit c55978024d123d43808ab393a0a4ce3ce8568150
Author: Jens Axboe <axboe@kernel.dk>
Date:   Tue Feb 27 11:09:20 2024 -0700

    io_uring/net: move receive multishot out of the generic msghdr path
    
    Move the actual user_msghdr / compat_msghdr into the send and receive
    sides, respectively, so we can move the uaddr receive handling into its
    own handler, and ditto the multishot with buffer selection logic.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-07-02 14:33:40 -04:00
Jeff Moyer 9d8ea9667d io_uring/net: unify how recvmsg and sendmsg copy in the msghdr
JIRA: https://issues.redhat.com/browse/RHEL-27755
Conflicts: RHEL does not have commit de4eda9de2d9 ("use less confusing
  names for iov_iter direction initializers"), so change ITER_DEST to
  READ and ITER_SOURCE to write.

commit 52307ac4f2b507f60bae6df5be938d35e199c688
Author: Jens Axboe <axboe@kernel.dk>
Date:   Mon Feb 19 14:16:47 2024 -0700

    io_uring/net: unify how recvmsg and sendmsg copy in the msghdr
    
    For recvmsg, we roll our own since we support buffer selections. This
    isn't the case for sendmsg right now, but in preparation for doing so,
    make the recvmsg copy helpers generic so we can call them from the
    sendmsg side as well.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-07-02 14:33:40 -04:00
Jeff Moyer 0c351cce91 io_uring/net: restore msg_control on sendzc retry
JIRA: https://issues.redhat.com/browse/RHEL-27755

commit 4fe82aedeb8a8cb09bfa60f55ab57b5c10a74ac4
Author: Pavel Begunkov <asml.silence@gmail.com>
Date:   Mon Apr 8 18:11:09 2024 +0100

    io_uring/net: restore msg_control on sendzc retry
    
    cac9e4418f4cb ("io_uring/net: save msghdr->msg_control for retries")
    reinstatiates msg_control before every __sys_sendmsg_sock(), since the
    function can overwrite the value in msghdr. We need to do same for
    zerocopy sendmsg.
    
    Cc: stable@vger.kernel.org
    Fixes: 493108d95f146 ("io_uring/net: zerocopy sendmsg")
    Link: https://github.com/axboe/liburing/issues/1067
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/cc1d5d9df0576fa66ddad4420d240a98a020b267.1712596179.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-07-02 14:33:40 -04:00
Jeff Moyer 587556069d io_uring: refactor DEFER_TASKRUN multishot checks
JIRA: https://issues.redhat.com/browse/RHEL-27755

commit e0e4ab52d17096d96c21a6805ccd424b283c3c6d
Author: Pavel Begunkov <asml.silence@gmail.com>
Date:   Fri Mar 8 13:55:57 2024 +0000

    io_uring: refactor DEFER_TASKRUN multishot checks
    
    We disallow DEFER_TASKRUN multishots from running by io-wq, which is
    checked by individual opcodes in the issue path. We can consolidate all
    it in io_wq_submit_work() at the same time moving the checks out of the
    hot path.
    
    Suggested-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/e492f0f11588bb5aa11d7d24e6f53b7c7628afdb.1709905727.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-07-02 14:33:39 -04:00
Jeff Moyer 4a9eb8b2ef io_uring/net: move recv/recvmsg flags out of retry loop
JIRA: https://issues.redhat.com/browse/RHEL-27755

commit eb18c29dd2a3d49cf220ee34411ff0fe60b36bf2
Author: Jens Axboe <axboe@kernel.dk>
Date:   Sun Feb 25 12:59:05 2024 -0700

    io_uring/net: move recv/recvmsg flags out of retry loop
    
    The flags don't change, just intialize them once rather than every loop
    for multishot.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-07-02 14:33:39 -04:00
Jeff Moyer 9986201013 io_uring: fix mshot io-wq checks
JIRA: https://issues.redhat.com/browse/RHEL-27755

commit 3a96378e22cc46c7c49b5911f6c8631527a133a9
Author: Pavel Begunkov <asml.silence@gmail.com>
Date:   Fri Mar 8 13:55:56 2024 +0000

    io_uring: fix mshot io-wq checks
    
    When checking for concurrent CQE posting, we're not only interested in
    requests running from the poll handler but also strayed requests ended
    up in normal io-wq execution. We're disallowing multishots in general
    from io-wq, not only when they came in a certain way.
    
    Cc: stable@vger.kernel.org
    Fixes: 17add5cea2bba ("io_uring: force multishot CQEs into task context")
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/d8c5b36a39258036f93301cd60d3cd295e40653d.1709905727.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-07-02 14:33:39 -04:00
Jeff Moyer aa20967022 io_uring/net: fix sendzc lazy wake polling
JIRA: https://issues.redhat.com/browse/RHEL-27755

commit ef42b85a5609cd822ca0a68dd2bef2b12b5d1ca3
Author: Pavel Begunkov <asml.silence@gmail.com>
Date:   Tue Apr 30 16:42:30 2024 +0100

    io_uring/net: fix sendzc lazy wake polling
    
    SEND[MSG]_ZC produces multiple CQEs via notifications, LAZY_WAKE doesn't
    handle it and so disable LAZY_WAKE for sendzc polling. It should be
    fine, sends are not likely to be polled in the first place.
    
    Fixes: 6ce4a93dbb5b ("io_uring/poll: use IOU_F_TWQ_LAZY_WAKE for wakeups")
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/5b360fb352d91e3aec751d75c87dfb4753a084ee.1714488419.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-07-02 14:33:39 -04:00
Jeff Moyer 8ff4bf8979 io_uring/net: remove dependency on REQ_F_PARTIAL_IO for sr->done_io
JIRA: https://issues.redhat.com/browse/RHEL-27755

commit 9817ad85899fb695f875610fb743cb18cf087582
Author: Jens Axboe <axboe@kernel.dk>
Date:   Thu Mar 7 12:43:22 2024 -0700

    io_uring/net: remove dependency on REQ_F_PARTIAL_IO for sr->done_io
    
    Ensure that prep handlers always initialize sr->done_io before any
    potential failure conditions, and with that, we now it's always been
    set even for the failure case.
    
    With that, we don't need to use the REQ_F_PARTIAL_IO flag to gate on that.
    Additionally, we should not overwrite req->cqe.res unless sr->done_io is
    actually positive.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-07-02 14:33:38 -04:00
Jeff Moyer 9b5e69fcdd io_uring/net: fix multishot accept overflow handling
JIRA: https://issues.redhat.com/browse/RHEL-27755

commit a37ee9e117ef73bbc2f5c0b31911afd52d229861
Author: Jens Axboe <axboe@kernel.dk>
Date:   Wed Feb 14 08:23:05 2024 -0700

    io_uring/net: fix multishot accept overflow handling
    
    If we hit CQ ring overflow when attempting to post a multishot accept
    completion, we don't properly save the result or return code. This
    results in losing the accepted fd value.
    
    Instead, we return the result from the poll operation that triggered
    the accept retry. This is generally POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND
    which is 0xc3, or 195, which looks like a valid file descriptor, but it
    really has no connection to that.
    
    Handle this like we do for other multishot completions - assign the
    result, and return IOU_STOP_MULTISHOT to cancel any further completions
    from this request when overflow is hit. This preserves the result, as we
    should, and tells the application that the request needs to be re-armed.
    
    Cc: stable@vger.kernel.org
    Fixes: 515e26961295 ("io_uring: revert "io_uring fix multishot accept ordering"")
    Link: https://github.com/axboe/liburing/issues/1062
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-07-02 14:33:38 -04:00