Commit Graph

52 Commits

Author SHA1 Message Date
Jeff Moyer 5663e99772 io_uring/kbuf: return correct iovec count from classic buffer peek
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit f274495aea7b15225b3d83837121b22ef96e560c
Author: Jens Axboe <axboe@kernel.dk>
Date:   Fri Aug 30 10:45:54 2024 -0600

    io_uring/kbuf: return correct iovec count from classic buffer peek
    
    io_provided_buffers_select() returns 0 to indicate success, but it should
    be returning 1 to indicate that 1 vec was mapped. This causes peeking
    to fail with classic provided buffers, and while that's not a use case
    that anyone should use, it should still work correctly.
    
    The end result is that no buffer will be selected, and hence a completion
    with '0' as the result will be posted, without a buffer attached.
    
    Fixes: 35c8711c8fc4 ("io_uring/kbuf: add helpers for getting/peeking multiple buffers")
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-12-02 11:14:53 -05:00
Jeff Moyer 64604cf7d5 io_uring/kbuf: sanitize peek buffer setup
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit e0ee967630c8ee67bb47a5b38d235cd5a8789c48
Author: Jens Axboe <axboe@kernel.dk>
Date:   Tue Aug 20 18:31:58 2024 -0600

    io_uring/kbuf: sanitize peek buffer setup
    
    Harden the buffer peeking a bit, by adding a sanity check for it having
    a valid size. Outside of that, arg->max_len is a size_t, though it's
    only ever set to a 32-bit value (as it's governed by MAX_RW_COUNT).
    Bump our needed check to a size_t so we know it fits. Finally, cap the
    calculated needed iov value to the PEEK_MAX_IMPORT, which is the
    maximum number of segments that should be peeked.
    
    Fixes: 35c8711c8fc4 ("io_uring/kbuf: add helpers for getting/peeking multiple buffers")
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-12-02 11:14:53 -05:00
Jeff Moyer 95babcb3db io_uring: fix error pbuf checking
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit bcc87d978b834c298bbdd9c52454c5d0a946e97e
Author: Pavel Begunkov <asml.silence@gmail.com>
Date:   Thu Jul 18 20:00:53 2024 +0100

    io_uring: fix error pbuf checking
    
    Syz reports a problem, which boils down to NULL vs IS_ERR inconsistent
    error handling in io_alloc_pbuf_ring().
    
    KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
    RIP: 0010:__io_remove_buffers+0xac/0x700 io_uring/kbuf.c:341
    Call Trace:
     <TASK>
     io_put_bl io_uring/kbuf.c:378 [inline]
     io_destroy_buffers+0x14e/0x490 io_uring/kbuf.c:392
     io_ring_ctx_free+0xa00/0x1070 io_uring/io_uring.c:2613
     io_ring_exit_work+0x80f/0x8a0 io_uring/io_uring.c:2844
     process_one_work kernel/workqueue.c:3231 [inline]
     process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3312
     worker_thread+0x86d/0xd40 kernel/workqueue.c:3390
     kthread+0x2f0/0x390 kernel/kthread.c:389
     ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
     ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
    
    Cc: stable@vger.kernel.org
    Reported-by: syzbot+2074b1a3d447915c6f1c@syzkaller.appspotmail.com
    Fixes: 87585b05757dc ("io_uring/kbuf: use vm_insert_pages() for mmap'ed pbuf ring")
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/c5f9df20560bd9830401e8e48abc029e7cfd9f5e.1721329239.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-12-02 11:14:52 -05:00
Jeff Moyer 471326c22a io_uring/kbuf: add helpers for getting/peeking multiple buffers
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit 35c8711c8fc4c16ad2749b8314da5829a493e28e
Author: Jens Axboe <axboe@kernel.dk>
Date:   Tue Mar 5 07:31:52 2024 -0700

    io_uring/kbuf: add helpers for getting/peeking multiple buffers
    
    Our provided buffer interface only allows selection of a single buffer.
    Add an API that allows getting/peeking multiple buffers at the same time.
    
    This is only implemented for the ring provided buffers. It could be added
    for the legacy provided buffers as well, but since it's strongly
    encouraged to use the new interface, let's keep it simpler and just
    provide it for the new API. The legacy interface will always just select
    a single buffer.
    
    There are two new main functions:
    
    io_buffers_select(), which selects up as many buffers as it can. The
    caller supplies the iovec array, and io_buffers_select() may allocate a
    bigger array if the 'out_len' being passed in is non-zero and bigger
    than what fits in the provided iovec. Buffers grabbed with this helper
    are permanently assigned.
    
    io_buffers_peek(), which works like io_buffers_select(), except they can
    be recycled, if needed. Callers using either of these functions should
    call io_put_kbufs() rather than io_put_kbuf() at completion time. The
    peek interface must be called with the ctx locked from peek to
    completion.
    
    This add a bit state for the request:
    
    - REQ_F_BUFFERS_COMMIT, which means that the the buffers have been
      peeked and should be committed to the buffer ring head when they are
      put as part of completion. Prior to this, req->buf_list was cleared to
      NULL when committed.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 17:35:44 -05:00
Jeff Moyer be6cb2cb5b io_uring/kbuf: remove dead define
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit 285207f67c9bcad1d9168993f175d6d88ce310f1
Author: Jens Axboe <axboe@kernel.dk>
Date:   Fri Mar 29 17:22:27 2024 -0600

    io_uring/kbuf: remove dead define
    
    We no longer use IO_BUFFER_LIST_BUF_PER_PAGE, kill it.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 17:11:44 -05:00
Jeff Moyer d45024afd9 io_uring: move mapping/allocation helpers to a separate file
JIRA: https://issues.redhat.com/browse/RHEL-64867
Conflicts: RHEL does not have commit 5e0a760b4441 ("mm, treewide:
rename MAX_ORDER to MAX_PAGE_ORDER").

commit f15ed8b4d0ce2c0831232ff85117418740f0c529
Author: Jens Axboe <axboe@kernel.dk>
Date:   Wed Mar 27 14:59:09 2024 -0600

    io_uring: move mapping/allocation helpers to a separate file
    
    Move the related code from io_uring.c into memmap.c. No functional
    changes in this patch, just cleaning it up a bit now that the full
    transition is done.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 17:09:44 -05:00
Jeff Moyer 5c0593218e io_uring: use unpin_user_pages() where appropriate
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit 18595c0a58ae29ac6a996c5b664610119b73182d
Author: Jens Axboe <axboe@kernel.dk>
Date:   Wed Mar 13 15:01:03 2024 -0600

    io_uring: use unpin_user_pages() where appropriate
    
    There are a few cases of open-rolled loops around unpin_user_page(), use
    the generic helper instead.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 17:08:44 -05:00
Jeff Moyer f5d5b2e624 io_uring/kbuf: use vm_insert_pages() for mmap'ed pbuf ring
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit 87585b05757dc70545efb434669708d276125559
Author: Jens Axboe <axboe@kernel.dk>
Date:   Tue Mar 12 20:24:21 2024 -0600

    io_uring/kbuf: use vm_insert_pages() for mmap'ed pbuf ring
    
    Rather than use remap_pfn_range() for this and manually free later,
    switch to using vm_insert_page() and have it Just Work.
    
    This requires a bit of effort on the mmap lookup side, as the ctx
    uring_lock isn't held, which  otherwise protects buffer_lists from being
    torn down, and it's not safe to grab from mmap context that would
    introduce an ABBA deadlock between the mmap lock and the ctx uring_lock.
    Instead, lookup the buffer_list under RCU, as the the list is RCU freed
    already. Use the existing reference count to determine whether it's
    possible to safely grab a reference to it (eg if it's not zero already),
    and drop that reference when done with the mapping. If the mmap
    reference is the last one, the buffer_list and the associated memory can
    go away, since the vma insertion has references to the inserted pages at
    that point.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 17:07:44 -05:00
Jeff Moyer 0ea0781425 io_uring/kbuf: vmap pinned buffer ring
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit e270bfd22a2a10d1cfbaddf23e79b6d0b405d21e
Author: Jens Axboe <axboe@kernel.dk>
Date:   Tue Mar 12 10:42:27 2024 -0600

    io_uring/kbuf: vmap pinned buffer ring
    
    This avoids needing to care about HIGHMEM, and it makes the buffer
    indexing easier as both ring provided buffer methods are now virtually
    mapped in a contigious fashion.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 17:06:44 -05:00
Jeff Moyer 1a1299241b io_uring/kbuf: rename is_mapped
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit 9219e4a9d4ad57323837f7c3562964e61840b17a
Author: Pavel Begunkov <asml.silence@gmail.com>
Date:   Wed Mar 13 15:52:41 2024 +0000

    io_uring/kbuf: rename is_mapped
    
    In buffer lists we have ->is_mapped as well as ->is_mmap, it's
    pretty hard to stay sane double checking which one means what,
    and in the long run there is a high chance of an eventual bug.
    Rename ->is_mapped into ->is_buf_ring.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/c4838f4d8ad506ad6373f1c305aee2d2c1a89786.1710343154.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 16:00:44 -05:00
Jeff Moyer 542681cc14 io_uring/kbuf: rename REQ_F_PARTIAL_IO to REQ_F_BL_NO_RECYCLE
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit 186daf2385295acf19ecf48f4d5214cc2d925933
Author: Jens Axboe <axboe@kernel.dk>
Date:   Thu Mar 7 12:53:24 2024 -0700

    io_uring/kbuf: rename REQ_F_PARTIAL_IO to REQ_F_BL_NO_RECYCLE
    
    We only use the flag for this purpose, so rename it accordingly. This
    further prevents various other use cases of it, keeping it clean and
    consistent. Then we can also check it in one spot, when it's being
    attempted recycled, and remove some dead code in io_kbuf_recycle_ring().
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 15:53:44 -05:00
Jeff Moyer b08260df8c io_uring/kbuf: flag request if buffer pool is empty after buffer pick
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit c3f9109dbc9e2cd0b2c3ba0536431eef282783e9
Author: Jens Axboe <axboe@kernel.dk>
Date:   Mon Feb 19 21:38:59 2024 -0700

    io_uring/kbuf: flag request if buffer pool is empty after buffer pick
    
    Normally we do an extra roundtrip for retries even if the buffer pool has
    depleted, as we don't check that upfront. Rather than add this check, have
    the buffer selection methods mark the request with REQ_F_BL_EMPTY if the
    used buffer group is out of buffers after this selection. This is very
    cheap to do once we're all the way inside there anyway, and it gives the
    caller a chance to make better decisions on how to proceed.
    
    For example, recv/recvmsg multishot could check this flag when it
    decides whether to keep receiving or not.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 15:50:44 -05:00
Jeff Moyer 5f7cb23a14 io_uring/kbuf: cleanup passing back cflags
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit 8435c6f380d622639d8acbc0af585d941396fa57
Author: Jens Axboe <axboe@kernel.dk>
Date:   Mon Jan 29 20:59:18 2024 -0700

    io_uring/kbuf: cleanup passing back cflags
    
    We have various functions calculating the CQE cflags we need to pass
    back, but it's all the same everywhere. Make a number of the putting
    functions void, and just have the two main helps for this, io_put_kbuf()
    and io_put_kbuf_comp() calculate the actual mask and pass it back.
    
    While at it, cleanup how we put REQ_F_BUFFER_RING buffers. Before
    this change, we would call into __io_put_kbuf() only to go right back
    in to the header defined functions. As clearing this type of buffer
    is just re-assigning the buf_index and incrementing the head, this
    is very wasteful.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 15:24:44 -05:00
Jeff Moyer b4549713dd io_uring/kbuf: hold io_buffer_list reference over mmap
JIRA: https://issues.redhat.com/browse/RHEL-27755
JIRA: https://issues.redhat.com/browse/RHEL-37250
CVE: CVE-2024-35880

commit 561e4f9451d65fc2f7eef564e0064373e3019793
Author: Jens Axboe <axboe@kernel.dk>
Date:   Tue Apr 2 16:16:03 2024 -0600

    io_uring/kbuf: hold io_buffer_list reference over mmap
    
    If we look up the kbuf, ensure that it doesn't get unregistered until
    after we're done with it. Since we're inside mmap, we cannot safely use
    the io_uring lock. Rely on the fact that we can lookup the buffer list
    under RCU now and grab a reference to it, preventing it from being
    unregistered until we're done with it. The lookup returns the
    io_buffer_list directly with it referenced.
    
    Cc: stable@vger.kernel.org # v6.4+
    Fixes: 5cf4f52e6d8a ("io_uring: free io_buffer_list entries via RCU")
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-07-02 14:33:40 -04:00
Jeff Moyer 1a61278d76 io_uring/kbuf: protect io_buffer_list teardown with a reference
JIRA: https://issues.redhat.com/browse/RHEL-27755

commit 6b69c4ab4f685327d9e10caf0d84217ba23a8c4b
Author: Jens Axboe <axboe@kernel.dk>
Date:   Fri Mar 15 16:12:51 2024 -0600

    io_uring/kbuf: protect io_buffer_list teardown with a reference
    
    No functional changes in this patch, just in preparation for being able
    to keep the buffer list alive outside of the ctx->uring_lock.
    
    Cc: stable@vger.kernel.org # v6.4+
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-07-02 14:33:39 -04:00
Jeff Moyer 8344e815eb io_uring/kbuf: get rid of bl->is_ready
JIRA: https://issues.redhat.com/browse/RHEL-27755

commit 3b80cff5a4d117c53d38ce805823084eaeffbde6
Author: Jens Axboe <axboe@kernel.dk>
Date:   Thu Mar 14 10:46:40 2024 -0600

    io_uring/kbuf: get rid of bl->is_ready
    
    Now that xarray is being exclusively used for the buffer_list lookup,
    this check is no longer needed. Get rid of it and the is_ready member.
    
    Cc: stable@vger.kernel.org # v6.4+
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-07-02 14:33:39 -04:00
Jeff Moyer ed8c655b9b io_uring/kbuf: get rid of lower BGID lists
JIRA: https://issues.redhat.com/browse/RHEL-27755

commit 09ab7eff38202159271534d2f5ad45526168f2a5
Author: Jens Axboe <axboe@kernel.dk>
Date:   Thu Mar 14 10:45:07 2024 -0600

    io_uring/kbuf: get rid of lower BGID lists
    
    Just rely on the xarray for any kind of bgid. This simplifies things, and
    it really doesn't bring us much, if anything.
    
    Cc: stable@vger.kernel.org # v6.4+
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-07-02 14:33:39 -04:00
Jeff Moyer fddfa72c02 io_uring: add io_file_can_poll() helper
JIRA: https://issues.redhat.com/browse/RHEL-27755
Conflicts: Context differences as we don't have commit 521223d7c229
  ("io_uring/cancel: don't default to setting req->work.cancel_seq").

commit 95041b93e90a06bb613ec4bef9cd4d61570f68e4
Author: Jens Axboe <axboe@kernel.dk>
Date:   Sun Jan 28 20:08:24 2024 -0700

    io_uring: add io_file_can_poll() helper
    
    This adds a flag to avoid dipping dereferencing file and then f_op to
    figure out if the file has a poll handler defined or not. We generally
    call this at least twice for networked workloads, and if using ring
    provided buffers, we do it on every buffer selection. Particularly the
    latter is troublesome, as it's otherwise a very fast operation.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-07-02 14:33:38 -04:00
Jeff Moyer 05e57d750f io_uring/kbuf: add method for returning provided buffer ring head
JIRA: https://issues.redhat.com/browse/RHEL-27755

commit d293b1a89694fc4918d9a4330a71ba2458f9d581
Author: Jens Axboe <axboe@kernel.dk>
Date:   Thu Dec 21 09:02:57 2023 -0700

    io_uring/kbuf: add method for returning provided buffer ring head
    
    The tail of the provided ring buffer is shared between the kernel and
    the application, but the head is private to the kernel as the
    application doesn't need to see it. However, this also prevents the
    application from knowing how many buffers the kernel has consumed.
    Usually this is fine, as the information is inherently racy in that
    the kernel could be consuming buffers continually, but for cleanup
    purposes it may be relevant to know how many buffers are still left
    in the ring.
    
    Add IORING_REGISTER_PBUF_STATUS which will return status for a given
    provided buffer ring. Right now it just returns the head, but space
    is reserved for more information later in, if needed.
    
    Link: https://github.com/axboe/liburing/discussions/1020
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-07-02 14:33:36 -04:00
Jeff Moyer 7b3b4276a9 io_uring: indicate if io_kbuf_recycle did recycle anything
JIRA: https://issues.redhat.com/browse/RHEL-27755

commit 89d528ba2f8281de61163c6b62e598b64d832175
Author: Dylan Yudaken <dyudaken@gmail.com>
Date:   Mon Nov 6 20:39:07 2023 +0000

    io_uring: indicate if io_kbuf_recycle did recycle anything
    
    It can be useful to know if io_kbuf_recycle did actually recycle the
    buffer on the request, or if it left the request alone.
    
    Signed-off-by: Dylan Yudaken <dyudaken@gmail.com>
    Link: https://lore.kernel.org/r/20231106203909.197089-2-dyudaken@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-07-02 10:08:34 -04:00
Jeff Moyer 1101708ff8 io_uring/kbuf: Fix an NULL vs IS_ERR() bug in io_alloc_pbuf_ring()
JIRA: https://issues.redhat.com/browse/RHEL-21391
JIRA: https://issues.redhat.com/browse/RHEL-19169
CVE: CVE-2024-0582

commit e53f7b54b1fdecae897f25002ff0cff04faab228
Author: Dan Carpenter <dan.carpenter@linaro.org>
Date:   Tue Dec 5 15:37:17 2023 +0300

    io_uring/kbuf: Fix an NULL vs IS_ERR() bug in io_alloc_pbuf_ring()
    
    The io_mem_alloc() function returns error pointers, not NULL.  Update
    the check accordingly.
    
    Fixes: b10b73c102a2 ("io_uring/kbuf: recycle freed mapped buffer ring entries")
    Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
    Link: https://lore.kernel.org/r/5ed268d3-a997-4f64-bd71-47faa92101ab@moroto.mountain
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-02-05 16:34:25 -05:00
Jeff Moyer 94a5e0442f io_uring/kbuf: recycle freed mapped buffer ring entries
JIRA: https://issues.redhat.com/browse/RHEL-21391
JIRA: https://issues.redhat.com/browse/RHEL-19169
CVE: CVE-2024-0582

commit b10b73c102a2eab91e1cd62a03d6446f1dfecc64
Author: Jens Axboe <axboe@kernel.dk>
Date:   Tue Nov 28 11:17:25 2023 -0700

    io_uring/kbuf: recycle freed mapped buffer ring entries
    
    Right now we stash any potentially mmap'ed provided ring buffer range
    for freeing at release time, regardless of when they get unregistered.
    Since we're keeping track of these ranges anyway, keep track of their
    registration state as well, and use that to recycle ranges when
    appropriate rather than always allocate new ones.
    
    The lookup is a basic scan of entries, checking for the best matching
    free entry.
    
    Fixes: c392cbecd8ec ("io_uring/kbuf: defer release of mapped buffer rings")
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-02-05 16:34:25 -05:00
Jeff Moyer d920a9c912 io_uring/kbuf: check for buffer list readiness after NULL check
JIRA: https://issues.redhat.com/browse/RHEL-21391
JIRA: https://issues.redhat.com/browse/RHEL-19169
CVE: CVE-2024-0582

commit 9865346b7e8374b57f1c3ccacdc77846c6352ff4
Author: Jens Axboe <axboe@kernel.dk>
Date:   Tue Dec 5 07:02:13 2023 -0700

    io_uring/kbuf: check for buffer list readiness after NULL check
    
    Move the buffer list 'is_ready' check below the validity check for
    the buffer list for a given group.
    
    Fixes: 5cf4f52e6d8a ("io_uring: free io_buffer_list entries via RCU")
    Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-02-05 16:34:24 -05:00
Jeff Moyer 7a768a78b8 io_uring: free io_buffer_list entries via RCU
JIRA: https://issues.redhat.com/browse/RHEL-21391
JIRA: https://issues.redhat.com/browse/RHEL-19169
CVE: CVE-2024-0582

commit 5cf4f52e6d8aa2d3b7728f568abbf9d42a3af252
Author: Jens Axboe <axboe@kernel.dk>
Date:   Mon Nov 27 17:54:40 2023 -0700

    io_uring: free io_buffer_list entries via RCU
    
    mmap_lock nests under uring_lock out of necessity, as we may be doing
    user copies with uring_lock held. However, for mmap of provided buffer
    rings, we attempt to grab uring_lock with mmap_lock already held from
    do_mmap(). This makes lockdep, rightfully, complain:
    
    WARNING: possible circular locking dependency detected
    6.7.0-rc1-00009-gff3337ebaf94-dirty #4438 Not tainted
    ------------------------------------------------------
    buf-ring.t/442 is trying to acquire lock:
    ffff00020e1480a8 (&ctx->uring_lock){+.+.}-{3:3}, at: io_uring_validate_mmap_request.isra.0+0x4c/0x140
    
    but task is already holding lock:
    ffff0000dc226190 (&mm->mmap_lock){++++}-{3:3}, at: vm_mmap_pgoff+0x124/0x264
    
    which lock already depends on the new lock.
    
    the existing dependency chain (in reverse order) is:
    
    -> #1 (&mm->mmap_lock){++++}-{3:3}:
           __might_fault+0x90/0xbc
           io_register_pbuf_ring+0x94/0x488
           __arm64_sys_io_uring_register+0x8dc/0x1318
           invoke_syscall+0x5c/0x17c
           el0_svc_common.constprop.0+0x108/0x130
           do_el0_svc+0x2c/0x38
           el0_svc+0x4c/0x94
           el0t_64_sync_handler+0x118/0x124
           el0t_64_sync+0x168/0x16c
    
    -> #0 (&ctx->uring_lock){+.+.}-{3:3}:
           __lock_acquire+0x19a0/0x2d14
           lock_acquire+0x2e0/0x44c
           __mutex_lock+0x118/0x564
           mutex_lock_nested+0x20/0x28
           io_uring_validate_mmap_request.isra.0+0x4c/0x140
           io_uring_mmu_get_unmapped_area+0x3c/0x98
           get_unmapped_area+0xa4/0x158
           do_mmap+0xec/0x5b4
           vm_mmap_pgoff+0x158/0x264
           ksys_mmap_pgoff+0x1d4/0x254
           __arm64_sys_mmap+0x80/0x9c
           invoke_syscall+0x5c/0x17c
           el0_svc_common.constprop.0+0x108/0x130
           do_el0_svc+0x2c/0x38
           el0_svc+0x4c/0x94
           el0t_64_sync_handler+0x118/0x124
           el0t_64_sync+0x168/0x16c
    
    From that mmap(2) path, we really just need to ensure that the buffer
    list doesn't go away from underneath us. For the lower indexed entries,
    they never go away until the ring is freed and we can always sanely
    reference those as long as the caller has a file reference. For the
    higher indexed ones in our xarray, we just need to ensure that the
    buffer list remains valid while we return the address of it.
    
    Free the higher indexed io_buffer_list entries via RCU. With that we can
    avoid needing ->uring_lock inside mmap(2), and simply hold the RCU read
    lock around the buffer list lookup and address check.
    
    To ensure that the arrayed lookup either returns a valid fully formulated
    entry via RCU lookup, add an 'is_ready' flag that we access with store
    and release memory ordering. This isn't needed for the xarray lookups,
    but doesn't hurt either. Since this isn't a fast path, retain it across
    both types. Similarly, for the allocated array inside the ctx, ensure
    we use the proper load/acquire as setup could in theory be running in
    parallel with mmap.
    
    While in there, add a few lockdep checks for documentation purposes.
    
    Cc: stable@vger.kernel.org
    Fixes: c56e022c0a27 ("io_uring: add support for user mapped provided buffer ring")
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-02-05 16:34:24 -05:00
Jeff Moyer 6e94a18d35 io_uring/kbuf: defer release of mapped buffer rings
JIRA: https://issues.redhat.com/browse/RHEL-21391
JIRA: https://issues.redhat.com/browse/RHEL-19169
CVE: CVE-2024-0582

commit c392cbecd8eca4c53f2bf508731257d9d0a21c2d
Author: Jens Axboe <axboe@kernel.dk>
Date:   Mon Nov 27 16:47:04 2023 -0700

    io_uring/kbuf: defer release of mapped buffer rings
    
    If a provided buffer ring is setup with IOU_PBUF_RING_MMAP, then the
    kernel allocates the memory for it and the application is expected to
    mmap(2) this memory. However, io_uring uses remap_pfn_range() for this
    operation, so we cannot rely on normal munmap/release on freeing them
    for us.
    
    Stash an io_buf_free entry away for each of these, if any, and provide
    a helper to free them post ->release().
    
    Cc: stable@vger.kernel.org
    Fixes: c56e022c0a27 ("io_uring: add support for user mapped provided buffer ring")
    Reported-by: Jann Horn <jannh@google.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-02-05 16:33:39 -05:00
Jeff Moyer 66a462cdd4 io_uring/kbuf: prune deferred locked cache when tearing down
JIRA: https://issues.redhat.com/browse/RHEL-21391

commit 07d6063d3d3beb3168d3ac9fdef7bca81254d983
Author: Jens Axboe <axboe@kernel.dk>
Date:   Mon Nov 27 17:02:48 2023 -0700

    io_uring/kbuf: prune deferred locked cache when tearing down
    
    We used to just use our page list for final teardown, which would ensure
    that we got all the buffers, even the ones that were not on the normal
    cached list. But while moving to slab for the io_buffers, we know only
    prune this list, not the deferred locked list that we have. This can
    cause a leak of memory, if the workload ends up using the intermediate
    locked list.
    
    Fix this by always pruning both lists when tearing down.
    
    Fixes: b3a4dbc89d40 ("io_uring/kbuf: Use slab for struct io_buffer objects")
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-02-01 12:02:14 -05:00
Jeff Moyer d71b8efd78 io_uring/kbuf: Use slab for struct io_buffer objects
JIRA: https://issues.redhat.com/browse/RHEL-21391

commit b3a4dbc89d4021b3f90ff6a13537111a004f9d07
Author: Gabriel Krisman Bertazi <krisman@suse.de>
Date:   Wed Oct 4 20:05:31 2023 -0400

    io_uring/kbuf: Use slab for struct io_buffer objects
    
    The allocation of struct io_buffer for metadata of provided buffers is
    done through a custom allocator that directly gets pages and
    fragments them.  But, slab would do just fine, as this is not a hot path
    (in fact, it is a deprecated feature) and, by keeping a custom allocator
    implementation we lose benefits like tracking, poisoning,
    sanitizers. Finally, the custom code is more complex and requires
    keeping the list of pages in struct ctx for no good reason.  This patch
    cleans this path up and just uses slab.
    
    I microbenchmarked it by forcing the allocation of a large number of
    objects with the least number of io_uring commands possible (keeping
    nbufs=USHRT_MAX), with and without the patch.  There is a slight
    increase in time spent in the allocation with slab, of course, but even
    when allocating to system resources exhaustion, which is not very
    realistic and happened around 1/2 billion provided buffers for me, it
    wasn't a significant hit in system time.  Specially if we think of a
    real-world scenario, an application doing register/unregister of
    provided buffers will hit ctx->io_buffers_cache more often than actually
    going to slab.
    
    Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
    Link: https://lore.kernel.org/r/20231005000531.30800-4-krisman@suse.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-02-01 12:01:14 -05:00
Jeff Moyer 8b51365edd io_uring/kbuf: Allow the full buffer id space for provided buffers
JIRA: https://issues.redhat.com/browse/RHEL-21391

commit f74c746e476b9dad51448b9a9421aae72b60e25f
Author: Gabriel Krisman Bertazi <krisman@suse.de>
Date:   Wed Oct 4 20:05:30 2023 -0400

    io_uring/kbuf: Allow the full buffer id space for provided buffers
    
    nbufs tracks the number of buffers and not the last bgid. In 16-bit, we
    have 2^16 valid buffers, but the check mistakenly rejects the last
    bid. Let's fix it to make the interface consistent with the
    documentation.
    
    Fixes: ddf0322db7 ("io_uring: add IORING_OP_PROVIDE_BUFFERS")
    Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
    Link: https://lore.kernel.org/r/20231005000531.30800-3-krisman@suse.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-02-01 12:00:14 -05:00
Jeff Moyer a212c3be64 io_uring/kbuf: Fix check of BID wrapping in provided buffers
JIRA: https://issues.redhat.com/browse/RHEL-21391

commit ab69838e7c75b0edb699c1a8f42752b30333c46f
Author: Gabriel Krisman Bertazi <krisman@suse.de>
Date:   Wed Oct 4 20:05:29 2023 -0400

    io_uring/kbuf: Fix check of BID wrapping in provided buffers
    
    Commit 3851d25c75ed0 ("io_uring: check for rollover of buffer ID when
    providing buffers") introduced a check to prevent wrapping the BID
    counter when sqe->off is provided, but it's off-by-one too
    restrictive, rejecting the last possible BID (65534).
    
    i.e., the following fails with -EINVAL.
    
         io_uring_prep_provide_buffers(sqe, addr, size, 0xFFFF, 0, 0);
    
    Fixes: 3851d25c75ed ("io_uring: check for rollover of buffer ID when providing buffers")
    Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
    Link: https://lore.kernel.org/r/20231005000531.30800-2-krisman@suse.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-02-01 11:59:14 -05:00
Jeff Moyer e1171b191d io_uring/kbuf: don't allow registered buffer rings on highmem pages
JIRA: https://issues.redhat.com/browse/RHEL-12076

commit f8024f1f36a30a082b0457d5779c8847cea57f57
Author: Jens Axboe <axboe@kernel.dk>
Date:   Mon Oct 2 18:14:08 2023 -0600

    io_uring/kbuf: don't allow registered buffer rings on highmem pages
    
    syzbot reports that registering a mapped buffer ring on arm32 can
    trigger an OOPS. Registered buffer rings have two modes, one of them
    is the application passing in the memory that the buffer ring should
    reside in. Once those pages are mapped, we use page_address() to get
    a virtual address. This will obviously fail on highmem pages, which
    aren't mapped.
    
    Add a check if we have any highmem pages after mapping, and fail the
    attempt to register a provided buffer ring if we do. This will return
    the same error as kernels that don't support provided buffer rings to
    begin with.
    
    Link: https://lore.kernel.org/io-uring/000000000000af635c0606bcb889@google.com/
    Fixes: c56e022c0a27 ("io_uring: add support for user mapped provided buffer ring")
    Cc: stable@vger.kernel.org
    Reported-by: syzbot+2113e61b8848fa7951d8@syzkaller.appspotmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-11-02 17:26:28 -04:00
Jeff Moyer 993071873a io_uring: stop calling free_compound_page()
JIRA: https://issues.redhat.com/browse/RHEL-12076

commit 99a9e0b83ab9955e604397717b82267feb021df3
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Wed Aug 16 16:11:49 2023 +0100

    io_uring: stop calling free_compound_page()
    
    Patch series "Remove _folio_dtor and _folio_order", v2.
    
    
    This patch (of 13):
    
    folio_put() is the standard way to write this, and it's not appreciably
    slower.  This is an enabling patch for removing free_compound_page()
    entirely.
    
    Link: https://lkml.kernel.org/r/20230816151201.3655946-1-willy@infradead.org
    Link: https://lkml.kernel.org/r/20230816151201.3655946-2-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Jens Axboe <axboe@kernel.dk>
    Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Cc: Yanteng Si <siyanteng@loongson.cn>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-11-02 17:26:26 -04:00
Jeff Moyer b8537362f0 io_uring/kbuf: remove extra ->buf_ring null check
JIRA: https://issues.redhat.com/browse/RHEL-12076

commit ceac766a5581e4e671ec8e5236b8fdaed8e4c8c9
Author: Pavel Begunkov <asml.silence@gmail.com>
Date:   Tue Apr 11 12:06:02 2023 +0100

    io_uring/kbuf: remove extra ->buf_ring null check
    
    The kernel test robot complains about __io_remove_buffers().
    
    io_uring/kbuf.c:221 __io_remove_buffers() warn: variable dereferenced
    before check 'bl->buf_ring' (see line 219)
    
    That check is not needed as ->buf_ring will always be set, so we can
    remove it and so silence the warning.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/9a632bbf749d9d911e605255652ce08d18e7d2c6.1681210788.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-11-02 15:31:32 -04:00
Jeff Moyer 3c040e8ae5 io_uring/kbuf: disallow mapping a badly aligned provided ring buffer
JIRA: https://issues.redhat.com/browse/RHEL-12076

commit fcb46c0ccc7c07af54f818fd498e461353ea50e7
Author: Jens Axboe <axboe@kernel.dk>
Date:   Fri Mar 17 10:42:08 2023 -0600

    io_uring/kbuf: disallow mapping a badly aligned provided ring buffer
    
    On at least parisc, we have strict requirements on how we virtually map
    an address that is shared between the application and the kernel. On
    these platforms, IOU_PBUF_RING_MMAP should be used when setting up a
    shared ring buffer for provided buffers. If the application is mapping
    these pages and asking the kernel to pin+map them as well, then we have
    no control over what virtual address we get in the kernel.
    
    For that case, do a sanity check if SHM_COLOUR is defined, and disallow
    the mapping request. The application must fall back to using
    IOU_PBUF_RING_MMAP for this case, and liburing will do that transparently
    with the set of helpers that it has.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-11-02 15:31:28 -04:00
Jeff Moyer a454ee1f2f io_uring: add support for user mapped provided buffer ring
JIRA: https://issues.redhat.com/browse/RHEL-12076

commit c56e022c0a27142b7b59ae6bdf45f86bf4b298a1
Author: Jens Axboe <axboe@kernel.dk>
Date:   Tue Mar 14 11:07:19 2023 -0600

    io_uring: add support for user mapped provided buffer ring
    
    The ring mapped provided buffer rings rely on the application allocating
    the memory for the ring, and then the kernel will map it. This generally
    works fine, but runs into issues on some architectures where we need
    to be able to ensure that the kernel and application virtual address for
    the ring play nicely together. This at least impacts architectures that
    set SHM_COLOUR, but potentially also anyone setting SHMLBA.
    
    To use this variant of ring provided buffers, the application need not
    allocate any memory for the ring. Instead the kernel will do so, and
    the allocation must subsequently call mmap(2) on the ring with the
    offset set to:
    
            IORING_OFF_PBUF_RING | (bgid << IORING_OFF_PBUF_SHIFT)
    
    to get a virtual address for the buffer ring. Normally the application
    would allocate a suitable piece of memory (and correctly aligned) and
    simply pass that in via io_uring_buf_reg.ring_addr and the kernel would
    map it.
    
    Outside of the setup differences, the kernel allocate + user mapped
    provided buffer ring works exactly the same.
    
    Acked-by: Helge Deller <deller@gmx.de>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-11-02 15:31:28 -04:00
Jeff Moyer bd566e11e9 io_uring/kbuf: rename struct io_uring_buf_reg 'pad' to'flags'
JIRA: https://issues.redhat.com/browse/RHEL-12076

commit 81cf17cd3ab3e5441e876a8e9e9c38ae9920cecb
Author: Jens Axboe <axboe@kernel.dk>
Date:   Tue Mar 14 11:01:45 2023 -0600

    io_uring/kbuf: rename struct io_uring_buf_reg 'pad' to'flags'
    
    In preparation for allowing flags to be set for registration, rename
    the padding and use it for that.
    
    Acked-by: Helge Deller <deller@gmx.de>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-11-02 15:31:28 -04:00
Jeff Moyer b03ab8a7fa io_uring/kbuf: add buffer_list->is_mapped member
JIRA: https://issues.redhat.com/browse/RHEL-12076

commit 25a2c188a0a00b3d9f2057798aa86fe6b04377bf
Author: Jens Axboe <axboe@kernel.dk>
Date:   Tue Mar 14 10:59:46 2023 -0600

    io_uring/kbuf: add buffer_list->is_mapped member
    
    Rather than rely on checking buffer_list->buf_pages or ->buf_nr_pages,
    add a separate member that tracks if this is a ring mapped provided
    buffer list or not.
    
    Acked-by: Helge Deller <deller@gmx.de>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-11-02 15:31:28 -04:00
Jeff Moyer 18656a7b07 io_uring/kbuf: move pinning of provided buffer ring into helper
JIRA: https://issues.redhat.com/browse/RHEL-12076

commit ba56b63242d12df088ed9a701cad320e6b306dfe
Author: Jens Axboe <axboe@kernel.dk>
Date:   Tue Mar 14 10:55:50 2023 -0600

    io_uring/kbuf: move pinning of provided buffer ring into helper
    
    In preparation for allowing the kernel to allocate the provided buffer
    rings and have the application mmap it instead, abstract out the
    current method of pinning and mapping the user allocated ring.
    
    No functional changes intended in this patch.
    
    Acked-by: Helge Deller <deller@gmx.de>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-11-02 15:31:28 -04:00
Jeff Moyer ab058a7254 io_uring: fix size calculation when registering buf ring
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2068237

commit 48ba08374e779421ca34bd14b4834aae19fc3e6a
Author: Wojciech Lukowicz <wlukowicz01@gmail.com>
Date:   Sat Feb 18 18:41:41 2023 +0000

    io_uring: fix size calculation when registering buf ring
    
    Using struct_size() to calculate the size of io_uring_buf_ring will sum
    the size of the struct and of the bufs array. However, the struct's fields
    are overlaid with the array making the calculated size larger than it
    should be.
    
    When registering a ring with N * PAGE_SIZE / sizeof(struct io_uring_buf)
    entries, i.e. with fully filled pages, the calculated size will span one
    more page than it should and io_uring will try to pin the following page.
    Depending on how the application allocated the ring, it might succeed
    using an unrelated page or fail returning EFAULT.
    
    The size of the ring should be the product of ring_entries and the size
    of io_uring_buf, i.e. the size of the bufs array only.
    
    Fixes: c7fb19428d67 ("io_uring: add support for ring mapped supplied buffers")
    Signed-off-by: Wojciech Lukowicz <wlukowicz01@gmail.com>
    Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
    Link: https://lore.kernel.org/r/20230218184141.70891-1-wlukowicz01@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-05-05 15:26:33 -04:00
Jeff Moyer d05b30e52e io_uring: fix memory leak when removing provided buffers
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2068237

commit b4a72c0589fdea6259720375426179888969d6a2
Author: Wojciech Lukowicz <wlukowicz01@gmail.com>
Date:   Sat Apr 1 20:50:39 2023 +0100

    io_uring: fix memory leak when removing provided buffers
    
    When removing provided buffers, io_buffer structs are not being disposed
    of, leading to a memory leak. They can't be freed individually, because
    they are allocated in page-sized groups. They need to be added to some
    free list instead, such as io_buffers_cache. All callers already hold
    the lock protecting it, apart from when destroying buffers, so had to
    extend the lock there.
    
    Fixes: cc3cec8367cb ("io_uring: speedup provided buffer handling")
    Signed-off-by: Wojciech Lukowicz <wlukowicz01@gmail.com>
    Link: https://lore.kernel.org/r/20230401195039.404909-2-wlukowicz01@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-05-05 15:26:32 -04:00
Jeff Moyer 518e4e218f io_uring: fix return value when removing provided buffers
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2068237

commit c0921e51dab767ef5adf6175c4a0ba3c6e1074a3
Author: Wojciech Lukowicz <wlukowicz01@gmail.com>
Date:   Sat Apr 1 20:50:38 2023 +0100

    io_uring: fix return value when removing provided buffers
    
    When a request to remove buffers is submitted, and the given number to be
    removed is larger than available in the specified buffer group, the
    resulting CQE result will be the number of removed buffers + 1, which is
    1 more than it should be.
    
    Previously, the head was part of the list and it got removed after the
    loop, so the increment was needed. Now, the head is not an element of
    the list, so the increment shouldn't be there anymore.
    
    Fixes: dbc7d452e7cf ("io_uring: manage provided buffers strictly ordered")
    Signed-off-by: Wojciech Lukowicz <wlukowicz01@gmail.com>
    Link: https://lore.kernel.org/r/20230401195039.404909-2-wlukowicz01@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-05-05 15:26:32 -04:00
Jeff Moyer 2456279ace io_uring: don't use complete_post in kbuf
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2068237

commit c3b490930dbe6a6c98d3820f445757ddec1efb08
Author: Pavel Begunkov <asml.silence@gmail.com>
Date:   Thu Nov 24 19:46:40 2022 +0000

    io_uring: don't use complete_post in kbuf
    
    Now we're handling IOPOLL completions more generically, get rid uses of
    _post() and send requests through the normal path. It may have some
    extra mertis performance wise, but we don't care much as there is a
    better interface for selected buffers.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/4deded706587f55b006dc33adf0c13cfc3b2319f.1669310258.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-05-05 15:25:15 -04:00
Jeff Moyer e20f6303b5 io_uring: iopoll protect complete_post
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2068237

commit 1bec951c3809051f64a6957fe86d1b4786cc0313
Author: Pavel Begunkov <asml.silence@gmail.com>
Date:   Wed Nov 23 11:33:41 2022 +0000

    io_uring: iopoll protect complete_post
    
    io_req_complete_post() may be used by iopoll enabled rings, grab locks
    in this case. That requires to pass issue_flags to propagate the locking
    state.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/cc6d854065c57c838ca8e8806f707a226b70fd2d.1669203009.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-05-05 15:25:11 -04:00
Jeff Moyer e6d23ec935 io_uring: check for rollover of buffer ID when providing buffers
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2068237

commit 3851d25c75ed03117268a8feb34adca5a843a126
Author: Jens Axboe <axboe@kernel.dk>
Date:   Thu Nov 10 10:50:55 2022 -0700

    io_uring: check for rollover of buffer ID when providing buffers
    
    We already check if the chosen starting offset for the buffer IDs fit
    within an unsigned short, as 65535 is the maximum value for a provided
    buffer. But if the caller asks to add N buffers at offset M, and M + N
    would exceed the size of the unsigned short, we simply add buffers with
    wrapping around the ID.
    
    This is not necessarily a bug and could in fact be a valid use case, but
    it seems confusing and inconsistent with the initial check for starting
    offset. Let's check for wrap consistently, and error the addition if we
    do need to wrap.
    
    Reported-by: Olivier Langlois <olivier@trillion01.com>
    Link: https://github.com/axboe/liburing/issues/726
    Cc: stable@vger.kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-05-05 15:24:16 -04:00
Jeff Moyer 95438cbb8c io_uring: make io_kiocb_to_cmd() typesafe
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2068237

commit f2ccb5aed7bce1d8b3ed5b3385759a5509663028
Author: Stefan Metzmacher <metze@samba.org>
Date:   Thu Aug 11 09:11:15 2022 +0200

    io_uring: make io_kiocb_to_cmd() typesafe
    
    We need to make sure (at build time) that struct io_cmd_data is not
    casted to a structure that's larger.
    
    Signed-off-by: Stefan Metzmacher <metze@samba.org>
    Link: https://lore.kernel.org/r/c024cdf25ae19fc0319d4180e2298bade8ed17b8.1660201408.git.metze@samba.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-04-29 08:43:02 -04:00
Jeff Moyer c9e0a76b49 io_uring: mem-account pbuf buckets
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2068237

commit cc18cc5e82033d406f54144ad6f8092206004684
Author: Pavel Begunkov <asml.silence@gmail.com>
Date:   Thu Aug 4 15:13:46 2022 +0100

    io_uring: mem-account pbuf buckets
    
    Potentially, someone may create as many pbuf bucket as there are indexes
    in an xarray without any other restrictions bounding our memory usage,
    put memory needed for the buckets under memory accounting.
    
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/d34c452e45793e978d26e2606211ec9070d329ea.1659622312.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-04-29 08:38:02 -04:00
Jeff Moyer fa421cdc94 io_uring: allow 0 length for buffer select
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2068237

commit b8c015598c8ef9195b8a2a5089e275c4f64ca999
Author: Dylan Yudaken <dylany@fb.com>
Date:   Thu Jun 30 02:12:20 2022 -0700

    io_uring: allow 0 length for buffer select
    
    If user gives 0 for length, we can set it from the available buffer size.
    
    Signed-off-by: Dylan Yudaken <dylany@fb.com>
    Link: https://lore.kernel.org/r/20220630091231.1456789-2-dylany@fb.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-04-29 07:09:02 -04:00
Jeff Moyer f09931d57c io_uring: kbuf: inline io_kbuf_recycle_ring()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2068237

commit 795bbbc8a9a1bbbafce762c706bfb5733c9d0426
Author: Hao Xu <howeyxu@tencent.com>
Date:   Thu Jun 23 21:01:26 2022 +0800

    io_uring: kbuf: inline io_kbuf_recycle_ring()
    
    Make io_kbuf_recycle_ring() inline since it is the fast path of
    provided buffer.
    
    Signed-off-by: Hao Xu <howeyxu@tencent.com>
    Link: https://lore.kernel.org/r/20220623130126.179232-1-hao.xu@linux.dev
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-04-29 06:58:02 -04:00
Jeff Moyer 3eca8ba9ef io_uring: kbuf: kill __io_kbuf_recycle()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2068237

commit 024b8fde3320ea34d7a5a3fc9dbc47ec736cd8eb
Author: Hao Xu <howeyxu@tencent.com>
Date:   Wed Jun 22 13:55:51 2022 +0800

    io_uring: kbuf: kill __io_kbuf_recycle()
    
    __io_kbuf_recycle() is only called in io_kbuf_recycle(). Kill it and
    tweak the code so that the legacy pbuf and ring pbuf code become clear
    
    Signed-off-by: Hao Xu <howeyxu@tencent.com>
    Link: https://lore.kernel.org/r/20220622055551.642370-1-hao.xu@linux.dev
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-04-29 06:48:02 -04:00
Jeff Moyer 8034d812f4 io_uring: kill extra io_uring_types.h includes
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2068237

commit 27a9d66fec77cff0e32d2ecd5d0ac7ef878a7bb0
Author: Pavel Begunkov <asml.silence@gmail.com>
Date:   Thu Jun 16 13:57:18 2022 +0100

    io_uring: kill extra io_uring_types.h includes
    
    io_uring/io_uring.h already includes io_uring_types.h, no need to
    include it every time. Kill it in a bunch of places, it prepares us for
    following patches.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/94d8c943fbe0ef949981c508ddcee7fc1c18850f.1655384063.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-04-29 06:19:02 -04:00
Jeff Moyer e6b710507b io_uring: kbuf: add comments for some tricky code
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2068237

commit f09c8643f0fad0e287b9f737955276000fd76a5d
Author: Hao Xu <howeyxu@tencent.com>
Date:   Fri Jun 17 13:04:29 2022 +0800

    io_uring: kbuf: add comments for some tricky code
    
    Add comments to explain why it is always under uring lock when
    incrementing head in __io_kbuf_recycle. And rectify one comemnt about
    kbuf consuming in iowq case.
    
    Signed-off-by: Hao Xu <howeyxu@tencent.com>
    Link: https://lore.kernel.org/r/20220617050429.94293-1-hao.xu@linux.dev
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-04-29 06:12:02 -04:00