Centos-kernel-stream-9

Commit Graph

Author	SHA1	Message	Date
Jeff Moyer	44b5887778	io_uring/sqpoll: ensure task state is TASK_RUNNING when running task_work JIRA: https://issues.redhat.com/browse/RHEL-64867 commit 8f7033aa4089fbaf7a33995f0f2ee6c9d7b9ca1b Author: Jens Axboe <axboe@kernel.dk> Date: Thu Oct 17 08:31:56 2024 -0600 io_uring/sqpoll: ensure task state is TASK_RUNNING when running task_work When the sqpoll is exiting and cancels pending work items, it may need to run task_work. If this happens from within io_uring_cancel_generic(), then it may be under waiting for the io_uring_task waitqueue. This results in the below splat from the scheduler, as the ring mutex may be attempted grabbed while in a TASK_INTERRUPTIBLE state. Ensure that the task state is set appropriately for that, just like what is done for the other cases in io_run_task_work(). do not call blocking ops when !TASK_RUNNING; state=1 set at [<0000000029387fd2>] prepare_to_wait+0x88/0x2fc WARNING: CPU: 6 PID: 59939 at kernel/sched/core.c:8561 __might_sleep+0xf4/0x140 Modules linked in: CPU: 6 UID: 0 PID: 59939 Comm: iou-sqp-59938 Not tainted 6.12.0-rc3-00113-g8d020023b155 #7456 Hardware name: linux,dummy-virt (DT) pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) pc : __might_sleep+0xf4/0x140 lr : __might_sleep+0xf4/0x140 sp : ffff80008c5e7830 x29: ffff80008c5e7830 x28: ffff0000d93088c0 x27: ffff60001c2d7230 x26: dfff800000000000 x25: ffff0000e16b9180 x24: ffff80008c5e7a50 x23: 1ffff000118bcf4a x22: ffff0000e16b9180 x21: ffff0000e16b9180 x20: 000000000000011b x19: ffff80008310fac0 x18: 1ffff000118bcd90 x17: 30303c5b20746120 x16: 74657320313d6574 x15: 0720072007200720 x14: 0720072007200720 x13: 0720072007200720 x12: ffff600036c64f0b x11: 1fffe00036c64f0a x10: ffff600036c64f0a x9 : dfff800000000000 x8 : 00009fffc939b0f6 x7 : ffff0001b6327853 x6 : 0000000000000001 x5 : ffff0001b6327850 x4 : ffff600036c64f0b x3 : ffff8000803c35bc x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000e16b9180 Call trace: __might_sleep+0xf4/0x140 mutex_lock+0x84/0x124 io_handle_tw_list+0xf4/0x260 tctx_task_work_run+0x94/0x340 io_run_task_work+0x1ec/0x3c0 io_uring_cancel_generic+0x364/0x524 io_sq_thread+0x820/0x124c ret_from_fork+0x10/0x20 Cc: stable@vger.kernel.org Fixes: af5d68f8892f ("io_uring/sqpoll: manage task_work privately") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2024-12-02 11:14:54 -05:00
Jeff Moyer	ac9c2b1e93	io_uring/sqpoll: close race on waiting for sqring entries JIRA: https://issues.redhat.com/browse/RHEL-64867 commit 28aabffae6be54284869a91cd8bccd3720041129 Author: Jens Axboe <axboe@kernel.dk> Date: Tue Oct 15 08:58:25 2024 -0600 io_uring/sqpoll: close race on waiting for sqring entries When an application uses SQPOLL, it must wait for the SQPOLL thread to consume SQE entries, if it fails to get an sqe when calling io_uring_get_sqe(). It can do so by calling io_uring_enter(2) with the flag value of IORING_ENTER_SQ_WAIT. In liburing, this is generally done with io_uring_sqring_wait(). There's a natural expectation that once this call returns, a new SQE entry can be retrieved, filled out, and submitted. However, the kernel uses the cached sq head to determine if the SQRING is full or not. If the SQPOLL thread is currently in the process of submitting SQE entries, it may have updated the cached sq head, but not yet committed it to the SQ ring. Hence the kernel may find that there are SQE entries ready to be consumed, and return successfully to the application. If the SQPOLL thread hasn't yet committed the SQ ring entries by the time the application returns to userspace and attempts to get a new SQE, it will fail getting a new SQE. Fix this by having io_sqring_full() always use the user visible SQ ring head entry, rather than the internally cached one. Cc: stable@vger.kernel.org # 5.10+ Link: https://github.com/axboe/liburing/discussions/1267 Reported-by: Benedek Thaler <thaler@thaler.hu> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2024-12-02 11:14:53 -05:00
Jeff Moyer	ba52121989	io_uring: check for non-NULL file pointer in io_file_can_poll() JIRA: https://issues.redhat.com/browse/RHEL-64867 commit 5fc16fa5f13b3c06fdb959ef262050bd810416a2 Author: Jens Axboe <axboe@kernel.dk> Date: Sat Jun 1 12:25:35 2024 -0600 io_uring: check for non-NULL file pointer in io_file_can_poll() In earlier kernels, it was possible to trigger a NULL pointer dereference off the forced async preparation path, if no file had been assigned. The trace leading to that looks as follows: BUG: kernel NULL pointer dereference, address: 00000000000000b0 PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP CPU: 67 PID: 1633 Comm: buf-ring-invali Not tainted 6.8.0-rc3+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS unknown 2/2/2022 RIP: 0010:io_buffer_select+0xc3/0x210 Code: 00 00 48 39 d1 0f 82 ae 00 00 00 48 81 4b 48 00 00 01 00 48 89 73 70 0f b7 50 0c 66 89 53 42 85 ed 0f 85 d2 00 00 00 48 8b 13 <48> 8b 92 b0 00 00 00 48 83 7a 40 00 0f 84 21 01 00 00 4c 8b 20 5b RSP: 0018:ffffb7bec38c7d88 EFLAGS: 00010246 RAX: ffff97af2be61000 RBX: ffff97af234f1700 RCX: 0000000000000040 RDX: 0000000000000000 RSI: ffff97aecfb04820 RDI: ffff97af234f1700 RBP: 0000000000000000 R08: 0000000000200030 R09: 0000000000000020 R10: ffffb7bec38c7dc8 R11: 000000000000c000 R12: ffffb7bec38c7db8 R13: ffff97aecfb05800 R14: ffff97aecfb05800 R15: ffff97af2be5e000 FS: 00007f852f74b740(0000) GS:ffff97b1eeec0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000b0 CR3: 000000016deab005 CR4: 0000000000370ef0 Call Trace: <TASK> ? __die+0x1f/0x60 ? page_fault_oops+0x14d/0x420 ? do_user_addr_fault+0x61/0x6a0 ? exc_page_fault+0x6c/0x150 ? asm_exc_page_fault+0x22/0x30 ? io_buffer_select+0xc3/0x210 __io_import_iovec+0xb5/0x120 io_readv_prep_async+0x36/0x70 io_queue_sqe_fallback+0x20/0x260 io_submit_sqes+0x314/0x630 __do_sys_io_uring_enter+0x339/0xbc0 ? __do_sys_io_uring_register+0x11b/0xc50 ? vm_mmap_pgoff+0xce/0x160 do_syscall_64+0x5f/0x180 entry_SYSCALL_64_after_hwframe+0x46/0x4e RIP: 0033:0x55e0a110a67e Code: ba cc 00 00 00 45 31 c0 44 0f b6 92 d0 00 00 00 31 d2 41 b9 08 00 00 00 41 83 e2 01 41 c1 e2 04 41 09 c2 b8 aa 01 00 00 0f 05 <c3> 90 89 30 eb a9 0f 1f 40 00 48 8b 42 20 8b 00 a8 06 75 af 85 f6 because the request is marked forced ASYNC and has a bad file fd, and hence takes the forced async prep path. Current kernels with the request async prep cleaned up can no longer hit this issue, but for ease of backporting, let's add this safety check in here too as it really doesn't hurt. For both cases, this will inevitably end with a CQE posted with -EBADF. Cc: stable@vger.kernel.org Fixes: a76c0b31eef5 ("io_uring: commit non-pollable provided mapped buffers upfront") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2024-12-02 11:12:48 -05:00
Jeff Moyer	f8b136892d	io_uring/rw: reinstate thread check for retries JIRA: https://issues.redhat.com/browse/RHEL-64867 commit 039a2e800bcd5beb89909d1a488abf3d647642cf Author: Jens Axboe <axboe@kernel.dk> Date: Thu Apr 25 09:04:32 2024 -0600 io_uring/rw: reinstate thread check for retries Allowing retries for everything is arguably the right thing to do, now that every command type is async read from the start. But it's exposed a few issues around missing check for a retry (which cca6571381a0 exposed), and the fixup commit for that isn't necessarily 100% sound in terms of iov_iter state. For now, just revert these two commits. This unfortunately then re-opens the fact that -EAGAIN can get bubbled to userspace for some cases where the kernel very well could just sanely retry them. But until we have all the conditions covered around that, we cannot safely enable that. This reverts commit df604d2ad480fcf7b39767280c9093e13b1de952. This reverts commit cca6571381a0bdc88021a1f7a4c2349df21279f7. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2024-11-28 17:42:44 -05:00
Jeff Moyer	67a075237e	io_uring/rw: ensure retry condition isn't lost JIRA: https://issues.redhat.com/browse/RHEL-64867 commit df604d2ad480fcf7b39767280c9093e13b1de952 Author: Jens Axboe <axboe@kernel.dk> Date: Wed Apr 17 09:23:55 2024 -0600 io_uring/rw: ensure retry condition isn't lost A previous commit removed the checking on whether or not it was possible to retry a request, since it's now possible to retry any of them. This would previously have caused the request to have been ended with an error, but now the retry condition can simply get lost instead. Cleanup the retry handling and always just punt it to task_work, which will queue it with io-wq appropriately. Reported-by: Changhui Zhong <czhong@redhat.com> Tested-by: Ming Lei <ming.lei@redhat.com> Fixes: cca6571381a0 ("io_uring/rw: cleanup retry path") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2024-11-28 17:32:44 -05:00
Jeff Moyer	9e9aaab6e3	io_uring: unexport io_req_cqe_overflow() JIRA: https://issues.redhat.com/browse/RHEL-64867 commit a5bff51850c8d533f3696d45749ab169dd49f8dd Author: Pavel Begunkov <asml.silence@gmail.com> Date: Wed Apr 10 02:26:51 2024 +0100 io_uring: unexport io_req_cqe_overflow() There are no users of io_req_cqe_overflow() apart from io_uring.c, make it static. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f4295eb2f9eb98d5db38c0578f57f0b86bfe0d8c.1712708261.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2024-11-28 17:21:44 -05:00
Jeff Moyer	d45024afd9	io_uring: move mapping/allocation helpers to a separate file JIRA: https://issues.redhat.com/browse/RHEL-64867 Conflicts: RHEL does not have commit 5e0a760b4441 ("mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER"). commit f15ed8b4d0ce2c0831232ff85117418740f0c529 Author: Jens Axboe <axboe@kernel.dk> Date: Wed Mar 27 14:59:09 2024 -0600 io_uring: move mapping/allocation helpers to a separate file Move the related code from io_uring.c into memmap.c. No functional changes in this patch, just cleaning it up a bit now that the full transition is done. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2024-11-28 17:09:44 -05:00
Jeff Moyer	f5d5b2e624	io_uring/kbuf: use vm_insert_pages() for mmap'ed pbuf ring JIRA: https://issues.redhat.com/browse/RHEL-64867 commit 87585b05757dc70545efb434669708d276125559 Author: Jens Axboe <axboe@kernel.dk> Date: Tue Mar 12 20:24:21 2024 -0600 io_uring/kbuf: use vm_insert_pages() for mmap'ed pbuf ring Rather than use remap_pfn_range() for this and manually free later, switch to using vm_insert_page() and have it Just Work. This requires a bit of effort on the mmap lookup side, as the ctx uring_lock isn't held, which otherwise protects buffer_lists from being torn down, and it's not safe to grab from mmap context that would introduce an ABBA deadlock between the mmap lock and the ctx uring_lock. Instead, lookup the buffer_list under RCU, as the the list is RCU freed already. Use the existing reference count to determine whether it's possible to safely grab a reference to it (eg if it's not zero already), and drop that reference when done with the mapping. If the mmap reference is the last one, the buffer_list and the associated memory can go away, since the vma insertion has references to the inserted pages at that point. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2024-11-28 17:07:44 -05:00
Jeff Moyer	3a48bb39e7	io_uring: get rid of remap_pfn_range() for mapping rings/sqes JIRA: https://issues.redhat.com/browse/RHEL-64867 Conflicts: RHEL does not have commit 5e0a760b4441 ("mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER"). commit 3ab1db3c6039e02a9deb9d5091d28d559917a645 Author: Jens Axboe <axboe@kernel.dk> Date: Wed Mar 13 09:56:14 2024 -0600 io_uring: get rid of remap_pfn_range() for mapping rings/sqes Rather than use remap_pfn_range() for this and manually free later, switch to using vm_insert_pages() and have it Just Work. If possible, allocate a single compound page that covers the range that is needed. If that works, then we can just use page_address() on that page. If we fail to get a compound page, allocate single pages and use vmap() to map them into the kernel virtual address space. This just covers the rings/sqes, the other remaining user of the mmap remap_pfn_range() user will be converted separately. Once that is done, we can kill the old alloc/free code. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2024-11-28 17:02:44 -05:00
Jeff Moyer	5541c9a578	io_uring: drop ->prep_async() JIRA: https://issues.redhat.com/browse/RHEL-64867 commit e10677a8f6980dbae2e866b8320d90bae07e87ee Author: Jens Axboe <axboe@kernel.dk> Date: Mon Mar 18 20:48:38 2024 -0600 io_uring: drop ->prep_async() It's now unused, drop the code related to it. This includes the io_issue_defs->manual alloc field. While in there, and since ->async_size is now being used a bit more frequently and in the issue path, move it to io_issue_defs[]. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2024-11-28 16:55:44 -05:00
Jeff Moyer	4bcf940192	io_uring: clean up io_lockdep_assert_cq_locked JIRA: https://issues.redhat.com/browse/RHEL-64867 commit c133b3b06b0653036b0c07675c1db0c89467ccdb Author: Pavel Begunkov <asml.silence@gmail.com> Date: Mon Mar 18 22:00:35 2024 +0000 io_uring: clean up io_lockdep_assert_cq_locked Move CONFIG_PROVE_LOCKING checks inside of io_lockdep_assert_cq_locked() and kill the else branch. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Tested-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/bbf33c429c9f6d7207a8fe66d1a5866ec2c99850.1710799188.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2024-11-28 16:33:44 -05:00
Jeff Moyer	5fb9e301ac	io_uring: refactor io_req_complete_post() JIRA: https://issues.redhat.com/browse/RHEL-64867 commit 0667db14e1f029d56243aa2509ebc5f944388200 Author: Pavel Begunkov <asml.silence@gmail.com> Date: Mon Mar 18 22:00:34 2024 +0000 io_uring: refactor io_req_complete_post() Make io_req_complete_post() to push all IORING_SETUP_IOPOLL requests to task_work, it's much cleaner and should normally happen. We couldn't do it before because there was a possibility of looping in complete_post() -> tw -> complete_post() -> ... Also, unexport the function and inline __io_req_complete_post(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Tested-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/ea19c032ace3e0dd96ac4d991a063b0188037014.1710799188.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2024-11-28 16:32:44 -05:00
Jeff Moyer	ae9b61904d	io_uring: get rid of intermediate aux cqe caches JIRA: https://issues.redhat.com/browse/RHEL-64867 commit 902ce82c2aa130bea5e3feca2d4ae62781865da7 Author: Pavel Begunkov <asml.silence@gmail.com> Date: Mon Mar 18 22:00:32 2024 +0000 io_uring: get rid of intermediate aux cqe caches io_post_aux_cqe(), which is used for multishot requests, delays completions by putting CQEs into a temporary array for the purpose completion lock/flush batching. DEFER_TASKRUN doesn't need any locking, so for it we can put completions directly into the CQ and defer post completion handling with a flag. That leaves !DEFER_TASKRUN, which is not that interesting / hot for multishot requests, so have conditional locking with deferred flush for them. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Tested-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/b1d05a81fd27aaa2a07f9860af13059e7ad7a890.1710799188.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2024-11-28 16:30:44 -05:00
Jeff Moyer	74280b745d	io_uring: refactor io_fill_cqe_req_aux JIRA: https://issues.redhat.com/browse/RHEL-64867 commit e5c12945be5016d681ff305ea7306fef5902219d Author: Pavel Begunkov <asml.silence@gmail.com> Date: Mon Mar 18 22:00:31 2024 +0000 io_uring: refactor io_fill_cqe_req_aux The restriction on multishot execution context disallowing io-wq is driven by rules of io_fill_cqe_req_aux(), it should only be called in the master task context, either from the syscall path or in task_work. Since task_work now always takes the ctx lock implying IO_URING_F_COMPLETE_DEFER, we can just assume that the function is always called with its defer argument set to true. Kill the argument. Also rename the function for more consistency as "fill" in CQE related functions was usually meant for raw interfaces only copying data into the CQ without any locking, waking the user and other accounting "post" functions take care of. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Tested-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/93423d106c33116c7d06bf277f651aa68b427328.1710799188.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2024-11-28 16:29:44 -05:00
Jeff Moyer	539a8ed2d1	io_uring: remove struct io_tw_state::locked JIRA: https://issues.redhat.com/browse/RHEL-64867 commit 8e5b3b89ecaf6d9295e561c225b35c574a5e0fe7 Author: Pavel Begunkov <asml.silence@gmail.com> Date: Mon Mar 18 22:00:30 2024 +0000 io_uring: remove struct io_tw_state::locked ctx is always locked for task_work now, so get rid of struct io_tw_state::locked. Note I'm stopping one step before removing io_tw_state altogether, which is not empty, because it still serves the purpose of indicating which function is a tw callback and forcing users not to invoke them carelessly out of a wrong context. The removal can always be done later. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Tested-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/e95e1ea116d0bfa54b656076e6a977bc221392a4.1710799188.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2024-11-28 16:28:44 -05:00
Jeff Moyer	fa29c3796e	io_uring/rw: avoid punting to io-wq directly JIRA: https://issues.redhat.com/browse/RHEL-64867 commit 6e6b8c62120a22acd8cb759304e4cd2e3215d488 Author: Pavel Begunkov <asml.silence@gmail.com> Date: Mon Mar 18 22:00:28 2024 +0000 io_uring/rw: avoid punting to io-wq directly kiocb_done() should care to specifically redirecting requests to io-wq. Remove the hopping to tw to then queue an io-wq, return -EAGAIN and let the core code io_uring handle offloading. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Tested-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/413564e550fe23744a970e1783dfa566291b0e6f.1710799188.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2024-11-28 16:26:44 -05:00
Jeff Moyer	ee6e5ff7da	io_uring/napi: ensure napi polling is aborted when work is available JIRA: https://issues.redhat.com/browse/RHEL-64867 commit 428f13826855e3eea44bf13cedbf33f382ef8794 Author: Jens Axboe <axboe@kernel.dk> Date: Wed Feb 14 12:59:36 2024 -0700 io_uring/napi: ensure napi polling is aborted when work is available While testing io_uring NAPI with DEFER_TASKRUN, I ran into slowdowns and stalls in packet delivery. Turns out that while io_napi_busy_loop_should_end() aborts appropriately on regular task_work, it does not abort if we have local task_work pending. Move io_has_work() into the private io_uring.h header, and gate whether we should continue polling on that as well. This makes NAPI polling on send/receive work as designed with IORING_SETUP_DEFER_TASKRUN as well. Fixes: 8d0c12a80cde ("io-uring: add napi busy poll support") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2024-11-28 15:44:44 -05:00
Jeff Moyer	7abbe65ada	io-uring: add napi busy poll support JIRA: https://issues.redhat.com/browse/RHEL-64867 commit 8d0c12a80cdeb80d5e0510e96d38fe551ed8e9b5 Author: Stefan Roesch <shr@devkernel.io> Date: Thu Jun 8 09:38:36 2023 -0700 io-uring: add napi busy poll support This adds the napi busy polling support in io_uring.c. It adds a new napi_list to the io_ring_ctx structure. This list contains the list of napi_id's that are currently enabled for busy polling. The list is synchronized by the new napi_lock spin lock. The current default napi busy polling time is stored in napi_busy_poll_to. If napi busy polling is not enabled, the value is 0. In addition there is also a hash table. The hash table store the napi id and the pointer to the above list nodes. The hash table is used to speed up the lookup to the list elements. The hash table is synchronized with rcu. The NAPI_TIMEOUT is stored as a timeout to make sure that the time a napi entry is stored in the napi list is limited. The busy poll timeout is also stored as part of the io_wait_queue. This is necessary as for sq polling the poll interval needs to be adjusted and the napi callback allows only to pass in one value. This has been tested with two simple programs from the liburing library repository: the napi client and the napi server program. The client sends a request, which has a timestamp in its payload and the server replies with the same payload. The client calculates the roundtrip time and stores it to calculate the results. The client is running on host1 and the server is running on host 2 (in the same rack). The measured times below are roundtrip times. They are average times over 5 runs each. Each run measures 1 million roundtrips. no rx coal rx coal: frames=88,usecs=33 Default 57us 56us client_poll=100us 47us 46us server_poll=100us 51us 46us client_poll=100us+ 40us 40us server_poll=100us client_poll=100us+ 41us 39us server_poll=100us+ prefer napi busy poll on client client_poll=100us+ 41us 39us server_poll=100us+ prefer napi busy poll on server client_poll=100us+ 41us 39us server_poll=100us+ prefer napi busy poll on client + server Signed-off-by: Stefan Roesch <shr@devkernel.io> Suggested-by: Olivier Langlois <olivier@trillion01.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20230608163839.2891748-5-shr@devkernel.io Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2024-11-28 15:40:44 -05:00
Jeff Moyer	5f96f592a6	io-uring: move io_wait_queue definition to header file JIRA: https://issues.redhat.com/browse/RHEL-64867 commit 405b4dc14b10c5bdb3e9a6c3b9596c1597f7974d Author: Stefan Roesch <shr@devkernel.io> Date: Thu Jun 8 09:38:35 2023 -0700 io-uring: move io_wait_queue definition to header file This moves the definition of the io_wait_queue structure to the header file so it can be also used from other files. Signed-off-by: Stefan Roesch <shr@devkernel.io> Link: https://lore.kernel.org/r/20230608163839.2891748-4-shr@devkernel.io Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2024-11-28 15:36:44 -05:00
Jeff Moyer	7dd91c9415	io_uring/sqpoll: manage task_work privately JIRA: https://issues.redhat.com/browse/RHEL-64867 commit af5d68f8892f8ee8f137648b79ceb2abc153a19b Author: Jens Axboe <axboe@kernel.dk> Date: Fri Feb 2 10:20:05 2024 -0700 io_uring/sqpoll: manage task_work privately Decouple from task_work running, and cap the number of entries we process at the time. If we exceed that number, push remaining entries to a retry list that we'll process first next time. We cap the number of entries to process at 8, which is fairly random. We just want to get enough per-ctx batching here, while not processing endlessly. Since we manually run PF_IO_WORKER related task_work anyway as the task never exits to userspace, with this we no longer need to add an actual task_work item to the per-process list. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2024-11-28 15:31:44 -05:00
Jeff Moyer	5d0f19f327	io_uring: mark the need to lock/unlock the ring as unlikely JIRA: https://issues.redhat.com/browse/RHEL-64867 commit bfe30bfde279529011161a60e5a7ca4be83de422 Author: Jens Axboe <axboe@kernel.dk> Date: Sun Jan 28 20:32:52 2024 -0700 io_uring: mark the need to lock/unlock the ring as unlikely Any of the fast paths will already have this locked, this helper only exists to deal with io-wq invoking request issue where we do not have the ctx->uring_lock held already. This means that any common or fast path will already have this locked, mark it as such. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2024-11-28 15:21:44 -05:00
Ming Lei	b74c3e7b5b	io_uring/cmd: move io_uring_try_cancel_uring_cmd() JIRA: https://issues.redhat.com/browse/RHEL-56837 commit da12d9ab5889b87429d9375748dcd1485b6241f3 Author: Pavel Begunkov <asml.silence@gmail.com> Date: Mon Mar 18 22:00:23 2024 +0000 io_uring/cmd: move io_uring_try_cancel_uring_cmd() io_uring_try_cancel_uring_cmd() is a part of the cmd handling so let's move it closer to all cmd bits into uring_cmd.c Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Tested-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/43a3937af4933655f0fd9362c381802f804f43de.1710799188.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2024-09-27 11:19:16 +08:00
Jeff Moyer	c5bed6fb0e	io_uring: use the right type for work_llist empty check JIRA: https://issues.redhat.com/browse/RHEL-27755 commit 22537c9f79417fed70b352d54d01d2586fee9521 Author: Jens Axboe <axboe@kernel.dk> Date: Mon Mar 25 18:53:33 2024 -0600 io_uring: use the right type for work_llist empty check io_task_work_pending() uses wq_list_empty() on ctx->work_llist, but it's not an io_wq_work_list, it's a struct llist_head. They both have ->first as head-of-list, and it turns out the checks are identical. But be proper and use the right helper. Fixes: dac6a0eae793 ("io_uring: ensure iopoll runs local task work as well") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2024-07-02 14:33:41 -04:00
Jeff Moyer	fddfa72c02	io_uring: add io_file_can_poll() helper JIRA: https://issues.redhat.com/browse/RHEL-27755 Conflicts: Context differences as we don't have commit 521223d7c229 ("io_uring/cancel: don't default to setting req->work.cancel_seq"). commit 95041b93e90a06bb613ec4bef9cd4d61570f68e4 Author: Jens Axboe <axboe@kernel.dk> Date: Sun Jan 28 20:08:24 2024 -0700 io_uring: add io_file_can_poll() helper This adds a flag to avoid dipping dereferencing file and then f_op to figure out if the file has a poll handler defined or not. We generally call this at least twice for networked workloads, and if using ring provided buffers, we do it on every buffer selection. Particularly the latter is troublesome, as it's otherwise a very fast operation. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2024-07-02 14:33:38 -04:00
Jeff Moyer	0c09ea233c	io_uring/poll: add requeue return code from poll multishot handling JIRA: https://issues.redhat.com/browse/RHEL-27755 commit 704ea888d646cb9d715662944cf389c823252ee0 Author: Jens Axboe <axboe@kernel.dk> Date: Mon Jan 29 11:57:11 2024 -0700 io_uring/poll: add requeue return code from poll multishot handling Since our poll handling is edge triggered, multishot handlers retry internally until they know that no more data is available. In preparation for limiting these retries, add an internal return code, IOU_REQUEUE, which can be used to inform the poll backend about the handler wanting to retry, but that this should happen through a normal task_work requeue rather than keep hammering on the issue side for this one request. No functional changes in this patch, nobody is using this return code just yet. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2024-07-02 14:33:37 -04:00
Jeff Moyer	249084f409	io_uring/register: move io_uring_register(2) related code to register.c JIRA: https://issues.redhat.com/browse/RHEL-27755 commit c43203154d8ac579537aa0c7802b77d463b1f53a Author: Jens Axboe <axboe@kernel.dk> Date: Tue Dec 19 08:54:20 2023 -0700 io_uring/register: move io_uring_register(2) related code to register.c Most of this code is basically self contained, move it out of the core io_uring file to bring a bit more separation to the registration related bits. This moves another ~10% of the code into register.c. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2024-07-02 14:33:36 -04:00
Jeff Moyer	6d2191d720	io_uring/cmd: inline io_uring_cmd_do_in_task_lazy JIRA: https://issues.redhat.com/browse/RHEL-27755 commit 6b04a3737057ddfed396c954f9e4be4fe6d53c62 Author: Pavel Begunkov <asml.silence@gmail.com> Date: Fri Dec 1 00:57:36 2023 +0000 io_uring/cmd: inline io_uring_cmd_do_in_task_lazy Now as we can easily include io_uring_types.h, move IOU_F_TWQ_LAZY_WAKE and inline io_uring_cmd_do_in_task_lazy(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/2ec9fb31dd192d1c5cf26d0a2dec5657d88a8e48.1701391955.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2024-07-02 14:32:10 -04:00
Jeff Moyer	b3b30700de	io_uring/unix: drop usage of io_uring socket JIRA: https://issues.redhat.com/browse/RHEL-36366 CVE: CVE-2023-52656 Conflicts: Contextual differences in iouring.h. commit a4104821ad651d8a0b374f0b2474c345bbb42f82 Author: Jens Axboe <axboe@kernel.dk> Date: Tue Dec 19 12:30:43 2023 -0700 io_uring/unix: drop usage of io_uring socket Since we no longer allow sending io_uring fds over SCM_RIGHTS, move to using io_is_uring_fops() to detect whether this is a io_uring fd or not. With that done, kill off io_uring_get_socket() as nobody calls it anymore. This is in preparation to yanking out the rest of the core related to unix gc with io_uring. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2024-05-15 13:58:16 -04:00
Jeff Moyer	c34b529ae7	io_uring: enable io_mem_alloc/free to be used in other parts JIRA: https://issues.redhat.com/browse/RHEL-21391 commit edecf1689768452ba1a64b7aaf3a47a817da651a Author: Jens Axboe <axboe@kernel.dk> Date: Mon Nov 27 20:53:52 2023 -0700 io_uring: enable io_mem_alloc/free to be used in other parts In preparation for using these helpers, make them non-static and add them to our internal header. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2024-02-01 12:08:14 -05:00
Jeff Moyer	d71b8efd78	io_uring/kbuf: Use slab for struct io_buffer objects JIRA: https://issues.redhat.com/browse/RHEL-21391 commit b3a4dbc89d4021b3f90ff6a13537111a004f9d07 Author: Gabriel Krisman Bertazi <krisman@suse.de> Date: Wed Oct 4 20:05:31 2023 -0400 io_uring/kbuf: Use slab for struct io_buffer objects The allocation of struct io_buffer for metadata of provided buffers is done through a custom allocator that directly gets pages and fragments them. But, slab would do just fine, as this is not a hot path (in fact, it is a deprecated feature) and, by keeping a custom allocator implementation we lose benefits like tracking, poisoning, sanitizers. Finally, the custom code is more complex and requires keeping the list of pages in struct ctx for no good reason. This patch cleans this path up and just uses slab. I microbenchmarked it by forcing the allocation of a large number of objects with the least number of io_uring commands possible (keeping nbufs=USHRT_MAX), with and without the patch. There is a slight increase in time spent in the allocation with slab, of course, but even when allocating to system resources exhaustion, which is not very realistic and happened around 1/2 billion provided buffers for me, it wasn't a significant hit in system time. Specially if we think of a real-world scenario, an application doing register/unregister of provided buffers will hit ctx->io_buffers_cache more often than actually going to slab. Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de> Link: https://lore.kernel.org/r/20231005000531.30800-4-krisman@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2024-02-01 12:01:14 -05:00
Jeff Moyer	73c30d9b0c	io_uring: ensure io_lockdep_assert_cq_locked() handles disabled rings JIRA: https://issues.redhat.com/browse/RHEL-12076 commit 1658633c04653578429ff5dfc62fdc159203a8f2 Author: Jens Axboe <axboe@kernel.dk> Date: Mon Oct 2 19:51:38 2023 -0600 io_uring: ensure io_lockdep_assert_cq_locked() handles disabled rings io_lockdep_assert_cq_locked() checks that locking is correctly done when a CQE is posted. If the ring is setup in a disabled state with IORING_SETUP_R_DISABLED, then ctx->submitter_task isn't assigned until the ring is later enabled. We generally don't post CQEs in this state, as no SQEs can be submitted. However it is possible to generate a CQE if tagged resources are being updated. If this happens and PROVE_LOCKING is enabled, then the locking check helper will dereference ctx->submitter_task, which hasn't been set yet. Fixup io_lockdep_assert_cq_locked() to handle this case correctly. While at it, convert it to a static inline as well, so that generated line offsets will actually reflect which condition failed, rather than just the line offset for io_lockdep_assert_cq_locked() itself. Reported-and-tested-by: syzbot+efc45d4e7ba6ab4ef1eb@syzkaller.appspotmail.com Fixes: f26cc9593581 ("io_uring: lockdep annotate CQ locking") Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2023-11-02 17:26:28 -04:00
Jeff Moyer	41a7b54f2e	io_uring: force inline io_fill_cqe_req JIRA: https://issues.redhat.com/browse/RHEL-12076 commit 093a650b757210bc856ca7f5349fb5a4bb9d4bd6 Author: Pavel Begunkov <asml.silence@gmail.com> Date: Thu Aug 24 23:53:30 2023 +0100 io_uring: force inline io_fill_cqe_req There are only 2 callers of io_fill_cqe_req left, and one of them is extremely hot. Force inline the function. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ffce4fc5e3521966def848a4d930586dfe33ae11.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2023-11-02 17:26:26 -04:00
Jeff Moyer	0f5c36d3ac	io_uring: merge iopoll and normal completion paths JIRA: https://issues.redhat.com/browse/RHEL-12076 commit ec26c225f06f5993f8891fa6c79fab3c92981181 Author: Pavel Begunkov <asml.silence@gmail.com> Date: Thu Aug 24 23:53:29 2023 +0100 io_uring: merge iopoll and normal completion paths io_do_iopoll() and io_submit_flush_completions() are pretty similar, both filling CQEs and then free a list of requests. Don't duplicate it and make iopoll use __io_submit_flush_completions(), which also helps with inlining and other optimisations. For that, we need to first find all completed iopoll requests and splice them from the iopoll list and then pass it down. This adds one extra list traversal, which should be fine as requests will stay hot in cache. CQ locking is already conditional, introduce ->lockless_cq and skip locking for IOPOLL as it's protected by ->uring_lock. We also add a wakeup optimisation for IOPOLL to __io_cq_unlock_post(), so it works just like io_cqring_ev_posted_iopoll(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3840473f5e8a960de35b77292026691880f6bdbc.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2023-11-02 17:26:26 -04:00
Jeff Moyer	6353ed9a30	io_uring: optimise extra io_get_cqe null check JIRA: https://issues.redhat.com/browse/RHEL-12076 commit 59fbc409e71649f558fb4578cdbfac67acb824dc Author: Pavel Begunkov <asml.silence@gmail.com> Date: Thu Aug 24 23:53:27 2023 +0100 io_uring: optimise extra io_get_cqe null check If the cached cqe check passes in io_get_cqe() it already means that the cqe we return is valid and non-zero, however the compiler is unable to optimise null checks like in io_fill_cqe_req(). Do a bit of trickery, return success/fail boolean from io_get_cqe() and store cqe in the cqe parameter. That makes it do the right thing, erasing the check together with the introduced indirection. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/322ea4d3377d3d4efd8ae90ab8ed28a99f518210.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2023-11-02 17:26:26 -04:00
Jeff Moyer	bc351fb523	io_uring: refactor __io_get_cqe() JIRA: https://issues.redhat.com/browse/RHEL-12076 commit 20d6b633870495fda1d92d283ebf890d80f68ecd Author: Pavel Begunkov <asml.silence@gmail.com> Date: Thu Aug 24 23:53:26 2023 +0100 io_uring: refactor __io_get_cqe() Make __io_get_cqe simpler by not grabbing the cqe from refilled cached, but letting io_get_cqe() do it for us. That's cleaner and removes some duplication. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/74dc8fdf2657e438b2e05e1d478a3596924604e9.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2023-11-02 17:26:26 -04:00
Jeff Moyer	fab40551d6	io_uring: simplify big_cqe handling JIRA: https://issues.redhat.com/browse/RHEL-12076 commit b24c5d752962fa0970cd7e3d74b1cd0e843358de Author: Pavel Begunkov <asml.silence@gmail.com> Date: Thu Aug 24 23:53:25 2023 +0100 io_uring: simplify big_cqe handling Don't keep big_cqe bits of req in a union with hash_node, find a separate space for it. It's bit safer, but also if we keep it always initialised, we can get rid of ugly REQ_F_CQE32_INIT handling. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/447aa1b2968978c99e655ba88db536e903df0fe9.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2023-11-02 17:26:26 -04:00
Jeff Moyer	2b187db613	io_uring: improve cqe !tracing hot path JIRA: https://issues.redhat.com/browse/RHEL-12076 commit a0727c738309a06ef5579c1742f8f0def63aa883 Author: Pavel Begunkov <asml.silence@gmail.com> Date: Thu Aug 24 23:53:23 2023 +0100 io_uring: improve cqe !tracing hot path While looking at io_fill_cqe_req()'s asm I stumbled on our trace points turning into the chunk below: trace_io_uring_complete(req->ctx, req, req->cqe.user_data, req->cqe.res, req->cqe.flags, req->extra1, req->extra2); io_uring/io_uring.c:898: trace_io_uring_complete(req->ctx, req, req->cqe.user_data, movq 232(%rbx), %rdi # req_44(D)->big_cqe.extra2, _5 movq 224(%rbx), %rdx # req_44(D)->big_cqe.extra1, _6 movl 84(%rbx), %r9d # req_44(D)->cqe.D.81184.flags, _7 movl 80(%rbx), %r8d # req_44(D)->cqe.res, _8 movq 72(%rbx), %rcx # req_44(D)->cqe.user_data, _9 movq 88(%rbx), %rsi # req_44(D)->ctx, _10 ./arch/x86/include/asm/jump_label.h:27: asm_volatile_goto("1:" 1:jmp .L1772 # objtool NOPs this # ... It does a jump_label for actual tracing, but those 6 moves will stay there in the hottest io_uring path. As an optimisation, add a trace_io_uring_complete_enabled() check, which is also uses jump_labels, it tricks the compiler into behaving. It removes the junk without changing anything else int the hot path. Note: apparently, it's not only me noticing it, and people are also working it around. We should remove the check when it's solved generically or rework tracing. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/555d8312644b3776f4be7e23f9b92943875c4bc7.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2023-11-02 17:26:26 -04:00
Jeff Moyer	615948488d	io_uring: never overflow io_aux_cqe JIRA: https://issues.redhat.com/browse/RHEL-12076 commit b6b2bb58a75407660f638a68e6e34a07036146d0 Author: Pavel Begunkov <asml.silence@gmail.com> Date: Fri Aug 11 13:53:45 2023 +0100 io_uring: never overflow io_aux_cqe Now all callers of io_aux_cqe() set allow_overflow to false, remove the parameter and not allow overflowing auxilary multishot cqes. When CQ is full the function callers and all multishot requests in general are expected to complete the request. That prevents indefinite in-background grows of the overflow list and let's the userspace to handle the backlog at its own pace. Resubmitting a request should also be faster than accounting a bunch of overflows, so it should be better for perf when it happens, but a well behaving userspace should be trying to avoid overflows in any case. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/bb20d14d708ea174721e58bb53786b0521e4dd6d.1691757663.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2023-11-02 17:26:25 -04:00
Jeff Moyer	f9df5f9d35	io_uring: remove return from io_req_cqe_overflow() JIRA: https://issues.redhat.com/browse/RHEL-12076 commit 056695bffa4beed5668dd4aa11efb696eacb3ed9 Author: Pavel Begunkov <asml.silence@gmail.com> Date: Fri Aug 11 13:53:44 2023 +0100 io_uring: remove return from io_req_cqe_overflow() Nobody checks io_req_cqe_overflow()'s return, make it return void. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8f2029ad0c22f73451664172d834372608ee0a77.1691757663.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2023-11-02 17:26:25 -04:00
Jeff Moyer	b364f1b51d	io_uring: open code io_fill_cqe_req() JIRA: https://issues.redhat.com/browse/RHEL-12076 commit 00b0db562485fbb259cd4054346208ad0885d662 Author: Pavel Begunkov <asml.silence@gmail.com> Date: Fri Aug 11 13:53:43 2023 +0100 io_uring: open code io_fill_cqe_req() io_fill_cqe_req() is only called from one place, open code it, and rename __io_fill_cqe_req(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f432ce75bb1c94cadf0bd2add4d6aa510bd1fb36.1691757663.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2023-11-02 17:26:25 -04:00
Jeff Moyer	f4278e6959	io_uring: have io_file_put() take an io_kiocb rather than the file JIRA: https://issues.redhat.com/browse/RHEL-12076 commit 17bc28374cd06b7d2d3f1e88470ef89f9cd3a497 Author: Jens Axboe <axboe@kernel.dk> Date: Fri Jul 7 11:14:40 2023 -0600 io_uring: have io_file_put() take an io_kiocb rather than the file No functional changes in this patch, just a prep patch for needing the request in io_file_put(). Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2023-11-02 17:26:24 -04:00
Jeff Moyer	b5bb2a633e	io_uring: fix false positive KASAN warnings JIRA: https://issues.redhat.com/browse/RHEL-12076 commit 569f5308e54352a12181cc0185f848024c5443e8 Author: Pavel Begunkov <asml.silence@gmail.com> Date: Wed Aug 9 13:22:16 2023 +0100 io_uring: fix false positive KASAN warnings io_req_local_work_add() peeks into the work list, which can be executed in the meanwhile. It's completely fine without KASAN as we're in an RCU read section and it's SLAB_TYPESAFE_BY_RCU. With KASAN though it may trigger a false positive warning because internal io_uring caches are sanitised. Remove sanitisation from the io_uring request cache for now. Cc: stable@vger.kernel.org Fixes: 8751d15426a31 ("io_uring: reduce scheduling due to tw") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/c6fbf7a82a341e66a0007c76eefd9d57f2d3ba51.1691541473.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2023-11-02 15:32:16 -04:00
Jeff Moyer	0af9ccc401	io_uring: make io_cq_unlock_post static JIRA: https://issues.redhat.com/browse/RHEL-12076 commit 0fdb9a196c6728b51e0e7a4f6fa292d9fd5793de Author: Pavel Begunkov <asml.silence@gmail.com> Date: Fri Jun 23 12:23:30 2023 +0100 io_uring: make io_cq_unlock_post static io_cq_unlock_post() is exclusively used in io_uring/io_uring.c, mark it static and don't expose to other files. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3dc8127dda4514e1dd24bb32035faac887c5fa37.1687518903.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2023-11-02 15:31:56 -04:00
Jeff Moyer	4e1f9a9930	io_uring: remove IOU_F_TWQ_FORCE_NORMAL JIRA: https://issues.redhat.com/browse/RHEL-12076 commit 91c7884ac9a92ffbf78af7fc89603daf24f448a9 Author: Pavel Begunkov <asml.silence@gmail.com> Date: Fri Jun 23 12:23:26 2023 +0100 io_uring: remove IOU_F_TWQ_FORCE_NORMAL Extract a function for non-local task_work_add, and use it directly from io_move_task_work_from_local(). Now we don't use IOU_F_TWQ_FORCE_NORMAL and it can be killed. As a small positive side effect we don't grab task->io_uring in io_req_normal_work_add anymore, which is not needed for io_req_local_work_add(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2e55571e8ff2927ae3cc12da606d204e2485525b.1687518903.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2023-11-02 15:31:55 -04:00
Jeff Moyer	9372a586d8	io_uring: remove io_req_ffs_set JIRA: https://issues.redhat.com/browse/RHEL-12076 commit 3beed235d1a1d0a4ab093ab67ea6b2841e9d4fa2 Author: Christoph Hellwig <hch@lst.de> Date: Tue Jun 20 13:32:31 2023 +0200 io_uring: remove io_req_ffs_set Just checking the flag directly makes it a lot more obvious what is going on here. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620113235.920399-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2023-11-02 15:31:54 -04:00
Jeff Moyer	1d2aa06cb2	io_uring: cleanup io_aux_cqe() API JIRA: https://issues.redhat.com/browse/RHEL-12076 commit d86eaed185e9c6052d1ee2ca538f1936ff255887 Author: Jens Axboe <axboe@kernel.dk> Date: Wed Jun 7 14:41:20 2023 -0600 io_uring: cleanup io_aux_cqe() API Everybody is passing in the request, so get rid of the io_ring_ctx and explicit user_data pass-in. Both the ctx and user_data can be deduced from the request at hand. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2023-11-02 15:31:54 -04:00
Jeff Moyer	7803ca2813	io_uring: Add io_uring_setup flag to pre-register ring fd and never install it JIRA: https://issues.redhat.com/browse/RHEL-12076 commit 6e76ac595855db27bbdaef337173294a6fd6eb2c Author: Josh Triplett <josh@joshtriplett.org> Date: Sat Apr 29 01:40:30 2023 +0900 io_uring: Add io_uring_setup flag to pre-register ring fd and never install it With IORING_REGISTER_USE_REGISTERED_RING, an application can register the ring fd and use it via registered index rather than installed fd. This allows using a registered ring for everything except the initial mmap. With IORING_SETUP_NO_MMAP, io_uring_setup uses buffers allocated by the user, rather than requiring a subsequent mmap. The combination of the two allows a user to operate entirely via a registered ring fd, making it unnecessary to ever install the fd in the first place. So, add a flag IORING_SETUP_REGISTERED_FD_ONLY to make io_uring_setup register the fd and return a registered index, without installing the fd. This allows an application to avoid touching the fd table at all, and allows a library to never even momentarily install a file descriptor. This splits out an io_ring_add_registered_file helper from io_ring_add_registered_fd, for use by io_uring_setup. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Link: https://lore.kernel.org/r/bc8f431bada371c183b95a83399628b605e978a3.1682699803.git.josh@joshtriplett.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2023-11-02 15:31:53 -04:00
Jeff Moyer	28532e38e4	io_uring: Create a helper to return the SQE size JIRA: https://issues.redhat.com/browse/RHEL-12076 commit 96c7d4f81db0fea05c0792f7563ae0cb4ad5f022 Author: Breno Leitao <leitao@debian.org> Date: Thu May 4 05:18:54 2023 -0700 io_uring: Create a helper to return the SQE size Create a simple helper that returns the size of the SQE. The SQE could have two size, depending of the flags. If IO_URING_SETUP_SQE128 flag is set, then return a double SQE, otherwise returns the sizeof of io_uring_sqe (64 bytes). Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/20230504121856.904491-2-leitao@debian.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2023-11-02 15:31:34 -04:00
Jeff Moyer	4a914a7271	io_uring: add irq lockdep checks JIRA: https://issues.redhat.com/browse/RHEL-12076 commit 8ce4269eeedc5b31f5817f610b42cba8be8fa9de Author: Pavel Begunkov <asml.silence@gmail.com> Date: Tue Apr 11 12:06:03 2023 +0100 io_uring: add irq lockdep checks We don't post CQEs from the IRQ context, add a check catching that. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f23f7a24dbe8027b3d37873fece2b6488f878b31.1681210788.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2023-11-02 15:31:32 -04:00
Jeff Moyer	28b3762b5d	io_uring: reduce scheduling due to tw JIRA: https://issues.redhat.com/browse/RHEL-12076 Conflicts: We backported the sysctl patch out of order, which caused the patch to not apply cleanly. commit 8751d15426a31baaf40f7570263c27c3e5d1dc44 Author: Pavel Begunkov <asml.silence@gmail.com> Date: Thu Apr 6 14:20:12 2023 +0100 io_uring: reduce scheduling due to tw Every task_work will try to wake the task to be executed, which causes excessive scheduling and additional overhead. For some tw it's justified, but others won't do much but post a single CQE. When a task waits for multiple cqes, every such task_work will wake it up. Instead, the task may give a hint about how many cqes it waits for, io_req_local_work_add() will compare against it and skip wake ups if #cqes + #tw is not enough to satisfy the waiting condition. Task_work that uses the optimisation should be simple enough and never post more than one CQE. It's also ignored for non DEFER_TASKRUN rings. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d2b77e99d1e86624d8a69f7037d764b739dcd225.1680782017.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2023-11-02 15:31:31 -04:00

1 2 3

133 Commits