JIRA: https://issues.redhat.com/browse/RHEL-64867
commit a23800f08a60787dfbf2b87b2e6ed411cb629859
Author: Chenliang Li <cliang01.li@samsung.com>
Date: Wed Jun 19 14:38:19 2024 +0800
io_uring/rsrc: fix incorrect assignment of iter->nr_segs in io_import_fixed
In io_import_fixed when advancing the iter within the first bvec, the
iter->nr_segs is set to bvec->bv_len. nr_segs should be the number of
bvecs, plus we don't need to adjust it here, so just remove it.
Fixes: b000ae0ec2d7 ("io_uring/rsrc: optimise single entry advance")
Signed-off-by: Chenliang Li <cliang01.li@samsung.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20240619063819.2445-1-cliang01.li@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-64867
Conflicts: RHEL does not have commit 5e0a760b4441 ("mm, treewide:
rename MAX_ORDER to MAX_PAGE_ORDER").
commit f15ed8b4d0ce2c0831232ff85117418740f0c529
Author: Jens Axboe <axboe@kernel.dk>
Date: Wed Mar 27 14:59:09 2024 -0600
io_uring: move mapping/allocation helpers to a separate file
Move the related code from io_uring.c into memmap.c. No functional
changes in this patch, just cleaning it up a bit now that the full
transition is done.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-64867
commit 1943f96b3816e0f0d3d6686374d6e1d617c8b42c
Author: Jens Axboe <axboe@kernel.dk>
Date: Wed Mar 13 14:58:14 2024 -0600
io_uring: unify io_pin_pages()
Move it into io_uring.c where it belongs, and use it in there as well
rather than have two implementations of this.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-64867
commit 922a2c78f13611e2c08fc48f615c0cd367dcb6da
Author: Jens Axboe <axboe@kernel.dk>
Date: Mon Oct 2 18:25:23 2023 -0600
io_uring/rsrc: cleanup io_pin_pages()
This function is overly convoluted with a goto error path, and checks
under the mmap_read_lock() that don't need to be at all. Rearrange it
a bit so the checks and errors fall out naturally, rather than needing
to jump around for it.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-64867
commit 414d0f45c316221acbf066658afdbae5b354a5cc
Author: Jens Axboe <axboe@kernel.dk>
Date: Wed Mar 20 15:19:44 2024 -0600
io_uring/alloc_cache: switch to array based caching
Currently lists are being used to manage this, but best practice is
usually to have these in an array instead as that it cheaper to manage.
Outside of that detail, games are also played with KASAN as the list
is inside the cached entry itself.
Finally, all users of this need a struct io_cache_entry embedded in
their struct, which is union'ized with something else in there that
isn't used across the free -> realloc cycle.
Get rid of all of that, and simply have it be an array. This will not
change the memory used, as we're just trading an 8-byte member entry
for the per-elem array size.
This reduces the overhead of the recycled allocations, and it reduces
the amount of code code needed to support recycling to about half of
what it currently is.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27742
This patch is a backport of the following upstream commit:
commit 4c630f307455c06f99bdeca7f7a1ab5318604fe0
Author: Lorenzo Stoakes <lstoakes@gmail.com>
Date: Wed May 17 20:25:45 2023 +0100
mm/gup: remove vmas parameter from pin_user_pages()
We are now in a position where no caller of pin_user_pages() requires the
vmas parameter at all, so eliminate this parameter from the function and
all callers.
This clears the way to removing the vmas parameter from GUP altogether.
Link: https://lkml.kernel.org/r/195a99ae949c9f5cb589d2222b736ced96ec199a.1684350871.git.lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> [qib]
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> [drivers/media]
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Jarkko Sakkinen <jarkko@kernel.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rafael Aquini <raquini@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27742
This patch is a backport of the following upstream commit:
commit 34ed8d0dcd692378f1155fe27648f54f99adbfbf
Author: Lorenzo Stoakes <lstoakes@gmail.com>
Date: Wed May 17 20:25:42 2023 +0100
io_uring: rsrc: delegate VMA file-backed check to GUP
Now that the GUP explicitly checks FOLL_LONGTERM pin_user_pages() for
broken file-backed mappings in "mm/gup: disallow FOLL_LONGTERM GUP-nonfast
writing to file-backed mappings", there is no need to explicitly check VMAs
for this condition, so simply remove this logic from io_uring altogether.
Link: https://lkml.kernel.org/r/e4a4efbda9cd12df71e0ed81796dc630231a1ef2.1684350871.git.lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Jarkko Sakkinen <jarkko@kernel.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rafael Aquini <raquini@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-47830
CVE: CVE-2024-40922
commit 54559642b96116b45e4b5ca7fd9f7835b8561272
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Wed Jun 12 13:56:38 2024 +0100
io_uring/rsrc: don't lock while !TASK_RUNNING
There is a report of io_rsrc_ref_quiesce() locking a mutex while not
TASK_RUNNING, which is due to forgetting restoring the state back after
io_run_task_work_sig() and attempts to break out of the waiting loop.
do not call blocking ops when !TASK_RUNNING; state=1 set at
[<ffffffff815d2494>] prepare_to_wait+0xa4/0x380
kernel/sched/wait.c:237
WARNING: CPU: 2 PID: 397056 at kernel/sched/core.c:10099
__might_sleep+0x114/0x160 kernel/sched/core.c:10099
RIP: 0010:__might_sleep+0x114/0x160 kernel/sched/core.c:10099
Call Trace:
<TASK>
__mutex_lock_common kernel/locking/mutex.c:585 [inline]
__mutex_lock+0xb4/0x940 kernel/locking/mutex.c:752
io_rsrc_ref_quiesce+0x590/0x940 io_uring/rsrc.c:253
io_sqe_buffers_unregister+0xa2/0x340 io_uring/rsrc.c:799
__io_uring_register io_uring/register.c:424 [inline]
__do_sys_io_uring_register+0x5b9/0x2400 io_uring/register.c:613
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xd8/0x270 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x6f/0x77
Reported-by: Li Shi <sl1589472800@gmail.com>
Fixes: 4ea15b56f0810 ("io_uring/rsrc: use wq for quiescing")
Cc: stable@vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/77966bc104e25b0534995d5dbb152332bc8f31c0.1718196953.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-36366
CVE: CVE-2023-52656
Conflicts: We backported commit 4f0b9194bc11 ("fs: Rename
anon_inode_getfile_secure() and anon_inode_getfd_secure()"), which
obviously causes a conflict in io_uring_get_file().
commit 6e5e6d274956305f1fc0340522b38f5f5be74bdb
Author: Jens Axboe <axboe@kernel.dk>
Date: Tue Dec 19 12:36:34 2023 -0700
io_uring: drop any code related to SCM_RIGHTS
This is dead code after we dropped support for passing io_uring fds
over SCM_RIGHTS, get rid of it.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-19874
commit d6fef34ee4d102be448146f24caf96d7b4a05401
Author: Keith Busch <kbusch@kernel.org>
Date: Mon Nov 20 14:18:31 2023 -0800
io_uring: fix off-by one bvec index
If the offset equals the bv_len of the first registered bvec, then the
request does not include any of that first bvec. Skip it so that drivers
don't have to deal with a zero length bvec, which was observed to break
NVMe's PRP list creation.
Cc: stable@vger.kernel.org
Fixes: bd11b3a391 ("io_uring: don't use iov_iter_advance() for fixed buffers")
Signed-off-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20231120221831.2646460-1-kbusch@meta.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12076
commit 19a63c4021702e389a559726b16fcbf07a8a05f9
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Fri Aug 11 13:53:46 2023 +0100
io_uring/rsrc: keep one global dummy_ubuf
We set empty registered buffers to dummy_ubuf as an optimisation.
Currently, we allocate the dummy entry for each ring, whenever we can
simply have one global instance.
We're casting out const on assignment, it's fine as we're not going to
change the content of the dummy, the constness gives us an extra layer
of protection if sth ever goes wrong.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e4a96dda35ab755914bc43f6781bba0df97ac489.1691757663.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12076
commit 4bfb0c9af832a182a54e549123a634e0070c8d4f
Author: Christoph Hellwig <hch@lst.de>
Date: Tue Jun 20 13:32:35 2023 +0200
io_uring: add helpers to decode the fixed file file_ptr
Remove all the open coded magic on slot->file_ptr by introducing two
helpers that return the file pointer and the flags instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230620113235.920399-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12076
commit 776617db78c6d208780e7c69d4d68d1fa82913de
Author: Tobias Holl <tobias@tholl.xyz>
Date: Wed May 3 08:59:50 2023 -0600
io_uring/rsrc: check for nonconsecutive pages
Pages that are from the same folio do not necessarily need to be
consecutive. In that case, we cannot consolidate them into a single bvec
entry. Before applying the huge page optimization from commit 57bebf807e2a
("io_uring/rsrc: optimise registered huge pages"), check that the memory
is actually consecutive.
Cc: stable@vger.kernel.org
Fixes: 57bebf807e2a ("io_uring/rsrc: optimise registered huge pages")
Signed-off-by: Tobias Holl <tobias@tholl.xyz>
[axboe: formatting]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12076
commit fc7f3a8d3a78503c4f3e108155fb9a233dc307a4
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Tue Apr 18 14:06:40 2023 +0100
io_uring/rsrc: devirtualise rsrc put callbacks
We only have two rsrc types, buffers and files, replace virtual
callbacks for putting resources down with a switch..case.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/02ca727bf8e5f7f820c2f404e95ae88c8f472930.1681822823.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12076
commit 29b26c556e7439b1370ac6a59fce83a9d1521de1
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Tue Apr 18 14:06:39 2023 +0100
io_uring/rsrc: pass node to io_rsrc_put_work()
Instead of passing rsrc_data and a resource to io_rsrc_put_work() just
forward node, that's all the function needs.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/791e8edd28d78797240b74d34e99facbaad62f3b.1681822823.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12076
commit 26147da37f3e52041d9deba189d39f27ce78a84f
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Tue Apr 18 14:06:37 2023 +0100
io_uring/rsrc: add empty flag in rsrc_node
Unless a node was flushed by io_rsrc_ref_quiesce(), it'll carry a
resource. Replace ->inline_items with an empty flag, which is
initialised to false and only raised in io_rsrc_ref_quiesce().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/75d384c9d2252e12af73b9cf8a44e1699106aeb1.1681822823.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12076
commit c376644fb915fbdea8c4a04f859d032a8be352fd
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Tue Apr 18 14:06:36 2023 +0100
io_uring/rsrc: merge nodes and io_rsrc_put
struct io_rsrc_node carries a number of resources represented by struct
io_rsrc_put. That was handy before for sync overhead ammortisation, but
all complexity is gone and nodes are simple and lightweight. Let's
allocate a separate node for each resource.
Nodes and io_rsrc_put and not much different in size, and former are
cached, so node allocation should work better. That also removes some
overhead for nested iteration in io_rsrc_node_ref_zero() /
__io_rsrc_put_work().
Another reason for the patch is that it greatly reduces complexity
by moving io_rsrc_node_switch[_start]() inside io_queue_rsrc_removal(),
so users don't have to care about it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c7d3a45b30cc14cd93700a710dd112edc703db98.1681822823.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12076
commit 63fea89027ff4fd4f350b471ad5b9220d373eec5
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Tue Apr 18 14:06:35 2023 +0100
io_uring/rsrc: infer node from ctx on io_queue_rsrc_removal
For io_queue_rsrc_removal() we should always use the current active rsrc
node, don't pass it directly but let the function grab it from the
context.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d15939b4afea730978b4925685c2577538b823bb.1681822823.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12076
commit c899a5d7d0eca054546b63e95c94b1e609516f84
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Thu Apr 13 15:28:14 2023 +0100
io_uring/rsrc: refactor io_queue_rsrc_removal
We can queue up a rsrc into a list in io_queue_rsrc_removal() while
allocating io_rsrc_put and so simplify the function.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/36bd708ee25c0e2e7992dc19b17db166eea9ac40.1681395792.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12076
commit 2f2af35f8e5a1ed552ed02e47277d50092a2b9f6
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Thu Apr 13 15:28:11 2023 +0100
io_uring/rsrc: inline switch_start fast path
Inline the part of io_rsrc_node_switch_start() that checks whether the
cache is empty or not, as most of the times it will have some number of
entries in there.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9619c1717a0e01f22c5fce2f1ba2735f804da0f2.1681395792.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12076
commit 0b222eeb6514ba6c3457b667fa4f3645032e1fc9
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Thu Apr 13 15:28:10 2023 +0100
io_uring/rsrc: remove rsrc_data refs
Instead of waiting for rsrc_data->refs to be downed to zero, check
whether there are rsrc nodes queued for completion, that's easier then
maintaining references.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8e33fd143d83e11af3e386aea28eb6d6c6a1be10.1681395792.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12076
commit 7d481e0356334eb2de254414769b4bed4b2a8827
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Thu Apr 13 15:28:09 2023 +0100
io_uring/rsrc: fix DEFER_TASKRUN rsrc quiesce
For io_rsrc_ref_quiesce() to progress it should execute all task_work
items, including deferred ones. However, currently nobody would wake us,
and so let's set ctx->cq_wait_nr, so io_req_local_work_add() would wake
us up.
Fixes: c0e0d6ba25f18 ("io_uring: add IORING_SETUP_DEFER_TASKRUN")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f1a90d1bc5ebf096475b018fed52e54f3b89d4af.1681395792.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12076
commit 4ea15b56f0810f0d8795d475db1bb74b3a7c1b2f
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Thu Apr 13 15:28:08 2023 +0100
io_uring/rsrc: use wq for quiescing
Replace completions with waitqueues for rsrc data quiesce, the main
wakeup condition is when data refs hit zero. Note that data refs are
only changes under ->uring_lock, so we prepare before mutex_unlock()
reacquire it after taking the lock back. This change will be needed
in the next patch.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1d0dbc74b3b4fd67c8f01819e680c5e0da252956.1681395792.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12076
commit eef81fcaa61e1bc6b7735be65f41bbf1a8efd133
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Thu Apr 13 15:28:07 2023 +0100
io_uring/rsrc: refactor io_rsrc_ref_quiesce
Refactor io_rsrc_ref_quiesce() by moving the first mutex_unlock(),
so we don't have to have a second mutex_unlock() further in the loop.
It prepares us to the next patch.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/65bc876271fb16bf550a53a4c76c91aacd94e52e.1681395792.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12076
commit d581076b6a85c6f8308a4ba2bdcd82651f5183df
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Tue Apr 11 12:06:08 2023 +0100
io_uring/rsrc: extract SCM file put helper
SCM file accounting is a slow path and is only used for UNIX files.
Extract a helper out of io_rsrc_file_put() that does the SCM
unaccounting.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/58cc7bffc2ee96bec8c2b89274a51febcbfa5556.1681210788.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12076
commit 2933ae6eaa05e8db6ad33a3ca12af18d2a25358c
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Tue Apr 11 12:06:07 2023 +0100
io_uring/rsrc: refactor io_rsrc_node_switch
We use io_rsrc_node_switch() coupled with io_rsrc_node_switch_start()
for a bunch of cases including initialising ctx->rsrc_node, i.e. by
passing NULL instead of rsrc_data. Leave it to only deal with actual
node changing.
For that, first remove it from io_uring_create() and add a function
allocating the first node. Then also remove all calls to
io_rsrc_node_switch() from files/buffers register as we already have a
node installed and it does essentially nothing.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d146fe306ff98b1a5a60c997c252534f03d423d7.1681210788.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12076
commit 13c223962eac16f161cf9b6355209774c609af28
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Tue Apr 11 12:06:06 2023 +0100
io_uring/rsrc: zero node's rsrc data on alloc
struct io_rsrc_node::rsrc_data field is initialised on rsrc removal and
shouldn't be used before that, still let's play safe and zero the field
on alloc.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/09bd03cedc8da8a7974c5e6e4bf0489fd16593ab.1681210788.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12076
commit 757ef4682b6aa29fdf752ad47f0d63eb48b261cf
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Tue Apr 4 13:39:56 2023 +0100
io_uring/rsrc: optimise io_rsrc_data refcounting
Every struct io_rsrc_node takes a struct io_rsrc_data reference, which
means all rsrc updates do 2 extra atomics. Replace atomics refcounting
with a int as it's all done under ->uring_lock.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e73c3d6820cf679532696d790b5b8fae23537213.1680576071.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12076
commit 9eae8655f9cd2eeed99fb7a0d2bb22816c17e497
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Tue Apr 4 13:39:54 2023 +0100
io_uring/rsrc: cache struct io_rsrc_node
Add allocation cache for struct io_rsrc_node, it's always allocated and
put under ->uring_lock, so it doesn't need any extra synchronisation
around caches.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/252a9d9ef9654e6467af30fdc02f57c0118fb76e.1680576071.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12076
commit 36b9818a5a84cb7c977fb723babca1c8d74f288f
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Tue Apr 4 13:39:53 2023 +0100
io_uring/rsrc: don't offload node free
struct delayed_work rsrc_put_work was previously used to offload node
freeing because io_rsrc_node_ref_zero() was previously called by RCU in
the IRQ context. Now, as percpu refcounting is gone, we can do it
eagerly at the spot without pushing it to a worker.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/13fb1aac1e8d068ad8fd4a0c6d0d157ab61b90c0.1680576071.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12076
commit ff7c75ecaa9e6b251f76c24e289d4bfe413ffe31
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Tue Apr 4 13:39:52 2023 +0100
io_uring/rsrc: optimise io_rsrc_put allocation
Every io_rsrc_node keeps a list of items to put, and all entries are
kmalloc()'ed. However, it's quite often to queue up only one entry per
node, so let's add an inline entry there to avoid extra allocations.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c482c1c652c45c85ac52e67c974bc758a50fed5f.1680576071.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12076
commit c824986c113f15e2ef2c00da9a226c09ecaac74c
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Tue Apr 4 13:39:51 2023 +0100
io_uring/rsrc: rename rsrc_list
We have too many "rsrc" around which makes the name of struct
io_rsrc_node::rsrc_list confusing. The field is responsible for keeping
a list of files or buffers, so call it item_list and add comments
around.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3e34d4dfc1fdbb6b520f904ee6187c2ccf680efe.1680576071.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12076
commit 0a4813b1abdf06e44ce60cdebfd374cfd27c46bf
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Tue Apr 4 13:39:50 2023 +0100
io_uring/rsrc: kill rsrc_ref_lock
We use ->rsrc_ref_lock spinlock to protect ->rsrc_ref_list in
io_rsrc_node_ref_zero(). Now we removed pcpu refcounting, which means
io_rsrc_node_ref_zero() is not executed from the irq context as an RCU
callback anymore, and we also put it under ->uring_lock.
io_rsrc_node_switch(), which queues up nodes into the list, is also
protected by ->uring_lock, so we can safely get rid of ->rsrc_ref_lock.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6b60af883c263551190b526a55ff2c9d5ae07141.1680576071.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12076
commit ef8ae64ffa9578c12e44de42604004c2cc3e9c27
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Tue Apr 4 13:39:49 2023 +0100
io_uring/rsrc: protect node refs with uring_lock
Currently, for nodes we have an atomic counter and some cached
(non-atomic) refs protected by uring_lock. Let's put all ref
manipulations under uring_lock and get rid of the atomic part.
It's free as in all cases we care about we already hold the lock.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/25b142feed7d831008257d90c8b17c0115d4fc15.1680576071.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12076
commit 8e15c0e71b8ae64fb7163532860f8d608165281f
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Tue Apr 4 13:39:46 2023 +0100
io_uring/rsrc: keep cached refs per node
We cache refs of the current node (i.e. ctx->rsrc_node) in
ctx->rsrc_cached_refs. We'll be moving away from atomics, so move the
cached refs in struct io_rsrc_node for now. It's a prep patch and
shouldn't change anything in practise.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9edc3669c1d71b06c2dca78b2b2b8bb9292738b9.1680576071.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12076
commit b8fb5b4fdd67f9d18109c5d21d44a8bd4ddb608b
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Tue Apr 4 13:39:45 2023 +0100
io_uring/rsrc: use non-pcpu refcounts for nodes
One problem with the current rsrc infra is that often updates will
generates lots of rsrc nodes, each carry pcpu refs. That takes quite a
lot of memory, especially if there is a stall, and takes lots of CPU
cycles. Only pcpu allocations takes >50 of CPU with a naive benchmark
updating files in a loop.
Replace pcpu refs with normal refcounting. There is already a hot path
avoiding atomics / refs, but following patches will further improve it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e9ed8a9457b331a26555ff9443afc64cdaab7247.1680576071.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12076
commit d2acf789088bb562cea342b6a24e646df4d47839
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Thu Mar 16 15:26:05 2023 +0000
io_uring/rsrc: fix folio accounting
| BUG: Bad page state in process kworker/u8:0 pfn:5c001
| page:00000000bfda61c8 refcount:0 mapcount:0 mapping:0000000000000000 index:0x20001 pfn:0x5c001
| head:0000000011409842 order:9 entire_mapcount:0 nr_pages_mapped:0 pincount:1
| anon flags: 0x3fffc00000b0004(uptodate|head|mappedtodisk|swapbacked|node=0|zone=0|lastcpupid=0xffff)
| raw: 03fffc0000000000 fffffc0000700001 ffffffff00700903 0000000100000000
| raw: 0000000000000200 0000000000000000 00000000ffffffff 0000000000000000
| head: 03fffc00000b0004 dead000000000100 dead000000000122 ffff00000a809dc1
| head: 0000000000020000 0000000000000000 00000000ffffffff 0000000000000000
| page dumped because: nonzero pincount
| CPU: 3 PID: 9 Comm: kworker/u8:0 Not tainted 6.3.0-rc2-00001-gc6811bf0cd87 #1
| Hardware name: linux,dummy-virt (DT)
| Workqueue: events_unbound io_ring_exit_work
| Call trace:
| dump_backtrace+0x13c/0x208
| show_stack+0x34/0x58
| dump_stack_lvl+0x150/0x1a8
| dump_stack+0x20/0x30
| bad_page+0xec/0x238
| free_tail_pages_check+0x280/0x350
| free_pcp_prepare+0x60c/0x830
| free_unref_page+0x50/0x498
| free_compound_page+0xcc/0x100
| free_transhuge_page+0x1f0/0x2b8
| destroy_large_folio+0x80/0xc8
| __folio_put+0xc4/0xf8
| gup_put_folio+0xd0/0x250
| unpin_user_page+0xcc/0x128
| io_buffer_unmap+0xec/0x2c0
| __io_sqe_buffers_unregister+0xa4/0x1e0
| io_ring_exit_work+0x68c/0x1188
| process_one_work+0x91c/0x1a58
| worker_thread+0x48c/0xe30
| kthread+0x278/0x2f0
| ret_from_fork+0x10/0x20
Mark reports an issue with the recent patches coalescing compound pages
while registering them in io_uring. The reason is that we try to drop
excessive references with folio_put_refs(), but pages were acquired
with pin_user_pages(), which has extra accounting and so should be put
down with matching unpin_user_pages() or at least gup_put_folio().
As a fix unpin_user_pages() all but first page instead, and let's figure
out a better API after.
Fixes: 57bebf807e2abcf8 ("io_uring/rsrc: optimise registered huge pages")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/10efd5507d6d1f05ea0f3c601830e08767e189bd.1678980230.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12076
commit 6acd352dfee558194643adbed7e849fe80fd1b93
Author: Li zeming <zeming@nfschina.com>
Date: Sat Mar 18 02:25:38 2023 +0800
io_uring: rsrc: Optimize return value variable 'ret'
The initialization assignment of the variable ret is changed to 0, only
in 'goto fail;' Use the ret variable as the function return value.
Signed-off-by: Li zeming <zeming@nfschina.com>
Link: https://lore.kernel.org/r/20230317182538.3027-1-zeming@nfschina.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12076
commit 977bc87356107fb946fb4ff24f1e4c241b5043ec
Author: Jens Axboe <axboe@kernel.dk>
Date: Fri Feb 24 09:54:57 2023 -0700
io_uring/rsrc: always initialize 'folio' to NULL
Smatch complains that:
smatch warnings:
io_uring/rsrc.c:1262 io_sqe_buffer_register() error: uninitialized symbol 'folio'.
'folio' may be used uninitialized, which can happen if we end up with a
single page mapped. Ensure that we clear folio to NULL at the top so
it's always set.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/r/202302241432.YML1CD5C-lkp@intel.com/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12076
commit 57bebf807e2abcf87d96b9de1266104ee2d8fc2f
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Wed Feb 22 14:36:51 2023 +0000
io_uring/rsrc: optimise registered huge pages
When registering huge pages, internally io_uring will split them into
many PAGE_SIZE bvec entries. That's bad for performance as drivers need
to eventually dma-map the data and will do it individually for each bvec
entry. Coalesce huge pages into one large bvec.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12076
commit b000ae0ec2d709046ac1a3c5722fea417f8a067e
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Wed Feb 22 14:36:50 2023 +0000
io_uring/rsrc: optimise single entry advance
Iterating within the first bvec entry should be essentially free, but we
use iov_iter_advance() for that, which shows up in benchmark profiles
taking up to 0.5% of CPU. Replace it with a hand coded version.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12076
commit 6bf65a1b3668b04bb6c8126494d00303104eb9e5
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Mon Feb 20 14:13:52 2023 +0000
io_uring/rsrc: fix a comment in io_import_fixed()
io_import_fixed() supports offsets, but "may not" means the opposite.
Replace it with "might not" so the comments rather speaks about
possible cases.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/5b5f79958456caa6dc532f6205f75f224b232c81.1676902343.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12076
commit cc342a21930f0e3862c5fd0871cd5a65c5b59e27
Author: Christoph Hellwig <hch@lst.de>
Date: Fri Feb 3 16:06:29 2023 +0100
io_uring: use bvec_set_page to initialize a bvec
Use the bvec_set_page helper to initialize a bvec.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20230203150634.3199647-19-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>