Centos-kernel-stream-9

Commit Graph

Author	SHA1	Message	Date
Rado Vrbovsky	f55e4a4e81	Merge: CNB96: page_pool: update to v6.12 MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5432 JIRA: https://issues.redhat.com/browse/RHEL-57765 Updating page_pool to upstream v6.12 where necessary to enable driver updates. Signed-off-by: Felix Maurer <fmaurer@redhat.com> Approved-by: Ivan Vecera <ivecera@redhat.com> Approved-by: Petr Oros <poros@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>	2024-11-27 11:19:28 +00:00
Paolo Abeni	ba209d7616	net: page_pool: fix warning code JIRA: https://issues.redhat.com/browse/RHEL-62849 Tested: LNST, Tier1 Upstream commit: commit 946b6c48cca48591fb495508c5dbfade767173d0 Author: Johannes Berg <johannes.berg@intel.com> Date: Fri Jul 5 13:42:06 2024 +0200 net: page_pool: fix warning code WARN_ON_ONCE("string") doesn't really do what appears to be intended, so fix that. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Fixes: 90de47f020db ("page_pool: fragment API support for 32-bit arch with 64-bit DMA") Link: https://patch.msgid.link/20240705134221.2f4de205caa1.I28496dc0f2ced580282d1fb892048017c4491e21@changeid Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-11-15 09:21:34 +01:00
Felix Maurer	ab82ee54a3	page_pool: export page_pool_disable_direct_recycling() JIRA: https://issues.redhat.com/browse/RHEL-57765 Conflicts: - net/core/page_pool.c: Context difference due to missing 4a96a4e807c3 ("page_pool: check for PP direct cache locality later") commit d7f39aee79f04eeaa42085728423501b33ac5be5 Author: David Wei <dw@davidwei.uk> Date: Wed Jun 26 20:01:59 2024 -0700 page_pool: export page_pool_disable_direct_recycling() 56ef27e3 unexported page_pool_unlink_napi() and renamed it to page_pool_disable_direct_recycling(). This is because there was no in-tree user of page_pool_unlink_napi(). Since then Rx queue API and an implementation in bnxt got merged. In the bnxt implementation, it broadly follows the following steps: allocate new queue memory + page pool, stop old rx queue, swap, then destroy old queue memory + page pool. The existing NAPI instance is re-used so when the old page pool that is no longer used but still linked to this shared NAPI instance is destroyed, it will trigger warnings. In my initial patches I unlinked a page pool from a NAPI instance directly. Instead, export page_pool_disable_direct_recycling() and call that instead to avoid having a driver touch a core struct. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Felix Maurer <fmaurer@redhat.com>	2024-11-06 18:32:15 +01:00
Felix Maurer	d4192ad886	page_pool: check for DMA sync shortcut earlier JIRA: https://issues.redhat.com/browse/RHEL-57765 Conflicts: - net/core/page_pool.c: upstream ef9226cd56b7 ("page_pool: constify some read-only function arguments") and this commit happened in parallel leading to conflicts in which args of {,__}page_pool_dma_sync_for_device() were const; this was resolved in daa121128a2d ("Merge tag 'dma-mapping-6.10-2024-05-20' of git://git.infradead.org/users/hch/dma-mapping"). Fixing accordingly in the backport: page and page_pool args should be const. commit 4321de4497b24fbf22389331f4ecd4039a451aa9 Author: Alexander Lobakin <aleksander.lobakin@intel.com> Date: Tue May 7 13:20:25 2024 +0200 page_pool: check for DMA sync shortcut earlier We can save a couple more function calls in the Page Pool code if we check for dma_need_sync() earlier, just when we test pp->p.dma_sync. Move both these checks into an inline wrapper and call the PP wrapper over the generic DMA sync function only when both are true. You can't cache the result of dma_need_sync() in &page_pool, as it may change anytime if an SWIOTLB buffer is allocated or mapped. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Felix Maurer <fmaurer@redhat.com>	2024-11-06 18:32:15 +01:00
Felix Maurer	ea81551570	page_pool: don't use driver-set flags field directly JIRA: https://issues.redhat.com/browse/RHEL-57765 commit 403f11ac9ab72fc3bee0b8c80c16e33212ea8cd9 Author: Alexander Lobakin <aleksander.lobakin@intel.com> Date: Tue May 7 13:20:24 2024 +0200 page_pool: don't use driver-set flags field directly page_pool::p is driver-defined params, copied directly from the structure passed to page_pool_create(). The structure isn't meant to be modified by the Page Pool core code and this even might look confusing[0][1]. In order to be able to alter some flags, let's define our own, internal fields the same way as the already existing one (::has_init_callback). They are defined as bits in the driver-set params, leave them so here as well, to not waste byte-per-bit or so. Almost 30 bits are still free for future extensions. We could've defined only new flags here or only the ones we may need to alter, but checking some flags in one place while others in another doesn't sound convenient or intuitive. ::flags passed by the driver can now go to the "slow" PP params. Suggested-by: Jakub Kicinski <kuba@kernel.org> Link[0]: https://lore.kernel.org/netdev/20230703133207.4f0c54ce@kernel.org Suggested-by: Alexander Duyck <alexanderduyck@fb.com> Link[1]: https://lore.kernel.org/netdev/CAKgT0UfZCGnWgOH96E4GV3ZP6LLbROHM7SHE8NKwq+exX+Gk_Q@mail.gmail.com Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Felix Maurer <fmaurer@redhat.com>	2024-11-06 18:32:15 +01:00
Felix Maurer	e3fdce3bee	page_pool: make sure frag API fields don't span between cachelines JIRA: https://issues.redhat.com/browse/RHEL-57765 commit 1f20a5769446a1acae67ac9e63d07a594829a789 Author: Alexander Lobakin <aleksander.lobakin@intel.com> Date: Tue May 7 13:20:23 2024 +0200 page_pool: make sure frag API fields don't span between cachelines After commit 5027ec19f104 ("net: page_pool: split the page_pool_params into fast and slow") that made &page_pool contain only "hot" params at the start, cacheline boundary chops frag API fields group in the middle again. To not bother with this each time fast params get expanded or shrunk, let's just align them to `4 * sizeof(long)`, the closest upper pow-2 to their actual size (2 longs + 1 int). This ensures 16-byte alignment for the 32-bit architectures and 32-byte alignment for the 64-bit ones, excluding unnecessary false-sharing. ::page_state_hold_cnt is used quite intensively on hotpath no matter if frag API is used, so move it to the newly created hole in the first cacheline. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Felix Maurer <fmaurer@redhat.com>	2024-11-06 18:32:15 +01:00
Felix Maurer	e46011f644	page_pool: constify some read-only function arguments JIRA: https://issues.redhat.com/browse/RHEL-57765 commit ef9226cd56b718c79184a3466d32984a51cb449c Author: Alexander Lobakin <aleksander.lobakin@intel.com> Date: Thu Apr 18 13:36:11 2024 +0200 page_pool: constify some read-only function arguments There are several functions taking pointers to data they don't modify. This includes statistics fetching, page and page_pool parameters, etc. Constify the pointers, so that call sites will be able to pass const pointers as well. No functional changes, no visible changes in functions sizes. Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Felix Maurer <fmaurer@redhat.com>	2024-11-06 18:32:15 +01:00
Felix Maurer	48b3e0a401	page_pool: try direct bulk recycling JIRA: https://issues.redhat.com/browse/RHEL-57765 commit 39806b96c89ae5d52092c8f86393ecbfaae26697 Author: Alexander Lobakin <aleksander.lobakin@intel.com> Date: Fri Mar 29 17:55:07 2024 +0100 page_pool: try direct bulk recycling Now that the checks for direct recycling possibility live inside the Page Pool core, reuse them when performing bulk recycling. page_pool_put_page_bulk() can be called from process context as well, page_pool_napi_local() takes care of this at the very beginning. Under high .ndo_xdp_xmit() traffic load, the win is 2-3% Pps assuming the sending driver uses xdp_return_frame_bulk() on Tx completion. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://lore.kernel.org/r/20240329165507.3240110-3-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Felix Maurer <fmaurer@redhat.com>	2024-11-06 18:32:15 +01:00
Felix Maurer	8643c21aa1	page_pool: check for PP direct cache locality later JIRA: https://issues.redhat.com/browse/RHEL-57765 Conflicts: - Context differences (missing skb_cow_data_for_xdp) due to missing e6d5dbdd20aa ("xdp: add multi-buff support for xdp running in generic mode") - net/core/skbuff.c: context difference (condition moved to function) due to missing 8cfa2dee325f ("skbuff: Add a function to check if a page belongs to page_pool") with no functional changes - net/core/skbuff.c: context difference (missing skb_kfree_head) due to missing bf9f1baa279f ("net: add dedicated kmem_cache for typical/small skb->head"); this can appear in revumatic as if skb_free_head was moved but that isn't true, the hunks are just reordered (check the line nums) commit 4a96a4e807c390a9d91b450ebe04eeb2e0ecc076 Author: Alexander Lobakin <aleksander.lobakin@intel.com> Date: Fri Mar 29 17:55:06 2024 +0100 page_pool: check for PP direct cache locality later Since we have pool->p.napi (Jakub) and pool->cpuid (Lorenzo) to check whether it's safe to use direct recycling, we can use both globally for each page instead of relying solely on @allow_direct argument. Let's assume that @allow_direct means "I'm sure it's local, don't waste time rechecking this" and when it's false, try the mentioned params to still recycle the page directly. If neither is true, we'll lose some CPU cycles, but then it surely won't be hotpath. On the other hand, paths where it's possible to use direct cache, but not possible to safely set @allow_direct, will benefit from this move. The whole propagation of @napi_safe through a dozen of skb freeing functions can now go away, which saves us some stack space. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://lore.kernel.org/r/20240329165507.3240110-2-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Felix Maurer <fmaurer@redhat.com>	2024-11-06 18:18:24 +01:00
Felix Maurer	e8f1b5267f	net: page_pool: factor out page_pool recycle check JIRA: https://issues.redhat.com/browse/RHEL-57765 commit 46f40172b68154106cae660c90c7801b61080892 Author: Mina Almasry <almasrymina@google.com> Date: Fri Mar 8 12:44:58 2024 -0800 net: page_pool: factor out page_pool recycle check The check is duplicated in 2 places, factor it out into a common helper. Signed-off-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Link: https://lore.kernel.org/r/20240308204500.1112858-1-almasrymina@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Felix Maurer <fmaurer@redhat.com>	2024-10-21 16:37:42 +02:00
Felix Maurer	2e7d822903	net: page_pool: fix recycle stats for system page_pool allocator JIRA: https://issues.redhat.com/browse/RHEL-57765 Conflicts: - net/core/page_pool.c: context difference due to missing aaf153aecef1 ("page_pool: halve BIAS_MAX for multiple user references of a fragment") commit f853fa5c54e7a0364a52125074dedeaf2c7ddace Author: Lorenzo Bianconi <lorenzo@kernel.org> Date: Fri Feb 16 10:25:43 2024 +0100 net: page_pool: fix recycle stats for system page_pool allocator Use global percpu page_pool_recycle_stats counter for system page_pool allocator instead of allocating a separate percpu variable for each (also percpu) page pool instance. Reviewed-by: Toke Hoiland-Jorgensen <toke@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://lore.kernel.org/r/87f572425e98faea3da45f76c3c68815c01a20ee.1708075412.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Felix Maurer <fmaurer@redhat.com>	2024-10-21 16:37:42 +02:00
Felix Maurer	edee1c1e12	page_pool: disable direct recycling based on pool->cpuid on destroy JIRA: https://issues.redhat.com/browse/RHEL-57765 commit 56ef27e3abe6d6453b1f4f6127041f3a65d7cbc9 Author: Alexander Lobakin <aleksander.lobakin@intel.com> Date: Thu Feb 15 12:39:05 2024 +0100 page_pool: disable direct recycling based on pool->cpuid on destroy Now that direct recycling is performed basing on pool->cpuid when set, memory leaks are possible: 1. A pool is destroyed. 2. Alloc cache is emptied (it's done only once). 3. pool->cpuid is still set. 4. napi_pp_put_page() does direct recycling basing on pool->cpuid. 5. Now alloc cache is not empty, but it won't ever be freed. In order to avoid that, rewrite pool->cpuid to -1 when unlinking NAPI to make sure no direct recycling will be possible after emptying the cache. This involves a bit of overhead as pool->cpuid now must be accessed via READ_ONCE() to avoid partial reads. Rename page_pool_unlink_napi() -> page_pool_disable_direct_recycling() to reflect what it actually does and unexport it. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20240215113905.96817-1-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Felix Maurer <fmaurer@redhat.com>	2024-10-21 16:37:42 +02:00
Wander Lairson Costa	de7a3b7b85	net: add generic percpu page_pool allocator JIRA: https://issues.redhat.com/browse/RHEL-9145 Conflicts: we already have 490a79faf95e ("net: introduce include/net/rps.h") commit 2b0cfa6e49566c8fa6759734cf821aa6e8271a9e Author: Lorenzo Bianconi <lorenzo@kernel.org> Date: Mon Feb 12 10:50:54 2024 +0100 net: add generic percpu page_pool allocator Introduce generic percpu page_pools allocator. Moreover add page_pool_create_percpu() and cpuid filed in page_pool struct in order to recycle the page in the page_pool "hot" cache if napi_pp_put_page() is running on the same cpu. This is a preliminary patch to add xdp multi-buff support for xdp running in generic mode. Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Toke Hoiland-Jorgensen <toke@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/80bc4285228b6f4220cd03de1999d86e46e3fcbd.1707729884.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Wander Lairson Costa <wander@redhat.com>	2024-09-16 16:04:27 -03:00
Petr Oros	77a2f42c86	page_pool: transition to reference count management after page draining JIRA: https://issues.redhat.com/browse/RHEL-31941 Upstream commit(s): commit 0a149ab78ee220c75eef797abea7a29f4490e226 Author: Liang Chen <liangchen.linux@gmail.com> Date: Tue Dec 12 12:46:11 2023 +0800 page_pool: transition to reference count management after page draining To support multiple users referencing the same fragment, 'pp_frag_count' is renamed to 'pp_ref_count', transitioning pp pages from fragment management to reference count management after draining based on the suggestion from [1]. The idea is that the concept of fragmenting exists before the page is drained, and all related functions retain their current names. However, once the page is drained, its management shifts to being governed by 'pp_ref_count'. Therefore, all functions associated with that lifecycle stage of a pp page are renamed. [1] http://lore.kernel.org/netdev/f71d9448-70c8-8793-dc9a-0eb48a570300@huawei.com Signed-off-by: Liang Chen <liangchen.linux@gmail.com> Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Reviewed-by: Mina Almasry <almasrymina@google.com> Link: https://lore.kernel.org/r/20231212044614.42733-2-liangchen.linux@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Petr Oros <poros@redhat.com>	2024-05-16 19:27:56 +02:00
Petr Oros	e777596229	net: page_pool: factor out releasing DMA from releasing the page JIRA: https://issues.redhat.com/browse/RHEL-31941 Upstream commit(s): commit c3f687d8dfeb33cffbb8f47c30002babfc4895d2 Author: Jakub Kicinski <kuba@kernel.org> Date: Thu Dec 7 16:52:32 2023 -0800 net: page_pool: factor out releasing DMA from releasing the page Releasing the DMA mapping will be useful for other types of pages, so factor it out. Make sure compiler inlines it, to avoid any regressions. Signed-off-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Petr Oros <poros@redhat.com>	2024-05-16 19:27:56 +02:00
Petr Oros	a58931e73e	net: page_pool: mute the periodic warning for visible page pools JIRA: https://issues.redhat.com/browse/RHEL-31941 Upstream commit(s): commit be0096676e230b43730b8936ac393d155b4e3262 Author: Jakub Kicinski <kuba@kernel.org> Date: Sun Nov 26 15:07:39 2023 -0800 net: page_pool: mute the periodic warning for visible page pools Mute the periodic "stalled pool shutdown" warning if the page pool is visible to user space. Rolling out a driver using page pools to just a few hundred hosts at Meta surfaces applications which fail to reap their broken sockets. Obviously it's best if the applications are fixed, but we don't generally print warnings for application resource leaks. Admins can now depend on the netlink interface for getting page pool info to detect buggy apps. While at it throw in the ID of the pool into the message, in rare cases (pools from destroyed netns) this will make finding the pool with a debugger easier. Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Petr Oros <poros@redhat.com>	2024-05-16 19:27:56 +02:00
Petr Oros	3d7a175988	net: page_pool: expose page pool stats via netlink JIRA: https://issues.redhat.com/browse/RHEL-31941 Upstream commit(s): commit d49010adae737638447369a4eff8f1aab736b076 Author: Jakub Kicinski <kuba@kernel.org> Date: Sun Nov 26 15:07:38 2023 -0800 net: page_pool: expose page pool stats via netlink Dump the stats into netlink. More clever approaches like dumping the stats per-CPU for each CPU individually to see where the packets get consumed can be implemented in the future. A trimmed example from a real (but recently booted system): $ ./cli.py --no-schema --spec netlink/specs/netdev.yaml \ --dump page-pool-stats-get [{'info': {'id': 19, 'ifindex': 2}, 'alloc-empty': 48, 'alloc-fast': 3024, 'alloc-refill': 0, 'alloc-slow': 48, 'alloc-slow-high-order': 0, 'alloc-waive': 0, 'recycle-cache-full': 0, 'recycle-cached': 0, 'recycle-released-refcnt': 0, 'recycle-ring': 0, 'recycle-ring-full': 0}, {'info': {'id': 18, 'ifindex': 2}, 'alloc-empty': 66, 'alloc-fast': 11811, 'alloc-refill': 35, 'alloc-slow': 66, 'alloc-slow-high-order': 0, 'alloc-waive': 0, 'recycle-cache-full': 1145, 'recycle-cached': 6541, 'recycle-released-refcnt': 0, 'recycle-ring': 1275, 'recycle-ring-full': 0}, {'info': {'id': 17, 'ifindex': 2}, 'alloc-empty': 73, 'alloc-fast': 62099, 'alloc-refill': 413, ... Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Petr Oros <poros@redhat.com>	2024-05-16 19:27:55 +02:00
Petr Oros	3c3422e0e9	net: page_pool: report when page pool was destroyed JIRA: https://issues.redhat.com/browse/RHEL-31941 Upstream commit(s): commit 69cb4952b6f6a226c1c0a7ca400398aaa8f75cf2 Author: Jakub Kicinski <kuba@kernel.org> Date: Sun Nov 26 15:07:37 2023 -0800 net: page_pool: report when page pool was destroyed Report when page pool was destroyed. Together with the inflight / memory use reporting this can serve as a replacement for the warning about leaked page pools we currently print to dmesg. Example output for a fake leaked page pool using some hacks in netdevsim (one "live" pool, and one "leaked" on the same dev): $ ./cli.py --no-schema --spec netlink/specs/netdev.yaml \ --dump page-pool-get [{'id': 2, 'ifindex': 3}, {'id': 1, 'ifindex': 3, 'destroyed': 133, 'inflight': 1}] Tested-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Petr Oros <poros@redhat.com>	2024-05-16 19:27:55 +02:00
Petr Oros	acb72c024e	net: page_pool: report amount of memory held by page pools JIRA: https://issues.redhat.com/browse/RHEL-31941 Upstream commit(s): commit 7aee8429eedd0970d8add2fb5b856bfc5f5f1fc1 Author: Jakub Kicinski <kuba@kernel.org> Date: Sun Nov 26 15:07:36 2023 -0800 net: page_pool: report amount of memory held by page pools Advanced deployments need the ability to check memory use of various system components. It makes it possible to make informed decisions about memory allocation and to find regressions and leaks. Report memory use of page pools. Report both number of references and bytes held. Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Petr Oros <poros@redhat.com>	2024-05-16 19:27:55 +02:00
Petr Oros	a4e8ab078c	net: page_pool: id the page pools JIRA: https://issues.redhat.com/browse/RHEL-31941 Upstream commit(s): commit f17c69649c698e4df3cfe0010b7bbf142dec3e40 Author: Jakub Kicinski <kuba@kernel.org> Date: Sun Nov 26 15:07:29 2023 -0800 net: page_pool: id the page pools To give ourselves the flexibility of creating netlink commands and ability to refer to page pool instances in uAPIs create IDs for page pools. Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Petr Oros <poros@redhat.com>	2024-05-16 19:27:55 +02:00
Petr Oros	843f234bb8	net: page_pool: factor out uninit JIRA: https://issues.redhat.com/browse/RHEL-31941 Upstream commit(s): commit 23cfaf67ba5d2f013d2576b8a9173c45a4a7f895 Author: Jakub Kicinski <kuba@kernel.org> Date: Sun Nov 26 15:07:28 2023 -0800 net: page_pool: factor out uninit We'll soon (next change in the series) need a fuller unwind path in page_pool_create() so create the inverse of page_pool_init(). Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Petr Oros <poros@redhat.com>	2024-05-16 19:27:55 +02:00
Petr Oros	9221d8eca7	net: page_pool: avoid touching slow on the fastpath JIRA: https://issues.redhat.com/browse/RHEL-31941 Upstream commit(s): commit 2da0cac1e9494f34c5a3438e5c4c7e662e1b7445 Author: Jakub Kicinski <kuba@kernel.org> Date: Mon Nov 20 16:00:35 2023 -0800 net: page_pool: avoid touching slow on the fastpath To fully benefit from previous commit add one byte of state in the first cache line recording if we need to look at the slow part. The packing isn't all that impressive right now, we create a 7B hole. I'm expecting Olek's rework will reshuffle this, anyway. Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Reviewed-by: Mina Almasry <almasrymina@google.com> Link: https://lore.kernel.org/r/20231121000048.789613-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Petr Oros <poros@redhat.com>	2024-05-16 19:27:55 +02:00
Petr Oros	9fda55f7c3	net: page_pool: split the page_pool_params into fast and slow JIRA: https://issues.redhat.com/browse/RHEL-31941 Upstream commit(s): commit 5027ec19f1049a07df5b0a37b1f462514cf2724b Author: Jakub Kicinski <kuba@kernel.org> Date: Mon Nov 20 16:00:34 2023 -0800 net: page_pool: split the page_pool_params into fast and slow struct page_pool is rather performance critical and we use 16B of the first cache line to store 2 pointers used only by test code. Future patches will add more informational (non-fast path) attributes. It's convenient for the user of the API to not have to worry which fields are fast and which are slow path. Use struct groups to split the params into the two categories internally. Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Link: https://lore.kernel.org/r/20231121000048.789613-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Petr Oros <poros@redhat.com>	2024-05-16 19:27:55 +02:00
Petr Oros	60c4332e7f	page_pool: introduce page_pool_alloc() API JIRA: https://issues.redhat.com/browse/RHEL-31941 Upstream commit(s): commit de97502e16fc406a74edee8359612e518986cf59 Author: Yunsheng Lin <linyunsheng@huawei.com> Date: Fri Oct 20 17:59:50 2023 +0800 page_pool: introduce page_pool_alloc() API Currently page pool supports the below use cases: use case 1: allocate page without page splitting using page_pool_alloc_pages() API if the driver knows that the memory it need is always bigger than half of the page allocated from page pool. use case 2: allocate page frag with page splitting using page_pool_alloc_frag() API if the driver knows that the memory it need is always smaller than or equal to the half of the page allocated from page pool. There is emerging use case [1] & [2] that is a mix of the above two case: the driver doesn't know the size of memory it need beforehand, so the driver may use something like below to allocate memory with least memory utilization and performance penalty: if (size << 1 > max_size) page = page_pool_alloc_pages(); else page = page_pool_alloc_frag(); To avoid the driver doing something like above, add the page_pool_alloc() API to support the above use case, and update the true size of memory that is acctually allocated by updating '*size' back to the driver in order to avoid exacerbating truesize underestimate problem. Rename page_pool_free() which is used in the destroy process to __page_pool_destroy() to avoid confusion with the newly added API. 1. https://lore.kernel.org/all/d3ae6bd3537fbce379382ac6a42f67e22f27ece2.1683896626.git.lorenzo@kernel.org/ 2. https://lore.kernel.org/all/20230526054621.18371-3-liangchen.linux@gmail.com/ Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Link: https://lore.kernel.org/r/20231020095952.11055-4-linyunsheng@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Petr Oros <poros@redhat.com>	2024-05-16 19:27:54 +02:00
Petr Oros	6c5988280c	page_pool: remove PP_FLAG_PAGE_FRAG JIRA: https://issues.redhat.com/browse/RHEL-31941 Conflicts: - drivers/net/ethernet/hisilicon/hns3/hns3_enet.c: chunk skipped due to missing 93188e9642c3ce ("net: hns3: support skb's frag page recycling based on page pool") - drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c chunk skipped due to missing b2e3406a38f0f4 ("octeontx2-pf: Add support for page pool") Upstream commit(s): commit 09d96ee5674a0eaa800c664353756ecc45c4a87f Author: Yunsheng Lin <linyunsheng@huawei.com> Date: Fri Oct 20 17:59:49 2023 +0800 page_pool: remove PP_FLAG_PAGE_FRAG PP_FLAG_PAGE_FRAG is not really needed after pp_frag_count handling is unified and page_pool_alloc_frag() is supported in 32-bit arch with 64-bit DMA, so remove it. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Link: https://lore.kernel.org/r/20231020095952.11055-3-linyunsheng@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Petr Oros <poros@redhat.com>	2024-05-16 19:27:54 +02:00
Petr Oros	62b3753c2d	page_pool: unify frag_count handling in page_pool_is_last_frag() JIRA: https://issues.redhat.com/browse/RHEL-31941 Upstream commit(s): commit 58d53d8f7da63dd13903bec0a40b3009a841b61b Author: Yunsheng Lin <linyunsheng@huawei.com> Date: Fri Oct 20 17:59:48 2023 +0800 page_pool: unify frag_count handling in page_pool_is_last_frag() Currently when page_pool_create() is called with PP_FLAG_PAGE_FRAG flag, page_pool_alloc_pages() is only allowed to be called under the below constraints: 1. page_pool_fragment_page() need to be called to setup page->pp_frag_count immediately. 2. page_pool_defrag_page() often need to be called to drain the page->pp_frag_count when there is no more user will be holding on to that page. Those constraints exist in order to support a page to be split into multi fragments. And those constraints have some overhead because of the cache line dirtying/bouncing and atomic update. Those constraints are unavoidable for case when we need a page to be split into more than one fragment, but there is also case that we want to avoid the above constraints and their overhead when a page can't be split as it can only hold a fragment as requested by user, depending on different use cases: use case 1: allocate page without page splitting. use case 2: allocate page with page splitting. use case 3: allocate page with or without page splitting depending on the fragment size. Currently page pool only provide page_pool_alloc_pages() and page_pool_alloc_frag() API to enable the 1 & 2 separately, so we can not use a combination of 1 & 2 to enable 3, it is not possible yet because of the per page_pool flag PP_FLAG_PAGE_FRAG. So in order to allow allocating unsplit page without the overhead of split page while still allow allocating split page we need to remove the per page_pool flag in page_pool_is_last_frag(), as best as I can think of, it seems there are two methods as below: 1. Add per page flag/bit to indicate a page is split or not, which means we might need to update that flag/bit everytime the page is recycled, dirtying the cache line of 'struct page' for use case 1. 2. Unify the page->pp_frag_count handling for both split and unsplit page by assuming all pages in the page pool is split into a big fragment initially. As page pool already supports use case 1 without dirtying the cache line of 'struct page' whenever a page is recyclable, we need to support the above use case 3 with minimal overhead, especially not adding any noticeable overhead for use case 1, and we are already doing an optimization by not updating pp_frag_count in page_pool_defrag_page() for the last fragment user, this patch chooses to unify the pp_frag_count handling to support the above use case 3. There is no noticeable performance degradation and some justification for unifying the frag_count handling with this patch applied using a micro-benchmark testing in [1]. 1. https://lore.kernel.org/all/bf2591f8-7b3c-4480-bb2c-31dc9da1d6ac@huawei.com/ Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Link: https://lore.kernel.org/r/20231020095952.11055-2-linyunsheng@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Petr Oros <poros@redhat.com>	2024-05-16 19:27:54 +02:00
Petr Oros	1eb93c611d	page_pool: fragment API support for 32-bit arch with 64-bit DMA JIRA: https://issues.redhat.com/browse/RHEL-31941 Upstream commit(s): commit 90de47f020db086f7929e09f64efd0cf627d6869 Author: Yunsheng Lin <linyunsheng@huawei.com> Date: Fri Oct 13 14:48:21 2023 +0800 page_pool: fragment API support for 32-bit arch with 64-bit DMA Currently page_pool_alloc_frag() is not supported in 32-bit arch with 64-bit DMA because of the overlap issue between pp_frag_count and dma_addr_upper in 'struct page' for those arches, which seems to be quite common, see [1], which means driver may need to handle it when using fragment API. It is assumed that the combination of the above arch with an address space >16TB does not exist, as all those arches have 64b equivalent, it seems logical to use the 64b version for a system with a large address space. It is also assumed that dma address is page aligned when we are dma mapping a page aligned buffer, see [2]. That means we're storing 12 bits of 0 at the lower end for a dma address, we can reuse those bits for the above arches to support 32b+12b, which is 16TB of memory. If we make a wrong assumption, a warning is emitted so that user can report to us. 1. https://lore.kernel.org/all/20211117075652.58299-1-linyunsheng@huawei.com/ 2. https://lore.kernel.org/all/20230818145145.4b357c89@kernel.org/ Tested-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Link: https://lore.kernel.org/r/20231013064827.61135-2-linyunsheng@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Petr Oros <poros@redhat.com>	2024-05-16 19:27:54 +02:00
Petr Oros	29bd06e0f7	page_pool: add a lockdep check for recycling in hardirq JIRA: https://issues.redhat.com/browse/RHEL-31941 Upstream commit(s): commit ff4e538c8c3e675a15e1e49509c55951832e0451 Author: Jakub Kicinski <kuba@kernel.org> Date: Fri Aug 4 20:05:28 2023 +0200 page_pool: add a lockdep check for recycling in hardirq Page pool use in hardirq is prohibited, add debug checks to catch misuses. IIRC we previously discussed using DEBUG_NET_WARN_ON_ONCE() for this, but there were concerns that people will have DEBUG_NET enabled in perf testing. I don't think anyone enables lockdep in perf testing, so use lockdep to avoid pushback and arguing :) Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/20230804180529.2483231-6-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Petr Oros <poros@redhat.com>	2024-05-16 19:27:54 +02:00
Petr Oros	163398a3e1	net: page_pool: merge page_pool_release_page() with page_pool_return_page() JIRA: https://issues.redhat.com/browse/RHEL-31941 Upstream commit(s): commit 07e0c7d3179da5d06132f3d71b740aa91bde52aa Author: Jakub Kicinski <kuba@kernel.org> Date: Wed Jul 19 18:04:09 2023 -0700 net: page_pool: merge page_pool_release_page() with page_pool_return_page() Now that page_pool_release_page() is not exported we can merge it with page_pool_return_page(). I believe that the "Do not replace this with page_pool_return_page()" comment was there in case page_pool_return_page() was not inlined, to avoid two function calls. Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com> Link: https://lore.kernel.org/r/20230720010409.1967072-5-kuba@kernel.org Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Petr Oros <poros@redhat.com>	2024-05-16 19:27:53 +02:00
Petr Oros	59ed484266	net: page_pool: hide page_pool_release_page() JIRA: https://issues.redhat.com/browse/RHEL-31941 Conflicts: - adjusted conflicts due to already applied 82e896d992fa63 ("docs: net: page_pool: use kdoc to avoid duplicating the information") and a9ca9f9ceff382 ("page_pool: split types and declarations from page_pool.h") Upstream commit(s): commit 535b9c61bdef6017228c708128b7849a476f8da5 Author: Jakub Kicinski <kuba@kernel.org> Date: Wed Jul 19 18:04:08 2023 -0700 net: page_pool: hide page_pool_release_page() There seems to be no user calling page_pool_release_page() for legit reasons, all the users simply haven't been converted to skb-based recycling, yet. Previous changes converted them. Update the docs, and unexport the function. Link: https://lore.kernel.org/r/20230720010409.1967072-4-kuba@kernel.org Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Petr Oros <poros@redhat.com>	2024-05-16 19:27:53 +02:00
Petr Oros	4534a6bc69	page_pool: add DMA_ATTR_WEAK_ORDERING on all mappings JIRA: https://issues.redhat.com/browse/RHEL-31941 Upstream commit(s): commit 8e4c62c7d980eaf0f64c1c0ef0c80f5685af0fb6 Author: Jakub Kicinski <kuba@kernel.org> Date: Mon Apr 17 08:28:05 2023 -0700 page_pool: add DMA_ATTR_WEAK_ORDERING on all mappings Commit `c519fe9a4f` ("bnxt: add dma mapping attributes") added DMA_ATTR_WEAK_ORDERING to DMA attrs on bnxt. It has since spread to a few more drivers (possibly as a copy'n'paste). DMA_ATTR_WEAK_ORDERING only seems to matter on Sparc and PowerPC/cell, the rarity of these platforms is likely why we never bothered adding the attribute in the page pool, even though it should be safe to add. To make the page pool migration in drivers which set this flag less of a risk (of regressing the precious sparc database workloads or whatever needed this) let's add DMA_ATTR_WEAK_ORDERING on all page pool DMA mappings. We could make this a driver opt-in but frankly I don't think it's worth complicating the API. I can't think of a reason why device accesses to packet memory would have to be ordered. Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/r/20230417152805.331865-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Petr Oros <poros@redhat.com>	2024-05-16 19:27:53 +02:00
Sabrina Dubroca	06fe287412	net: skbuff: don't include <net/page_pool/types.h> to <linux/skbuff.h> JIRA: https://issues.redhat.com/browse/RHEL-31751 Conflicts: context around #include in net/core/skbuff.c commit 75eaf63ea7afeafd026ffef03bdc69e31f10829b Author: Alexander Lobakin <aleksander.lobakin@intel.com> Date: Fri Aug 4 20:05:25 2023 +0200 net: skbuff: don't include <net/page_pool/types.h> to <linux/skbuff.h> Currently, touching <net/page_pool/types.h> triggers a rebuild of more than half of the kernel. That's because it's included in <linux/skbuff.h>. And each new include to page_pool/types.h adds more [useless] data for the toolchain to process per each source file from that pile. In commit `6a5bcd84e8` ("page_pool: Allow drivers to hint on SKB recycling"), Matteo included it to be able to call a couple of functions defined there. Then, in commit 57f05bc2ab24 ("page_pool: keep pp info as long as page pool owns the page") one of the calls was removed, so only one was left. It's the call to page_pool_return_skb_page() in napi_frag_unref(). The function is external and doesn't have any dependencies. Having very niche page_pool_types.h included only for that looks like an overkill. As %PP_SIGNATURE is not local to page_pool.c (was only in the early submissions), nothing holds this function there. Teleport page_pool_return_skb_page() to skbuff.c, just next to the main consumer, skb_pp_recycle(), and rename it to napi_pp_put_page(), as it doesn't work with skbs at all and the former name tells nothing. The #if guards here are only to not compile and have it in the vmlinux when not needed -- both call sites are already guarded. Now, touching page_pool_types.h only triggers rebuilding of the drivers using it and a couple of core networking files. Suggested-by: Jakub Kicinski <kuba@kernel.org> # make skbuff.h less heavy Suggested-by: Alexander Duyck <alexanderduyck@fb.com> # move to skbuff.c Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/20230804180529.2483231-3-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sabrina Dubroca <sdubroca@redhat.com>	2024-04-11 10:04:27 +02:00
Scott Weaver	2a096a138f	Merge: bpf: backport fixes from upstream (phase 1) MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3337 JIRA: https://issues.redhat.com/browse/RHEL-15913 Backporting relevant fixes from upstream. Signed-off-by: Felix Maurer <fmaurer@redhat.com> Approved-by: Artem Savkov <asavkov@redhat.com> Approved-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Scott Weaver <scweaver@redhat.com>	2024-01-22 12:01:20 -05:00
Petr Oros	8333fb8ac8	page_pool: split types and declarations from page_pool.h JIRA: https://issues.redhat.com/browse/RHEL-16983 Conflicts: - net/core/skbuff.c: adjusted context conflict due to missing 78476d315e1905 ("mctp: Add flow extension to skb") - drivers/net/ethernet/hisilicon/hns3/hns3_enet.h: adjusted context conflict due to missing 87a9b2fd9288c5 ("net: hns3: add support for TX push mode") - drivers/net/ethernet/mediatek/mtk_eth_soc.[c\|h] Chunks ommited due to lack of page_pool support in driver. Missing upstream commit 23233e577ef973 ("net: ethernet: mtk_eth_soc: rely on page_pool for single page buffers") - drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c adjusted context conflict due to missing 67f245c2ec0af1 ("mlx5: bpf_xdp_metadata_rx_hash add xdp rss hash type") - drivers/net/ethernet/microsoft/mana/mana_en.c adjusted context conflict due to missing 92272ec4107ef4 ("eth: add missing xdp.h includes in drivers") - drivers/net/veth.c Chunks ommited due to missing 0ebab78cbcbfd6 ("net: veth: add page_pool for page recycling") - Unmerged path's (missing in rhel): drivers/net/ethernet/engleder/tsnep_main.c, drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c, drivers/net/ethernet/microchip/lan966x/lan966x_main.h, drivers/net/ethernet/wangxun/libwx/wx_lib.c Upstream commit(s): commit a9ca9f9ceff382b58b488248f0c0da9e157f5d06 Author: Yunsheng Lin <linyunsheng@huawei.com> Date: Fri Aug 4 20:05:24 2023 +0200 page_pool: split types and declarations from page_pool.h Split types and pure function declarations from page_pool.h and add them in page_page/types.h, so that C sources can include page_pool.h and headers should generally only include page_pool/types.h as suggested by jakub. Rename page_pool.h to page_pool/helpers.h to have both in one place. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/20230804180529.2483231-2-aleksander.lobakin@intel.com [Jakub: change microsoft/mana, fix kdoc paths in Documentation] Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Petr Oros <poros@redhat.com>	2023-11-30 19:11:24 +01:00
Petr Oros	4dc9eab8bd	docs: net: page_pool: use kdoc to avoid duplicating the information JIRA: https://issues.redhat.com/browse/RHEL-16983 Conflicts: - adjusted conflict in Documentation/networking/page_pool.rst due to missing 535b9c61bdef60 net: page_pool: hide page_pool_release_page() Upstream commit(s): commit 82e896d992fa631cda1f63239fd47b3ab781ffa6 Author: Jakub Kicinski <kuba@kernel.org> Date: Wed Aug 2 09:18:21 2023 -0700 docs: net: page_pool: use kdoc to avoid duplicating the information All struct members of the driver-facing APIs are documented twice, in the code and under Documentation. This is a bit tedious. I also get the feeling that a lot of developers will read the header when coding, rather than the doc. Bring the two a little closer together by using kdoc for structs and functions. Using kdoc also gives us links (mentioning a function or struct in the text gets replaced by a link to its doc). Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://lore.kernel.org/r/20230802161821.3621985-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Petr Oros <poros@redhat.com>	2023-11-30 14:37:59 +01:00
Felix Maurer	9c2bc8bcad	net: page_pool: add missing free_percpu when page_pool_init fail JIRA: https://issues.redhat.com/browse/RHEL-15913 commit 8ffbd1669ed1d58939d6e878dffaa2f60bf961a4 Author: Jian Shen <shenjian15@huawei.com> Date: Mon Oct 30 17:12:56 2023 +0800 net: page_pool: add missing free_percpu when page_pool_init fail When ptr_ring_init() returns failure in page_pool_init(), free_percpu() is not called to free pool->recycle_stats, which may cause memory leak. Fixes: ad6fa1e1ab1b ("page_pool: Add recycle stats") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Link: https://lore.kernel.org/r/20231030091256.2915394-1-shaojijie@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Felix Maurer <fmaurer@redhat.com>	2023-11-07 15:32:02 +01:00
Ivan Vecera	734b571259	page_pool: unlink from napi during destroy JIRA: https://issues.redhat.com/browse/RHEL-12613 commit dd64b232deb8d48812a2ea739d1fedaeaffb59ed Author: Jakub Kicinski <kuba@kernel.org> Date: Wed Apr 19 11:20:06 2023 -0700 page_pool: unlink from napi during destroy Jesper points out that we must prevent recycling into cache after page_pool_destroy() is called, because page_pool_destroy() is not synchronized with recycling (some pages may still be outstanding when destroy() gets called). I assumed this will not happen because NAPI can't be scheduled if its page pool is being destroyed. But I missed the fact that NAPI may get reused. For instance when user changes ring configuration driver may allocate a new page pool, stop NAPI, swap, start NAPI, and then destroy the old pool. The NAPI is running so old page pool will think it can recycle to the cache, but the consumer at that point is the destroy() path, not NAPI. To avoid extra synchronization let the drivers do "unlinking" during the "swap" stage while NAPI is indeed disabled. Fixes: 8c48eea3adf3 ("page_pool: allow caching from safely localized NAPI") Reported-by: Jesper Dangaard Brouer <jbrouer@redhat.com> Link: https://lore.kernel.org/all/e8df2654-6a5b-3c92-489d-2fe5e444135f@redhat.com/ Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/r/20230419182006.719923-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Ivan Vecera <ivecera@redhat.com>	2023-10-31 15:09:26 +01:00
Ivan Vecera	d80ce17d20	page_pool: allow caching from safely localized NAPI JIRA: https://issues.redhat.com/browse/RHEL-12613 Conflicts: - simple context conflict in net/core/dev.c due to absence of commit 8b43fd3d1d7d8 ("net: optimize ____napi_schedule() to avoid extra NET_RX_SOFTIRQ") that is out of scope of this series commit 8c48eea3adf3119e0a3fc57bd31f6966f26ee784 Author: Jakub Kicinski <kuba@kernel.org> Date: Wed Apr 12 21:26:04 2023 -0700 page_pool: allow caching from safely localized NAPI Recent patches to mlx5 mentioned a regression when moving from driver local page pool to only using the generic page pool code. Page pool has two recycling paths (1) direct one, which runs in safe NAPI context (basically consumer context, so producing can be lockless); and (2) via a ptr_ring, which takes a spin lock because the freeing can happen from any CPU; producer and consumer may run concurrently. Since the page pool code was added, Eric introduced a revised version of deferred skb freeing. TCP skbs are now usually returned to the CPU which allocated them, and freed in softirq context. This places the freeing (producing of pages back to the pool) enticingly close to the allocation (consumer). If we can prove that we're freeing in the same softirq context in which the consumer NAPI will run - lockless use of the cache is perfectly fine, no need for the lock. Let drivers link the page pool to a NAPI instance. If the NAPI instance is scheduled on the same CPU on which we're freeing - place the pages in the direct cache. With that and patched bnxt (XDP enabled to engage the page pool, sigh, bnxt really needs page pool work :() I see a 2.6% perf boost with a TCP stream test (app on a different physical core than softirq). The CPU use of relevant functions decreases as expected: page_pool_refill_alloc_cache 1.17% -> 0% _raw_spin_lock 2.41% -> 0.98% Only consider lockless path to be safe when NAPI is scheduled - in practice this should cover majority if not all of steady state workloads. It's usually the NAPI kicking in that causes the skb flush. The main case we'll miss out on is when application runs on the same CPU as NAPI. In that case we don't use the deferred skb free path. Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Ivan Vecera <ivecera@redhat.com>	2023-10-31 15:09:26 +01:00
Felix Maurer	ce4cb58f61	page_pool: fix inconsistency for page_pool_ring_[un]lock() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2218483 commit 368d3cb406cdd074d1df2ad9ec06d1bfcb664882 Author: Yunsheng Lin <linyunsheng@huawei.com> Date: Mon May 22 11:17:14 2023 +0800 page_pool: fix inconsistency for page_pool_ring_[un]lock() page_pool_ring_[un]lock() use in_softirq() to decide which spin lock variant to use, and when they are called in the context with in_softirq() being false, spin_lock_bh() is called in page_pool_ring_lock() while spin_unlock() is called in page_pool_ring_unlock(), because spin_lock_bh() has disabled the softirq in page_pool_ring_lock(), which causes inconsistency for spin lock pair calling. This patch fixes it by returning in_softirq state from page_pool_producer_lock(), and use it to decide which spin lock variant to use in page_pool_producer_unlock(). As pool->ring has both producer and consumer lock, so rename it to page_pool_producer_[un]lock() to reflect the actual usage. Also move them to page_pool.c as they are only used there, and remove the 'inline' as the compiler may have better idea to do inlining or not. Fixes: `7886244736` ("net: page_pool: Add bulk support for ptr_ring") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Link: https://lore.kernel.org/r/20230522031714.5089-1-linyunsheng@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Felix Maurer <fmaurer@redhat.com>	2023-06-29 12:37:10 +02:00
Felix Maurer	a9ac84a3b0	net: page_pool: use in_softirq() instead Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2178930 commit 542bcea4be866b14b3a5c8e90773329066656c43 Author: Qingfang DENG <qingfang.deng@siflower.com.cn> Date: Fri Feb 3 09:16:11 2023 +0800 net: page_pool: use in_softirq() instead We use BH context only for synchronization, so we don't care if it's actually serving softirq or not. As a side node, in case of threaded NAPI, in_serving_softirq() will return false because it's in process context with BH off, making page_pool_recycle_in_cache() unreachable. Signed-off-by: Qingfang DENG <qingfang.deng@siflower.com.cn> Tested-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Felix Maurer <fmaurer@redhat.com>	2023-06-13 22:45:48 +02:00
Chris von Recklinghausen	15416c6b7e	mm/swap: convert __put_page() to __folio_put() Bugzilla: https://bugzilla.redhat.com/2160210 commit 8d29c7036f5ff360ea1f51b9fed5d909be7c8094 Author: Matthew Wilcox (Oracle) <willy@infradead.org> Date: Fri Jun 17 18:50:13 2022 +0100 mm/swap: convert __put_page() to __folio_put() Saves 11 bytes of text by removing a check of PageTail. Link: https://lkml.kernel.org/r/20220617175020.717127-16-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2023-03-24 11:19:20 -04:00
Felix Maurer	1cc26eafb4	net: page_pool: optimize page pool page allocation in NUMA scenario Bugzilla: https://bugzilla.redhat.com/2137876 commit d810d367ec40a1031173a447bd0146cf48e98733 Author: Jie Wang <wangjie125@huawei.com> Date: Tue Jul 5 19:35:15 2022 +0800 net: page_pool: optimize page pool page allocation in NUMA scenario Currently NIC packet receiving performance based on page pool deteriorates occasionally. To analysis the causes of this problem page allocation stats are collected. Here are the stats when NIC rx performance deteriorates: bandwidth(Gbits/s) 16.8 6.91 rx_pp_alloc_fast 13794308 21141869 rx_pp_alloc_slow 108625 166481 rx_pp_alloc_slow_h 0 0 rx_pp_alloc_empty 8192 8192 rx_pp_alloc_refill 0 0 rx_pp_alloc_waive 100433 158289 rx_pp_recycle_cached 0 0 rx_pp_recycle_cache_full 0 0 rx_pp_recycle_ring 362400 420281 rx_pp_recycle_ring_full 6064893 9709724 rx_pp_recycle_released_ref 0 0 The rx_pp_alloc_waive count indicates that a large number of pages' numa node are inconsistent with the NIC device numa node. Therefore these pages can't be reused by the page pool. As a result, many new pages would be allocated by __page_pool_alloc_pages_slow which is time consuming. This causes the NIC rx performance fluctuations. The main reason of huge numa mismatch pages in page pool is that page pool uses alloc_pages_bulk_array to allocate original pages. This function is not suitable for page allocation in NUMA scenario. So this patch uses alloc_pages_bulk_array_node which has a NUMA id input parameter to ensure the NUMA consistent between NIC device and allocated pages. Repeated NIC rx performance tests are performed 40 times. NIC rx bandwidth is higher and more stable compared to the datas above. Here are three test stats, the rx_pp_alloc_waive count is zero and rx_pp_alloc_slow which indicates pages allocated from slow patch is relatively low. bandwidth(Gbits/s) 93 93.9 93.8 rx_pp_alloc_fast 60066264 61266386 60938254 rx_pp_alloc_slow 16512 16517 16539 rx_pp_alloc_slow_ho 0 0 0 rx_pp_alloc_empty 16512 16517 16539 rx_pp_alloc_refill 473841 481910 481585 rx_pp_alloc_waive 0 0 0 rx_pp_recycle_cached 0 0 0 rx_pp_recycle_cache_full 0 0 0 rx_pp_recycle_ring 29754145 30358243 30194023 rx_pp_recycle_ring_full 0 0 0 rx_pp_recycle_released_ref 0 0 0 Signed-off-by: Jie Wang <wangjie125@huawei.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Link: https://lore.kernel.org/r/20220705113515.54342-1-huangguangbin2@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Felix Maurer <fmaurer@redhat.com>	2023-01-05 15:46:51 +01:00
Felix Maurer	88509dacf2	net: page_pool: add page allocation stats for two fast page allocate path Bugzilla: https://bugzilla.redhat.com/2120968 commit 0f6deac3a07958195173119627502350925dce78 Author: Jie Wang <wangjie125@huawei.com> Date: Thu May 12 14:56:31 2022 +0800 net: page_pool: add page allocation stats for two fast page allocate path Currently If use page pool allocation stats to analysis a RX performance degradation problem. These stats only count for pages allocate from page_pool_alloc_pages. But nic drivers such as hns3 use page_pool_dev_alloc_frag to allocate pages, so page stats in this API should also be counted. Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Felix Maurer <fmaurer@redhat.com>	2022-11-30 12:47:10 +02:00
Felix Maurer	e4c2252001	net: page_pool: introduce ethtool stats Bugzilla: https://bugzilla.redhat.com/2120968 commit f3c5264f452a5b0ac1de1f2f657efbabdea3c76a Author: Lorenzo Bianconi <lorenzo@kernel.org> Date: Tue Apr 12 18:31:58 2022 +0200 net: page_pool: introduce ethtool stats Introduce page_pool APIs to report stats through ethtool and reduce duplicated code in each driver. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Felix Maurer <fmaurer@redhat.com>	2022-11-30 12:47:09 +02:00
Yauheni Kaliuta	3c9b8c39bc	page_pool: Add recycle stats to page_pool_put_page_bulk Bugzilla: https://bugzilla.redhat.com/2120968 commit 590032a4d2133ecc10d3078a8db1d85a4842f12c Author: Lorenzo Bianconi <lorenzo@kernel.org> Date: Mon Apr 11 16:05:26 2022 +0200 page_pool: Add recycle stats to page_pool_put_page_bulk Add missing recycle stats to page_pool_put_page_bulk routine. Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Link: https://lore.kernel.org/r/3712178b51c007cfaed910ea80e68f00c916b1fa.1649685634.git.lorenzo@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>	2022-11-28 16:48:59 +02:00
Jiri Benc	b52705c258	page_pool: Add function to batch and return stats Bugzilla: https://bugzilla.redhat.com/2120966 commit 6b95e3388b1ea0ca63500c5a6e39162dbf828433 Author: Joe Damato <jdamato@fastly.com> Date: Tue Mar 1 23:55:49 2022 -0800 page_pool: Add function to batch and return stats Adds a function page_pool_get_stats which can be used by drivers to obtain stats for a specified page_pool. Signed-off-by: Joe Damato <jdamato@fastly.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Benc <jbenc@redhat.com>	2022-10-25 14:57:59 +02:00
Jiri Benc	249dfc0fd8	page_pool: Add recycle stats Bugzilla: https://bugzilla.redhat.com/2120966 commit ad6fa1e1ab1b8164f1ba296b1b4dc556a483bcad Author: Joe Damato <jdamato@fastly.com> Date: Tue Mar 1 23:55:48 2022 -0800 page_pool: Add recycle stats Add per-cpu stats tracking page pool recycling events: - cached: recycling placed page in the page pool cache - cache_full: page pool cache was full - ring: page placed into the ptr ring - ring_full: page released from page pool because the ptr ring was full - released_refcnt: page released (and not recycled) because refcnt > 1 Signed-off-by: Joe Damato <jdamato@fastly.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Benc <jbenc@redhat.com>	2022-10-25 14:57:59 +02:00
Jiri Benc	ff85690598	page_pool: Add allocation stats Bugzilla: https://bugzilla.redhat.com/2120966 commit 8610037e8106b48c79cfe0afb92b2b2466e51c3d Author: Joe Damato <jdamato@fastly.com> Date: Tue Mar 1 23:55:47 2022 -0800 page_pool: Add allocation stats Add per-pool statistics counters for the allocation path of a page pool. These stats are incremented in softirq context, so no locking or per-cpu variables are needed. This code is disabled by default and a kernel config option is provided for users who wish to enable them. The statistics added are: - fast: successful fast path allocations - slow: slow path order-0 allocations - slow_high_order: slow path high order allocations - empty: ptr ring is empty, so a slow path allocation was forced. - refill: an allocation which triggered a refill of the cache - waive: pages obtained from the ptr ring that cannot be added to the cache due to a NUMA mismatch. Signed-off-by: Joe Damato <jdamato@fastly.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Benc <jbenc@redhat.com>	2022-10-25 14:57:59 +02:00
Jiri Benc	0700a9a4b5	page_pool: Refactor page_pool to enable fragmenting after allocation Bugzilla: https://bugzilla.redhat.com/2120966 commit 52cc6ffc0ab2c61a76127b9347567fc97c15582f Author: Alexander Duyck <alexanderduyck@fb.com> Date: Mon Jan 31 08:40:01 2022 -0800 page_pool: Refactor page_pool to enable fragmenting after allocation This change is meant to permit a driver to perform "fragmenting" of the page from within the driver instead of the current model which requires pre-partitioning the page. The main motivation behind this is to support use cases where the page will be split up by the driver after DMA instead of before. With this change it becomes possible to start using page pool to replace some of the existing use cases where multiple references were being used for a single page, but the number needed was unknown as the size could be dynamic. For example, with this code it would be possible to do something like the following to handle allocation: page = page_pool_alloc_pages(); if (!page) return NULL; page_pool_fragment_page(page, DRIVER_PAGECNT_BIAS_MAX); rx_buf->page = page; rx_buf->pagecnt_bias = DRIVER_PAGECNT_BIAS_MAX; Then we would process a received buffer by handling it with: rx_buf->pagecnt_bias--; Once the page has been fully consumed we could then flush the remaining instances with: if (page_pool_defrag_page(page, rx_buf->pagecnt_bias)) continue; page_pool_put_defragged_page(pool, page -1, !!budget); The general idea is that we want to have the ability to allocate a page with excess fragment count and then trim off the unneeded fragments. Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Benc <jbenc@redhat.com>	2022-10-25 14:57:54 +02:00
Felix Maurer	559e95e23d	page_pool: remove spinlock in page_pool_refill_alloc_cache() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2071620 commit 07b17f0f7485bcbc7902cf6f56a89f5b716344bd Author: Yunsheng Lin <linyunsheng@huawei.com> Date: Fri Jan 7 17:00:42 2022 +0800 page_pool: remove spinlock in page_pool_refill_alloc_cache() As page_pool_refill_alloc_cache() is only called by __page_pool_get_cached(), which assumes non-concurrent access as suggested by the comment in __page_pool_get_cached(), and ptr_ring allows concurrent access between consumer and producer, so remove the spinlock in page_pool_refill_alloc_cache(). Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/r/20220107090042.13605-1-linyunsheng@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Felix Maurer <fmaurer@redhat.com>	2022-08-24 12:53:57 +02:00

1 2

87 Commits