Centos-kernel-stream-9

Commit Graph

Author	SHA1	Message	Date
Rado Vrbovsky	4da7c39b53	Merge: io_uring: Update to upstream v6.10 + fixes	2025-01-13 18:58:47 +00:00
Rafael Aquini	c8c9c0b259	mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER JIRA: https://issues.redhat.com/browse/RHEL-27745 Conflicts: * arch//Kconfig: all hunks dropped as there were only text blurbs and comments being changed with no functional changes whatsoever, and RHEL9 is missing several (unrelated) commits to these arches that tranform the text blurbs in the way these non-functional hunks were expecting; drivers/accel/qaic/qaic_data.c: hunk dropped due to RHEL-only commit `083c0cdce2` ("Merge DRM changes from upstream v6.8..v6.9"); * drivers/gpu/drm/i915/gem/selftests/huge_pages.c: hunk dropped due to RHEL-only commit `ca8b16c11b` ("Merge DRM changes from upstream v6.7..v6.8"); * drivers/gpu/drm/ttm/tests/ttm_pool_test.c: all hunks dropped due to RHEL-only commit `ca8b16c11b` ("Merge DRM changes from upstream v6.7..v6.8"); * drivers/video/fbdev/vermilion/vermilion.c: hunk dropped as RHEL9 misses commit `dbe7e429fe` ("vmlfb: framebuffer driver for Intel Vermilion Range"); * include/linux/pageblock-flags.h: differences due to out-of-order backport of upstream commits 72801513b2bf ("mm: set pageblock_order to HPAGE_PMD_ORDER in case with !CONFIG_HUGETLB_PAGE but THP enabled"), and 3a7e02c040b1 ("minmax: avoid overly complicated constant expressions in VM code"); * mm/mm_init.c: differences on the 3rd, and 4th hunks are due to RHEL backport commit `1845b92dcf` ("mm: move most of core MM initialization to mm/mm_init.c") ignoring the out-of-order backport of commit 3f6dac0fd1b8 ("mm/page_alloc: make deferred page init free pages in MAX_ORDER blocks") thus partially reverting the changes introduced by the latter; This patch is a backport of the following upstream commit: commit 5e0a760b44417f7cadd79de2204d6247109558a0 Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Date: Thu Dec 28 17:47:04 2023 +0300 mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER commit 23baf831a32c ("mm, treewide: redefine MAX_ORDER sanely") has changed the definition of MAX_ORDER to be inclusive. This has caused issues with code that was not yet upstream and depended on the previous definition. To draw attention to the altered meaning of the define, rename MAX_ORDER to MAX_PAGE_ORDER. Link: https://lkml.kernel.org/r/20231228144704.14033-2-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-12-09 12:24:17 -05:00
Jeff Moyer	34644cd1d2	kasan: rename and document kasan_(un)poison_object_data JIRA: https://issues.redhat.com/browse/RHEL-64867 commit 1ce9a0523938f87dd8505233cc3445f8e2d8dcee Author: Andrey Konovalov <andreyknvl@gmail.com> Date: Tue Dec 19 23:29:03 2023 +0100 kasan: rename and document kasan_(un)poison_object_data Rename kasan_unpoison_object_data to kasan_unpoison_new_object and add a documentation comment. Do the same for kasan_poison_object_data. The new names and the comments should suggest the users that these hooks are intended for internal use by the slab allocator. The following patch will remove non-slab-internal uses of these hooks. No functional changes. [andreyknvl@google.com: update references to renamed functions in comments] Link: https://lkml.kernel.org/r/20231221180637.105098-1-andrey.konovalov@linux.dev Link: https://lkml.kernel.org/r/eab156ebbd635f9635ef67d1a4271f716994e628.1703024586.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Lobakin <alobakin@pm.me> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Breno Leitao <leitao@debian.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2024-11-28 16:22:44 -05:00
Rafael Aquini	a9278b8510	Randomized slab caches for kmalloc() JIRA: https://issues.redhat.com/browse/RHEL-27743 Conflicts: * minor extra RHEL-only hunk to create the required CONFIG_RANDOM_KMALLOC_CACHES file under rhel's config database. This patch is a backport of the following upstream commit: commit 3c6152940584290668b35fa0800026f6a1ae05fe Author: GONG, Ruiqi <gongruiqi@huaweicloud.com> Date: Fri Jul 14 14:44:22 2023 +0800 Randomized slab caches for kmalloc() When exploiting memory vulnerabilities, "heap spraying" is a common technique targeting those related to dynamic memory allocation (i.e. the "heap"), and it plays an important role in a successful exploitation. Basically, it is to overwrite the memory area of vulnerable object by triggering allocation in other subsystems or modules and therefore getting a reference to the targeted memory location. It's usable on various types of vulnerablity including use after free (UAF), heap out- of-bound write and etc. There are (at least) two reasons why the heap can be sprayed: 1) generic slab caches are shared among different subsystems and modules, and 2) dedicated slab caches could be merged with the generic ones. Currently these two factors cannot be prevented at a low cost: the first one is a widely used memory allocation mechanism, and shutting down slab merging completely via `slub_nomerge` would be overkill. To efficiently prevent heap spraying, we propose the following approach: to create multiple copies of generic slab caches that will never be merged, and random one of them will be used at allocation. The random selection is based on the address of code that calls `kmalloc()`, which means it is static at runtime (rather than dynamically determined at each time of allocation, which could be bypassed by repeatedly spraying in brute force). In other words, the randomness of cache selection will be with respect to the code address rather than time, i.e. allocations in different code paths would most likely pick different caches, although kmalloc() at each place would use the same cache copy whenever it is executed. In this way, the vulnerable object and memory allocated in other subsystems and modules will (most probably) be on different slab caches, which prevents the object from being sprayed. Meanwhile, the static random selection is further enhanced with a per-boot random seed, which prevents the attacker from finding a usable kmalloc that happens to pick the same cache with the vulnerable subsystem/module by analyzing the open source code. In other words, with the per-boot seed, the random selection is static during each time the system starts and runs, but not across different system startups. The overhead of performance has been tested on a 40-core x86 server by comparing the results of `perf bench all` between the kernels with and without this patch based on the latest linux-next kernel, which shows minor difference. A subset of benchmarks are listed below: sched/ sched/ syscall/ mem/ mem/ messaging pipe basic memcpy memset (sec) (sec) (sec) (GB/sec) (GB/sec) control1 0.019 5.459 0.733 15.258789 51.398026 control2 0.019 5.439 0.730 16.009221 48.828125 control3 0.019 5.282 0.735 16.009221 48.828125 control_avg 0.019 5.393 0.733 15.759077 49.684759 experiment1 0.019 5.374 0.741 15.500992 46.502976 experiment2 0.019 5.440 0.746 16.276042 51.398026 experiment3 0.019 5.242 0.752 15.258789 51.398026 experiment_avg 0.019 5.352 0.746 15.678608 49.766343 The overhead of memory usage was measured by executing `free` after boot on a QEMU VM with 1GB total memory, and as expected, it's positively correlated with # of cache copies: control 4 copies 8 copies 16 copies total 969.8M 968.2M 968.2M 968.2M used 20.0M 21.9M 24.1M 26.7M free 936.9M 933.6M 931.4M 928.6M available 932.2M 928.8M 926.6M 923.9M Co-developed-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: GONG, Ruiqi <gongruiqi@huaweicloud.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Acked-by: Dennis Zhou <dennis@kernel.org> # percpu Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-10-01 11:19:49 -04:00
Rafael Aquini	5b0f4beec7	mm/slab: correct return values in comment for _kmem_cache_create() JIRA: https://issues.redhat.com/browse/RHEL-27742 This patch is a backport of the following upstream commit: commit 444f20c29e8b41a5aef5c34e3eab84e8d1cc4511 Author: zhaoxinchao <chrisxinchao@outlook.com> Date: Tue Apr 18 10:05:23 2023 +0800 mm/slab: correct return values in comment for _kmem_cache_create() __kmem_cache_create() returns 0 on success and non-zero on failure. The comment is wrong in two instances, so fix the first one and remove the second one. Also make the comment non-doc, because it doesn't describe an API function, but SLAB-specific implementation. Signed-off-by: zhaoxinchao <chrisxinchao@outlook.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-09-05 20:35:07 -04:00
Rafael Aquini	f97b54a816	mm/slab: Replace invocation of weak PRNG JIRA: https://issues.redhat.com/browse/RHEL-27742 This patch is a backport of the following upstream commit: commit f7e466e951a15bc7cec496f22f6276b854d3c310 Author: David Keisar Schmidt <david.keisarschm@mail.huji.ac.il> Date: Sun Apr 16 20:22:42 2023 +0300 mm/slab: Replace invocation of weak PRNG The Slab allocator randomization uses the prandom_u32 PRNG. That was added to prevent attackers to obtain information on the heap state, by randomizing the freelists state. However, this PRNG turned out to be weak, as noted in commit `c51f8f88d7` To fix it, we have changed the invocation of prandom_u32_state to get_random_u32 to ensure the PRNG is strong. Since a modulo operation is applied right after that, we used get_random_u32_below, to achieve uniformity. In addition, we changed the freelist_init_state union to struct, since the rnd_state inside which is used to store the state of prandom_u32, is not needed anymore, since get_random_u32 maintains its own state. Signed-off-by: David Keisar Schmidt <david.keisarschm@mail.huji.ac.il> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-09-05 20:35:06 -04:00
Nico Pache	5415c73d87	mm/slab: Finish struct page to struct slab conversion commit dd35f71a1d98b87e0e3ee3d87fff1bc7004cf626 Author: Vlastimil Babka <vbabka@suse.cz> Date: Tue Nov 2 13:26:56 2021 +0100 mm/slab: Finish struct page to struct slab conversion Change cache_free_alien() to use slab_nid(virt_to_slab()). Otherwise just update of comments and some remaining variable names. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Roman Gushchin <guro@fb.com> JIRA: https://issues.redhat.com/browse/RHEL-5619 Signed-off-by: Nico Pache <npache@redhat.com>	2024-04-30 17:51:22 -06:00
Chris von Recklinghausen	41d58c77af	mm: vmscan: refactor updating current->reclaim_state Conflicts: mm/slob.c - We already have 6630e950d532 ("mm/slob: remove slob.c") so the file is gone. JIRA: https://issues.redhat.com/browse/RHEL-27741 commit c7b23b68e2aa93f86a206222d23ccd9a21f5982a Author: Yosry Ahmed <yosryahmed@google.com> Date: Thu Apr 13 10:40:34 2023 +0000 mm: vmscan: refactor updating current->reclaim_state During reclaim, we keep track of pages reclaimed from other means than LRU-based reclaim through scan_control->reclaim_state->reclaimed_slab, which we stash a pointer to in current task_struct. However, we keep track of more than just reclaimed slab pages through this. We also use it for clean file pages dropped through pruned inodes, and xfs buffer pages freed. Rename reclaimed_slab to reclaimed, and add a helper function that wraps updating it through current, so that future changes to this logic are contained within include/linux/swap.h. Link: https://lkml.kernel.org/r/20230413104034.1086717-4-yosryahmed@google.c om Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christoph Lameter <cl@linux.com> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: NeilBrown <neilb@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2024-04-30 07:00:59 -04:00
Aristeu Rozanski	32202ced15	mm/slab: Fix undefined init_cache_node_node() for NUMA and !SMP JIRA: https://issues.redhat.com/browse/RHEL-27740 Tested: by me commit 66a1c22b709178e7b823d44465d0c2e5ed7492fb Author: Geert Uytterhoeven <geert+renesas@glider.be> Date: Tue Mar 21 09:30:59 2023 +0100 mm/slab: Fix undefined init_cache_node_node() for NUMA and !SMP sh/migor_defconfig: mm/slab.c: In function ‘slab_memory_callback’: mm/slab.c:1127:23: error: implicit declaration of function ‘init_cache_node_node’; did you mean ‘drain_cache_node_node’? [-Werror=implicit-function-declaration] 1127 \| ret = init_cache_node_node(nid); \| ^~~~~~~~~~~~~~~~~~~~ \| drain_cache_node_node The #ifdef condition protecting the definition of init_cache_node_node() no longer matches the conditions protecting the (multiple) users. Fix this by syncing the conditions. Fixes: 76af6a054da40553 ("mm/migrate: add CPU hotplug to demotion #ifdef") Reported-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/b5bdea22-ed2f-3187-6efe-0c72330270a4@infradead.org Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Acked-by: Randy Dunlap <rdunlap@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>	2024-04-29 14:33:25 -04:00
Aristeu Rozanski	672578399d	mm, slab/slub: Ensure kmem_cache_alloc_bulk() is available early JIRA: https://issues.redhat.com/browse/RHEL-27740 Tested: by me commit f5451547b8310868f5b5acff7cd4aa7c0267edb3 Author: Thomas Gleixner <tglx@linutronix.de> Date: Tue Feb 7 15:16:53 2023 +0100 mm, slab/slub: Ensure kmem_cache_alloc_bulk() is available early The memory allocators are available during early boot even in the phase where interrupts are disabled and scheduling is not yet possible. The setup is so that GFP_KERNEL allocations work in this phase without causing might_alloc() splats to be emitted because the system state is SYSTEM_BOOTING at that point which prevents the warnings to trigger. Most allocation/free functions use local_irq_save()/restore() or a lock variant of that. But kmem_cache_alloc_bulk() and kmem_cache_free_bulk() use local_[lock]_irq_disable()/enable(), which leads to a lockdep warning when interrupts are enabled during the early boot phase. This went unnoticed so far as there are no early users of these interfaces. The upcoming conversion of the interrupt descriptor store from radix_tree to maple_tree triggered this warning as maple_tree uses the bulk interface. Cure this by moving the kmem_cache_alloc/free() bulk variants of SLUB and SLAB to local[_lock]_irq_save()/restore(). There is obviously no reclaim possible and required at this point so there is no need to expand this coverage further. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>	2024-04-29 14:33:13 -04:00
Aristeu Rozanski	fa5f95f92f	mm: introduce folio_is_pfmemalloc JIRA: https://issues.redhat.com/browse/RHEL-27740 Tested: by me commit 02d65d6fb1aae151570c8bfd1bd77a8153d2e607 Author: Sidhartha Kumar <sidhartha.kumar@oracle.com> Date: Fri Jan 6 15:52:51 2023 -0600 mm: introduce folio_is_pfmemalloc Add a folio equivalent for page_is_pfmemalloc. This removes two instances of page_is_pfmemalloc(folio_page(folio, 0)) so the folio can be used directly. Link: https://lkml.kernel.org/r/20230106215251.599222-1-sidhartha.kumar@oracle.com Suggested-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: SeongJae Park <sj@kernel.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>	2024-04-29 14:33:03 -04:00
Aristeu Rozanski	5dc5236643	mm/slab.c: cleanup is_debug_pagealloc_cache() JIRA: https://issues.redhat.com/browse/RHEL-27740 Tested: by me commit 81ce2ebd194cf32027854ce1c703b7fd129c86b8 Author: lvqian <lvqian@nfschina.com> Date: Wed Jan 11 17:27:44 2023 +0800 mm/slab.c: cleanup is_debug_pagealloc_cache() Remove the if statement to increase code readability. Also make the function inline, per David. Signed-off-by: lvqian <lvqian@nfschina.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>	2024-04-29 14:32:59 -04:00
Aristeu Rozanski	120c032c5d	mm/sl{a,u}b: fix wrong usages of folio_page() for getting head pages JIRA: https://issues.redhat.com/browse/RHEL-27740 Tested: by me commit c034c6a45c977fdf33de5974d7def75bda9dcadc Author: SeongJae Park <sj@kernel.org> Date: Tue Jan 10 00:51:24 2023 +0000 mm/sl{a,u}b: fix wrong usages of folio_page() for getting head pages The standard idiom for getting head page of a given folio is '&folio->page', but some are wrongly using 'folio_page(folio, 0)' for the purpose. Fix those to use the idiom. Suggested-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: SeongJae Park <sj@kernel.org> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>	2024-04-29 14:32:59 -04:00
Aristeu Rozanski	e24d457b8d	mm/slab: remove unused slab_early_init JIRA: https://issues.redhat.com/browse/RHEL-27740 Tested: by me commit 35e3c36d438e05fcd4f846c76cf22cbda9b63abb Author: Gou Hao <gouhao@uniontech.com> Date: Sun Dec 18 20:31:27 2022 +0800 mm/slab: remove unused slab_early_init 'slab_early_init' was introduced by 'commit `e0a4272679` ("[PATCH] mm/slab.c: fix early init assumption")', this flag was used to prevent off-slab caches being created so early during bootup. The only user of 'slab_early_init' was removed in 'commit `3217fd9bdf` ("mm/slab: make criteria for off slab determination robust and simple")'. Signed-off-by: Gou Hao <gouhao@uniontech.com> Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>	2024-04-29 14:32:58 -04:00
Audra Mitchell	086ffa2949	mm, slab: periodically resched in drain_freelist() JIRA: https://issues.redhat.com/browse/RHEL-27739 This patch is a backport of the following upstream commit: commit cc2e9d2b26c86c1dd8687f6916e5f621bcacd6f7 Author: David Rientjes <rientjes@google.com> Date: Tue Dec 27 22:05:48 2022 -0800 mm, slab: periodically resched in drain_freelist() drain_freelist() can be called with a very large number of slabs to free, such as for kmem_cache_shrink(), or depending on various settings of the slab cache when doing periodic reaping. If there is a potentially long list of slabs to drain, periodically schedule to ensure we aren't saturating the cpu for too long. Signed-off-by: David Rientjes <rientjes@google.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Audra Mitchell <audra@redhat.com>	2024-04-09 09:43:02 -04:00
Audra Mitchell	efb0626ae7	mm/migrate: make isolate_movable_page() skip slab pages JIRA: https://issues.redhat.com/browse/RHEL-27739 This patch is a backport of the following upstream commit: commit 8b8817630ae80032e80b2eaf334de756ac1ff6a3 Author: Vlastimil Babka <vbabka@suse.cz> Date: Fri Nov 4 15:57:26 2022 +0100 mm/migrate: make isolate_movable_page() skip slab pages In the next commit we want to rearrange struct slab fields to allow a larger rcu_head. Afterwards, the page->mapping field will overlap with SLUB's "struct list_head slab_list", where the value of prev pointer can become LIST_POISON2, which is 0x122 + POISON_POINTER_DELTA. Unfortunately the bit 1 being set can confuse PageMovable() to be a false positive and cause a GPF as reported by lkp [1]. To fix this, make isolate_movable_page() skip pages with the PageSlab flag set. This is a bit tricky as we need to add memory barriers to SLAB and SLUB's page allocation and freeing, and their counterparts to isolate_movable_page(). Based on my RFC from [2]. Added a comment update from Matthew's variant in [3] and, as done there, moved the PageSlab checks to happen before trying to take the page lock. [1] https://lore.kernel.org/all/208c1757-5edd-fd42-67d4-1940cc43b50f@intel.com/ [2] https://lore.kernel.org/all/aec59f53-0e53-1736-5932-25407125d4d4@suse.cz/ [3] https://lore.kernel.org/all/YzsVM8eToHUeTP75@casper.infradead.org/ Reported-by: kernel test robot <yujie.liu@intel.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Audra Mitchell <audra@redhat.com>	2024-04-09 09:42:53 -04:00
Audra Mitchell	1b22e98c99	mm/slab: move and adjust kernel-doc for kmem_cache_alloc JIRA: https://issues.redhat.com/browse/RHEL-27739 This patch is a backport of the following upstream commit: commit 838de63b101147fc7d8af828465cf6d1d30232a8 Author: Vlastimil Babka <vbabka@suse.cz> Date: Thu Nov 10 09:10:30 2022 +0100 mm/slab: move and adjust kernel-doc for kmem_cache_alloc Alexander reports an issue with the kmem_cache_alloc() comment in mm/slab.c: > The current comment mentioned that the flags only matters if the > cache has no available objects. It's different for the __GFP_ZERO > flag which will ensure that the returned object is always zeroed > in any case. > I have the feeling I run into this question already two times if > the user need to zero the object or not, but the user does not need > to zero the object afterwards. However another use of __GFP_ZERO > and only zero the object if the cache has no available objects would > also make no sense. and suggests thus mentioning __GFP_ZERO as the exception. But on closer inspection, the part about flags being only relevant if cache has no available objects is misleading. The slab user has no reliable way to determine if there are available objects, and e.g. the might_sleep() debug check can be performed even if objects are available, so passing correct flags given the allocation context always matters. Thus remove that sentence completely, and while at it, move the comment to from SLAB-specific mm/slab.c to the common include/linux/slab.h The comment otherwise refers flags description for kmalloc(), so add __GFP_ZERO comment there and remove a very misleading GFP_HIGHUSER (not applicable to slab) description from there. Mention kzalloc() and kmem_cache_zalloc() shortcuts. Reported-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/all/20221011145413.8025-1-aahringo@redhat.com/ Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Audra Mitchell <audra@redhat.com>	2024-04-09 09:42:52 -04:00
Audra Mitchell	65dfaa7487	mm/slab: Annotate kmem_cache_node->list_lock as raw JIRA: https://issues.redhat.com/browse/RHEL-27739 This patch is a backport of the following upstream commit: commit b539ce9f1a31c442098c3f351cb4d03ba27c2720 Author: Jiri Kosina <jkosina@suse.cz> Date: Fri Oct 21 21:18:12 2022 +0200 mm/slab: Annotate kmem_cache_node->list_lock as raw The list_lock can be taken in hardirq context when do_drain() is being called via IPI on all cores, and therefore lockdep complains about it, because it can't be preempted on PREEMPT_RT. That's not a real issue, as SLAB can't be built on PREEMPT_RT anyway, but we still want to get rid of the warning on non-PREEMPT_RT builds. Annotate it therefore as a raw lock in order to get rid of he lockdep warning below. ============================= [ BUG: Invalid wait context ] 6.1.0-rc1-00134-ge35184f32151 #4 Not tainted ----------------------------- swapper/3/0 is trying to lock: ffff8bc88086dc18 (&parent->list_lock){..-.}-{3:3}, at: do_drain+0x57/0xb0 other info that might help us debug this: context-{2:2} no locks held by swapper/3/0. stack backtrace: CPU: 3 PID: 0 Comm: swapper/3 Not tainted 6.1.0-rc1-00134-ge35184f32151 #4 Hardware name: LENOVO 20K5S22R00/20K5S22R00, BIOS R0IET38W (1.16 ) 05/31/2017 Call Trace: <IRQ> dump_stack_lvl+0x6b/0x9d __lock_acquire+0x1519/0x1730 ? build_sched_domains+0x4bd/0x1590 ? __lock_acquire+0xad2/0x1730 lock_acquire+0x294/0x340 ? do_drain+0x57/0xb0 ? sched_clock_tick+0x41/0x60 _raw_spin_lock+0x2c/0x40 ? do_drain+0x57/0xb0 do_drain+0x57/0xb0 __flush_smp_call_function_queue+0x138/0x220 __sysvec_call_function+0x4f/0x210 sysvec_call_function+0x4b/0x90 </IRQ> <TASK> asm_sysvec_call_function+0x16/0x20 RIP: 0010:mwait_idle+0x5e/0x80 Code: 31 d2 65 48 8b 04 25 80 ed 01 00 48 89 d1 0f 01 c8 48 8b 00 a8 08 75 14 66 90 0f 00 2d 0b 78 46 00 31 c0 48 89 c1 fb 0f 01 c9 <eb> 06 fb 0f 1f 44 00 00 65 48 8b 04 25 80 ed 01 00 f0 80 60 02 df RSP: 0000:ffffa90940217ee0 EFLAGS: 00000246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff9bb9f93a RBP: 0000000000000003 R08: 0000000000000001 R09: 0000000000000001 R10: ffffa90940217ea8 R11: 0000000000000000 R12: ffffffffffffffff R13: 0000000000000000 R14: ffff8bc88127c500 R15: 0000000000000000 ? default_idle_call+0x1a/0xa0 default_idle_call+0x4b/0xa0 do_idle+0x1f1/0x2c0 ? _raw_spin_unlock_irqrestore+0x56/0x70 cpu_startup_entry+0x19/0x20 start_secondary+0x122/0x150 secondary_startup_64_no_verify+0xce/0xdb </TASK> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Audra Mitchell <audra@redhat.com>	2024-04-09 09:42:49 -04:00
Jan Stancek	78042596b6	Merge: iommu: IOMMU and DMA-mapping API Updates for 9.4 MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3180 # Merge Request Required Information ``` Bugzilla: https://bugzilla.redhat.com/2223717 JIRA: https://issues.redhat.com/browse/RHEL-10007 JIRA: https://issues.redhat.com/browse/RHEL-10026 JIRA: https://issues.redhat.com/browse/RHEL-10042 JIRA: https://issues.redhat.com/browse/RHEL-10094 JIRA: https://issues.redhat.com/browse/RHEL-3655 JIRA: https://issues.redhat.com/browse/RHEL-800 Depends: !3244 Depends: !3245 Omitted-fix: c7bd8a1f45ba ("iommu/apple-dart: Handle DMA_FQ domains in attach_dev()") - Apple Dart not supported Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Testing: A mix of fio jobs, and various stress-ng io stressors (--hdd, --readahead, --aio, --aiol, --seek, --sync_file) run with strict and lazy translation modes on amd, intel, and arm systems. pgtbl_v2 tested on AMD Genoa host Conflicts: Should be noted in individual commits. In particular one upstream merge in 6.4, 58390c8ce1bd, had a rather messy merge conflict resolution set, so a number of commits have those cleanups added in here. ``` ## Summary of Changes ``` Rebase through v6.5 with a good portion of v6.6 as well (minus the dynamic swiotlb mempool support, per numa dma cma support, and arm + mm tlb invalidate changes). For iommufd changes there are backports of the underlying functionality in iommufd, but I have left the vfio commits that will eventually make use of it for Alex. Highlights * AMD GA Log Overflow refactor and PPR Log support * AMD v2 page table support * AMD v2 5 level guest page table support * Various cleanups and fixes * Sync ipmmu-vmsa in preparation for Renesas support (config not enabled) * Continuation of swiotlb rework * Continuation of the refactor of core iommu code as part of SVA, iommufd, and pasid support work * Continuation of the iommufd prep work (config still not enabled) * Support for bounce buffer usage with non cache-line aligned kmallocs on arm64 * Clean up of in-kernel pasid use for vt-d * More cleanup of BUG_ON and warning use in vt-d This is based on top of MR !2843 and !3158. ``` Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> ## Approved Development Ticket All submissions to CentOS Stream must reference an approved ticket in [Red Hat Jira](https://issues.redhat.com/). Please follow the CentOS Stream [contribution documentation](https://docs.centos.org/en-US/stream-contrib/quickstart/) for how to file this ticket and have it approved. Approved-by: John W. Linville <linville@redhat.com> Approved-by: Tony Camuso <tcamuso@redhat.com> Approved-by: David Arcari <darcari@redhat.com> Approved-by: Mika Penttilä <mpenttil@redhat.com> Approved-by: Chris von Recklinghausen <crecklin@redhat.com> Approved-by: Eric Auger <eric.auger@redhat.com> Approved-by: Mark Salter <msalter@redhat.com> Approved-by: Donald Dutile <ddutile@redhat.com> Signed-off-by: Jan Stancek <jstancek@redhat.com>	2023-11-19 15:53:34 +01:00
Paolo Bonzini	538bf6f332	mm, treewide: redefine MAX_ORDER sanely JIRA: https://issues.redhat.com/browse/RHEL-10059 MAX_ORDER currently defined as number of orders page allocator supports: user can ask buddy allocator for page order between 0 and MAX_ORDER-1. This definition is counter-intuitive and lead to number of bugs all over the kernel. Change the definition of MAX_ORDER to be inclusive: the range of orders user can ask from buddy allocator is 0..MAX_ORDER now. [kirill@shutemov.name: fix min() warning] Link: https://lkml.kernel.org/r/20230315153800.32wib3n5rickolvh@box [akpm@linux-foundation.org: fix another min_t warning] [kirill@shutemov.name: fixups per Zi Yan] Link: https://lkml.kernel.org/r/20230316232144.b7ic4cif4kjiabws@box.shutemov.name [akpm@linux-foundation.org: fix underlining in docs] Link: https://lore.kernel.org/oe-kbuild-all/202303191025.VRCTk6mP-lkp@intel.com/ Link: https://lkml.kernel.org/r/20230315113133.11326-11-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 23baf831a32c04f9a968812511540b1b3e648bf5) [RHEL: Fix conflicts by changing MAX_ORDER - 1 to MAX_ORDER, ">= MAX_ORDER" to "> MAX_ORDER", etc.] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-10-30 09:12:37 +01:00
Jerry Snitselaar	cca24b2885	mm/slab: simplify create_kmalloc_cache() args and make it static JIRA: https://issues.redhat.com/browse/RHEL-10094 Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Conflicts: Context diff at beginning of new_kmalloc_cache(). commit 0c474d31a6378f20cbe83f62d4177ebdc099c7fc Author: Catalin Marinas <catalin.marinas@arm.com> Date: Mon Jun 12 16:31:47 2023 +0100 mm/slab: simplify create_kmalloc_cache() args and make it static In the slab variant of kmem_cache_init(), call new_kmalloc_cache() instead of initialising the kmalloc_caches array directly. With this, create_kmalloc_cache() is now only called from new_kmalloc_cache() in the same file, so make it static. In addition, the useroffset argument is always 0 while usersize is the same as size. Remove them. Link: https://lkml.kernel.org/r/20230612153201.554742-4-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Isaac J. Manjarres <isaacmanjarres@google.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Jerry Snitselaar <jsnitsel@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jonathan Cameron <jic23@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Lars-Peter Clausen <lars@metafoo.de> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Mike Snitzer <snitzer@kernel.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Saravana Kannan <saravanak@google.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 0c474d31a6378f20cbe83f62d4177ebdc099c7fc) Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com>	2023-10-27 01:26:58 -07:00
Chris von Recklinghausen	8cc9c44a1f	mm/slub: only zero requested size of buffer for kzalloc when debug enabled JIRA: https://issues.redhat.com/browse/RHEL-1848 commit 9ce67395f5a0cdec6ce152d26bfda13b98b25c01 Author: Feng Tang <feng.tang@intel.com> Date: Fri Oct 21 11:24:03 2022 +0800 mm/slub: only zero requested size of buffer for kzalloc when debug enabled kzalloc/kmalloc will round up the request size to a fixed size (mostly power of 2), so the allocated memory could be more than requested. Currently kzalloc family APIs will zero all the allocated memory. To detect out-of-bound usage of the extra allocated memory, only zero the requested part, so that redzone sanity check could be added to the extra space later. For kzalloc users who will call ksize() later and utilize this extra space, please be aware that the space is not zeroed any more when debug is enabled. (Thanks to Kees Cook's effort to sanitize all ksize() user cases [1], this won't be a big issue). [1]. https://lore.kernel.org/all/20220922031013.2150682-1-keescook@chromium.org/#r Signed-off-by: Feng Tang <feng.tang@intel.com> Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2023-10-20 06:15:14 -04:00
Chris von Recklinghausen	1f619343f6	treewide: use get_random_u32() when possible Conflicts: drivers/gpu/drm/tests/drm_buddy_test.c drivers/gpu/drm/tests/drm_mm_test.c - We already have ce28ab1380e8 ("drm/tests: Add back seed value information") so keep calls to kunit_info. drop changes to drivers/misc/habanalabs/gaudi2/gaudi2.c fs/ntfs3/fslog.c - files not in CS9 net/sunrpc/auth_gss/gss_krb5_wrap.c - We already have 7f675ca7757b ("SUNRPC: Improve Kerberos confounder generation") so code to change is gone. drivers/gpu/drm/i915/i915_gem_gtt.c drivers/gpu/drm/i915/selftests/i915_selftest.c drivers/gpu/drm/tests/drm_buddy_test.c drivers/gpu/drm/tests/drm_mm_test.c change added under `4cb818386e` ("Merge DRM changes from upstream v6.0.8..v6.1") JIRA: https://issues.redhat.com/browse/RHEL-1848 commit a251c17aa558d8e3128a528af5cf8b9d7caae4fd Author: Jason A. Donenfeld <Jason@zx2c4.com> Date: Wed Oct 5 17:43:22 2022 +0200 treewide: use get_random_u32() when possible The prandom_u32() function has been a deprecated inline wrapper around get_random_u32() for several releases now, and compiles down to the exact same code. Replace the deprecated wrapper with a direct call to the real function. The same also applies to get_random_int(), which is just a wrapper around get_random_u32(). This was done as a basic find and replace. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake Acked-by: Chuck Lever <chuck.lever@oracle.com> # for nfsd Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> # for thunderbol t Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Acked-by: Helge Deller <deller@gmx.de> # for parisc Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2023-10-20 06:15:03 -04:00
Chris von Recklinghausen	7ef6d47fef	mm/slab: use kmalloc_node() for off slab freelist_idx_t array allocation JIRA: https://issues.redhat.com/browse/RHEL-1848 commit e36ce448a08d43de69e7449eb225805a7a8addf8 Author: Hyeonggon Yoo <42.hyeyoo@gmail.com> Date: Sat Oct 15 13:34:29 2022 +0900 mm/slab: use kmalloc_node() for off slab freelist_idx_t array allocation After commit d6a71648dbc0 ("mm/slab: kmalloc: pass requests larger than order-1 page to page allocator"), SLAB passes large ( > PAGE_SIZE * 2) requests to buddy like SLUB does. SLAB has been using kmalloc caches to allocate freelist_idx_t array for off slab caches. But after the commit, freelist_size can be bigger than KMALLOC_MAX_CACHE_SIZE. Instead of using pointer to kmalloc cache, use kmalloc_node() and only check if the kmalloc cache is off slab during calculate_slab_order(). If freelist_size > KMALLOC_MAX_CACHE_SIZE, no looping condition happens as it allocates freelist_idx_t array directly from buddy. Link: https://lore.kernel.org/all/20221014205818.GA1428667@roeck-us.net/ Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net> Fixes: d6a71648dbc0 ("mm/slab: kmalloc: pass requests larger than order-1 page to page allocator") Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2023-10-20 06:15:02 -04:00
Chris von Recklinghausen	a54b2a2fb0	mm/slab_common: drop kmem_alloc & avoid dereferencing fields when not using JIRA: https://issues.redhat.com/browse/RHEL-1848 commit 2c1d697fb8ba6d2d44f914d4268ae1ccdf025f1b Author: Hyeonggon Yoo <42.hyeyoo@gmail.com> Date: Wed Aug 17 19:18:24 2022 +0900 mm/slab_common: drop kmem_alloc & avoid dereferencing fields when not using Drop kmem_alloc event class, and define kmalloc and kmem_cache_alloc using TRACE_EVENT() macro. And then this patch does: - Do not pass pointer to struct kmem_cache to trace_kmalloc. gfp flag is enough to know if it's accounted or not. - Avoid dereferencing s->object_size and s->size when not using kmem_cache_alloc event. - Avoid dereferencing s->name in when not using kmem_cache_free event. - Adjust s->size to SLOB_UNITS(s->size) * SLOB_UNIT in SLOB Cc: Vasily Averin <vasily.averin@linux.dev> Suggested-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2023-10-20 06:13:19 -04:00
Chris von Recklinghausen	3af5982dd6	mm/slab_common: unify NUMA and UMA version of tracepoints JIRA: https://issues.redhat.com/browse/RHEL-1848 commit 11e9734bcb6a7361943f993eba4e97f5812120d8 Author: Hyeonggon Yoo <42.hyeyoo@gmail.com> Date: Wed Aug 17 19:18:23 2022 +0900 mm/slab_common: unify NUMA and UMA version of tracepoints Drop kmem_alloc event class, rename kmem_alloc_node to kmem_alloc, and remove _node postfix for NUMA version of tracepoints. This will break some tools that depend on {kmem_cache_alloc,kmalloc}_node, but at this point maintaining both kmem_alloc and kmem_alloc_node event classes does not makes sense at all. Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2023-10-20 06:13:18 -04:00
Chris von Recklinghausen	b228dc7f49	mm/sl[au]b: cleanup kmem_cache_alloc[_node]_trace() JIRA: https://issues.redhat.com/browse/RHEL-1848 commit 26a40990ba052e6f553256f9d0f112452b992a38 Author: Hyeonggon Yoo <42.hyeyoo@gmail.com> Date: Wed Aug 17 19:18:22 2022 +0900 mm/sl[au]b: cleanup kmem_cache_alloc[_node]_trace() Despite its name, kmem_cache_alloc[_node]_trace() is hook for inlined kmalloc. So rename it to kmalloc[_node]_trace(). Move its implementation to slab_common.c by using __kmem_cache_alloc_node(), but keep CONFIG_TRACING=n varients to save a function call when CONFIG_TRACING=n. Use __assume_kmalloc_alignment for kmalloc[_node]_trace instead of __assume_slab_alignement. Generally kmalloc has larger alignment requirements. Suggested-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2023-10-20 06:13:18 -04:00
Chris von Recklinghausen	7a80abf490	mm/sl[au]b: generalize kmalloc subsystem Conflicts: We already have 05a940656e1e ("slab: Introduce kmalloc_size_roundup()") so there is a difference in deleted code (comments). JIRA: https://issues.redhat.com/browse/RHEL-1848 commit b14051352465a24b3c9ceaccac4e39b3521bb370 Author: Hyeonggon Yoo <42.hyeyoo@gmail.com> Date: Wed Aug 17 19:18:21 2022 +0900 mm/sl[au]b: generalize kmalloc subsystem Now everything in kmalloc subsystem can be generalized. Let's do it! Generalize __do_kmalloc_node(), __kmalloc_node_track_caller(), kfree(), __ksize(), __kmalloc(), __kmalloc_node() and move them to slab_common.c. In the meantime, rename kmalloc_large_node_notrace() to __kmalloc_large_node() and make it static as it's now only called in slab_common.c. [ feng.tang@intel.com: adjust kfence skip list to include __kmem_cache_free so that kfence kunit tests do not fail ] Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2023-10-20 06:13:18 -04:00
Chris von Recklinghausen	816794f3cb	mm/sl[au]b: introduce common alloc/free functions without tracepoint JIRA: https://issues.redhat.com/browse/RHEL-1848 commit ed4cd17eb26d7f0c6a762608a3f30870929fbcdd Author: Hyeonggon Yoo <42.hyeyoo@gmail.com> Date: Wed Aug 17 19:18:20 2022 +0900 mm/sl[au]b: introduce common alloc/free functions without tracepoint To unify kmalloc functions in later patch, introduce common alloc/free functions that does not have tracepoint. Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2023-10-20 06:13:17 -04:00
Chris von Recklinghausen	6b205383b1	mm/slab: kmalloc: pass requests larger than order-1 page to page allocator JIRA: https://issues.redhat.com/browse/RHEL-1848 commit d6a71648dbc0ca5520cba16a8fdce8d37ae74218 Author: Hyeonggon Yoo <42.hyeyoo@gmail.com> Date: Wed Aug 17 19:18:19 2022 +0900 mm/slab: kmalloc: pass requests larger than order-1 page to page allocator There is not much benefit for serving large objects in kmalloc(). Let's pass large requests to page allocator like SLUB for better maintenance of common code. Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2023-10-20 06:13:17 -04:00
Chris von Recklinghausen	fe40a3e4cd	mm/sl[au]b: factor out __do_kmalloc_node() Conflicts: mm/slub.c - We already have 5373b8a09d6e ("kasan: call kasan_malloc() from __kmalloc_*track_caller()") so there is a difference in deleted code JIRA: https://issues.redhat.com/browse/RHEL-1848 commit 0f853b2e6dd9580103484a098e9c973a67d127ac Author: Hyeonggon Yoo <42.hyeyoo@gmail.com> Date: Wed Aug 17 19:18:14 2022 +0900 mm/sl[au]b: factor out __do_kmalloc_node() __kmalloc(), __kmalloc_node(), __kmalloc_node_track_caller() mostly do same job. Factor out common code into __do_kmalloc_node(). Note that this patch also fixes missing kasan_kmalloc() in SLUB's __kmalloc_node_track_caller(). Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2023-10-20 06:13:15 -04:00
Chris von Recklinghausen	d40ac277c8	mm/slab_common: cleanup kmalloc_track_caller() Conflicts: mm/slub.c - We already have 5373b8a09d6e ("kasan: call kasan_malloc() from __kmalloc_*track_caller()") so there is a difference in deleted code. JIRA: https://issues.redhat.com/browse/RHEL-1848 commit c45248db04f8e3aca4798d67a394fb9cc2168118 Author: Hyeonggon Yoo <42.hyeyoo@gmail.com> Date: Wed Aug 17 19:18:13 2022 +0900 mm/slab_common: cleanup kmalloc_track_caller() Make kmalloc_track_caller() wrapper of kmalloc_node_track_caller(). Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2023-10-20 06:13:15 -04:00
Chris von Recklinghausen	22099a7033	mm/slab_common: remove CONFIG_NUMA ifdefs for common kmalloc functions JIRA: https://issues.redhat.com/browse/RHEL-1848 commit f78a03f6e28be0283f73d3c18b54837b638a8ccf Author: Hyeonggon Yoo <42.hyeyoo@gmail.com> Date: Wed Aug 17 19:18:12 2022 +0900 mm/slab_common: remove CONFIG_NUMA ifdefs for common kmalloc functions Now that slab_alloc_node() is available for SLAB when CONFIG_NUMA=n, remove CONFIG_NUMA ifdefs for common kmalloc functions. Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2023-10-20 06:13:15 -04:00
Chris von Recklinghausen	425c969bfe	mm/slab: cleanup slab_alloc() and slab_alloc_node() JIRA: https://issues.redhat.com/browse/RHEL-1848 commit 07588d726f8d320215dcf6c79a28fe6b1bab6255 Author: Hyeonggon Yoo <42.hyeyoo@gmail.com> Date: Wed Aug 17 19:18:11 2022 +0900 mm/slab: cleanup slab_alloc() and slab_alloc_node() Make slab_alloc_node() available even when CONFIG_NUMA=n and make slab_alloc() wrapper of slab_alloc_node(). This is necessary for further cleanup. Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2023-10-20 06:13:14 -04:00
Chris von Recklinghausen	2a274f2047	mm/slab: move NUMA-related code to __do_cache_alloc() JIRA: https://issues.redhat.com/browse/RHEL-1848 commit c31a910c74ed558461dc7eecf6168ccf805775ec Author: Hyeonggon Yoo <42.hyeyoo@gmail.com> Date: Wed Aug 17 19:18:10 2022 +0900 mm/slab: move NUMA-related code to __do_cache_alloc() To implement slab_alloc_node() independent of NUMA configuration, move NUMA fallback/alternate allocation code into __do_cache_alloc(). One functional change here is not to check availability of node when allocating from local node. Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2023-10-20 06:13:14 -04:00
Chris von Recklinghausen	b15982fc79	mm/sl[au]b: use own bulk free function when bulk alloc failed Bugzilla: https://bugzilla.redhat.com/2160210 commit 2055e67bb6a8fbb6aabdb9536443688ef52456c4 Author: Hyeonggon Yoo <42.hyeyoo@gmail.com> Date: Wed Jun 15 00:26:34 2022 +0900 mm/sl[au]b: use own bulk free function when bulk alloc failed There is no benefit to call generic bulk free function when kmem_cache_alloc_bulk() failed. Use own kmem_cache_free_bulk() instead of generic function. Note that if kmem_cache_alloc_bulk() fails to allocate first object in SLUB, size is zero. So allow passing size == 0 to kmem_cache_free_bulk() like SLAB's. Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2023-03-24 11:19:27 -04:00
Chris von Recklinghausen	b7497cd088	mm: slab: optimize memcg_slab_free_hook() Bugzilla: https://bugzilla.redhat.com/2160210 commit b77d5b1b83e3e14870224de7c63f115a2dc44e9a Author: Muchun Song <songmuchun@bytedance.com> Date: Fri Apr 29 20:30:44 2022 +0800 mm: slab: optimize memcg_slab_free_hook() Most callers of memcg_slab_free_hook() already know the slab, which could be passed to memcg_slab_free_hook() directly to reduce the overhead of an another call of virt_to_slab(). For bulk freeing of objects, the call of slab_objcgs() in the loop in memcg_slab_free_hook() is redundant as well. Rework memcg_slab_free_hook() and build_detached_freelist() to reduce those unnecessary overhead and make memcg_slab_free_hook() can handle bulk freeing in slab_free(). Move the calling site of memcg_slab_free_hook() from do_slab_free() to slab_free() for slub to make the code clearer since the logic is weird (e.g. the caller need to judge whether it needs to call memcg_slab_free_hook()). It is easy to make mistakes like missing calling of memcg_slab_free_hook() like fixes of: commit `d1b2cf6cb8` ("mm: memcg/slab: uncharge during kmem_cache_free_bulk()") commit ae085d7f9365 ("mm: kfence: fix missing objcg housekeeping for SLAB") This optimization is mainly for bulk objects freeing. The following numbers is shown for 16-object freeing. before after kmem_cache_free_bulk: ~430 ns ~400 ns The overhead is reduced by about 7% for 16-object freeing. Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Link: https://lore.kernel.org/r/20220429123044.37885-1-songmuchun@bytedance.com Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2023-03-24 11:19:21 -04:00
Chris von Recklinghausen	a664e283b8	mm/tracing: add 'accounted' entry into output of allocation tracepoints Bugzilla: https://bugzilla.redhat.com/2160210 commit b347aa7b57477f71c740e2bbc6d1078a7109ba23 Author: Vasily Averin <vvs@openvz.org> Date: Fri Jun 3 06:21:49 2022 +0300 mm/tracing: add 'accounted' entry into output of allocation tracepoints Slab caches marked with SLAB_ACCOUNT force accounting for every allocation from this cache even if __GFP_ACCOUNT flag is not passed. Unfortunately, at the moment this flag is not visible in ftrace output, and this makes it difficult to analyze the accounted allocations. This patch adds boolean "accounted" entry into trace output, and set it to 'true' for calls used __GFP_ACCOUNT flag and for allocations from caches marked with SLAB_ACCOUNT. Set it to 'false' if accounting is disabled in configs. Signed-off-by: Vasily Averin <vvs@openvz.org> Acked-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Link: https://lore.kernel.org/r/c418ed25-65fe-f623-fbf8-1676528859ed@openvz.org Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2023-03-24 11:19:21 -04:00
Chris von Recklinghausen	038177e235	mm, slab: fix bad alignments Bugzilla: https://bugzilla.redhat.com/2160210 commit d1ca263d0d518b4918473768aee0cfb2770014bc Author: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Date: Thu Jun 9 12:01:32 2022 +0800 mm, slab: fix bad alignments As reported by coccicheck: ./mm/slab.c:3253:2-59: code aligned with following code on line 3255. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2023-03-24 11:19:21 -04:00
Chris von Recklinghausen	4004d229b5	mm/slab: delete cache_alloc_debugcheck_before() Bugzilla: https://bugzilla.redhat.com/2160210 commit a3967244430eb91698ac8dca7db8bd0871251305 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Sun Jun 5 17:25:38 2022 +0200 mm/slab: delete cache_alloc_debugcheck_before() It only does a might_sleep_if(GFP_RECLAIM) check, which is already covered by the might_alloc() in slab_pre_alloc_hook(). And all callers of cache_alloc_debugcheck_before() call that beforehand already. Link: https://lkml.kernel.org/r/20220605152539.3196045-2-daniel.vetter@ffwll.ch Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2023-03-24 11:19:14 -04:00
Chris von Recklinghausen	83549ef6f8	mm/slab.c: fix comments Bugzilla: https://bugzilla.redhat.com/2160210 commit a8f23dd166651dcda2c02f16e524f56a4bd49084 Author: Yixuan Cao <caoyixuan2019@email.szu.edu.cn> Date: Thu Apr 7 16:09:58 2022 +0800 mm/slab.c: fix comments While reading the source code, I noticed some language errors in the comments, so I fixed them. Signed-off-by: Yixuan Cao <caoyixuan2019@email.szu.edu.cn> Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20220407080958.3667-1-caoyixuan2019@email.szu.edu.cn Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2023-03-24 11:18:50 -04:00
Chris von Recklinghausen	5d60d4d004	mm/slab: remove some unused functions Bugzilla: https://bugzilla.redhat.com/2160210 commit 1e703d0548e0a2766e198c64797737d50349f46e Author: Miaohe Lin <linmiaohe@huawei.com> Date: Tue Mar 22 17:14:21 2022 +0800 mm/slab: remove some unused functions alternate_node_alloc and ____cache_alloc_node are always called when CONFIG_NUMA. So we can remove the unused !CONFIG_NUMA variant. Also forward declaration for alternate_node_alloc is unnecessary. Remove it too. [ vbabka@suse.cz: move ____cache_alloc_node() declaration closer to its callers ] Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20220322091421.25285-1-linmiaohe@huawei.com Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2023-03-24 11:18:49 -04:00
Mark Salter	53b03ecaba	mm: make minimum slab alignment a runtime property Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2122232 commit d949a8155d139aa890795b802004a196b7f00598 Author: Peter Collingbourne <pcc@google.com> Date: Mon, 9 May 2022 18:20:53 -0700 When CONFIG_KASAN_HW_TAGS is enabled we currently increase the minimum slab alignment to 16. This happens even if MTE is not supported in hardware or disabled via kasan=off, which creates an unnecessary memory overhead in those cases. Eliminate this overhead by making the minimum slab alignment a runtime property and only aligning to 16 if KASAN is enabled at runtime. On a DragonBoard 845c (non-MTE hardware) with a kernel built with CONFIG_KASAN_HW_TAGS, waiting for quiescence after a full Android boot I see the following Slab measurements in /proc/meminfo (median of 3 reboots): Before: 169020 kB After: 167304 kB [akpm@linux-foundation.org: make slab alignment type `unsigned int' to avoid casting] Link: https://linux-review.googlesource.com/id/I752e725179b43b144153f4b6f584ceb646473ead Link: https://lkml.kernel.org/r/20220427195820.1716975-2-pcc@google.com Signed-off-by: Peter Collingbourne <pcc@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mark Salter <msalter@redhat.com>	2023-01-28 11:34:57 -05:00
Michal Schmidt	de2f4dee96	slab: Introduce kmalloc_size_roundup() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2143368 commit 05a940656e1eb2026d9ee31019d5b47e9545124d Author: Kees Cook <keescook@chromium.org> Date: Fri Sep 23 13:28:08 2022 -0700 slab: Introduce kmalloc_size_roundup() In the effort to help the compiler reason about buffer sizes, the __alloc_size attribute was added to allocators. This improves the scope of the compiler's ability to apply CONFIG_UBSAN_BOUNDS and (in the near future) CONFIG_FORTIFY_SOURCE. For most allocations, this works well, as the vast majority of callers are not expecting to use more memory than what they asked for. There is, however, one common exception to this: anticipatory resizing of kmalloc allocations. These cases all use ksize() to determine the actual bucket size of a given allocation (e.g. 128 when 126 was asked for). This comes in two styles in the kernel: 1) An allocation has been determined to be too small, and needs to be resized. Instead of the caller choosing its own next best size, it wants to minimize the number of calls to krealloc(), so it just uses ksize() plus some additional bytes, forcing the realloc into the next bucket size, from which it can learn how large it is now. For example: data = krealloc(data, ksize(data) + 1, gfp); data_len = ksize(data); 2) The minimum size of an allocation is calculated, but since it may grow in the future, just use all the space available in the chosen bucket immediately, to avoid needing to reallocate later. A good example of this is skbuff's allocators: data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc); ... /* kmalloc(size) might give us more room than requested. * Put skb_shared_info exactly at the end of allocated zone, * to allow max possible filling before reallocation. */ osize = ksize(data); size = SKB_WITH_OVERHEAD(osize); In both cases, the "how much was actually allocated?" question is answered _after_ the allocation, where the compiler hinting is not in an easy place to make the association any more. This mismatch between the compiler's view of the buffer length and the code's intention about how much it is going to actually use has already caused problems[1]. It is possible to fix this by reordering the use of the "actual size" information. We can serve the needs of users of ksize() and still have accurate buffer length hinting for the compiler by doing the bucket size calculation _before_ the allocation. Code can instead ask "how large an allocation would I get for a given size?". Introduce kmalloc_size_roundup(), to serve this function so we can start replacing the "anticipatory resizing" uses of ksize(). [1] https://github.com/ClangBuiltLinux/linux/issues/1599 https://github.com/KSPP/linux/issues/183 [ vbabka@suse.cz: add SLOB version ] Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm@kvack.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Michal Schmidt <mschmidt@redhat.com>	2022-11-22 16:08:59 +01:00
Chris von Recklinghausen	9258ee6d65	mm, kfence: support kmem_dump_obj() for KFENCE objects Bugzilla: https://bugzilla.redhat.com/2120352 commit 2dfe63e61cc31ee59ce951672b0850b5229cd5b0 Author: Marco Elver <elver@google.com> Date: Thu Apr 14 19:13:40 2022 -0700 mm, kfence: support kmem_dump_obj() for KFENCE objects Calling kmem_obj_info() via kmem_dump_obj() on KFENCE objects has been producing garbage data due to the object not actually being maintained by SLAB or SLUB. Fix this by implementing __kfence_obj_info() that copies relevant information to struct kmem_obj_info when the object was allocated by KFENCE; this is called by a common kmem_obj_info(), which also calls the slab/slub/slob specific variant now called __kmem_obj_info(). For completeness, kmem_dump_obj() now displays if the object was allocated by KFENCE. Link: https://lore.kernel.org/all/20220323090520.GG16885@xsang-OptiPlex-9020/ Link: https://lkml.kernel.org/r/20220406131558.3558585-1-elver@google.com Fixes: `b89fb5ef0c` ("mm, kfence: insert KFENCE hooks for SLUB") Fixes: `d3fb45f370` ("mm, kfence: insert KFENCE hooks for SLAB") Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reported-by: kernel test robot <oliver.sang@intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> [slab] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:28:06 -04:00
Chris von Recklinghausen	241d3da5fe	mm: kfence: fix missing objcg housekeeping for SLAB Bugzilla: https://bugzilla.redhat.com/2120352 commit ae085d7f9365de7da27ab5c0d16b12d51ea7fca9 Author: Muchun Song <songmuchun@bytedance.com> Date: Sun Mar 27 13:18:52 2022 +0800 mm: kfence: fix missing objcg housekeeping for SLAB The objcg is not cleared and put for kfence object when it is freed, which could lead to memory leak for struct obj_cgroup and wrong statistics of NR_SLAB_RECLAIMABLE_B or NR_SLAB_UNRECLAIMABLE_B. Since the last freed object's objcg is not cleared, mem_cgroup_from_obj() could return the wrong memcg when this kfence object, which is not charged to any objcgs, is reallocated to other users. A real word issue [1] is caused by this bug. Link: https://lore.kernel.org/all/000000000000cabcb505dae9e577@google.com/ [1] Reported-by: syzbot+f8c45ccc7d5d45fc5965@syzkaller.appspotmail.com Fixes: `d3fb45f370` ("mm, kfence: insert KFENCE hooks for SLAB") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Marco Elver <elver@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:28:03 -04:00
Chris von Recklinghausen	9d61cca226	mm/kasan: Convert to struct folio and struct slab Bugzilla: https://bugzilla.redhat.com/2120352 commit 6e48a966dfd18987fec9385566a67d36e2b5fc11 Author: Matthew Wilcox (Oracle) <willy@infradead.org> Date: Mon Oct 4 14:46:46 2021 +0100 mm/kasan: Convert to struct folio and struct slab KASAN accesses some slab related struct page fields so we need to convert it to struct slab. Some places are a bit simplified thanks to kasan_addr_to_slab() encapsulating the PageSlab flag check through virt_to_slab(). When resolving object address to either a real slab or a large kmalloc, use struct folio as the intermediate type for testing the slab flag to avoid unnecessary implicit compound_head(). [ vbabka@suse.cz: use struct folio, adjust to differences in previous patches ] Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Reviewed-by: Roman Gushchin <guro@fb.com> Tested-by: Hyeongogn Yoo <42.hyeyoo@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: <kasan-dev@googlegroups.com> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:27:35 -04:00
Chris von Recklinghausen	c6690c8c61	mm: Convert struct page to struct slab in functions used by other subsystems Bugzilla: https://bugzilla.redhat.com/2120352 commit 40f3bf0cb04c91d33531b1b95788ad2f0e4062cf Author: Vlastimil Babka <vbabka@suse.cz> Date: Tue Nov 2 15:42:04 2021 +0100 mm: Convert struct page to struct slab in functions used by other subsystems KASAN, KFENCE and memcg interact with SLAB or SLUB internals through functions nearest_obj(), obj_to_index() and objs_per_slab() that use struct page as parameter. This patch converts it to struct slab including all callers, through a coccinelle semantic patch. // Options: --include-headers --no-includes --smpl-spacing include/linux/slab_def.h include/linux/slub_def.h mm/slab.h mm/kasan/.c mm/kfence/kfence_test.c mm/memcontrol.c mm/slab.c mm/slub.c // Note: needs coccinelle 1.1.1 to avoid breaking whitespace @@ @@ -objs_per_slab_page( +objs_per_slab( ... ) { ... } @@ @@ -objs_per_slab_page( +objs_per_slab( ... ) @@ identifier fn =~ "obj_to_index\|objs_per_slab"; @@ fn(..., - const struct page page + const struct slab slab ,...) { <... ( - page_address(page) + slab_address(slab) \| - page + slab ) ...> } @@ identifier fn =~ "nearest_obj"; @@ fn(..., - struct page page + const struct slab *slab ,...) { <... ( - page_address(page) + slab_address(slab) \| - page + slab ) ...> } @@ identifier fn =~ "nearest_obj\|obj_to_index\|objs_per_slab"; expression E; @@ fn(..., ( - slab_page(E) + E \| - virt_to_page(E) + virt_to_slab(E) \| - virt_to_head_page(E) + virt_to_slab(E) \| - page + page_slab(page) ) ,...) Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Reviewed-by: Roman Gushchin <guro@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Julia Lawall <julia.lawall@inria.fr> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Marco Elver <elver@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: <kasan-dev@googlegroups.com> Cc: <cgroups@vger.kernel.org> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:27:35 -04:00
Chris von Recklinghausen	a9b569137b	mm/slab: Convert most struct page to struct slab by spatch Bugzilla: https://bugzilla.redhat.com/2120352 commit 7981e67efb85908d9c4924c8e6669c5d5fe365b7 Author: Vlastimil Babka <vbabka@suse.cz> Date: Tue Nov 2 13:23:10 2021 +0100 mm/slab: Convert most struct page to struct slab by spatch The majority of conversion from struct page to struct slab in SLAB internals can be delegated to a coccinelle semantic patch. This includes renaming of variables with 'page' in name to 'slab', and similar. Big thanks to Julia Lawall and Luis Chamberlain for help with coccinelle. // Options: --include-headers --no-includes --smpl-spacing mm/slab.c // Note: needs coccinelle 1.1.1 to avoid breaking whitespace, and ocaml for the // embedded script // build list of functions for applying the next rule @initialize:ocaml@ @@ let ok_function p = not (List.mem (List.hd p).current_element ["kmem_getpages";"kmem_freepages"]) // convert the type in selected functions @@ position p : script:ocaml() { ok_function p }; @@ - struct page@p + struct slab @@ @@ -PageSlabPfmemalloc(page) +slab_test_pfmemalloc(slab) @@ @@ -ClearPageSlabPfmemalloc(page) +slab_clear_pfmemalloc(slab) @@ @@ obj_to_index( ..., - page + slab_page(slab) ,...) // for all functions, change any "struct slab page" parameter to "struct slab // slab" in the signature, and generally all occurences of "page" to "slab" in // the body - with some special cases. @@ identifier fn; expression E; @@ fn(..., - struct slab page + struct slab slab ,...) { <... ( - int page_node; + int slab_node; \| - page_node + slab_node \| - page_slab(page) + slab \| - page_address(page) + slab_address(slab) \| - page_size(page) + slab_size(slab) \| - page_to_nid(page) + slab_nid(slab) \| - virt_to_head_page(E) + virt_to_slab(E) \| - page + slab ) ...> } // rename a function parameter @@ identifier fn; expression E; @@ fn(..., - int page_node + int slab_node ,...) { <... - page_node + slab_node ...> } // functions converted by previous rules that were temporarily called using // slab_page(E) so we want to remove the wrapper now that they accept struct // slab ptr directly @@ identifier fn =~ "index_to_obj"; expression E; @@ fn(..., - slab_page(E) + E ,...) // functions that were returning struct page ptr and now will return struct // slab ptr, including slab_page() wrapper removal @@ identifier fn =~ "cache_grow_begin\|get_valid_first_slab\|get_first_slab"; expression E; @@ fn(...) { <... - slab_page(E) + E ...> } // rename any former struct page * declarations @@ @@ struct slab * -page +slab ; // all functions (with exceptions) with a local "struct slab page" variable // that will be renamed to "struct slab slab" @@ identifier fn !~ "kmem_getpages\|kmem_freepages"; expression E; @@ fn(...) { <... ( - page_slab(page) + slab \| - page_to_nid(page) + slab_nid(slab) \| - kasan_poison_slab(page) + kasan_poison_slab(slab_page(slab)) \| - page_address(page) + slab_address(slab) \| - page_size(page) + slab_size(slab) \| - page->pages + slab->slabs \| - page = virt_to_head_page(E) + slab = virt_to_slab(E) \| - virt_to_head_page(E) + virt_to_slab(E) \| - page + slab ) ...> } Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Roman Gushchin <guro@fb.com> Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Julia Lawall <julia.lawall@inria.fr> Cc: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:27:35 -04:00
Chris von Recklinghausen	4a1e6707a9	mm/slab: Convert kmem_getpages() and kmem_freepages() to struct slab Bugzilla: https://bugzilla.redhat.com/2120352 commit 42c0faac3192352867f6e6ba815b28ed58bf7388 Author: Vlastimil Babka <vbabka@suse.cz> Date: Fri Oct 29 17:54:55 2021 +0200 mm/slab: Convert kmem_getpages() and kmem_freepages() to struct slab These functions sit at the boundary to page allocator. Also use folio internally to avoid extra compound_head() when dealing with page flags. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Roman Gushchin <guro@fb.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:27:34 -04:00

1 2 3 4 5 ...

784 Commits