Commit Graph

784 Commits

Author SHA1 Message Date
Rado Vrbovsky 4da7c39b53 Merge: io_uring: Update to upstream v6.10 + fixes 2025-01-13 18:58:47 +00:00
Rafael Aquini c8c9c0b259 mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER
JIRA: https://issues.redhat.com/browse/RHEL-27745
Conflicts:
  * arch/*/Kconfig: all hunks dropped as there were only text blurbs and comments
     being changed with no functional changes whatsoever, and RHEL9 is missing
     several (unrelated) commits to these arches that tranform the text blurbs in
     the way these non-functional hunks were expecting;
  * drivers/accel/qaic/qaic_data.c: hunk dropped due to RHEL-only commit
     083c0cdce2 ("Merge DRM changes from upstream v6.8..v6.9");
  * drivers/gpu/drm/i915/gem/selftests/huge_pages.c: hunk dropped due to RHEL-only
     commit ca8b16c11b ("Merge DRM changes from upstream v6.7..v6.8");
  * drivers/gpu/drm/ttm/tests/ttm_pool_test.c: all hunks dropped due to RHEL-only
     commit ca8b16c11b ("Merge DRM changes from upstream v6.7..v6.8");
  * drivers/video/fbdev/vermilion/vermilion.c: hunk dropped as RHEL9 misses
     commit dbe7e429fe ("vmlfb: framebuffer driver for Intel Vermilion Range");
  * include/linux/pageblock-flags.h: differences due to out-of-order backport
    of upstream commits 72801513b2bf ("mm: set pageblock_order to HPAGE_PMD_ORDER
    in case with !CONFIG_HUGETLB_PAGE but THP enabled"), and 3a7e02c040b1
    ("minmax: avoid overly complicated constant expressions in VM code");
  * mm/mm_init.c: differences on the 3rd, and 4th hunks are due to RHEL
     backport commit 1845b92dcf ("mm: move most of core MM initialization to
     mm/mm_init.c") ignoring the out-of-order backport of commit 3f6dac0fd1b8
     ("mm/page_alloc: make deferred page init free pages in MAX_ORDER blocks")
     thus partially reverting the changes introduced by the latter;

This patch is a backport of the following upstream commit:
commit 5e0a760b44417f7cadd79de2204d6247109558a0
Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Date:   Thu Dec 28 17:47:04 2023 +0300

    mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER

    commit 23baf831a32c ("mm, treewide: redefine MAX_ORDER sanely") has
    changed the definition of MAX_ORDER to be inclusive.  This has caused
    issues with code that was not yet upstream and depended on the previous
    definition.

    To draw attention to the altered meaning of the define, rename MAX_ORDER
    to MAX_PAGE_ORDER.

    Link: https://lkml.kernel.org/r/20231228144704.14033-2-kirill.shutemov@linux.intel.com
    Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-12-09 12:24:17 -05:00
Jeff Moyer 34644cd1d2 kasan: rename and document kasan_(un)poison_object_data
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit 1ce9a0523938f87dd8505233cc3445f8e2d8dcee
Author: Andrey Konovalov <andreyknvl@gmail.com>
Date:   Tue Dec 19 23:29:03 2023 +0100

    kasan: rename and document kasan_(un)poison_object_data
    
    Rename kasan_unpoison_object_data to kasan_unpoison_new_object and add a
    documentation comment.  Do the same for kasan_poison_object_data.
    
    The new names and the comments should suggest the users that these hooks
    are intended for internal use by the slab allocator.
    
    The following patch will remove non-slab-internal uses of these hooks.
    
    No functional changes.
    
    [andreyknvl@google.com: update references to renamed functions in comments]
      Link: https://lkml.kernel.org/r/20231221180637.105098-1-andrey.konovalov@linux.dev
    Link: https://lkml.kernel.org/r/eab156ebbd635f9635ef67d1a4271f716994e628.1703024586.git.andreyknvl@google.com
    Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
    Reviewed-by: Marco Elver <elver@google.com>
    Cc: Alexander Lobakin <alobakin@pm.me>
    Cc: Alexander Potapenko <glider@google.com>
    Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
    Cc: Breno Leitao <leitao@debian.org>
    Cc: Dmitry Vyukov <dvyukov@google.com>
    Cc: Evgenii Stepanov <eugenis@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 16:22:44 -05:00
Rafael Aquini a9278b8510 Randomized slab caches for kmalloc()
JIRA: https://issues.redhat.com/browse/RHEL-27743
Conflicts:
  * minor extra RHEL-only hunk to create the required CONFIG_RANDOM_KMALLOC_CACHES
    file under rhel's config database.

This patch is a backport of the following upstream commit:
commit 3c6152940584290668b35fa0800026f6a1ae05fe
Author: GONG, Ruiqi <gongruiqi@huaweicloud.com>
Date:   Fri Jul 14 14:44:22 2023 +0800

    Randomized slab caches for kmalloc()

    When exploiting memory vulnerabilities, "heap spraying" is a common
    technique targeting those related to dynamic memory allocation (i.e. the
    "heap"), and it plays an important role in a successful exploitation.
    Basically, it is to overwrite the memory area of vulnerable object by
    triggering allocation in other subsystems or modules and therefore
    getting a reference to the targeted memory location. It's usable on
    various types of vulnerablity including use after free (UAF), heap out-
    of-bound write and etc.

    There are (at least) two reasons why the heap can be sprayed: 1) generic
    slab caches are shared among different subsystems and modules, and
    2) dedicated slab caches could be merged with the generic ones.
    Currently these two factors cannot be prevented at a low cost: the first
    one is a widely used memory allocation mechanism, and shutting down slab
    merging completely via `slub_nomerge` would be overkill.

    To efficiently prevent heap spraying, we propose the following approach:
    to create multiple copies of generic slab caches that will never be
    merged, and random one of them will be used at allocation. The random
    selection is based on the address of code that calls `kmalloc()`, which
    means it is static at runtime (rather than dynamically determined at
    each time of allocation, which could be bypassed by repeatedly spraying
    in brute force). In other words, the randomness of cache selection will
    be with respect to the code address rather than time, i.e. allocations
    in different code paths would most likely pick different caches,
    although kmalloc() at each place would use the same cache copy whenever
    it is executed. In this way, the vulnerable object and memory allocated
    in other subsystems and modules will (most probably) be on different
    slab caches, which prevents the object from being sprayed.

    Meanwhile, the static random selection is further enhanced with a
    per-boot random seed, which prevents the attacker from finding a usable
    kmalloc that happens to pick the same cache with the vulnerable
    subsystem/module by analyzing the open source code. In other words, with
    the per-boot seed, the random selection is static during each time the
    system starts and runs, but not across different system startups.

    The overhead of performance has been tested on a 40-core x86 server by
    comparing the results of `perf bench all` between the kernels with and
    without this patch based on the latest linux-next kernel, which shows
    minor difference. A subset of benchmarks are listed below:

                    sched/  sched/  syscall/       mem/       mem/
                 messaging    pipe     basic     memcpy     memset
                     (sec)   (sec)     (sec)   (GB/sec)   (GB/sec)

    control1         0.019   5.459     0.733  15.258789  51.398026
    control2         0.019   5.439     0.730  16.009221  48.828125
    control3         0.019   5.282     0.735  16.009221  48.828125
    control_avg      0.019   5.393     0.733  15.759077  49.684759

    experiment1      0.019   5.374     0.741  15.500992  46.502976
    experiment2      0.019   5.440     0.746  16.276042  51.398026
    experiment3      0.019   5.242     0.752  15.258789  51.398026
    experiment_avg   0.019   5.352     0.746  15.678608  49.766343

    The overhead of memory usage was measured by executing `free` after boot
    on a QEMU VM with 1GB total memory, and as expected, it's positively
    correlated with # of cache copies:

               control  4 copies  8 copies  16 copies

    total       969.8M    968.2M    968.2M     968.2M
    used         20.0M     21.9M     24.1M      26.7M
    free        936.9M    933.6M    931.4M     928.6M
    available   932.2M    928.8M    926.6M     923.9M

    Co-developed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
    Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
    Signed-off-by: GONG, Ruiqi <gongruiqi@huaweicloud.com>
    Reviewed-by: Kees Cook <keescook@chromium.org>
    Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    Acked-by: Dennis Zhou <dennis@kernel.org> # percpu
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-10-01 11:19:49 -04:00
Rafael Aquini 5b0f4beec7 mm/slab: correct return values in comment for _kmem_cache_create()
JIRA: https://issues.redhat.com/browse/RHEL-27742

This patch is a backport of the following upstream commit:
commit 444f20c29e8b41a5aef5c34e3eab84e8d1cc4511
Author: zhaoxinchao <chrisxinchao@outlook.com>
Date:   Tue Apr 18 10:05:23 2023 +0800

    mm/slab: correct return values in comment for _kmem_cache_create()

    __kmem_cache_create() returns 0 on success and non-zero on failure.
    The comment is wrong in two instances, so fix the first one and remove
    the second one. Also make the comment non-doc, because it doesn't
    describe an API function, but SLAB-specific implementation.

    Signed-off-by: zhaoxinchao <chrisxinchao@outlook.com>
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-09-05 20:35:07 -04:00
Rafael Aquini f97b54a816 mm/slab: Replace invocation of weak PRNG
JIRA: https://issues.redhat.com/browse/RHEL-27742

This patch is a backport of the following upstream commit:
commit f7e466e951a15bc7cec496f22f6276b854d3c310
Author: David Keisar Schmidt <david.keisarschm@mail.huji.ac.il>
Date:   Sun Apr 16 20:22:42 2023 +0300

    mm/slab: Replace invocation of weak PRNG

    The Slab allocator randomization uses the prandom_u32
    PRNG. That was added to prevent attackers to obtain information on the heap
    state, by randomizing the freelists state.

    However, this PRNG turned out to be weak, as noted in commit c51f8f88d7
    To fix it, we have changed the invocation of prandom_u32_state to get_random_u32
    to ensure the PRNG is strong. Since a modulo operation is applied right after that,
    we used get_random_u32_below, to achieve uniformity.

    In addition, we changed the freelist_init_state union to struct,
    since the rnd_state inside which is used to store the state of prandom_u32,
    is not needed anymore, since get_random_u32 maintains its own state.

    Signed-off-by: David Keisar Schmidt <david.keisarschm@mail.huji.ac.il>
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-09-05 20:35:06 -04:00
Nico Pache 5415c73d87 mm/slab: Finish struct page to struct slab conversion
commit dd35f71a1d98b87e0e3ee3d87fff1bc7004cf626
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Tue Nov 2 13:26:56 2021 +0100

    mm/slab: Finish struct page to struct slab conversion

    Change cache_free_alien() to use slab_nid(virt_to_slab()). Otherwise
    just update of comments and some remaining variable names.

    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    Reviewed-by: Roman Gushchin <guro@fb.com>

JIRA: https://issues.redhat.com/browse/RHEL-5619
Signed-off-by: Nico Pache <npache@redhat.com>
2024-04-30 17:51:22 -06:00
Chris von Recklinghausen 41d58c77af mm: vmscan: refactor updating current->reclaim_state
Conflicts: mm/slob.c - We already have
	6630e950d532 ("mm/slob: remove slob.c")
	so the file is gone.

JIRA: https://issues.redhat.com/browse/RHEL-27741

commit c7b23b68e2aa93f86a206222d23ccd9a21f5982a
Author: Yosry Ahmed <yosryahmed@google.com>
Date:   Thu Apr 13 10:40:34 2023 +0000

    mm: vmscan: refactor updating current->reclaim_state

    During reclaim, we keep track of pages reclaimed from other means than
    LRU-based reclaim through scan_control->reclaim_state->reclaimed_slab,
    which we stash a pointer to in current task_struct.

    However, we keep track of more than just reclaimed slab pages through
    this.  We also use it for clean file pages dropped through pruned inodes,
    and xfs buffer pages freed.  Rename reclaimed_slab to reclaimed, and add a
    helper function that wraps updating it through current, so that future
    changes to this logic are contained within include/linux/swap.h.

    Link: https://lkml.kernel.org/r/20230413104034.1086717-4-yosryahmed@google.c
om
    Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: Darrick J. Wong <djwong@kernel.org>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: NeilBrown <neilb@suse.de>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Roman Gushchin <roman.gushchin@linux.dev>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Tim Chen <tim.c.chen@linux.intel.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Yu Zhao <yuzhao@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-30 07:00:59 -04:00
Aristeu Rozanski 32202ced15 mm/slab: Fix undefined init_cache_node_node() for NUMA and !SMP
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me

commit 66a1c22b709178e7b823d44465d0c2e5ed7492fb
Author: Geert Uytterhoeven <geert+renesas@glider.be>
Date:   Tue Mar 21 09:30:59 2023 +0100

    mm/slab: Fix undefined init_cache_node_node() for NUMA and !SMP

    sh/migor_defconfig:

        mm/slab.c: In function ‘slab_memory_callback’:
        mm/slab.c:1127:23: error: implicit declaration of function ‘init_cache_node_node’; did you mean ‘drain_cache_node_node’? [-Werror=implicit-function-declaration]
         1127 |                 ret = init_cache_node_node(nid);
              |                       ^~~~~~~~~~~~~~~~~~~~
              |                       drain_cache_node_node

    The #ifdef condition protecting the definition of init_cache_node_node()
    no longer matches the conditions protecting the (multiple) users.

    Fix this by syncing the conditions.

    Fixes: 76af6a054da40553 ("mm/migrate: add CPU hotplug to demotion #ifdef")
    Reported-by: Randy Dunlap <rdunlap@infradead.org>
    Link: https://lore.kernel.org/r/b5bdea22-ed2f-3187-6efe-0c72330270a4@infradead.org
    Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
    Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
    Acked-by: Randy Dunlap <rdunlap@infradead.org>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2024-04-29 14:33:25 -04:00
Aristeu Rozanski 672578399d mm, slab/slub: Ensure kmem_cache_alloc_bulk() is available early
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me

commit f5451547b8310868f5b5acff7cd4aa7c0267edb3
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Feb 7 15:16:53 2023 +0100

    mm, slab/slub: Ensure kmem_cache_alloc_bulk() is available early

    The memory allocators are available during early boot even in the phase
    where interrupts are disabled and scheduling is not yet possible.

    The setup is so that GFP_KERNEL allocations work in this phase without
    causing might_alloc() splats to be emitted because the system state is
    SYSTEM_BOOTING at that point which prevents the warnings to trigger.

    Most allocation/free functions use local_irq_save()/restore() or a lock
    variant of that. But kmem_cache_alloc_bulk() and kmem_cache_free_bulk() use
    local_[lock]_irq_disable()/enable(), which leads to a lockdep warning when
    interrupts are enabled during the early boot phase.

    This went unnoticed so far as there are no early users of these
    interfaces. The upcoming conversion of the interrupt descriptor store from
    radix_tree to maple_tree triggered this warning as maple_tree uses the bulk
    interface.

    Cure this by moving the kmem_cache_alloc/free() bulk variants of SLUB and
    SLAB to local[_lock]_irq_save()/restore().

    There is obviously no reclaim possible and required at this point so there
    is no need to expand this coverage further.

    No functional change.

    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2024-04-29 14:33:13 -04:00
Aristeu Rozanski fa5f95f92f mm: introduce folio_is_pfmemalloc
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me

commit 02d65d6fb1aae151570c8bfd1bd77a8153d2e607
Author: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Date:   Fri Jan 6 15:52:51 2023 -0600

    mm: introduce folio_is_pfmemalloc

    Add a folio equivalent for page_is_pfmemalloc. This removes two instances
    of page_is_pfmemalloc(folio_page(folio, 0)) so the folio can be used
    directly.

    Link: https://lkml.kernel.org/r/20230106215251.599222-1-sidhartha.kumar@oracle.com
    Suggested-by: Matthew Wilcox <willy@infradead.org>
    Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: SeongJae Park <sj@kernel.org>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2024-04-29 14:33:03 -04:00
Aristeu Rozanski 5dc5236643 mm/slab.c: cleanup is_debug_pagealloc_cache()
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me

commit 81ce2ebd194cf32027854ce1c703b7fd129c86b8
Author: lvqian <lvqian@nfschina.com>
Date:   Wed Jan 11 17:27:44 2023 +0800

    mm/slab.c: cleanup is_debug_pagealloc_cache()

    Remove the if statement to increase code readability.
    Also make the function inline, per David.

    Signed-off-by: lvqian <lvqian@nfschina.com>
    Acked-by: David Rientjes <rientjes@google.com>
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2024-04-29 14:32:59 -04:00
Aristeu Rozanski 120c032c5d mm/sl{a,u}b: fix wrong usages of folio_page() for getting head pages
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me

commit c034c6a45c977fdf33de5974d7def75bda9dcadc
Author: SeongJae Park <sj@kernel.org>
Date:   Tue Jan 10 00:51:24 2023 +0000

    mm/sl{a,u}b: fix wrong usages of folio_page() for getting head pages

    The standard idiom for getting head page of a given folio is
    '&folio->page', but some are wrongly using 'folio_page(folio, 0)' for
    the purpose.  Fix those to use the idiom.

    Suggested-by: Matthew Wilcox <willy@infradead.org>
    Signed-off-by: SeongJae Park <sj@kernel.org>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2024-04-29 14:32:59 -04:00
Aristeu Rozanski e24d457b8d mm/slab: remove unused slab_early_init
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me

commit 35e3c36d438e05fcd4f846c76cf22cbda9b63abb
Author: Gou Hao <gouhao@uniontech.com>
Date:   Sun Dec 18 20:31:27 2022 +0800

    mm/slab: remove unused slab_early_init

    'slab_early_init' was introduced by 'commit e0a4272679
    ("[PATCH] mm/slab.c: fix early init assumption")', this
    flag was used to prevent off-slab caches being created
    so early during bootup.

    The only user of 'slab_early_init' was removed in 'commit
     3217fd9bdf ("mm/slab: make criteria for off slab
    determination robust and simple")'.

    Signed-off-by: Gou Hao <gouhao@uniontech.com>
    Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    Acked-by: David Rientjes <rientjes@google.com>
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2024-04-29 14:32:58 -04:00
Audra Mitchell 086ffa2949 mm, slab: periodically resched in drain_freelist()
JIRA: https://issues.redhat.com/browse/RHEL-27739

This patch is a backport of the following upstream commit:
commit cc2e9d2b26c86c1dd8687f6916e5f621bcacd6f7
Author: David Rientjes <rientjes@google.com>
Date:   Tue Dec 27 22:05:48 2022 -0800

    mm, slab: periodically resched in drain_freelist()

    drain_freelist() can be called with a very large number of slabs to free,
    such as for kmem_cache_shrink(), or depending on various settings of the
    slab cache when doing periodic reaping.

    If there is a potentially long list of slabs to drain, periodically
    schedule to ensure we aren't saturating the cpu for too long.

    Signed-off-by: David Rientjes <rientjes@google.com>
    Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Signed-off-by: Audra Mitchell <audra@redhat.com>
2024-04-09 09:43:02 -04:00
Audra Mitchell efb0626ae7 mm/migrate: make isolate_movable_page() skip slab pages
JIRA: https://issues.redhat.com/browse/RHEL-27739

This patch is a backport of the following upstream commit:
commit 8b8817630ae80032e80b2eaf334de756ac1ff6a3
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Fri Nov 4 15:57:26 2022 +0100

    mm/migrate: make isolate_movable_page() skip slab pages

    In the next commit we want to rearrange struct slab fields to allow a larger
    rcu_head. Afterwards, the page->mapping field will overlap with SLUB's "struct
    list_head slab_list", where the value of prev pointer can become LIST_POISON2,
    which is 0x122 + POISON_POINTER_DELTA.  Unfortunately the bit 1 being set can
    confuse PageMovable() to be a false positive and cause a GPF as reported by lkp
    [1].

    To fix this, make isolate_movable_page() skip pages with the PageSlab flag set.
    This is a bit tricky as we need to add memory barriers to SLAB and SLUB's page
    allocation and freeing, and their counterparts to isolate_movable_page().

    Based on my RFC from [2]. Added a comment update from Matthew's variant in [3]
    and, as done there, moved the PageSlab checks to happen before trying to take
    the page lock.

    [1] https://lore.kernel.org/all/208c1757-5edd-fd42-67d4-1940cc43b50f@intel.com/
    [2] https://lore.kernel.org/all/aec59f53-0e53-1736-5932-25407125d4d4@suse.cz/
    [3] https://lore.kernel.org/all/YzsVM8eToHUeTP75@casper.infradead.org/

    Reported-by: kernel test robot <yujie.liu@intel.com>
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

Signed-off-by: Audra Mitchell <audra@redhat.com>
2024-04-09 09:42:53 -04:00
Audra Mitchell 1b22e98c99 mm/slab: move and adjust kernel-doc for kmem_cache_alloc
JIRA: https://issues.redhat.com/browse/RHEL-27739

This patch is a backport of the following upstream commit:
commit 838de63b101147fc7d8af828465cf6d1d30232a8
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Thu Nov 10 09:10:30 2022 +0100

    mm/slab: move and adjust kernel-doc for kmem_cache_alloc

    Alexander reports an issue with the kmem_cache_alloc() comment in
    mm/slab.c:

    > The current comment mentioned that the flags only matters if the
    > cache has no available objects. It's different for the __GFP_ZERO
    > flag which will ensure that the returned object is always zeroed
    > in any case.

    > I have the feeling I run into this question already two times if
    > the user need to zero the object or not, but the user does not need
    > to zero the object afterwards. However another use of __GFP_ZERO
    > and only zero the object if the cache has no available objects would
    > also make no sense.

    and suggests thus mentioning __GFP_ZERO as the exception. But on closer
    inspection, the part about flags being only relevant if cache has no
    available objects is misleading. The slab user has no reliable way to
    determine if there are available objects, and e.g. the might_sleep()
    debug check can be performed even if objects are available, so passing
    correct flags given the allocation context always matters.

    Thus remove that sentence completely, and while at it, move the comment
    to from SLAB-specific mm/slab.c to the common include/linux/slab.h
    The comment otherwise refers flags description for kmalloc(), so add
    __GFP_ZERO comment there and remove a very misleading GFP_HIGHUSER
    (not applicable to slab) description from there. Mention kzalloc() and
    kmem_cache_zalloc() shortcuts.

    Reported-by: Alexander Aring <aahringo@redhat.com>
    Link: https://lore.kernel.org/all/20221011145413.8025-1-aahringo@redhat.com/
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Signed-off-by: Audra Mitchell <audra@redhat.com>
2024-04-09 09:42:52 -04:00
Audra Mitchell 65dfaa7487 mm/slab: Annotate kmem_cache_node->list_lock as raw
JIRA: https://issues.redhat.com/browse/RHEL-27739

This patch is a backport of the following upstream commit:
commit b539ce9f1a31c442098c3f351cb4d03ba27c2720
Author: Jiri Kosina <jkosina@suse.cz>
Date:   Fri Oct 21 21:18:12 2022 +0200

    mm/slab: Annotate kmem_cache_node->list_lock as raw

    The list_lock can be taken in hardirq context when do_drain() is being
    called via IPI on all cores, and therefore lockdep complains about it,
    because it can't be preempted on PREEMPT_RT.

    That's not a real issue, as SLAB can't be built on PREEMPT_RT anyway, but
    we still want to get rid of the warning on non-PREEMPT_RT builds.

    Annotate it therefore as a raw lock in order to get rid of he lockdep
    warning below.

             =============================
             [ BUG: Invalid wait context ]
             6.1.0-rc1-00134-ge35184f32151 #4 Not tainted
             -----------------------------
             swapper/3/0 is trying to lock:
             ffff8bc88086dc18 (&parent->list_lock){..-.}-{3:3}, at: do_drain+0x57/0xb0
             other info that might help us debug this:
             context-{2:2}
             no locks held by swapper/3/0.
             stack backtrace:
             CPU: 3 PID: 0 Comm: swapper/3 Not tainted 6.1.0-rc1-00134-ge35184f32151 #4
             Hardware name: LENOVO 20K5S22R00/20K5S22R00, BIOS R0IET38W (1.16 ) 05/31/2017
             Call Trace:
              <IRQ>
              dump_stack_lvl+0x6b/0x9d
              __lock_acquire+0x1519/0x1730
              ? build_sched_domains+0x4bd/0x1590
              ? __lock_acquire+0xad2/0x1730
              lock_acquire+0x294/0x340
              ? do_drain+0x57/0xb0
              ? sched_clock_tick+0x41/0x60
              _raw_spin_lock+0x2c/0x40
              ? do_drain+0x57/0xb0
              do_drain+0x57/0xb0
              __flush_smp_call_function_queue+0x138/0x220
              __sysvec_call_function+0x4f/0x210
              sysvec_call_function+0x4b/0x90
              </IRQ>
              <TASK>
              asm_sysvec_call_function+0x16/0x20
             RIP: 0010:mwait_idle+0x5e/0x80
             Code: 31 d2 65 48 8b 04 25 80 ed 01 00 48 89 d1 0f 01 c8 48 8b 00 a8 08 75 14 66 90 0f 00 2d 0b 78 46 00 31 c0 48 89 c1 fb 0f 01 c9 <eb> 06 fb 0f 1f 44 00 00 65 48 8b 04 25 80 ed 01 00 f0 80 60 02 df
             RSP: 0000:ffffa90940217ee0 EFLAGS: 00000246
             RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
             RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff9bb9f93a
             RBP: 0000000000000003 R08: 0000000000000001 R09: 0000000000000001
             R10: ffffa90940217ea8 R11: 0000000000000000 R12: ffffffffffffffff
             R13: 0000000000000000 R14: ffff8bc88127c500 R15: 0000000000000000
              ? default_idle_call+0x1a/0xa0
              default_idle_call+0x4b/0xa0
              do_idle+0x1f1/0x2c0
              ? _raw_spin_unlock_irqrestore+0x56/0x70
              cpu_startup_entry+0x19/0x20
              start_secondary+0x122/0x150
              secondary_startup_64_no_verify+0xce/0xdb
              </TASK>

    Signed-off-by: Jiri Kosina <jkosina@suse.cz>
    Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Signed-off-by: Audra Mitchell <audra@redhat.com>
2024-04-09 09:42:49 -04:00
Jan Stancek 78042596b6 Merge: iommu: IOMMU and DMA-mapping API Updates for 9.4
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3180

# Merge Request Required Information

```
Bugzilla: https://bugzilla.redhat.com/2223717
JIRA: https://issues.redhat.com/browse/RHEL-10007
JIRA: https://issues.redhat.com/browse/RHEL-10026
JIRA: https://issues.redhat.com/browse/RHEL-10042
JIRA: https://issues.redhat.com/browse/RHEL-10094
JIRA: https://issues.redhat.com/browse/RHEL-3655
JIRA: https://issues.redhat.com/browse/RHEL-800

Depends: !3244
Depends: !3245

Omitted-fix: c7bd8a1f45ba ("iommu/apple-dart: Handle DMA_FQ domains in attach_dev()")
             - Apple Dart not supported

Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

Testing: A mix of fio jobs, and various stress-ng io stressors (--hdd, --readahead, --aio, --aiol, --seek,
         --sync_file) run with strict and lazy translation modes on amd, intel, and arm systems. pgtbl_v2
         tested on AMD Genoa host

Conflicts: Should be noted in individual commits. In particular one upstream merge in 6.4, 58390c8ce1bd, had a rather
           messy merge conflict resolution set, so a number of commits have those cleanups added in here.
```

## Summary of Changes

```
        Rebase through v6.5 with a good portion of v6.6 as well (minus the
	dynamic swiotlb mempool support, per numa dma cma support, and arm
	+ mm tlb invalidate changes). For iommufd changes there are
	backports of the underlying functionality in iommufd, but I have left
	the vfio commits that will eventually make use of it for Alex.

Highlights
	* AMD GA Log Overflow refactor and PPR Log support
	* AMD v2 page table support
	* AMD v2 5 level guest page table support
	* Various cleanups and fixes
	* Sync ipmmu-vmsa in preparation for Renesas support  (config not enabled)
	* Continuation of swiotlb rework
	* Continuation of the refactor of core iommu code as part of SVA, iommufd, and pasid support work
	* Continuation of the iommufd prep work (config still not enabled)
	* Support for bounce buffer usage with non cache-line aligned kmallocs on arm64
	* Clean up of in-kernel pasid use for vt-d
	* More cleanup of BUG_ON and warning use in vt-d

        This is based on top of MR !2843 and !3158.
```

Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com>

## Approved Development Ticket
All submissions to CentOS Stream must reference an approved ticket in [Red Hat Jira](https://issues.redhat.com/). Please follow the CentOS Stream [contribution documentation](https://docs.centos.org/en-US/stream-contrib/quickstart/) for how to file this ticket and have it approved.

Approved-by: John W. Linville <linville@redhat.com>
Approved-by: Tony Camuso <tcamuso@redhat.com>
Approved-by: David Arcari <darcari@redhat.com>
Approved-by: Mika Penttilä <mpenttil@redhat.com>
Approved-by: Chris von Recklinghausen <crecklin@redhat.com>
Approved-by: Eric Auger <eric.auger@redhat.com>
Approved-by: Mark Salter <msalter@redhat.com>
Approved-by: Donald Dutile <ddutile@redhat.com>

Signed-off-by: Jan Stancek <jstancek@redhat.com>
2023-11-19 15:53:34 +01:00
Paolo Bonzini 538bf6f332 mm, treewide: redefine MAX_ORDER sanely
JIRA: https://issues.redhat.com/browse/RHEL-10059

MAX_ORDER currently defined as number of orders page allocator supports:
user can ask buddy allocator for page order between 0 and MAX_ORDER-1.

This definition is counter-intuitive and lead to number of bugs all over
the kernel.

Change the definition of MAX_ORDER to be inclusive: the range of orders
user can ask from buddy allocator is 0..MAX_ORDER now.

[kirill@shutemov.name: fix min() warning]
  Link: https://lkml.kernel.org/r/20230315153800.32wib3n5rickolvh@box
[akpm@linux-foundation.org: fix another min_t warning]
[kirill@shutemov.name: fixups per Zi Yan]
  Link: https://lkml.kernel.org/r/20230316232144.b7ic4cif4kjiabws@box.shutemov.name
[akpm@linux-foundation.org: fix underlining in docs]
  Link: https://lore.kernel.org/oe-kbuild-all/202303191025.VRCTk6mP-lkp@intel.com/
Link: https://lkml.kernel.org/r/20230315113133.11326-11-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Michael Ellerman <mpe@ellerman.id.au>	[powerpc]
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 23baf831a32c04f9a968812511540b1b3e648bf5)

[RHEL: Fix conflicts by changing MAX_ORDER - 1 to MAX_ORDER,
       ">= MAX_ORDER" to "> MAX_ORDER", etc.]

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-30 09:12:37 +01:00
Jerry Snitselaar cca24b2885 mm/slab: simplify create_kmalloc_cache() args and make it static
JIRA: https://issues.redhat.com/browse/RHEL-10094
Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Conflicts: Context diff at beginning of new_kmalloc_cache().

commit 0c474d31a6378f20cbe83f62d4177ebdc099c7fc
Author: Catalin Marinas <catalin.marinas@arm.com>
Date:   Mon Jun 12 16:31:47 2023 +0100

    mm/slab: simplify create_kmalloc_cache() args and make it static

    In the slab variant of kmem_cache_init(), call new_kmalloc_cache() instead
    of initialising the kmalloc_caches array directly.  With this,
    create_kmalloc_cache() is now only called from new_kmalloc_cache() in the
    same file, so make it static.  In addition, the useroffset argument is
    always 0 while usersize is the same as size.  Remove them.

    Link: https://lkml.kernel.org/r/20230612153201.554742-4-catalin.marinas@arm.com
    Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
    Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
    Tested-by: Isaac J. Manjarres <isaacmanjarres@google.com>
    Cc: Alasdair Kergon <agk@redhat.com>
    Cc: Ard Biesheuvel <ardb@kernel.org>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Daniel Vetter <daniel@ffwll.ch>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Cc: Herbert Xu <herbert@gondor.apana.org.au>
    Cc: Jerry Snitselaar <jsnitsel@redhat.com>
    Cc: Joerg Roedel <joro@8bytes.org>
    Cc: Jonathan Cameron <jic23@kernel.org>
    Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Cc: Lars-Peter Clausen <lars@metafoo.de>
    Cc: Logan Gunthorpe <logang@deltatee.com>
    Cc: Marc Zyngier <maz@kernel.org>
    Cc: Mark Brown <broonie@kernel.org>
    Cc: Mike Snitzer <snitzer@kernel.org>
    Cc: "Rafael J. Wysocki" <rafael@kernel.org>
    Cc: Robin Murphy <robin.murphy@arm.com>
    Cc: Saravana Kannan <saravanak@google.com>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

(cherry picked from commit 0c474d31a6378f20cbe83f62d4177ebdc099c7fc)
Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com>
2023-10-27 01:26:58 -07:00
Chris von Recklinghausen 8cc9c44a1f mm/slub: only zero requested size of buffer for kzalloc when debug enabled
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 9ce67395f5a0cdec6ce152d26bfda13b98b25c01
Author: Feng Tang <feng.tang@intel.com>
Date:   Fri Oct 21 11:24:03 2022 +0800

    mm/slub: only zero requested size of buffer for kzalloc when debug enabled

    kzalloc/kmalloc will round up the request size to a fixed size
    (mostly power of 2), so the allocated memory could be more than
    requested. Currently kzalloc family APIs will zero all the
    allocated memory.

    To detect out-of-bound usage of the extra allocated memory, only
    zero the requested part, so that redzone sanity check could be
    added to the extra space later.

    For kzalloc users who will call ksize() later and utilize this
    extra space, please be aware that the space is not zeroed any
    more when debug is enabled. (Thanks to Kees Cook's effort to
    sanitize all ksize() user cases [1], this won't be a big issue).

    [1]. https://lore.kernel.org/all/20220922031013.2150682-1-keescook@chromium.org/#r

    Signed-off-by: Feng Tang <feng.tang@intel.com>
    Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:14 -04:00
Chris von Recklinghausen 1f619343f6 treewide: use get_random_u32() when possible
Conflicts:
	drivers/gpu/drm/tests/drm_buddy_test.c
	drivers/gpu/drm/tests/drm_mm_test.c - We already have
		ce28ab1380e8 ("drm/tests: Add back seed value information")
		so keep calls to kunit_info.
	drop changes to drivers/misc/habanalabs/gaudi2/gaudi2.c
		fs/ntfs3/fslog.c - files not in CS9
	net/sunrpc/auth_gss/gss_krb5_wrap.c - We already have
		7f675ca7757b ("SUNRPC: Improve Kerberos confounder generation")
		so code to change is gone.
	drivers/gpu/drm/i915/i915_gem_gtt.c
	drivers/gpu/drm/i915/selftests/i915_selftest.c
	drivers/gpu/drm/tests/drm_buddy_test.c
	drivers/gpu/drm/tests/drm_mm_test.c
		change added under
		4cb818386e ("Merge DRM changes from upstream v6.0.8..v6.1")

JIRA: https://issues.redhat.com/browse/RHEL-1848

commit a251c17aa558d8e3128a528af5cf8b9d7caae4fd
Author: Jason A. Donenfeld <Jason@zx2c4.com>
Date:   Wed Oct 5 17:43:22 2022 +0200

    treewide: use get_random_u32() when possible

    The prandom_u32() function has been a deprecated inline wrapper around
    get_random_u32() for several releases now, and compiles down to the
    exact same code. Replace the deprecated wrapper with a direct call to
    the real function. The same also applies to get_random_int(), which is
    just a wrapper around get_random_u32(). This was done as a basic find
    and replace.

    Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Reviewed-by: Kees Cook <keescook@chromium.org>
    Reviewed-by: Yury Norov <yury.norov@gmail.com>
    Reviewed-by: Jan Kara <jack@suse.cz> # for ext4
    Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake
    Acked-by: Chuck Lever <chuck.lever@oracle.com> # for nfsd
    Acked-by: Jakub Kicinski <kuba@kernel.org>
    Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> # for thunderbol
t
    Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
    Acked-by: Helge Deller <deller@gmx.de> # for parisc
    Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390
    Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:03 -04:00
Chris von Recklinghausen 7ef6d47fef mm/slab: use kmalloc_node() for off slab freelist_idx_t array allocation
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit e36ce448a08d43de69e7449eb225805a7a8addf8
Author: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Date:   Sat Oct 15 13:34:29 2022 +0900

    mm/slab: use kmalloc_node() for off slab freelist_idx_t array allocation

    After commit d6a71648dbc0 ("mm/slab: kmalloc: pass requests larger than
    order-1 page to page allocator"), SLAB passes large ( > PAGE_SIZE * 2)
    requests to buddy like SLUB does.

    SLAB has been using kmalloc caches to allocate freelist_idx_t array for
    off slab caches. But after the commit, freelist_size can be bigger than
    KMALLOC_MAX_CACHE_SIZE.

    Instead of using pointer to kmalloc cache, use kmalloc_node() and only
    check if the kmalloc cache is off slab during calculate_slab_order().
    If freelist_size > KMALLOC_MAX_CACHE_SIZE, no looping condition happens
    as it allocates freelist_idx_t array directly from buddy.

    Link: https://lore.kernel.org/all/20221014205818.GA1428667@roeck-us.net/
    Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net>
    Fixes: d6a71648dbc0 ("mm/slab: kmalloc: pass requests larger than order-1 page to page allocator")
    Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:02 -04:00
Chris von Recklinghausen a54b2a2fb0 mm/slab_common: drop kmem_alloc & avoid dereferencing fields when not using
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 2c1d697fb8ba6d2d44f914d4268ae1ccdf025f1b
Author: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Date:   Wed Aug 17 19:18:24 2022 +0900

    mm/slab_common: drop kmem_alloc & avoid dereferencing fields when not using

    Drop kmem_alloc event class, and define kmalloc and kmem_cache_alloc
    using TRACE_EVENT() macro.

    And then this patch does:
       - Do not pass pointer to struct kmem_cache to trace_kmalloc.
         gfp flag is enough to know if it's accounted or not.
       - Avoid dereferencing s->object_size and s->size when not using kmem_cache_alloc event.
       - Avoid dereferencing s->name in when not using kmem_cache_free event.
       - Adjust s->size to SLOB_UNITS(s->size) * SLOB_UNIT in SLOB

    Cc: Vasily Averin <vasily.averin@linux.dev>
    Suggested-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:13:19 -04:00
Chris von Recklinghausen 3af5982dd6 mm/slab_common: unify NUMA and UMA version of tracepoints
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 11e9734bcb6a7361943f993eba4e97f5812120d8
Author: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Date:   Wed Aug 17 19:18:23 2022 +0900

    mm/slab_common: unify NUMA and UMA version of tracepoints

    Drop kmem_alloc event class, rename kmem_alloc_node to kmem_alloc, and
    remove _node postfix for NUMA version of tracepoints.

    This will break some tools that depend on {kmem_cache_alloc,kmalloc}_node,
    but at this point maintaining both kmem_alloc and kmem_alloc_node
    event classes does not makes sense at all.

    Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:13:18 -04:00
Chris von Recklinghausen b228dc7f49 mm/sl[au]b: cleanup kmem_cache_alloc[_node]_trace()
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 26a40990ba052e6f553256f9d0f112452b992a38
Author: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Date:   Wed Aug 17 19:18:22 2022 +0900

    mm/sl[au]b: cleanup kmem_cache_alloc[_node]_trace()

    Despite its name, kmem_cache_alloc[_node]_trace() is hook for inlined
    kmalloc. So rename it to kmalloc[_node]_trace().

    Move its implementation to slab_common.c by using
    __kmem_cache_alloc_node(), but keep CONFIG_TRACING=n varients to save a
    function call when CONFIG_TRACING=n.

    Use __assume_kmalloc_alignment for kmalloc[_node]_trace instead of
    __assume_slab_alignement. Generally kmalloc has larger alignment
    requirements.

    Suggested-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:13:18 -04:00
Chris von Recklinghausen 7a80abf490 mm/sl[au]b: generalize kmalloc subsystem
Conflicts: We already have
	05a940656e1e ("slab: Introduce kmalloc_size_roundup()")
	so there is a difference in deleted code (comments).

JIRA: https://issues.redhat.com/browse/RHEL-1848

commit b14051352465a24b3c9ceaccac4e39b3521bb370
Author: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Date:   Wed Aug 17 19:18:21 2022 +0900

    mm/sl[au]b: generalize kmalloc subsystem

    Now everything in kmalloc subsystem can be generalized.
    Let's do it!

    Generalize __do_kmalloc_node(), __kmalloc_node_track_caller(),
    kfree(), __ksize(), __kmalloc(), __kmalloc_node() and move them
    to slab_common.c.

    In the meantime, rename kmalloc_large_node_notrace()
    to __kmalloc_large_node() and make it static as it's now only called in
    slab_common.c.

    [ feng.tang@intel.com: adjust kfence skip list to include
      __kmem_cache_free so that kfence kunit tests do not fail ]

    Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:13:18 -04:00
Chris von Recklinghausen 816794f3cb mm/sl[au]b: introduce common alloc/free functions without tracepoint
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit ed4cd17eb26d7f0c6a762608a3f30870929fbcdd
Author: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Date:   Wed Aug 17 19:18:20 2022 +0900

    mm/sl[au]b: introduce common alloc/free functions without tracepoint

    To unify kmalloc functions in later patch, introduce common alloc/free
    functions that does not have tracepoint.

    Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:13:17 -04:00
Chris von Recklinghausen 6b205383b1 mm/slab: kmalloc: pass requests larger than order-1 page to page allocator
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit d6a71648dbc0ca5520cba16a8fdce8d37ae74218
Author: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Date:   Wed Aug 17 19:18:19 2022 +0900

    mm/slab: kmalloc: pass requests larger than order-1 page to page allocator

    There is not much benefit for serving large objects in kmalloc().
    Let's pass large requests to page allocator like SLUB for better
    maintenance of common code.

    Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:13:17 -04:00
Chris von Recklinghausen fe40a3e4cd mm/sl[au]b: factor out __do_kmalloc_node()
Conflicts: mm/slub.c - We already have
	5373b8a09d6e ("kasan: call kasan_malloc() from __kmalloc_*track_caller()")
	so there is a difference in deleted code

JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 0f853b2e6dd9580103484a098e9c973a67d127ac
Author: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Date:   Wed Aug 17 19:18:14 2022 +0900

    mm/sl[au]b: factor out __do_kmalloc_node()

    __kmalloc(), __kmalloc_node(), __kmalloc_node_track_caller()
    mostly do same job. Factor out common code into __do_kmalloc_node().

    Note that this patch also fixes missing kasan_kmalloc() in SLUB's
    __kmalloc_node_track_caller().

    Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:13:15 -04:00
Chris von Recklinghausen d40ac277c8 mm/slab_common: cleanup kmalloc_track_caller()
Conflicts: mm/slub.c - We already have
	5373b8a09d6e ("kasan: call kasan_malloc() from __kmalloc_*track_caller()")
	so there is a difference in deleted code.

JIRA: https://issues.redhat.com/browse/RHEL-1848

commit c45248db04f8e3aca4798d67a394fb9cc2168118
Author: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Date:   Wed Aug 17 19:18:13 2022 +0900

    mm/slab_common: cleanup kmalloc_track_caller()

    Make kmalloc_track_caller() wrapper of kmalloc_node_track_caller().

    Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:13:15 -04:00
Chris von Recklinghausen 22099a7033 mm/slab_common: remove CONFIG_NUMA ifdefs for common kmalloc functions
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit f78a03f6e28be0283f73d3c18b54837b638a8ccf
Author: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Date:   Wed Aug 17 19:18:12 2022 +0900

    mm/slab_common: remove CONFIG_NUMA ifdefs for common kmalloc functions

    Now that slab_alloc_node() is available for SLAB when CONFIG_NUMA=n,
    remove CONFIG_NUMA ifdefs for common kmalloc functions.

    Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:13:15 -04:00
Chris von Recklinghausen 425c969bfe mm/slab: cleanup slab_alloc() and slab_alloc_node()
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 07588d726f8d320215dcf6c79a28fe6b1bab6255
Author: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Date:   Wed Aug 17 19:18:11 2022 +0900

    mm/slab: cleanup slab_alloc() and slab_alloc_node()

    Make slab_alloc_node() available even when CONFIG_NUMA=n and
    make slab_alloc() wrapper of slab_alloc_node().

    This is necessary for further cleanup.

    Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:13:14 -04:00
Chris von Recklinghausen 2a274f2047 mm/slab: move NUMA-related code to __do_cache_alloc()
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit c31a910c74ed558461dc7eecf6168ccf805775ec
Author: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Date:   Wed Aug 17 19:18:10 2022 +0900

    mm/slab: move NUMA-related code to __do_cache_alloc()

    To implement slab_alloc_node() independent of NUMA configuration,
    move NUMA fallback/alternate allocation code into __do_cache_alloc().

    One functional change here is not to check availability of node
    when allocating from local node.

    Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:13:14 -04:00
Chris von Recklinghausen b15982fc79 mm/sl[au]b: use own bulk free function when bulk alloc failed
Bugzilla: https://bugzilla.redhat.com/2160210

commit 2055e67bb6a8fbb6aabdb9536443688ef52456c4
Author: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Date:   Wed Jun 15 00:26:34 2022 +0900

    mm/sl[au]b: use own bulk free function when bulk alloc failed

    There is no benefit to call generic bulk free function when
    kmem_cache_alloc_bulk() failed. Use own kmem_cache_free_bulk()
    instead of generic function.

    Note that if kmem_cache_alloc_bulk() fails to allocate first object in
    SLUB, size is zero. So allow passing size == 0 to kmem_cache_free_bulk()
    like SLAB's.

    Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:27 -04:00
Chris von Recklinghausen b7497cd088 mm: slab: optimize memcg_slab_free_hook()
Bugzilla: https://bugzilla.redhat.com/2160210

commit b77d5b1b83e3e14870224de7c63f115a2dc44e9a
Author: Muchun Song <songmuchun@bytedance.com>
Date:   Fri Apr 29 20:30:44 2022 +0800

    mm: slab: optimize memcg_slab_free_hook()

    Most callers of memcg_slab_free_hook() already know the slab,  which could
    be passed to memcg_slab_free_hook() directly to reduce the overhead of an
    another call of virt_to_slab().  For bulk freeing of objects, the call of
    slab_objcgs() in the loop in memcg_slab_free_hook() is redundant as well.
    Rework memcg_slab_free_hook() and build_detached_freelist() to reduce
    those unnecessary overhead and make memcg_slab_free_hook() can handle bulk
    freeing in slab_free().

    Move the calling site of memcg_slab_free_hook() from do_slab_free() to
    slab_free() for slub to make the code clearer since the logic is weird
    (e.g. the caller need to judge whether it needs to call
    memcg_slab_free_hook()). It is easy to make mistakes like missing calling
    of memcg_slab_free_hook() like fixes of:

      commit d1b2cf6cb8 ("mm: memcg/slab: uncharge during kmem_cache_free_bulk()")
      commit ae085d7f9365 ("mm: kfence: fix missing objcg housekeeping for SLAB")

    This optimization is mainly for bulk objects freeing.  The following numbers
    is shown for 16-object freeing.

                               before      after
      kmem_cache_free_bulk:   ~430 ns     ~400 ns

    The overhead is reduced by about 7% for 16-object freeing.

    Signed-off-by: Muchun Song <songmuchun@bytedance.com>
    Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    Link: https://lore.kernel.org/r/20220429123044.37885-1-songmuchun@bytedance.com
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:21 -04:00
Chris von Recklinghausen a664e283b8 mm/tracing: add 'accounted' entry into output of allocation tracepoints
Bugzilla: https://bugzilla.redhat.com/2160210

commit b347aa7b57477f71c740e2bbc6d1078a7109ba23
Author: Vasily Averin <vvs@openvz.org>
Date:   Fri Jun 3 06:21:49 2022 +0300

    mm/tracing: add 'accounted' entry into output of allocation tracepoints

    Slab caches marked with SLAB_ACCOUNT force accounting for every
    allocation from this cache even if __GFP_ACCOUNT flag is not passed.
    Unfortunately, at the moment this flag is not visible in ftrace output,
    and this makes it difficult to analyze the accounted allocations.

    This patch adds boolean "accounted" entry into trace output,
    and set it to 'true' for calls used __GFP_ACCOUNT flag and
    for allocations from caches marked with SLAB_ACCOUNT.
    Set it to 'false' if accounting is disabled in configs.

    Signed-off-by: Vasily Averin <vvs@openvz.org>
    Acked-by: Shakeel Butt <shakeelb@google.com>
    Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
    Acked-by: Muchun Song <songmuchun@bytedance.com>
    Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    Link: https://lore.kernel.org/r/c418ed25-65fe-f623-fbf8-1676528859ed@openvz.org
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:21 -04:00
Chris von Recklinghausen 038177e235 mm, slab: fix bad alignments
Bugzilla: https://bugzilla.redhat.com/2160210

commit d1ca263d0d518b4918473768aee0cfb2770014bc
Author: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Date:   Thu Jun 9 12:01:32 2022 +0800

    mm, slab: fix bad alignments

    As reported by coccicheck:

    ./mm/slab.c:3253:2-59: code aligned with following code on line 3255.

    Reported-by: Abaci Robot <abaci@linux.alibaba.com>
    Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
    Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    Acked-by: David Rientjes <rientjes@google.com>
    Reviewed-by: Muchun Song <songmuchun@bytedance.com>
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:21 -04:00
Chris von Recklinghausen 4004d229b5 mm/slab: delete cache_alloc_debugcheck_before()
Bugzilla: https://bugzilla.redhat.com/2160210

commit a3967244430eb91698ac8dca7db8bd0871251305
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Sun Jun 5 17:25:38 2022 +0200

    mm/slab: delete cache_alloc_debugcheck_before()

    It only does a might_sleep_if(GFP_RECLAIM) check, which is already covered
    by the might_alloc() in slab_pre_alloc_hook().  And all callers of
    cache_alloc_debugcheck_before() call that beforehand already.

    Link: https://lkml.kernel.org/r/20220605152539.3196045-2-daniel.vetter@ffwll.ch
    Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: Pekka Enberg <penberg@kernel.org>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Roman Gushchin <roman.gushchin@linux.dev>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:14 -04:00
Chris von Recklinghausen 83549ef6f8 mm/slab.c: fix comments
Bugzilla: https://bugzilla.redhat.com/2160210

commit a8f23dd166651dcda2c02f16e524f56a4bd49084
Author: Yixuan Cao <caoyixuan2019@email.szu.edu.cn>
Date:   Thu Apr 7 16:09:58 2022 +0800

    mm/slab.c: fix comments

    While reading the source code,
    I noticed some language errors in the comments, so I fixed them.

    Signed-off-by: Yixuan Cao <caoyixuan2019@email.szu.edu.cn>
    Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Link: https://lore.kernel.org/r/20220407080958.3667-1-caoyixuan2019@email.szu.edu.cn

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:18:50 -04:00
Chris von Recklinghausen 5d60d4d004 mm/slab: remove some unused functions
Bugzilla: https://bugzilla.redhat.com/2160210

commit 1e703d0548e0a2766e198c64797737d50349f46e
Author: Miaohe Lin <linmiaohe@huawei.com>
Date:   Tue Mar 22 17:14:21 2022 +0800

    mm/slab: remove some unused functions

    alternate_node_alloc and ____cache_alloc_node are always called when
    CONFIG_NUMA. So we can remove the unused !CONFIG_NUMA variant. Also
    forward declaration for alternate_node_alloc is unnecessary. Remove
    it too.

    [ vbabka@suse.cz: move ____cache_alloc_node() declaration closer to
      its callers ]

    Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Link: https://lore.kernel.org/r/20220322091421.25285-1-linmiaohe@huawei.com

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:18:49 -04:00
Mark Salter 53b03ecaba mm: make minimum slab alignment a runtime property
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2122232

commit d949a8155d139aa890795b802004a196b7f00598
Author: Peter Collingbourne <pcc@google.com>
Date: Mon, 9 May 2022 18:20:53 -0700

    When CONFIG_KASAN_HW_TAGS is enabled we currently increase the minimum
    slab alignment to 16.  This happens even if MTE is not supported in
    hardware or disabled via kasan=off, which creates an unnecessary memory
    overhead in those cases.  Eliminate this overhead by making the minimum
    slab alignment a runtime property and only aligning to 16 if KASAN is
    enabled at runtime.

    On a DragonBoard 845c (non-MTE hardware) with a kernel built with
    CONFIG_KASAN_HW_TAGS, waiting for quiescence after a full Android boot I
    see the following Slab measurements in /proc/meminfo (median of 3
    reboots):

    Before: 169020 kB
    After:  167304 kB

    [akpm@linux-foundation.org: make slab alignment type `unsigned int' to avoid casting]
    Link: https://linux-review.googlesource.com/id/I752e725179b43b144153f4b6f584ceb646473ead
    Link: https://lkml.kernel.org/r/20220427195820.1716975-2-pcc@google.com
    Signed-off-by: Peter Collingbourne <pcc@google.com>
    Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
    Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    Acked-by: David Rientjes <rientjes@google.com>
    Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Cc: Pekka Enberg <penberg@kernel.org>
    Cc: Roman Gushchin <roman.gushchin@linux.dev>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Herbert Xu <herbert@gondor.apana.org.au>
    Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
    Cc: Alexander Potapenko <glider@google.com>
    Cc: Dmitry Vyukov <dvyukov@google.com>
    Cc: Eric W. Biederman <ebiederm@xmission.com>
    Cc: Kees Cook <keescook@chromium.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Mark Salter <msalter@redhat.com>
2023-01-28 11:34:57 -05:00
Michal Schmidt de2f4dee96 slab: Introduce kmalloc_size_roundup()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2143368

commit 05a940656e1eb2026d9ee31019d5b47e9545124d
Author: Kees Cook <keescook@chromium.org>
Date:   Fri Sep 23 13:28:08 2022 -0700

    slab: Introduce kmalloc_size_roundup()

    In the effort to help the compiler reason about buffer sizes, the
    __alloc_size attribute was added to allocators. This improves the scope
    of the compiler's ability to apply CONFIG_UBSAN_BOUNDS and (in the near
    future) CONFIG_FORTIFY_SOURCE. For most allocations, this works well,
    as the vast majority of callers are not expecting to use more memory
    than what they asked for.

    There is, however, one common exception to this: anticipatory resizing
    of kmalloc allocations. These cases all use ksize() to determine the
    actual bucket size of a given allocation (e.g. 128 when 126 was asked
    for). This comes in two styles in the kernel:

    1) An allocation has been determined to be too small, and needs to be
       resized. Instead of the caller choosing its own next best size, it
       wants to minimize the number of calls to krealloc(), so it just uses
       ksize() plus some additional bytes, forcing the realloc into the next
       bucket size, from which it can learn how large it is now. For example:

            data = krealloc(data, ksize(data) + 1, gfp);
            data_len = ksize(data);

    2) The minimum size of an allocation is calculated, but since it may
       grow in the future, just use all the space available in the chosen
       bucket immediately, to avoid needing to reallocate later. A good
       example of this is skbuff's allocators:

            data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc);
            ...
            /* kmalloc(size) might give us more room than requested.
             * Put skb_shared_info exactly at the end of allocated zone,
             * to allow max possible filling before reallocation.
             */
            osize = ksize(data);
            size = SKB_WITH_OVERHEAD(osize);

    In both cases, the "how much was actually allocated?" question is answered
    _after_ the allocation, where the compiler hinting is not in an easy place
    to make the association any more. This mismatch between the compiler's
    view of the buffer length and the code's intention about how much it is
    going to actually use has already caused problems[1]. It is possible to
    fix this by reordering the use of the "actual size" information.

    We can serve the needs of users of ksize() and still have accurate buffer
    length hinting for the compiler by doing the bucket size calculation
    _before_ the allocation. Code can instead ask "how large an allocation
    would I get for a given size?".

    Introduce kmalloc_size_roundup(), to serve this function so we can start
    replacing the "anticipatory resizing" uses of ksize().

    [1] https://github.com/ClangBuiltLinux/linux/issues/1599
        https://github.com/KSPP/linux/issues/183

    [ vbabka@suse.cz: add SLOB version ]

    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: Pekka Enberg <penberg@kernel.org>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: linux-mm@kvack.org
    Signed-off-by: Kees Cook <keescook@chromium.org>
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
2022-11-22 16:08:59 +01:00
Chris von Recklinghausen 9258ee6d65 mm, kfence: support kmem_dump_obj() for KFENCE objects
Bugzilla: https://bugzilla.redhat.com/2120352

commit 2dfe63e61cc31ee59ce951672b0850b5229cd5b0
Author: Marco Elver <elver@google.com>
Date:   Thu Apr 14 19:13:40 2022 -0700

    mm, kfence: support kmem_dump_obj() for KFENCE objects

    Calling kmem_obj_info() via kmem_dump_obj() on KFENCE objects has been
    producing garbage data due to the object not actually being maintained
    by SLAB or SLUB.

    Fix this by implementing __kfence_obj_info() that copies relevant
    information to struct kmem_obj_info when the object was allocated by
    KFENCE; this is called by a common kmem_obj_info(), which also calls the
    slab/slub/slob specific variant now called __kmem_obj_info().

    For completeness, kmem_dump_obj() now displays if the object was
    allocated by KFENCE.

    Link: https://lore.kernel.org/all/20220323090520.GG16885@xsang-OptiPlex-9020/
    Link: https://lkml.kernel.org/r/20220406131558.3558585-1-elver@google.com
    Fixes: b89fb5ef0c ("mm, kfence: insert KFENCE hooks for SLUB")
    Fixes: d3fb45f370 ("mm, kfence: insert KFENCE hooks for SLAB")
    Signed-off-by: Marco Elver <elver@google.com>
    Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    Reported-by: kernel test robot <oliver.sang@intel.com>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>      [slab]
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2022-10-12 07:28:06 -04:00
Chris von Recklinghausen 241d3da5fe mm: kfence: fix missing objcg housekeeping for SLAB
Bugzilla: https://bugzilla.redhat.com/2120352

commit ae085d7f9365de7da27ab5c0d16b12d51ea7fca9
Author: Muchun Song <songmuchun@bytedance.com>
Date:   Sun Mar 27 13:18:52 2022 +0800

    mm: kfence: fix missing objcg housekeeping for SLAB

    The objcg is not cleared and put for kfence object when it is freed,
    which could lead to memory leak for struct obj_cgroup and wrong
    statistics of NR_SLAB_RECLAIMABLE_B or NR_SLAB_UNRECLAIMABLE_B.

    Since the last freed object's objcg is not cleared,
    mem_cgroup_from_obj() could return the wrong memcg when this kfence
    object, which is not charged to any objcgs, is reallocated to other
    users.

    A real word issue [1] is caused by this bug.

    Link: https://lore.kernel.org/all/000000000000cabcb505dae9e577@google.com/ [1]
    Reported-by: syzbot+f8c45ccc7d5d45fc5965@syzkaller.appspotmail.com
    Fixes: d3fb45f370 ("mm, kfence: insert KFENCE hooks for SLAB")
    Signed-off-by: Muchun Song <songmuchun@bytedance.com>
    Cc: Dmitry Vyukov <dvyukov@google.com>
    Cc: Marco Elver <elver@google.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2022-10-12 07:28:03 -04:00
Chris von Recklinghausen 9d61cca226 mm/kasan: Convert to struct folio and struct slab
Bugzilla: https://bugzilla.redhat.com/2120352

commit 6e48a966dfd18987fec9385566a67d36e2b5fc11
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Mon Oct 4 14:46:46 2021 +0100

    mm/kasan: Convert to struct folio and struct slab

    KASAN accesses some slab related struct page fields so we need to
    convert it to struct slab. Some places are a bit simplified thanks to
    kasan_addr_to_slab() encapsulating the PageSlab flag check through
    virt_to_slab().  When resolving object address to either a real slab or
    a large kmalloc, use struct folio as the intermediate type for testing
    the slab flag to avoid unnecessary implicit compound_head().

    [ vbabka@suse.cz: use struct folio, adjust to differences in previous
      patches ]

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
    Reviewed-by: Roman Gushchin <guro@fb.com>
    Tested-by: Hyeongogn Yoo <42.hyeyoo@gmail.com>
    Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
    Cc: Alexander Potapenko <glider@google.com>
    Cc: Andrey Konovalov <andreyknvl@gmail.com>
    Cc: Dmitry Vyukov <dvyukov@google.com>
    Cc: <kasan-dev@googlegroups.com>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2022-10-12 07:27:35 -04:00
Chris von Recklinghausen c6690c8c61 mm: Convert struct page to struct slab in functions used by other subsystems
Bugzilla: https://bugzilla.redhat.com/2120352

commit 40f3bf0cb04c91d33531b1b95788ad2f0e4062cf
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Tue Nov 2 15:42:04 2021 +0100

    mm: Convert struct page to struct slab in functions used by other subsystems

    KASAN, KFENCE and memcg interact with SLAB or SLUB internals through
    functions nearest_obj(), obj_to_index() and objs_per_slab() that use
    struct page as parameter. This patch converts it to struct slab
    including all callers, through a coccinelle semantic patch.

    // Options: --include-headers --no-includes --smpl-spacing include/linux/slab_def.h include/linux/slub_def.h mm/slab.h mm/kasan/*.c mm/kfence/kfence_test.c mm/memcontrol.c mm/slab.c mm/slub.c
    // Note: needs coccinelle 1.1.1 to avoid breaking whitespace

    @@
    @@

    -objs_per_slab_page(
    +objs_per_slab(
     ...
     )
     { ... }

    @@
    @@

    -objs_per_slab_page(
    +objs_per_slab(
     ...
     )

    @@
    identifier fn =~ "obj_to_index|objs_per_slab";
    @@

     fn(...,
    -   const struct page *page
    +   const struct slab *slab
        ,...)
     {
    <...
    (
    - page_address(page)
    + slab_address(slab)
    |
    - page
    + slab
    )
    ...>
     }

    @@
    identifier fn =~ "nearest_obj";
    @@

     fn(...,
    -   struct page *page
    +   const struct slab *slab
        ,...)
     {
    <...
    (
    - page_address(page)
    + slab_address(slab)
    |
    - page
    + slab
    )
    ...>
     }

    @@
    identifier fn =~ "nearest_obj|obj_to_index|objs_per_slab";
    expression E;
    @@

     fn(...,
    (
    - slab_page(E)
    + E
    |
    - virt_to_page(E)
    + virt_to_slab(E)
    |
    - virt_to_head_page(E)
    + virt_to_slab(E)
    |
    - page
    + page_slab(page)
    )
      ,...)

    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
    Reviewed-by: Roman Gushchin <guro@fb.com>
    Acked-by: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Julia Lawall <julia.lawall@inria.fr>
    Cc: Luis Chamberlain <mcgrof@kernel.org>
    Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
    Cc: Alexander Potapenko <glider@google.com>
    Cc: Andrey Konovalov <andreyknvl@gmail.com>
    Cc: Dmitry Vyukov <dvyukov@google.com>
    Cc: Marco Elver <elver@google.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
    Cc: <kasan-dev@googlegroups.com>
    Cc: <cgroups@vger.kernel.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2022-10-12 07:27:35 -04:00
Chris von Recklinghausen a9b569137b mm/slab: Convert most struct page to struct slab by spatch
Bugzilla: https://bugzilla.redhat.com/2120352

commit 7981e67efb85908d9c4924c8e6669c5d5fe365b7
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Tue Nov 2 13:23:10 2021 +0100

    mm/slab: Convert most struct page to struct slab by spatch

    The majority of conversion from struct page to struct slab in SLAB
    internals can be delegated to a coccinelle semantic patch. This includes
    renaming of variables with 'page' in name to 'slab', and similar.

    Big thanks to Julia Lawall and Luis Chamberlain for help with
    coccinelle.

    // Options: --include-headers --no-includes --smpl-spacing mm/slab.c
    // Note: needs coccinelle 1.1.1 to avoid breaking whitespace, and ocaml for the
    // embedded script

    // build list of functions for applying the next rule
    @initialize:ocaml@
    @@

    let ok_function p =
      not (List.mem (List.hd p).current_element ["kmem_getpages";"kmem_freepages"])

    // convert the type in selected functions
    @@
    position p : script:ocaml() { ok_function p };
    @@

    - struct page@p
    + struct slab

    @@
    @@

    -PageSlabPfmemalloc(page)
    +slab_test_pfmemalloc(slab)

    @@
    @@

    -ClearPageSlabPfmemalloc(page)
    +slab_clear_pfmemalloc(slab)

    @@
    @@

    obj_to_index(
     ...,
    - page
    + slab_page(slab)
    ,...)

    // for all functions, change any "struct slab *page" parameter to "struct slab
    // *slab" in the signature, and generally all occurences of "page" to "slab" in
    // the body - with some special cases.
    @@
    identifier fn;
    expression E;
    @@

     fn(...,
    -   struct slab *page
    +   struct slab *slab
        ,...)
     {
    <...
    (
    - int page_node;
    + int slab_node;
    |
    - page_node
    + slab_node
    |
    - page_slab(page)
    + slab
    |
    - page_address(page)
    + slab_address(slab)
    |
    - page_size(page)
    + slab_size(slab)
    |
    - page_to_nid(page)
    + slab_nid(slab)
    |
    - virt_to_head_page(E)
    + virt_to_slab(E)
    |
    - page
    + slab
    )
    ...>
     }

    // rename a function parameter
    @@
    identifier fn;
    expression E;
    @@

     fn(...,
    -   int page_node
    +   int slab_node
        ,...)
     {
    <...
    - page_node
    + slab_node
    ...>
     }

    // functions converted by previous rules that were temporarily called using
    // slab_page(E) so we want to remove the wrapper now that they accept struct
    // slab ptr directly
    @@
    identifier fn =~ "index_to_obj";
    expression E;
    @@

     fn(...,
    - slab_page(E)
    + E
     ,...)

    // functions that were returning struct page ptr and now will return struct
    // slab ptr, including slab_page() wrapper removal
    @@
    identifier fn =~ "cache_grow_begin|get_valid_first_slab|get_first_slab";
    expression E;
    @@

     fn(...)
     {
    <...
    - slab_page(E)
    + E
    ...>
     }

    // rename any former struct page * declarations
    @@
    @@

    struct slab *
    -page
    +slab
    ;

    // all functions (with exceptions) with a local "struct slab *page" variable
    // that will be renamed to "struct slab *slab"
    @@
    identifier fn !~ "kmem_getpages|kmem_freepages";
    expression E;
    @@

     fn(...)
     {
    <...
    (
    - page_slab(page)
    + slab
    |
    - page_to_nid(page)
    + slab_nid(slab)
    |
    - kasan_poison_slab(page)
    + kasan_poison_slab(slab_page(slab))
    |
    - page_address(page)
    + slab_address(slab)
    |
    - page_size(page)
    + slab_size(slab)
    |
    - page->pages
    + slab->slabs
    |
    - page = virt_to_head_page(E)
    + slab = virt_to_slab(E)
    |
    - virt_to_head_page(E)
    + virt_to_slab(E)
    |
    - page
    + slab
    )
    ...>
     }

    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Reviewed-by: Roman Gushchin <guro@fb.com>
    Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    Cc: Julia Lawall <julia.lawall@inria.fr>
    Cc: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2022-10-12 07:27:35 -04:00
Chris von Recklinghausen 4a1e6707a9 mm/slab: Convert kmem_getpages() and kmem_freepages() to struct slab
Bugzilla: https://bugzilla.redhat.com/2120352

commit 42c0faac3192352867f6e6ba815b28ed58bf7388
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Fri Oct 29 17:54:55 2021 +0200

    mm/slab: Convert kmem_getpages() and kmem_freepages() to struct slab

    These functions sit at the boundary to page allocator. Also use folio
    internally to avoid extra compound_head() when dealing with page flags.

    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Reviewed-by: Roman Gushchin <guro@fb.com>
    Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2022-10-12 07:27:34 -04:00