Commit Graph

69 Commits

Author SHA1 Message Date
Eric Chanudet ec94e5c650 mm/mm_init.c: print mem_init info after defer_init is done
JIRA: https://issues.redhat.com/browse/RHEL-85565

commit 4f66da89d31ca56d4c41de01dd663f79d697904b
Author: Wei Yang <richard.weiyang@gmail.com>
Date:   Tue Jun 11 14:52:23 2024 +0000

    mm/mm_init.c: print mem_init info after defer_init is done

    Current call flow looks like this:

    start_kernel
      mm_core_init
        mem_init
        mem_init_print_info
      rest_init
        kernel_init
          kernel_init_freeable
            page_alloc_init_late
              deferred_init_memmap

    If CONFIG_DEFERRED_STRUCT_PAGE_INIT, the time mem_init_print_info()
    calls, pages are not totally initialized and freed to buddy.

    This has one issue

      * nr_free_pages() just contains partial free pages in the system,
        which is not we expect.

    Let's print the mem info after defer_init is done.

    Also this would help changing totalram_pages accounting, since we plan
    to move the accounting into __free_pages_core().

    Link: https://lkml.kernel.org/r/20240611145223.16872-1-richard.weiyang@gmail.com
    Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
    Acked-by: David Hildenbrand <david@redhat.com>
    Cc: Mike Rapoport (IBM) <rppt@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Eric Chanudet <echanude@redhat.com>
2025-03-31 12:16:36 -04:00
Eric Chanudet 161697faac mm/mm_init: use node's number of cpus in deferred_page_init_max_threads
JIRA: https://issues.redhat.com/browse/RHEL-77271

commit 188f87f2648b13f5de17d5e068f18d317e0c1f98
Author: Eric Chanudet <echanude@redhat.com>
Date:   Wed May 22 16:38:01 2024 -0400

    mm/mm_init: use node's number of cpus in deferred_page_init_max_threads

    x86_64 is already using the node's cpu as maximum threads.  Make that the
    default for all archs setting DEFERRED_STRUCT_PAGE_INIT.

    This returns to the behavior prior making the function arch-specific with
    commit ecd0965069 ("mm: make deferred init's max threads
    arch-specific").

    Setting DEFERRED_STRUCT_PAGE_INIT and testing on a few arm64 platforms
    shows faster deferred_init_memmap completions:

    |         | x13s        | SA8775p-ride | Ampere R137-P31 | Ampere HR330 |
    |         | Metal, 32GB | VM, 36GB     | VM, 58GB        | Metal, 128GB |
    |         | 8cpus       | 8cpus        | 8cpus           | 32cpus       |
    |---------|-------------|--------------|-----------------|--------------|
    | threads |  ms     (%) | ms       (%) |  ms         (%) |  ms      (%) |
    |---------|-------------|--------------|-----------------|--------------|
    | 1       | 108    (0%) | 72      (0%) | 224        (0%) | 324     (0%) |
    | cpus    |  24  (-77%) | 36    (-50%) |  40      (-82%) |  56   (-82%) |

    Michael Ellerman reported:

    : On a machine here (1TB, 40 cores, 4KB pages) the existing code gives:
    :
    :   [    0.500124] node 2 deferred pages initialised in 210ms
    :   [    0.515790] node 3 deferred pages initialised in 230ms
    :   [    0.516061] node 0 deferred pages initialised in 230ms
    :   [    0.516522] node 7 deferred pages initialised in 230ms
    :   [    0.516672] node 4 deferred pages initialised in 230ms
    :   [    0.516798] node 6 deferred pages initialised in 230ms
    :   [    0.517051] node 5 deferred pages initialised in 230ms
    :   [    0.523887] node 1 deferred pages initialised in 240ms
    :
    : vs with the patch:
    :
    :   [    0.379613] node 0 deferred pages initialised in 90ms
    :   [    0.380388] node 1 deferred pages initialised in 90ms
    :   [    0.380540] node 4 deferred pages initialised in 100ms
    :   [    0.390239] node 6 deferred pages initialised in 100ms
    :   [    0.390249] node 2 deferred pages initialised in 100ms
    :   [    0.390786] node 3 deferred pages initialised in 110ms
    :   [    0.396721] node 5 deferred pages initialised in 110ms
    :   [    0.397095] node 7 deferred pages initialised in 110ms
    :
    : Which is a nice speedup.

    [echanude@redhat.com: v3]
      Link: https://lkml.kernel.org/r/20240528185455.643227-4-echanude@redhat.com
    Link: https://lkml.kernel.org/r/20240522203758.626932-4-echanude@redhat.com
    Signed-off-by: Eric Chanudet <echanude@redhat.com>
    Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
    Reviewed-by: Baoquan He <bhe@redhat.com>
    Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
    Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Borislav Petkov (AMD) <bp@alien8.de>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Nicholas Piggin <npiggin@gmail.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Eric Chanudet <echanude@redhat.com>
2025-01-31 16:59:53 -05:00
Waiman Long 4a455bf63c mm/mm_init: Fix incorrect alignment between deferred_free_pages() & deferred_free_range()
JIRA: https://issues.redhat.com/browse/RHEL-72551
Upstream Status: RHEL only
Tested: A test kernel with this patch applied was boot up and the
	deferred pages initialization time was back to normal comparable
	to that of the upstream kernel.

RHEL commit 1845b92dcf ("mm: move most of core MM initialization to
mm/mm_init.c") moved deferred_free_pages() and deferred_free_range()
from mm/page_alloc.c to mm/mm_init.c. However, mm/page_alloc.c already
had a later commit 3f6dac0fd1b8 ("mm/page_alloc: make deferred page init
free pages in MAX_ORDER blocks") applied on top of these two functions.
Commit 1845b92dcf, however, didn't move the change forward. That
was OK at the time as the page alignment (pageblock_aligned()) in
deferred_free_pages() and deferred_free_range() still matched.

Later, RHEL commit c8c9c0b259 ("mm, treewide: rename MAX_ORDER
to MAX_PAGE_ORDER") changed the alignment of deferred_free_range()
from pageblock_aligned() to IS_MAX_ORDER_ALIGNED(), but didn't change
the alignment of deferred_free_pages(). This misalignment caused a
100X increase in the deferred pages initialization time. So a 50ms
initialization time became more than 5000ms, for example. This is
because the middle portion of deferred_free_range() to free a large
naturally-aligned chunk was never called. Instead, __free_pages_core()
is now called for every page. MAX_ORDER_NR_PAGES is 2^10 pages for
x86-64. So __free_pages_core() is now called a thousand times more.

For systems with large amount of memory, soft lockup warnings will now
be displayed in the console at bootup time.

Fix this problem by updating deferred_free_pages() to use
IS_MAX_ORDER_ALIGNED() alignment. This patch also includes other
minor changes to match the expected output if commit 1845b92dcf and
3f6dac0fd1b8 are applied in the correct order without merge conflict.

Fixes: 1845b92dcf ("mm: move most of core MM initialization to mm/mm_init.c")
Fixes: c8c9c0b259 ("mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER")
Signed-off-by: Waiman Long <longman@redhat.com>
2025-01-03 22:29:03 -05:00
Rafael Aquini cfad6829e6 efi: disable mirror feature during crashkernel
JIRA: https://issues.redhat.com/browse/RHEL-27745

This patch is a backport of the following upstream commit:
commit 7ea6ec4c25294e8bc8788148ef854df92ee8dc5e
Author: Ma Wupeng <mawupeng1@huawei.com>
Date:   Tue Jan 9 12:15:36 2024 +0800

    efi: disable mirror feature during crashkernel

    If the system has no mirrored memory or uses crashkernel.high while
    kernelcore=mirror is enabled on the command line then during crashkernel,
    there will be limited mirrored memory and this usually leads to OOM.

    To solve this problem, disable the mirror feature during crashkernel.

    Link: https://lkml.kernel.org/r/20240109041536.3903042-1-mawupeng1@huawei.com
    Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
    Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-12-09 12:24:18 -05:00
Rafael Aquini c8c9c0b259 mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER
JIRA: https://issues.redhat.com/browse/RHEL-27745
Conflicts:
  * arch/*/Kconfig: all hunks dropped as there were only text blurbs and comments
     being changed with no functional changes whatsoever, and RHEL9 is missing
     several (unrelated) commits to these arches that tranform the text blurbs in
     the way these non-functional hunks were expecting;
  * drivers/accel/qaic/qaic_data.c: hunk dropped due to RHEL-only commit
     083c0cdce2 ("Merge DRM changes from upstream v6.8..v6.9");
  * drivers/gpu/drm/i915/gem/selftests/huge_pages.c: hunk dropped due to RHEL-only
     commit ca8b16c11b ("Merge DRM changes from upstream v6.7..v6.8");
  * drivers/gpu/drm/ttm/tests/ttm_pool_test.c: all hunks dropped due to RHEL-only
     commit ca8b16c11b ("Merge DRM changes from upstream v6.7..v6.8");
  * drivers/video/fbdev/vermilion/vermilion.c: hunk dropped as RHEL9 misses
     commit dbe7e429fe ("vmlfb: framebuffer driver for Intel Vermilion Range");
  * include/linux/pageblock-flags.h: differences due to out-of-order backport
    of upstream commits 72801513b2bf ("mm: set pageblock_order to HPAGE_PMD_ORDER
    in case with !CONFIG_HUGETLB_PAGE but THP enabled"), and 3a7e02c040b1
    ("minmax: avoid overly complicated constant expressions in VM code");
  * mm/mm_init.c: differences on the 3rd, and 4th hunks are due to RHEL
     backport commit 1845b92dcf ("mm: move most of core MM initialization to
     mm/mm_init.c") ignoring the out-of-order backport of commit 3f6dac0fd1b8
     ("mm/page_alloc: make deferred page init free pages in MAX_ORDER blocks")
     thus partially reverting the changes introduced by the latter;

This patch is a backport of the following upstream commit:
commit 5e0a760b44417f7cadd79de2204d6247109558a0
Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Date:   Thu Dec 28 17:47:04 2023 +0300

    mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER

    commit 23baf831a32c ("mm, treewide: redefine MAX_ORDER sanely") has
    changed the definition of MAX_ORDER to be inclusive.  This has caused
    issues with code that was not yet upstream and depended on the previous
    definition.

    To draw attention to the altered meaning of the define, rename MAX_ORDER
    to MAX_PAGE_ORDER.

    Link: https://lkml.kernel.org/r/20231228144704.14033-2-kirill.shutemov@linux.intel.com
    Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-12-09 12:24:17 -05:00
Rafael Aquini 070f8b6fd5 mm: remove unnecessary ia64 code and comment
JIRA: https://issues.redhat.com/browse/RHEL-27745

This patch is a backport of the following upstream commit:
commit e99fb98d478a0480d50e334df21bef12fb74e17f
Author: Kefeng Wang <wangkefeng.wang@huawei.com>
Date:   Fri Dec 22 15:02:03 2023 +0800

    mm: remove unnecessary ia64 code and comment

    IA64 has gone with commit cf8e8658100d ("arch: Remove Itanium (IA-64)
    architecture"), remove unnecessary ia64 special mm code and comment too.

    Link: https://lkml.kernel.org/r/20231222070203.2966980-1-wangkefeng.wang@huawei.com
    Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
    Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-12-09 12:24:11 -05:00
Rado Vrbovsky 570a71d7db Merge: mm: update core code to v6.6 upstream
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5252

JIRA: https://issues.redhat.com/browse/RHEL-27743  
JIRA: https://issues.redhat.com/browse/RHEL-59459    
CVE: CVE-2024-46787    
Depends: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4961  
  
This MR brings RHEL9 core MM code up to upstream's v6.6 LTS level.    
This work follows up on the previous v6.5 update (RHEL-27742) and as such,    
the bulk of this changeset is comprised of refactoring and clean-ups of     
the internal implementation of several APIs as it further advances the     
conversion to FOLIOS, and follow up on the per-VMA locking changes.

Also, with the rebase to v6.6 LTS, we complete the infrastructure to allow    
Control-flow Enforcement Technology, a.k.a. Shadow Stacks, for x86 builds,    
and we add a potential extra level of protection (assessment pending) to help    
on mitigating kernel heap exploits dubbed as "SlubStick".     
    
Follow-up fixes are omitted from this series either because they are irrelevant to     
the bits we support on RHEL or because they depend on bigger changesets introduced     
upstream more recently. A follow-up ticket (RHEL-27745) will deal with these and other cases separately.    

Omitted-fix: e540b8c5da04 ("mips: mm: add slab availability checking in ioremap_prot")    
Omitted-fix: f7875966dc0c ("tools headers UAPI: Sync files changed by new fchmodat2 and map_shadow_stack syscalls with the kernel sources")   
Omitted-fix: df39038cd895 ("s390/mm: Fix VM_FAULT_HWPOISON handling in do_exception()")    
Omitted-fix: 12bbaae7635a ("mm: create FOLIO_FLAG_FALSE and FOLIO_TYPE_OPS macros")    
Omitted-fix: fd1a745ce03e ("mm: support page_mapcount() on page_has_type() pages")    
Omitted-fix: d99e3140a4d3 ("mm: turn folio_test_hugetlb into a PageType")    
Omitted-fix: fa2690af573d ("mm: page_ref: remove folio_try_get_rcu()")    
Omitted-fix: f442fa614137 ("mm: gup: stop abusing try_grab_folio")    
Omitted-fix: cb0f01beb166 ("mm/mprotect: fix dax pud handling")    
    
Signed-off-by: Rafael Aquini <raquini@redhat.com>

Approved-by: John W. Linville <linville@redhat.com>
Approved-by: Mark Salter <msalter@redhat.com>
Approved-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Approved-by: Chris von Recklinghausen <crecklin@redhat.com>
Approved-by: Steve Best <sbest@redhat.com>
Approved-by: David Airlie <airlied@redhat.com>
Approved-by: Michal Schmidt <mschmidt@redhat.com>
Approved-by: Baoquan He <5820488-baoquan_he@users.noreply.gitlab.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2024-10-30 07:22:28 +00:00
Rado Vrbovsky dd3203c2f2 Merge: padata: Rebase to v6.11
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5189

JIRA: https://issues.redhat.com/browse/RHEL-56164    
CVE: CVE-2024-43889    
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5189

The main purpose of of this MR is to pull in commit 6d45e1c948a8
("padata: Fix possible divide-by-0 panic in padata_mt_helper()") which
is a fix for CVE-2024-43889. In addition, prior padata commits are also
pulled in as well (up to v6.11) as some of them may be useful.

Note that patch 6 of the series has a strange subject line that does
not reflect what the patch is doing. It is a mistake made in upstream
and this series just follows upstream version as is.

Signed-off-by: Waiman Long <longman@redhat.com>

Approved-by: Rafael Aquini <raquini@redhat.com>
Approved-by: Chris von Recklinghausen <crecklin@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2024-10-25 16:16:29 +00:00
Rafael Aquini 062c2d2155 mm/mm_init: use helper macro BITS_PER_LONG and BITS_PER_BYTE
JIRA: https://issues.redhat.com/browse/RHEL-27743

This patch is a backport of the following upstream commit:
commit daee07bfba3340b07edcf9ae92044398e8a964db
Author: Miaohe Lin <linmiaohe@huawei.com>
Date:   Mon Aug 7 10:35:28 2023 +0800

    mm/mm_init: use helper macro BITS_PER_LONG and BITS_PER_BYTE

    It's more readable to use helper macro BITS_PER_LONG and BITS_PER_BYTE.
    No functional change intended.

    Link: https://lkml.kernel.org/r/20230807023528.325191-1-linmiaohe@huawei.com
    Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-10-01 11:21:12 -04:00
Rafael Aquini f5f98e718c mm: no need to export mm_kobj
JIRA: https://issues.redhat.com/browse/RHEL-27743

This patch is a backport of the following upstream commit:
commit dbdd2a989f2357d40f0c5a440ca81bf1390f11ba
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date:   Fri Aug 4 08:43:37 2023 +0200

    mm: no need to export mm_kobj

    There are no modules using mm_kobj, so do not export it.

    Link: https://lkml.kernel.org/r/2023080436-algebra-cabana-417d@gregkh
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
    Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-10-01 11:20:59 -04:00
Rafael Aquini 51ea23f932 mm: disable kernelcore=mirror when no mirror memory
JIRA: https://issues.redhat.com/browse/RHEL-27743
Conflicts:
  * mm/internal.h: context difference due to a series of conflict resolutions
      given a series of out-of-order backports which made "mirrored_kernelcore"
      to end up in a slightly different context when comparing with its upstream
      placement. We leverage this backport to make it go into the "right" place.

This patch is a backport of the following upstream commit:
commit 0db31d63f27e5b8ca84b9fd5a3cff5b12ac88abf
Author: Ma Wupeng <mawupeng1@huawei.com>
Date:   Wed Aug 2 15:23:28 2023 +0800

    mm: disable kernelcore=mirror when no mirror memory

    For system with kernelcore=mirror enabled while no mirrored memory is
    reported by efi.  This could lead to kernel OOM during startup since all
    memory beside zone DMA are in the movable zone and this prevents the
    kernel to use it.

    Zone DMA/DMA32 initialization is independent of mirrored memory and their
    max pfn is set in zone_sizes_init().  Since kernel can fallback to zone
    DMA/DMA32 if there is no memory in zone Normal, these zones are seen as
    mirrored memory no mather their memory attributes are.

    To solve this problem, disable kernelcore=mirror when there is no real
    mirrored memory exists.

    Link: https://lkml.kernel.org/r/20230802072328.2107981-1-mawupeng1@huawei.com
    Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
    Suggested-by: Kefeng Wang <wangkefeng.wang@huawei.com>
    Suggested-by: Mike Rapoport <rppt@kernel.org>
    Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
    Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
    Cc: Levi Yun <ppbuk5246@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-10-01 11:20:14 -04:00
Rafael Aquini 77f1f775e1 mm/vmemmap: improve vmemmap_can_optimize and allow architectures to override
JIRA: https://issues.redhat.com/browse/RHEL-27743

This patch is a backport of the following upstream commit:
commit c1a6c536fb088c01d6bdce77731d89ad5e1734c6
Author: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Date:   Tue Jul 25 00:37:49 2023 +0530

    mm/vmemmap: improve vmemmap_can_optimize and allow architectures to override

    dax vmemmap optimization requires a minimum of 2 PAGE_SIZE area within
    vmemmap such that tail page mapping can point to the second PAGE_SIZE
    area.  Enforce that in vmemmap_can_optimize() function.

    Architectures like powerpc also want to enable vmemmap optimization
    conditionally (only with radix MMU translation).  Hence allow architecture
    override.

    Link: https://lkml.kernel.org/r/20230724190759.483013-4-aneesh.kumar@linux.ibm.com
    Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
    Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Joao Martins <joao.m.martins@oracle.com>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Muchun Song <muchun.song@linux.dev>
    Cc: Nicholas Piggin <npiggin@gmail.com>
    Cc: Oscar Salvador <osalvador@suse.de>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-10-01 11:19:44 -04:00
Rafael Aquini bdf551acca mm: kfence: allocate kfence_metadata at runtime
JIRA: https://issues.redhat.com/browse/RHEL-27743

This patch is a backport of the following upstream commit:
commit cabdf74e6b319c989eb8e812f1854291ae0af1c0
Author: Peng Zhang <zhangpeng.00@bytedance.com>
Date:   Tue Jul 18 15:30:19 2023 +0800

    mm: kfence: allocate kfence_metadata at runtime

    kfence_metadata is currently a static array.  For the purpose of
    allocating scalable __kfence_pool, we first change it to runtime
    allocation of metadata.  Since the size of an object of kfence_metadata is
    1160 bytes, we can save at least 72 pages (with default 256 objects)
    without enabling kfence.

    [akpm@linux-foundation.org: restore newline, per Marco]
    Link: https://lkml.kernel.org/r/20230718073019.52513-1-zhangpeng.00@bytedance.com
    Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
    Reviewed-by: Marco Elver <elver@google.com>
    Cc: Alexander Potapenko <glider@google.com>
    Cc: Dmitry Vyukov <dvyukov@google.com>
    Cc: Muchun Song <muchun.song@linux.dev>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-10-01 11:19:09 -04:00
Rafael Aquini db5304cc3f mm/mm_init.c: drop node_start_pfn from adjust_zone_range_for_zone_movable()
JIRA: https://issues.redhat.com/browse/RHEL-27743

This patch is a backport of the following upstream commit:
commit 0792e47d566244e150e320708e0be708a9db1a93
Author: Haifeng Xu <haifeng.xu@shopee.com>
Date:   Mon Jul 17 06:58:11 2023 +0000

    mm/mm_init.c: drop node_start_pfn from adjust_zone_range_for_zone_movable()

    node_start_pfn is not used in adjust_zone_range_for_zone_movable(), so it
    is pointless to waste a function argument.  Drop the parameter.

    Link: https://lkml.kernel.org/r/20230717065811.1262-1-haifeng.xu@shopee.com
    Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
    Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-10-01 11:19:03 -04:00
Rafael Aquini 574789c8d4 mm/mm_init.c: mark check_for_memory() as __init
JIRA: https://issues.redhat.com/browse/RHEL-27743

This patch is a backport of the following upstream commit:
commit b894da0468640f610d47624e872dc11f2ae5bb4b
Author: Haifeng Xu <haifeng.xu@shopee.com>
Date:   Mon Jul 10 09:37:50 2023 +0000

    mm/mm_init.c: mark check_for_memory() as __init

    The only caller of check_for_memory() is free_area_init(), which is
    annotated with __init, so it should be safe to also mark the former as
    __init.

    Link: https://lkml.kernel.org/r/20230710093750.1294-1-haifeng.xu@shopee.com
    Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com>
    Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
    Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-10-01 11:18:15 -04:00
Rafael Aquini 8189325ce2 mm/mm_init.c: update obsolete comment in get_pfn_range_for_nid()
JIRA: https://issues.redhat.com/browse/RHEL-27743

This patch is a backport of the following upstream commit:
commit 3a29280afb25263c76212a8c140c29f280049ffb
Author: Miaohe Lin <linmiaohe@huawei.com>
Date:   Sun Jun 25 11:33:40 2023 +0800

    mm/mm_init.c: update obsolete comment in get_pfn_range_for_nid()

    Since commit 633c0666b5 ("Memoryless nodes: drop one memoryless node boot
    warning"), the warning for a node with no available memory is removed.
    Update the corresponding comment.

    Link: https://lkml.kernel.org/r/20230625033340.1054103-1-linmiaohe@huawei.com
    Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
    Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-10-01 11:18:15 -04:00
Rafael Aquini 144c8b3a02 mm/mm_init.c: remove obsolete macro HASH_SMALL
JIRA: https://issues.redhat.com/browse/RHEL-27743

This patch is a backport of the following upstream commit:
commit 3fade62b62e84dd8dbf6e92d494b0e7eca750c43
Author: Miaohe Lin <linmiaohe@huawei.com>
Date:   Sun Jun 25 10:13:23 2023 +0800

    mm/mm_init.c: remove obsolete macro HASH_SMALL

    HASH_SMALL only works when parameter numentries is 0. But the sole caller
    futex_init() never calls alloc_large_system_hash() with numentries set to
    0. So HASH_SMALL is obsolete and remove it.

    Link: https://lkml.kernel.org/r/20230625021323.849147-1-linmiaohe@huawei.com
    Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
    Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
    Cc: André Almeida <andrealmeid@igalia.com>
    Cc: Darren Hart <dvhart@infradead.org>
    Cc: Davidlohr Bueso <dave@stgolabs.net>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-10-01 11:18:14 -04:00
Waiman Long fa01745e1e Author: Gang Li padata: dispatch works on
JIRA: https://issues.redhat.com/browse/RHEL-56164

commit eb52286634f042432ec775077a73334603a1c6e4
Author: Gang Li Subject: padata: dispatch works on <gang.li@linux.dev>
Date:   Wed, 6 Mar 2024 13:04:17 -0800

    Author: Gang Li padata: dispatch works on

    different nodes Date: Thu, 22 Feb 2024 22:04:17 +0800

    When a group of tasks that access different nodes are scheduled on the
    same node, they may encounter bandwidth bottlenecks and access latency.

    Thus, numa_aware flag is introduced here, allowing tasks to be distributed
    across different nodes to fully utilize the advantage of multi-node
    systems.

    Link: https://lkml.kernel.org/r/20240222140422.393911-5-gang.li@linux.dev
    Signed-off-by: Gang Li <ligang.bdlg@bytedance.com>
    Tested-by: David Rientjes <rientjes@google.com>
    Reviewed-by: Muchun Song <muchun.song@linux.dev>
    Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
    Cc: Alexey Dobriyan <adobriyan@gmail.com>
    Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Jane Chu <jane.chu@oracle.com>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Paul E. McKenney <paulmck@kernel.org>
    Cc: Randy Dunlap <rdunlap@infradead.org>
    Cc: Steffen Klassert <steffen.klassert@secunet.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2024-09-24 14:00:43 -04:00
Rafael Aquini 2d90ad12f0 mm/mm_init.c: drop 'nid' parameter from check_for_memory()
JIRA: https://issues.redhat.com/browse/RHEL-27742

This patch is a backport of the following upstream commit:
commit 91ff4d754a1895feb4216e94028edd76cbbc0770
Author: Haifeng Xu <haifeng.xu@shopee.com>
Date:   Wed Jun 7 03:24:02 2023 +0000

    mm/mm_init.c: drop 'nid' parameter from check_for_memory()

    The node_id in pgdat has already been set in free_area_init_node(),
    so use it internally instead of passing a redundant parameter.

    Link: https://lkml.kernel.org/r/20230607032402.4679-1-haifeng.xu@shopee.com
    Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com>
    Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-09-05 20:36:37 -04:00
Rafael Aquini 9bcffbf7e3 mm/mm_init.c: remove reset_node_present_pages()
JIRA: https://issues.redhat.com/browse/RHEL-27742

This patch is a backport of the following upstream commit:
commit 32b6a4a1745a46918f748f6fb7641e588fbec6f2
Author: Haifeng Xu <haifeng.xu@shopee.com>
Date:   Wed Jun 7 02:50:56 2023 +0000

    mm/mm_init.c: remove reset_node_present_pages()

    reset_node_present_pages() only get called in hotadd_init_pgdat(), move
    the action that clear present pages to free_area_init_core_hotplug(), so
    the helper can be removed.

    Link: https://lkml.kernel.org/r/20230607025056.1348-1-haifeng.xu@shopee.com
    Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com>
    Suggested-by: David Hildenbrand <david@redhat.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Mike Rapoport (IBM) <rppt@kernel.org>
    Cc: Oscar Salvador <osalvador@suse.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-09-05 20:36:36 -04:00
Rafael Aquini 4c7155b50f mm/mm_init.c: move set_pageblock_order() to free_area_init()
JIRA: https://issues.redhat.com/browse/RHEL-27742

This patch is a backport of the following upstream commit:
commit e3d9b45fb17cfddb1c414b5981743d4245fcf486
Author: Haifeng Xu <haifeng.xu@shopee.com>
Date:   Thu Jun 1 06:35:35 2023 +0000

    mm/mm_init.c: move set_pageblock_order() to free_area_init()

    pageblock_order only needs to be set once, there is no need to initialize
    it in every zone/node.

    Link: https://lkml.kernel.org/r/20230601063536.26882-1-haifeng.xu@shopee.com
    Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com>
    Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
    Cc: Michal Hocko <mhocko@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-09-05 20:36:23 -04:00
Rafael Aquini da4ce5a82b mm/mm_init.c: remove free_area_init_memoryless_node()
JIRA: https://issues.redhat.com/browse/RHEL-27742

This patch is a backport of the following upstream commit:
commit 837c2ba56d6fd1ecf7a1c5aa0cdc872f3b74185b
Author: Haifeng Xu <haifeng.xu@shopee.com>
Date:   Sun May 28 04:57:20 2023 +0000

    mm/mm_init.c: remove free_area_init_memoryless_node()

    free_area_init_memoryless_node() is just a wrapper of
    free_area_init_node(), remove it to clean up.

    Link: https://lkml.kernel.org/r/20230528045720.4835-1-haifeng.xu@shopee.com
    Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Cc: Mike Rapoport <rppt@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-09-05 20:36:15 -04:00
Rafael Aquini 49426dd512 mm/mm_init.c: do not calculate zone_start_pfn/zone_end_pfn in zone_absent_pages_in_node()
JIRA: https://issues.redhat.com/browse/RHEL-27742

This patch is a backport of the following upstream commit:
commit 1c2d252f5b4289e1c6840bcf394157b70c639d6e
Author: Haifeng Xu <haifeng.xu@shopee.com>
Date:   Fri May 26 08:52:51 2023 +0000

    mm/mm_init.c: do not calculate zone_start_pfn/zone_end_pfn in zone_absent_pages_in_node()

    In calculate_node_totalpages(), zone_start_pfn/zone_end_pfn are already
    calculated in zone_spanned_pages_in_node(), so use them as parameters
    instead of node_start_pfn/node_end_pfn and the duplicated calculation
    process can de dropped.

    Link: https://lkml.kernel.org/r/20230526085251.1977-2-haifeng.xu@shopee.com
    Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com>
    Suggested-by: Mike Rapoport <rppt@kernel.org>
    Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Haifeng Xu <haifeng.xu@shopee.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-09-05 20:36:12 -04:00
Rafael Aquini a35879b0b8 mm/mm_init.c: introduce reset_memoryless_node_totalpages()
JIRA: https://issues.redhat.com/browse/RHEL-27742

This patch is a backport of the following upstream commit:
commit ba1b67c79cb3c5f5d11cb475bb7045929b235538
Author: Haifeng Xu <haifeng.xu@shopee.com>
Date:   Fri May 26 08:52:50 2023 +0000

    mm/mm_init.c: introduce reset_memoryless_node_totalpages()

    Currently, no matter whether a node actually has memory or not,
    calculate_node_totalpages() is used to account number of pages in
    zone/node.  However, for node without memory, these unnecessary
    calculations can be skipped.  All the zone/node page counts can be set to
    0 directly.  So introduce reset_memoryless_node_totalpages() to perform
    this action.

    Furthermore, calculate_node_totalpages() only gets called for the node
    with memory.

    Link: https://lkml.kernel.org/r/20230526085251.1977-1-haifeng.xu@shopee.com
    Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com>
    Suggested-by: Mike Rapoport <rppt@kernel.org>
    Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-09-05 20:36:11 -04:00
Chris von Recklinghausen 28429327ca mm: page_alloc: move sysctls into it own fils
JIRA: https://issues.redhat.com/browse/RHEL-20141

commit e95d372c4cd46b6ec4eeacc07adcb7260ab4cfa0
Author: Kefeng Wang <wangkefeng.wang@huawei.com>
Date:   Tue May 16 14:38:20 2023 +0800

    mm: page_alloc: move sysctls into it own fils

    This moves all page alloc related sysctls to its own file, as part of the
    kernel/sysctl.c spring cleaning, also move some functions declarations
    from mm.h into internal.h.

    Link: https://lkml.kernel.org/r/20230516063821.121844-13-wangkefeng.wang@huawei.com
    Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: "Huang, Ying" <ying.huang@intel.com>
    Cc: Iurii Zaikin <yzaikin@google.com>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Len Brown <len.brown@intel.com>
    Cc: Luis Chamberlain <mcgrof@kernel.org>
    Cc: Mike Rapoport (IBM) <rppt@kernel.org>
    Cc: Oscar Salvador <osalvador@suse.de>
    Cc: Pavel Machek <pavel@ucw.cz>
    Cc: Rafael J. Wysocki <rafael@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-06-07 13:14:12 -04:00
Chris von Recklinghausen eaa3efc000 mm: page_alloc: move set_zone_contiguous() into mm_init.c
JIRA: https://issues.redhat.com/browse/RHEL-20141

commit 904d58578fce531be07619a2bc2cdc16c9fd49b6
Author: Kefeng Wang <wangkefeng.wang@huawei.com>
Date:   Tue May 16 14:38:11 2023 +0800

    mm: page_alloc: move set_zone_contiguous() into mm_init.c

    set_zone_contiguous() is only used in mm init/hotplug, and
    clear_zone_contiguous() only used in hotplug, move them from page_alloc.c
    to the more appropriate file.

    Link: https://lkml.kernel.org/r/20230516063821.121844-4-wangkefeng.wang@huawei.com
    Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: "Huang, Ying" <ying.huang@intel.com>
    Cc: Iurii Zaikin <yzaikin@google.com>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Len Brown <len.brown@intel.com>
    Cc: Luis Chamberlain <mcgrof@kernel.org>
    Cc: Mike Rapoport (IBM) <rppt@kernel.org>
    Cc: Oscar Salvador <osalvador@suse.de>
    Cc: Pavel Machek <pavel@ucw.cz>
    Cc: Rafael J. Wysocki <rafael@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-06-07 13:14:11 -04:00
Chris von Recklinghausen a5588506c2 mm: page_alloc: move init_on_alloc/free() into mm_init.c
JIRA: https://issues.redhat.com/browse/RHEL-20141

commit 5e7d5da2f41c1d762cd1dbdd97758be6c414ea29
Author: Kefeng Wang <wangkefeng.wang@huawei.com>
Date:   Tue May 16 14:38:10 2023 +0800

    mm: page_alloc: move init_on_alloc/free() into mm_init.c

    Since commit f2fc4b44ec2b ("mm: move init_mem_debugging_and_hardening() to
    mm/mm_init.c"), the init_on_alloc() and init_on_free() define is better to
    move there too.

    Link: https://lkml.kernel.org/r/20230516063821.121844-3-wangkefeng.wang@huawei.com
    Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
    Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: "Huang, Ying" <ying.huang@intel.com>
    Cc: Iurii Zaikin <yzaikin@google.com>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Len Brown <len.brown@intel.com>
    Cc: Luis Chamberlain <mcgrof@kernel.org>
    Cc: Oscar Salvador <osalvador@suse.de>
    Cc: Pavel Machek <pavel@ucw.cz>
    Cc: Rafael J. Wysocki <rafael@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-06-07 13:14:11 -04:00
Chris von Recklinghausen 042dca8adf mm: page_alloc: move mirrored_kernelcore into mm_init.c
JIRA: https://issues.redhat.com/browse/RHEL-20141

commit 072ba380cefc7722c9442cc14a9c2810898c13ac
Author: Kefeng Wang <wangkefeng.wang@huawei.com>
Date:   Tue May 16 14:38:09 2023 +0800

    mm: page_alloc: move mirrored_kernelcore into mm_init.c

    Patch series "mm: page_alloc: misc cleanup and refactor", v2.

    This aims to reduce more space in page_alloc.c, also do some cleanup, no
    functional changes intended.

    This patch (of 13):

    Since commit 9420f89db2dd ("mm: move most of core MM initialization to
    mm/mm_init.c"), mirrored_kernelcore should be moved into mm_init.c, as
    most related codes are already there.

    Link: https://lkml.kernel.org/r/20230516063821.121844-1-wangkefeng.wang@huawei.com
    Link: https://lkml.kernel.org/r/20230516063821.121844-2-wangkefeng.wang@huawei.com
    Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
    Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: "Huang, Ying" <ying.huang@intel.com>
    Cc: Iurii Zaikin <yzaikin@google.com>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Len Brown <len.brown@intel.com>
    Cc: Luis Chamberlain <mcgrof@kernel.org>
    Cc: Oscar Salvador <osalvador@suse.de>
    Cc: Pavel Machek <pavel@ucw.cz>
    Cc: Rafael J. Wysocki <rafael@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-06-07 13:14:11 -04:00
Eric Chanudet 2a44683bfc mm: pass nid to reserve_bootmem_region()
JIRA: https://issues.redhat.com/browse/RHEL-36126

commit 61167ad5fecdeaa037f3df1ba354dddd5f66a1ed
Author: Yajun Deng <yajun.deng@linux.dev>
Date:   Mon Jun 19 10:34:06 2023 +0800

    mm: pass nid to reserve_bootmem_region()

    early_pfn_to_nid() is called frequently in init_reserved_page(), it
    returns the node id of the PFN.  These PFN are probably from the same
    memory region, they have the same node id.  It's not necessary to call
    early_pfn_to_nid() for each PFN.

    Pass nid to reserve_bootmem_region() and drop the call to
    early_pfn_to_nid() in init_reserved_page().  Also, set nid on all reserved
    pages before doing this, as some reserved memory regions may not be set
    nid.

    The most beneficial function is memmap_init_reserved_pages() if
    CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled.

    The following data was tested on an x86 machine with 190GB of RAM.

    before:
    memmap_init_reserved_pages()  67ms

    after:
    memmap_init_reserved_pages()  20ms

    Link: https://lkml.kernel.org/r/20230619023406.424298-1-yajun.deng@linux.dev
    Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
    Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Eric Chanudet <echanude@redhat.com>
2024-05-21 14:18:30 -04:00
Chris von Recklinghausen b196c334e8 mm: make arch_has_descending_max_zone_pfns() static
JIRA: https://issues.redhat.com/browse/RHEL-27741

commit 5f300fd59a2ae90b8a7fb5ed3d5fd43768236c38
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Fri Apr 14 10:03:53 2023 +0200

    mm: make arch_has_descending_max_zone_pfns() static

    clang produces a build failure on x86 for some randconfig builds after a
    change that moves around code to mm/mm_init.c:

    Cannot find symbol for section 2: .text.
    mm/mm_init.o: failed

    I have not been able to figure out why this happens, but the __weak
    annotation on arch_has_descending_max_zone_pfns() is the trigger here.

    Removing the weak function in favor of an open-coded Kconfig option check
    avoids the problem and becomes clearer as well as better to optimize by
    the compiler.

    [arnd@arndb.de: fix logic bug]
      Link: https://lkml.kernel.org/r/20230415081904.969049-1-arnd@kernel.org
    Link: https://lkml.kernel.org/r/20230414080418.110236-1-arnd@kernel.org
    Fixes: 9420f89db2dd ("mm: move most of core MM initialization to mm/mm_init.c")
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Tested-by: SeongJae Park <sj@kernel.org>
    Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
    Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
    Cc: kernel test robot <oliver.sang@intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-30 07:00:48 -04:00
Chris von Recklinghausen 434306679e mm: be less noisy during memory hotplug
Conflicts: mm/mm_init.c - fuzz

JIRA: https://issues.redhat.com/browse/RHEL-27741

commit dd31bad21980990d903133d2855ea0e2eccade5e
Author: Tomas Krcka <krckatom@amazon.de>
Date:   Thu Mar 23 17:43:49 2023 +0000

    mm: be less noisy during memory hotplug

    Turn a pr_info() into a pr_debug() to prevent dmesg spamming on systems
    where memory hotplug is a frequent operation.

    Link: https://lkml.kernel.org/r/20230323174349.35990-1-krckatom@amazon.de
    Signed-off-by: Tomas Krcka <krckatom@amazon.de>
    Suggested-by: Jan H. Schönherr <jschoenh@amazon.de>
    Acked-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-30 07:00:37 -04:00
Chris von Recklinghausen cb7dac2b1a mm: move kmem_cache_init() declaration to mm/slab.h
JIRA: https://issues.redhat.com/browse/RHEL-27741

commit d5d2c02a4980c2e22037679457bf2d921b86a503
Author: Mike Rapoport (IBM) <rppt@kernel.org>
Date:   Tue Mar 21 19:05:11 2023 +0200

    mm: move kmem_cache_init() declaration to mm/slab.h

    kmem_cache_init() is called only from mm_core_init(), there is no need to
    declare it in include/linux/slab.h

    Move kmem_cache_init() declaration to mm/slab.h

    Link: https://lkml.kernel.org/r/20230321170513.2401534-13-rppt@kernel.org
    Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
    Cc: Doug Berger <opendmb@gmail.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-30 07:00:34 -04:00
Chris von Recklinghausen 5b7e1e7745 mm: move mem_init_print_info() to mm_init.c
JIRA: https://issues.redhat.com/browse/RHEL-27741

commit eb8589b4f8c107c346421881963c0ee0b8367c2c
Author: Mike Rapoport (IBM) <rppt@kernel.org>
Date:   Tue Mar 21 19:05:10 2023 +0200

    mm: move mem_init_print_info() to mm_init.c

    mem_init_print_info() is only called from mm_core_init().

    Move it close to the caller and make it static.

    Link: https://lkml.kernel.org/r/20230321170513.2401534-12-rppt@kernel.org
    Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
    Acked-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
    Cc: Doug Berger <opendmb@gmail.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-30 07:00:33 -04:00
Chris von Recklinghausen c183dc21cd init,mm: fold late call to page_ext_init() to page_alloc_init_late()
JIRA: https://issues.redhat.com/browse/RHEL-27741

commit de57807e6f267a658a046dbca44dc40fe806d60f
Author: Mike Rapoport (IBM) <rppt@kernel.org>
Date:   Tue Mar 21 19:05:09 2023 +0200

    init,mm: fold late call to page_ext_init() to page_alloc_init_late()

    When deferred initialization of struct pages is enabled, page_ext_init()
    must be called after all the deferred initialization is done, but there is
    no point to keep it a separate call from kernel_init_freeable() right
    after page_alloc_init_late().

    Fold the call to page_ext_init() into page_alloc_init_late() and localize
    deferred_struct_pages variable.

    Link: https://lkml.kernel.org/r/20230321170513.2401534-11-rppt@kernel.org
    Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
    Cc: Doug Berger <opendmb@gmail.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-30 07:00:33 -04:00
Chris von Recklinghausen 1e880e2e96 mm: move init_mem_debugging_and_hardening() to mm/mm_init.c
JIRA: https://issues.redhat.com/browse/RHEL-27741

commit f2fc4b44ec2bb94c51c7ae1af9b1177d72705992
Author: Mike Rapoport (IBM) <rppt@kernel.org>
Date:   Tue Mar 21 19:05:08 2023 +0200

    mm: move init_mem_debugging_and_hardening() to mm/mm_init.c

    init_mem_debugging_and_hardening() is only called from mm_core_init().

    Move it close to the caller, make it static and rename it to
    mem_debugging_and_hardening_init() for consistency with surrounding
    convention.

    Link: https://lkml.kernel.org/r/20230321170513.2401534-10-rppt@kernel.org
    Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
    Acked-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
    Cc: Doug Berger <opendmb@gmail.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-30 07:00:33 -04:00
Chris von Recklinghausen 3138f85cac mm: call {ptlock,pgtable}_cache_init() directly from mm_core_init()
JIRA: https://issues.redhat.com/browse/RHEL-27741

commit 4cd1e9edf60efb20fad35cf5e9ade7ad75b34cd1
Author: Mike Rapoport (IBM) <rppt@kernel.org>
Date:   Tue Mar 21 19:05:07 2023 +0200

    mm: call {ptlock,pgtable}_cache_init() directly from mm_core_init()

    and drop pgtable_init() as it has no real value and its name is
    misleading.

    Link: https://lkml.kernel.org/r/20230321170513.2401534-9-rppt@kernel.org
    Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
    Cc: Doug Berger <opendmb@gmail.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Cc: Sergei Shtylyov <sergei.shtylyov@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-30 07:00:32 -04:00
Chris von Recklinghausen 532c3e9303 init,mm: move mm_init() to mm/mm_init.c and rename it to mm_core_init()
JIRA: https://issues.redhat.com/browse/RHEL-27741

commit b7ec1bf3e7b9dd9d3335c937f5d834680d74addf
Author: Mike Rapoport (IBM) <rppt@kernel.org>
Date:   Tue Mar 21 19:05:06 2023 +0200

    init,mm: move mm_init() to mm/mm_init.c and rename it to mm_core_init()

    Make mm_init() a part of mm/ codebase.  mm_core_init() better describes
    what the function does and does not clash with mm_init() in kernel/fork.c

    Link: https://lkml.kernel.org/r/20230321170513.2401534-8-rppt@kernel.org
    Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
    Acked-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
    Cc: Doug Berger <opendmb@gmail.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-30 07:00:32 -04:00
Chris von Recklinghausen 2e6baa337e mm: handle hashdist initialization in mm/mm_init.c
JIRA: https://issues.redhat.com/browse/RHEL-27741

commit 534ef4e19160b1034430c4c4fbc1bf94c0253a51
Author: Mike Rapoport (IBM) <rppt@kernel.org>
Date:   Tue Mar 21 19:05:03 2023 +0200

    mm: handle hashdist initialization in mm/mm_init.c

    The hashdist variable must be initialized before the first call to
    alloc_large_system_hash() and free_area_init() looks like a better place
    for it than page_alloc_init().

    Move hashdist handling to mm/mm_init.c

    Link: https://lkml.kernel.org/r/20230321170513.2401534-5-rppt@kernel.org
    Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
    Acked-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
    Cc: Doug Berger <opendmb@gmail.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-30 07:00:31 -04:00
Chris von Recklinghausen 1845b92dcf mm: move most of core MM initialization to mm/mm_init.c
Conflicts: mm/page_alloc.c, mm/mm_init.c - conflicts due to
	3f6dac0fd1b8 ("mm/page_alloc: make deferred page init free pages in MAX_ORDER blocks")
	and
	87a7ae75d738 ("mm/vmemmap/devdax: fix kernel crash when probing devdax devices")

JIRA: https://issues.redhat.com/browse/RHEL-27741

commit 9420f89db2dd611c5b436a13e13f74d65ecc3a6a
Author: Mike Rapoport (IBM) <rppt@kernel.org>
Date:   Tue Mar 21 19:05:02 2023 +0200

    mm: move most of core MM initialization to mm/mm_init.c

    The bulk of memory management initialization code is spread all over
    mm/page_alloc.c and makes navigating through page allocator functionality
    difficult.

    Move most of the functions marked __init and __meminit to mm/mm_init.c to
    make it better localized and allow some more spare room before
    mm/page_alloc.c reaches 10k lines.

    No functional changes.

    Link: https://lkml.kernel.org/r/20230321170513.2401534-4-rppt@kernel.org
    Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
    Acked-by: David Hildenbrand <david@redhat.com>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Cc: Doug Berger <opendmb@gmail.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-30 07:00:31 -04:00
Audra Mitchell c9d5756843 memory: move hotplug memory notifier priority to same file for easy sorting
JIRA: https://issues.redhat.com/browse/RHEL-27739

This patch is a backport of the following upstream commit:
commit 1eeaa4fd39b0b1b3e986f8eab6978e69b01e3c5e
Author: Liu Shixin <liushixin2@huawei.com>
Date:   Fri Sep 23 11:33:47 2022 +0800

    memory: move hotplug memory notifier priority to same file for easy sorting

    The priority of hotplug memory callback is defined in a different file.
    And there are some callers using numbers directly.  Collect them together
    into include/linux/memory.h for easy reading.  This allows us to sort
    their priorities more intuitively without additional comments.

    Link: https://lkml.kernel.org/r/20220923033347.3935160-9-liushixin2@huawei.com
    Signed-off-by: Liu Shixin <liushixin2@huawei.com>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
    Cc: Waiman Long <longman@redhat.com>
    Cc: zefan li <lizefan.x@bytedance.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Audra Mitchell <audra@redhat.com>
2024-04-09 09:42:51 -04:00
Audra Mitchell c5a1784372 mm/mm_init.c: use hotplug_memory_notifier() directly
JIRA: https://issues.redhat.com/browse/RHEL-27739

This patch is a backport of the following upstream commit:
commit d46722ef1c090541d56f706f3a90f3f2e84cdf0c
Author: Liu Shixin <liushixin2@huawei.com>
Date:   Fri Sep 23 11:33:44 2022 +0800

    mm/mm_init.c: use hotplug_memory_notifier() directly

    Commit 76ae847497bc52 ("Documentation: raise minimum supported version of
    GCC to 5.1") updated the minimum gcc version to 5.1.  So the problem
    mentioned in f02c696800 ("include/linux/memory.h: implement
    register_hotmemory_notifier()") no longer exist.  So we can now switch to
    use hotplug_memory_notifier() directly rather than
    register_hotmemory_notifier().

    Link: https://lkml.kernel.org/r/20220923033347.3935160-6-liushixin2@huawei.com
    Signed-off-by: Liu Shixin <liushixin2@huawei.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
    Cc: Waiman Long <longman@redhat.com>
    Cc: zefan li <lizefan.x@bytedance.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Audra Mitchell <audra@redhat.com>
2024-04-09 09:42:51 -04:00
Chris von Recklinghausen 2af7596eac mm: multi-gen LRU: groundwork
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit ec1c86b25f4bdd9dce6436c0539d2a6ae676e1c4
Author: Yu Zhao <yuzhao@google.com>
Date:   Sun Sep 18 02:00:02 2022 -0600

    mm: multi-gen LRU: groundwork

    Evictable pages are divided into multiple generations for each lruvec.
    The youngest generation number is stored in lrugen->max_seq for both
    anon and file types as they are aged on an equal footing. The oldest
    generation numbers are stored in lrugen->min_seq[] separately for anon
    and file types as clean file pages can be evicted regardless of swap
    constraints. These three variables are monotonically increasing.

    Generation numbers are truncated into order_base_2(MAX_NR_GENS+1) bits
    in order to fit into the gen counter in folio->flags. Each truncated
    generation number is an index to lrugen->lists[]. The sliding window
    technique is used to track at least MIN_NR_GENS and at most
    MAX_NR_GENS generations. The gen counter stores a value within [1,
    MAX_NR_GENS] while a page is on one of lrugen->lists[]. Otherwise it
    stores 0.

    There are two conceptually independent procedures: "the aging", which
    produces young generations, and "the eviction", which consumes old
    generations.  They form a closed-loop system, i.e., "the page reclaim".
    Both procedures can be invoked from userspace for the purposes of working
    set estimation and proactive reclaim.  These techniques are commonly used
    to optimize job scheduling (bin packing) in data centers [1][2].

    To avoid confusion, the terms "hot" and "cold" will be applied to the
    multi-gen LRU, as a new convention; the terms "active" and "inactive" will
    be applied to the active/inactive LRU, as usual.

    The protection of hot pages and the selection of cold pages are based
    on page access channels and patterns. There are two access channels:
    one through page tables and the other through file descriptors. The
    protection of the former channel is by design stronger because:
    1. The uncertainty in determining the access patterns of the former
       channel is higher due to the approximation of the accessed bit.
    2. The cost of evicting the former channel is higher due to the TLB
       flushes required and the likelihood of encountering the dirty bit.
    3. The penalty of underprotecting the former channel is higher because
       applications usually do not prepare themselves for major page
       faults like they do for blocked I/O. E.g., GUI applications
       commonly use dedicated I/O threads to avoid blocking rendering
       threads.

    There are also two access patterns: one with temporal locality and the
    other without.  For the reasons listed above, the former channel is
    assumed to follow the former pattern unless VM_SEQ_READ or VM_RAND_READ is
    present; the latter channel is assumed to follow the latter pattern unless
    outlying refaults have been observed [3][4].

    The next patch will address the "outlying refaults".  Three macros, i.e.,
    LRU_REFS_WIDTH, LRU_REFS_PGOFF and LRU_REFS_MASK, used later are added in
    this patch to make the entire patchset less diffy.

    A page is added to the youngest generation on faulting.  The aging needs
    to check the accessed bit at least twice before handing this page over to
    the eviction.  The first check takes care of the accessed bit set on the
    initial fault; the second check makes sure this page has not been used
    since then.  This protocol, AKA second chance, requires a minimum of two
    generations, hence MIN_NR_GENS.

    [1] https://dl.acm.org/doi/10.1145/3297858.3304053
    [2] https://dl.acm.org/doi/10.1145/3503222.3507731
    [3] https://lwn.net/Articles/495543/
    [4] https://lwn.net/Articles/815342/

    Link: https://lkml.kernel.org/r/20220918080010.2920238-6-yuzhao@google.com
    Signed-off-by: Yu Zhao <yuzhao@google.com>
    Acked-by: Brian Geffon <bgeffon@google.com>
    Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
    Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name>
    Acked-by: Steven Barrett <steven@liquorix.net>
    Acked-by: Suleiman Souhlal <suleiman@google.com>
    Tested-by: Daniel Byrne <djbyrne@mtu.edu>
    Tested-by: Donald Carr <d@chaos-reins.com>
    Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
    Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
    Tested-by: Shuang Zhai <szhai2@cs.rochester.edu>
    Tested-by: Sofia Trinh <sofia.trinh@edi.works>
    Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
    Cc: Andi Kleen <ak@linux.intel.com>
    Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
    Cc: Barry Song <baohua@kernel.org>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Hillf Danton <hdanton@sina.com>
    Cc: Jens Axboe <axboe@kernel.dk>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Michael Larabel <Michael@MichaelLarabel.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Mike Rapoport <rppt@kernel.org>
    Cc: Mike Rapoport <rppt@linux.ibm.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Qi Zheng <zhengqi.arch@bytedance.com>
    Cc: Tejun Heo <tj@kernel.org>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:13:45 -04:00
Yu Zhao 1587db62d8 include/linux/page-flags-layout.h: cleanups
Tidy things up and delete comments stating the obvious with typos or
making no sense.

Link: https://lkml.kernel.org/r/20210303071609.797782-2-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:42 -07:00
Gustavo A. R. Silva 01359eb201 mm: fix fall-through warnings for Clang
In preparation to enable -Wimplicit-fallthrough for Clang, fix a couple of
warnings by explicitly adding a break statement instead of just letting
the code fall through to the next, and by adding a fallthrough
pseudo-keyword in places where the code is intended to fall through.

Link: https://github.com/KSPP/linux/issues/115
Link: https://lkml.kernel.org/r/f5756988b8842a3f10008fbc5b0a654f828920a9.1605896059.git.gustavoars@kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:47 -08:00
Feng Tang 56f3547bfa mm: adjust vm_committed_as_batch according to vm overcommit policy
When checking a performance change for will-it-scale scalability mmap test
[1], we found very high lock contention for spinlock of percpu counter
'vm_committed_as':

    94.14%     0.35%  [kernel.kallsyms]         [k] _raw_spin_lock_irqsave
    48.21% _raw_spin_lock_irqsave;percpu_counter_add_batch;__vm_enough_memory;mmap_region;do_mmap;
    45.91% _raw_spin_lock_irqsave;percpu_counter_add_batch;__do_munmap;

Actually this heavy lock contention is not always necessary.  The
'vm_committed_as' needs to be very precise when the strict
OVERCOMMIT_NEVER policy is set, which requires a rather small batch number
for the percpu counter.

So keep 'batch' number unchanged for strict OVERCOMMIT_NEVER policy, and
lift it to 64X for OVERCOMMIT_ALWAYS and OVERCOMMIT_GUESS policies.  Also
add a sysctl handler to adjust it when the policy is reconfigured.

Benchmark with the same testcase in [1] shows 53% improvement on a 8C/16T
desktop, and 2097%(20X) on a 4S/72C/144T server.  We tested with test
platforms in 0day (server, desktop and laptop), and 80%+ platforms shows
improvements with that test.  And whether it shows improvements depends on
if the test mmap size is bigger than the batch number computed.

And if the lift is 16X, 1/3 of the platforms will show improvements,
though it should help the mmap/unmap usage generally, as Michal Hocko
mentioned:

: I believe that there are non-synthetic worklaods which would benefit from
: a larger batch.  E.g.  large in memory databases which do large mmaps
: during startups from multiple threads.

[1] https://lore.kernel.org/lkml/20200305062138.GI5972@shao2-debian/

Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Qian Cai <cai@lca.pw>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: kernel test robot <rong.a.chen@intel.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1589611660-89854-4-git-send-email-feng.tang@intel.com
Link: http://lkml.kernel.org/r/1592725000-73486-4-git-send-email-feng.tang@intel.com
Link: http://lkml.kernel.org/r/1594389708-60781-5-git-send-email-feng.tang@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:26 -07:00
Jing Xia 86fea8b494 mm/mm_init.c: report kasan-tag information stored in page->flags
The pageflags_layout_usage shows incorrect message by means of
mminit_loglevel when Kasan runs in the mode of software tag-based
enabled with CONFIG_KASAN_SW_TAGS.  This patch corrects it and reports
kasan-tag information.

Signed-off-by: Jing Xia <jing.xia@unisoc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Chunyan Zhang <chunyan.zhang@unisoc.com>
Cc: Orson Zhai <orson.zhai@unisoc.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Link: http://lkml.kernel.org/r/1586929370-10838-1-git-send-email-jing.xia.mail@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02 10:59:12 -07:00
Mateusz Nosek e46b893dd1 mm/mm_init.c: clean code. Use BUILD_BUG_ON when comparing compile time constant
MAX_ZONELISTS is a compile time constant, so it should be compared using
BUILD_BUG_ON not BUG_ON.

Signed-off-by: Mateusz Nosek <mateusznosek0@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Link: http://lkml.kernel.org/r/20200228224617.11343-1-mateusznosek0@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:41 -07:00
Thomas Gleixner 457c899653 treewide: Add SPDX license identifier for missed files
Add SPDX license identifiers to all files which:

 - Have no license information of any form

 - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
   initial scan/conversion to ignore the file

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

  GPL-2.0-only

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21 10:50:45 +02:00
Arun KS ca79b0c211 mm: convert totalram_pages and totalhigh_pages variables to atomic
totalram_pages and totalhigh_pages are made static inline function.

Main motivation was that managed_page_count_lock handling was complicating
things.  It was discussed in length here,
https://lore.kernel.org/patchwork/patch/995739/#1181785 So it seemes
better to remove the lock and convert variables to atomic, with preventing
poteintial store-to-read tearing as a bonus.

[akpm@linux-foundation.org: coding style fixes]
Link: http://lkml.kernel.org/r/1542090790-21750-4-git-send-email-arunks@codeaurora.org
Signed-off-by: Arun KS <arunks@codeaurora.org>
Suggested-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 12:11:47 -08:00
Pavel Tatashin c1093b746c mm: access zone->node via zone_to_nid() and zone_set_nid()
zone->node is configured only when CONFIG_NUMA=y, so it is a good idea to
have inline functions to access this field in order to avoid ifdef's in c
files.

Link: http://lkml.kernel.org/r/20180730101757.28058-3-osalvador@techadventures.net
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22 10:52:45 -07:00