Centos-kernel-stream-9/mm
Rafael Aquini fe6c0243f4 mm/migrate: optimize hotplug-time demotion order updates
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2023396

This patch is a backport of the following upstream commit:
commit 295be91f7ef0027fca2f2e4788e99731aa931834
Author: Dave Hansen <dave.hansen@linux.intel.com>
Date:   Mon Oct 18 15:15:29 2021 -0700

    mm/migrate: optimize hotplug-time demotion order updates

    Patch series "mm/migrate: 5.15 fixes for automatic demotion", v2.

    This contains two fixes for the "automatic demotion" code which was
    merged into 5.15:

     * Fix memory hotplug performance regression by watching
       suppressing any real action on irrelevant hotplug events.

     * Ensure CPU hotplug handler is registered when memory hotplug
       is disabled.

    This patch (of 2):

    == tl;dr ==

    Automatic demotion opted for a simple, lazy approach to handling hotplug
    events.  This noticeably slows down memory hotplug[1].  Optimize away
    updates to the demotion order when memory hotplug events should have no
    effect.

    This has no effect on CPU hotplug.  There is no known problem on the CPU
    side and any work there will be in a separate series.

    == Background ==

    Automatic demotion is a memory migration strategy to ensure that new
    allocations have room in faster memory tiers on tiered memory systems.
    The kernel maintains an array (node_demotion[]) to drive these
    migrations.

    The node_demotion[] path is calculated by starting at nodes with CPUs
    and then "walking" to nodes with memory.  Only hotplug events which
    online or offline a node with memory (N_ONLINE) or CPUs (N_CPU) will
    actually affect the migration order.

    == Problem ==

    However, the current code is lazy.  It completely regenerates the
    migration order on *any* CPU or memory hotplug event.  The logic was
    that these events are extremely rare and that the overhead from
    indiscriminate order regeneration is minimal.

    Part of the update logic involves a synchronize_rcu(), which is a pretty
    big hammer.  Its overhead was large enough to be detected by some 0day
    tests that watch memory hotplug performance[1].

    == Solution ==

    Add a new helper (node_demotion_topo_changed()) which can differentiate
    between superfluous and impactful hotplug events.  Skip the expensive
    update operation for superfluous events.

    == Aside: Locking ==

    It took me a few moments to declare the locking to be safe enough for
    node_demotion_topo_changed() to work.  It all hinges on the memory
    hotplug lock:

    During memory hotplug events, 'mem_hotplug_lock' is held for write.
    This ensures that two memory hotplug events can not be called
    simultaneously.

    CPU hotplug has a similar lock (cpuhp_state_mutex) which also provides
    mutual exclusion between CPU hotplug events.  In addition, the demotion
    code acquire and hold the mem_hotplug_lock for read during its CPU
    hotplug handlers.  This provides mutual exclusion between the demotion
    memory hotplug callbacks and the CPU hotplug callbacks.

    This effectively allows treating the migration target generation code to
    act as if it is single-threaded.

    1. https://lore.kernel.org/all/20210905135932.GE15026@xsang-OptiPlex-9020/

    Link: https://lkml.kernel.org/r/20210924161251.093CCD06@davehans-spike.ostc.intel.com
    Link: https://lkml.kernel.org/r/20210924161253.D7673E31@davehans-spike.ostc.intel.com
    Fixes: 884a6e5d1f93 ("mm/migrate: update node demotion order on hotplug events")
    Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
    Reported-by: kernel test robot <oliver.sang@intel.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Cc: "Huang, Ying" <ying.huang@intel.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Wei Xu <weixugc@google.com>
    Cc: Oscar Salvador <osalvador@suse.de>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Greg Thelen <gthelen@google.com>
    Cc: Yang Shi <yang.shi@linux.alibaba.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Rafael Aquini <aquini@redhat.com>
2021-11-29 11:44:02 -05:00
..
damon mm/damon: don't use strnlen() with known-bogus source length 2021-11-29 11:43:53 -05:00
kasan mm/kasan: move kasan.fault to mm/kasan/report.c 2021-11-29 11:41:44 -05:00
kfence kfence: test: fail fast if disabled at boot 2021-11-29 11:43:20 -05:00
Kconfig mm/idle_page_tracking: make PG_idle reusable 2021-11-29 11:43:23 -05:00
Kconfig.debug
Makefile mm: introduce Data Access MONitor (DAMON) 2021-11-29 11:43:21 -05:00
backing-dev.c writeback: fix bandwidth estimate for spiky workload 2021-11-29 11:40:48 -05:00
balloon_compaction.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
bootmem_info.c mm/bootmem_info.c: mark __init on register_page_bootmem_info_section 2021-11-29 11:41:36 -05:00
cleancache.c
cma.c mm/cma: mark CMA on x86_64 tech preview and print RHEL-specific infos 2021-08-30 14:31:13 -04:00
cma.h mm: cma: support sysfs 2021-05-05 11:27:24 -07:00
cma_debug.c mm/cma: change cma mutex to irq safe spinlock 2021-05-05 11:27:21 -07:00
cma_sysfs.c mm: cma: support sysfs 2021-05-05 11:27:24 -07:00
compaction.c mm: remove pfn_valid_within() and CONFIG_HOLES_IN_ZONE 2021-11-29 11:43:02 -05:00
debug.c mm/debug: sync up latest migrate_reason to migrate_reason_names 2021-11-29 11:43:56 -05:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: fix corrupted page flag 2021-11-29 11:40:42 -05:00
dmapool.c mm/dmapool: use DEVICE_ATTR_RO macro 2021-06-29 10:53:52 -07:00
early_ioremap.c mm/early_ioremap.c: remove redundant early_ioremap_shutdown() 2021-11-29 11:43:16 -05:00
fadvise.c
failslab.c
filemap.c mm: remove irqsave/restore locking from contexts with irqs enabled 2021-11-29 11:40:44 -05:00
frontswap.c mm/mempool: minor coding style tweaks 2021-05-05 11:27:27 -07:00
gup.c Revert "mm/gup: remove try_get_page(), call try_get_compound_head() directly" 2021-11-29 11:42:59 -05:00
gup_test.c selftests/vm: gup_test: test faulting in kernel, and verify pinnable pages 2021-05-05 11:27:26 -07:00
gup_test.h selftests/vm: gup_test: fix test flag 2021-05-05 11:27:26 -07:00
highmem.c mm: in_irq() cleanup 2021-11-29 11:43:17 -05:00
hmm.c mm/hmm: bypass devmap pte when all pfn requested flags are fulfilled 2021-11-29 11:43:40 -05:00
huge_memory.c mm,do_huge_pmd_numa_page: remove unnecessary TLB flushing code 2021-11-29 11:41:33 -05:00
hugetlb.c mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY 2021-11-29 11:42:21 -05:00
hugetlb_cgroup.c hugetlb: make free_huge_page irq safe 2021-05-05 11:27:22 -07:00
hugetlb_vmemmap.c mm: hugetlb: introduce CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON 2021-06-30 20:47:26 -07:00
hugetlb_vmemmap.h mm: hugetlb: introduce nr_free_vmemmap_pages in the struct hstate 2021-06-30 20:47:25 -07:00
hwpoison-inject.c mm: hwpoison: don't drop slab caches for offlining non-LRU page 2021-11-29 11:41:59 -05:00
init-mm.c mm: add setup_initial_init_mm() helper 2021-07-08 11:48:21 -07:00
internal.h mm/numa: automatically generate node migration order 2021-11-29 11:42:06 -05:00
interval_tree.c mm/interval_tree: add comments to improve code readability 2021-04-30 11:20:38 -07:00
io-mapping.c mm: add a io_mapping_map_user helper 2021-04-30 11:20:39 -07:00
ioremap.c mm: move ioremap_page_range to vmalloc.c 2021-11-29 11:43:15 -05:00
khugepaged.c huge tmpfs: SGP_NOALLOC to stop collapse_file() on race 2021-11-29 11:41:11 -05:00
kmemleak.c mm/kmemleak: allow __GFP_NOLOCKDEP passed to kmemleak's gfp 2021-11-29 11:43:43 -05:00
ksm.c mm: KSM: fix data type 2021-11-29 11:42:30 -05:00
list_lru.c mm: vmscan: consolidate shrinker_maps handling code 2021-05-05 11:27:23 -07:00
maccess.c ARM: 9115/1: mm/maccess: fix unaligned copy_{from,to}_kernel_nofault 2021-11-29 11:40:28 -05:00
madvise.c mm/madvise: add MADV_WILLNEED to process_madvise() 2021-11-29 11:42:35 -05:00
mapping_dirty_helpers.c mm/mapping_dirty_helpers: remove double Note in kerneldoc 2021-07-01 11:06:02 -07:00
memblock.c memblock: exclude NOMAP regions from kmemleak 2021-11-29 11:44:00 -05:00
memcontrol.c memcg: flush lruvec stats in the refault 2021-11-29 11:43:51 -05:00
memfd.c Reimplement RLIMIT_MEMLOCK on top of ucounts 2021-04-30 14:14:02 -05:00
memory-failure.c mm/memory_failure: fix the missing pte_unmap() call 2021-11-29 11:43:58 -05:00
memory.c mm: fix the deadlock in finish_fault() 2021-07-23 17:43:28 -07:00
memory_hotplug.c mm/memory_hotplug: use helper zone_is_zone_device() to simplify the code 2021-11-29 11:43:13 -05:00
mempolicy.c mm/mempolicy: fix a race between offset_il_node and mpol_rebind_task 2021-11-29 11:43:44 -05:00
mempool.c kasan: use separate (un)poison implementation for integrated init 2021-06-04 19:32:21 +01:00
memremap.c mm/memory_hotplug: remove nid parameter from arch_remove_memory() 2021-11-29 11:43:05 -05:00
memtest.c
migrate.c mm/migrate: optimize hotplug-time demotion order updates 2021-11-29 11:44:02 -05:00
mincore.c
mlock.c mm: introduce memfd_secret system call to create "secret" memory areas 2021-07-08 11:48:21 -07:00
mm_init.c include/linux/page-flags-layout.h: cleanups 2021-04-30 11:20:42 -07:00
mmap.c remap_file_pages: Use vma_lookup() instead of find_vma() 2021-11-29 11:41:35 -05:00
mmap_lock.c mm: mmap_lock: fix disabling preemption directly 2021-07-23 17:43:28 -07:00
mmu_gather.c mm: eliminate "expecting prototype" kernel-doc warnings 2021-04-16 16:10:36 -07:00
mmu_notifier.c mm/mmu_notifiers: ensure range_end() is paired with range_start() 2021-03-25 09:22:55 -07:00
mmzone.c
mprotect.c mm: device exclusive memory access 2021-07-01 11:06:03 -07:00
mremap.c mm/mremap: fix memory account on do_munmap() failure 2021-11-29 11:41:36 -05:00
msync.c mm/msync: exit early when the flags is an MS_ASYNC and start < vm_start 2021-04-30 11:20:37 -07:00
nommu.c mm: ignore MAP_DENYWRITE in ksys_mmap_pgoff() 2021-11-29 11:40:33 -05:00
oom_kill.c mm: introduce process_mrelease system call 2021-11-29 11:42:25 -05:00
page-writeback.c writeback: use READ_ONCE for unlocked reads of writeback stats 2021-11-29 11:40:49 -05:00
page_alloc.c mm/page_alloc.c: avoid accessing uninitialized pcp page migratetype 2021-11-29 11:43:42 -05:00
page_counter.c mm: page_counter: mitigate consequences of a page_counter underflow 2021-04-30 11:20:38 -07:00
page_ext.c mm/idle_page_tracking: make PG_idle reusable 2021-11-29 11:43:23 -05:00
page_idle.c mm/idle_page_tracking: make PG_idle reusable 2021-11-29 11:43:23 -05:00
page_io.c swap: fix swapfile read/write offset 2021-03-02 17:25:46 -07:00
page_isolation.c mm: remove pfn_valid_within() and CONFIG_HOLES_IN_ZONE 2021-11-29 11:43:02 -05:00
page_owner.c mm: remove pfn_valid_within() and CONFIG_HOLES_IN_ZONE 2021-11-29 11:43:02 -05:00
page_poison.c mm: page_poison: print page info when corruption is caught 2021-04-30 11:20:36 -07:00
page_reporting.c mm/page_reporting: allow driver to specify reporting order 2021-06-29 10:53:47 -07:00
page_reporting.h mm/page_reporting: export reporting order as module parameter 2021-06-29 10:53:47 -07:00
page_vma_mapped.c mm: device exclusive memory access 2021-07-01 11:06:03 -07:00
pagewalk.c mm: pagewalk: fix walk for hugepage tables 2021-06-29 10:53:49 -07:00
percpu-internal.h Merge branch 'for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu 2021-07-01 17:17:24 -07:00
percpu-km.c percpu: flush tlb in pcpu_reclaim_populated() 2021-07-04 18:30:17 +00:00
percpu-stats.c percpu: rework memcg accounting 2021-06-05 20:43:15 +00:00
percpu-vm.c percpu: flush tlb in pcpu_reclaim_populated() 2021-07-04 18:30:17 +00:00
percpu.c percpu: remove export of pcpu_base_addr 2021-11-29 11:43:30 -05:00
pgalloc-track.h mm: fix typos in comments 2021-05-07 00:26:35 -07:00
pgtable-generic.c mm/thp: fix __split_huge_pmd_locked() on shmem migration entry 2021-06-16 09:24:42 -07:00
process_vm_access.c mm/process_vm_access.c: remove duplicate include 2021-05-05 11:27:27 -07:00
ptdump.c mm: ptdump: fix build failure 2021-04-16 16:10:37 -07:00
readahead.c mm: Protect operations adding pages to page cache with invalidate_lock 2021-11-29 11:40:22 -05:00
rmap.c mm: remove redundant compound_head() calling 2021-11-29 11:43:14 -05:00
rodata_test.c
secretmem.c mm/secretmem: use refcount_t instead of atomic_t 2021-11-29 11:43:19 -05:00
shmem.c mm/shmem.c: fix judgment error in shmem_is_huge() 2021-11-29 11:43:54 -05:00
shuffle.c mm: eliminate "expecting prototype" kernel-doc warnings 2021-04-16 16:10:36 -07:00
shuffle.h mm/shuffle: fix section mismatch warning 2021-05-22 15:09:07 -10:00
slab.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
slab.h mm/memcg: fix NULL pointer dereference in memcg_slab_free_hook() 2021-07-30 10:14:39 -07:00
slab_common.c mm: slub: move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context 2021-11-29 11:42:54 -05:00
slob.c mm: Don't build mm_dump_obj() on CONFIG_PRINTK=n kernels 2021-03-08 14:18:46 -08:00
slub.c mm, slub: convert kmem_cpu_slab protection to local_lock 2021-11-29 11:42:57 -05:00
sparse-vmemmap.c mm: sparsemem: split the huge PMD mapping of vmemmap pages 2021-06-30 20:47:26 -07:00
sparse.c mm: introduce memmap_alloc() to unify memory map allocation 2021-11-29 11:41:51 -05:00
swap.c mm: fs: invalidate bh_lrus for only cold path 2021-11-29 11:43:55 -05:00
swap_cgroup.c
swap_slots.c mm/swap_slots.c: delete meaningless forward declarations 2021-06-29 10:53:49 -07:00
swap_state.c Revert "mm: swap: check if swap backing device is congested or not" 2021-08-20 11:31:42 -07:00
swapfile.c mm, memcg: inline swap-related functions to improve disabled memcg config 2021-11-29 11:41:15 -05:00
truncate.c fs: inode: count invalidated shadow pages in pginodesteal 2021-11-29 11:40:53 -05:00
usercopy.c
userfaultfd.c userfaultfd: change mmap_changing to atomic 2021-11-29 11:42:04 -05:00
util.c mm: fix uninitialized use in overcommit_policy_handler 2021-11-29 11:43:58 -05:00
vmacache.c
vmalloc.c mm: don't allow executable ioremap mappings 2021-11-29 11:43:15 -05:00
vmpressure.c mm/vmpressure: replace vmpressure_to_css() with vmpressure_to_memcg() 2021-11-29 11:42:13 -05:00
vmscan.c mm,vmscan: fix divide by zero in get_scan_count 2021-11-29 11:43:41 -05:00
vmstat.c mm/vmstat: protect per cpu variables with preempt disable on RT 2021-11-29 11:43:32 -05:00
workingset.c memcg: flush lruvec stats in the refault 2021-11-29 11:43:51 -05:00
z3fold.c mm/z3fold: add kerneldoc fields for z3fold_pool 2021-07-01 11:06:03 -07:00
zbud.c mm/zbud: add kerneldoc fields for zbud_pool 2021-07-01 11:06:03 -07:00
zpool.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
zsmalloc.c mm/zsmalloc.c: improve readability for async_free_zspage() 2021-07-01 11:06:02 -07:00
zswap.c mm/zswap.c: fix two bugs in zswap_writeback_entry() 2021-06-30 20:47:31 -07:00