linux-kernelorg-stable/mm
Mateusz Guzik d6ff4c8f65
fs: avoid mmap sem relocks when coredumping with many missing pages
Dumping processes with large allocated and mostly not-faulted areas is
very slow.

Borrowing a test case from Tavian Barnes:

int main(void) {
    char *mem = mmap(NULL, 1ULL << 40, PROT_READ | PROT_WRITE,
            MAP_ANONYMOUS | MAP_NORESERVE | MAP_PRIVATE, -1, 0);
    printf("%p %m\n", mem);
    if (mem != MAP_FAILED) {
            mem[0] = 1;
    }
    abort();
}

That's 1TB of almost completely not-populated area.

On my test box it takes 13-14 seconds to dump.

The profile shows:
-   99.89%     0.00%  a.out
     entry_SYSCALL_64_after_hwframe
     do_syscall_64
     syscall_exit_to_user_mode
     arch_do_signal_or_restart
   - get_signal
      - 99.89% do_coredump
         - 99.88% elf_core_dump
            - dump_user_range
               - 98.12% get_dump_page
                  - 64.19% __get_user_pages
                     - 40.92% gup_vma_lookup
                        - find_vma
                           - mt_find
                                4.21% __rcu_read_lock
                                1.33% __rcu_read_unlock
                     - 3.14% check_vma_flags
                          0.68% vma_is_secretmem
                       0.61% __cond_resched
                       0.60% vma_pgtable_walk_end
                       0.59% vma_pgtable_walk_begin
                       0.58% no_page_table
                  - 15.13% down_read_killable
                       0.69% __cond_resched
                    13.84% up_read
                 0.58% __cond_resched

Almost 29% of the time is spent relocking the mmap semaphore between
calls to get_dump_page() which find nothing.

Whacking that results in times of 10 seconds (down from 13-14).

While here make the thing killable.

The real problem is the page-sized iteration and the real fix would
patch it up instead. It is left as an exercise for the mm-familiar
reader.

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/r/20250119103205.2172432-1-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-21 10:25:32 +01:00
..
damon mm/damon/core: use str_high_low() helper in damos_wmark_wait_us() 2025-01-25 20:22:46 -08:00
kasan kasan: sw_tags: use str_on_off() helper in kasan_init_sw_tags() 2025-01-25 20:22:46 -08:00
kfence kfence: skip __GFP_THISNODE allocations on NUMA systems 2025-02-01 03:53:26 -08:00
kmsan mm/memblock: add memblock_alloc_or_panic interface 2025-01-25 20:22:38 -08:00
Kconfig mm: add build-time option for hotplug memory default online type 2025-01-25 20:22:21 -08:00
Kconfig.debug
Makefile mm: pgtable: reclaim empty PTE page in madvise(MADV_DONTNEED) 2025-01-13 22:40:48 -08:00
backing-dev.c
balloon_compaction.c
bootmem_info.c bootmem: stop using page->index 2024-11-07 14:38:07 -08:00
cma.c cma: enforce non-zero pageblock_order during cma_init_reserved_mem() 2024-11-14 22:49:19 -08:00
cma.h mm: change type of cma_area_count to unsigned int 2025-01-13 22:40:35 -08:00
cma_debug.c
cma_sysfs.c
compaction.c mm: compaction: use the proper flag to determine watermarks 2025-02-01 03:53:25 -08:00
debug.c mm/debug: introduce VM_WARN_ON_VMG() to dump VMA merge state 2025-01-25 20:22:23 -08:00
debug_page_alloc.c
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: Use pxdp_get() for accessing page table entries 2024-09-17 01:07:01 -07:00
dmapool.c
dmapool_test.c
early_ioremap.c mm/early_ioremap: add null pointer checks to prevent NULL-pointer dereference 2025-01-13 22:40:59 -08:00
execmem.c alloc_tag: populate memory for module tags as needed 2024-11-07 14:25:16 -08:00
fadvise.c fdget(), trivial conversions 2024-11-03 01:28:06 -05:00
fail_page_alloc.c fault-inject: improve build for CONFIG_FAULT_INJECTION=n 2024-09-01 20:43:33 -07:00
failslab.c fault-inject: improve build for CONFIG_FAULT_INJECTION=n 2024-09-01 20:43:33 -07:00
filemap.c The various patchsets are summarized below. Plus of course many 2025-01-26 18:36:23 -08:00
folio-compat.c mm/writeback: add folio_mark_dirty_lock() 2024-11-05 11:14:32 +01:00
gup.c fs: avoid mmap sem relocks when coredumping with many missing pages 2025-02-21 10:25:32 +01:00
gup_test.c
gup_test.h
highmem.c
hmm.c
huge_memory.c mm/huge_memory: convert has_hwpoisoned into a pure folio flag 2025-01-25 20:22:41 -08:00
hugetlb.c mm/hugetlb: fix hugepage allocation for interleaved memory nodes 2025-02-01 03:53:27 -08:00
hugetlb_cgroup.c mm/hugetlb-cgroup: convert hugetlb_cgroup_css_offline() to work on folios 2025-01-25 20:22:42 -08:00
hugetlb_vmemmap.c treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
hugetlb_vmemmap.h
hwpoison-inject.c
init-mm.c mm: convert mm_lock_seq to a proper seqcount 2025-01-13 22:40:50 -08:00
internal.h mm/truncate: add folio_unmap_invalidate() helper 2025-01-25 20:22:43 -08:00
interval_tree.c
io-mapping.c
ioremap.c
khugepaged.c The various patchsets are summarized below. Plus of course many 2025-01-26 18:36:23 -08:00
kmemleak.c mm: kmemleak: fix upper boundary check for physical address objects 2025-02-01 03:53:25 -08:00
ksm.c ksm: add ksm involvement information for each process 2025-01-25 20:22:40 -08:00
list_lru.c mm/list_lru: fix false warning of negative counter 2024-12-30 17:59:10 -08:00
maccess.c kasan: migrate copy_user_test to kunit 2024-11-11 00:26:44 -08:00
madvise.c mm: pgtable: reclaim empty PTE page in madvise(MADV_DONTNEED) 2025-01-13 22:40:48 -08:00
mapping_dirty_helpers.c
memblock.c mm/memblock: add memblock_alloc_or_panic interface 2025-01-25 20:22:38 -08:00
memcontrol-v1.c mm: remove the non-useful else after a break in a if statement 2025-01-13 22:40:40 -08:00
memcontrol-v1.h mm: memcg: declare do_memsw_account inline 2024-12-05 19:54:46 -08:00
memcontrol.c memcg: fix soft lockup in the OOM process 2025-01-25 20:22:35 -08:00
memfd.c mm/memfd: use strncpy_from_user() to read memfd name 2025-01-25 20:22:40 -08:00
memory-failure.c treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
memory-tiers.c memory tiers: use default_dram_perf_ref_source in log message 2024-09-26 14:01:44 -07:00
memory.c The various patchsets are summarized below. Plus of course many 2025-01-26 18:36:23 -08:00
memory_hotplug.c mm: add build-time option for hotplug memory default online type 2025-01-25 20:22:21 -08:00
mempolicy.c mm/hugetlb: rename isolate_hugetlb() to folio_isolate_hugetlb() 2025-01-25 20:22:41 -08:00
mempool.c
memremap.c
memtest.c
migrate.c mm: separate move/undo parts from migrate_pages_batch() 2025-01-25 20:22:45 -08:00
migrate_device.c mm: remap unused subpages to shared zeropage when splitting isolated thp 2024-09-09 16:39:03 -07:00
mincore.c
mlock.c mm/mlock: set the correct prev on failure 2024-11-07 14:14:58 -08:00
mm_init.c mm/memmap: prevent double scanning of memmap by kmemleak 2025-01-25 20:22:30 -08:00
mm_slot.h
mmap.c mm: make mmap_region() internal 2025-01-25 20:22:38 -08:00
mmap_lock.c mm: mmap_lock: optimize mmap_lock tracepoints 2025-01-13 22:40:34 -08:00
mmu_gather.c mm: pgtable: move __tlb_remove_table_one() in x86 to generic file 2025-01-25 20:22:23 -08:00
mmu_notifier.c
mmzone.c
mprotect.c mm: add PTE_MARKER_GUARD PTE marker 2024-11-11 00:26:44 -08:00
mremap.c mm: clear uffd-wp PTE/PMD state on mremap() 2025-01-12 19:03:37 -08:00
mseal.c mseal: remove can_do_mseal() 2025-01-13 22:40:51 -08:00
msync.c
nommu.c fsnotify: generate pre-content permission event on page fault 2024-12-11 17:28:41 +01:00
numa.c mm/memblock: add memblock_alloc_or_panic interface 2025-01-25 20:22:38 -08:00
numa_emulation.c mm/fake-numa: allow later numa node hotplug 2025-01-25 20:22:29 -08:00
numa_memblks.c mm/fake-numa: allow later numa node hotplug 2025-01-25 20:22:29 -08:00
oom_kill.c treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
page-writeback.c treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
page_alloc.c treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
page_counter.c kernel/cgroup: Add "dmem" memory accounting cgroup 2025-01-06 17:24:38 +01:00
page_ext.c
page_frag_cache.c mm/page_alloc: export free_frozen_pages() instead of free_unref_page() 2025-01-13 22:40:31 -08:00
page_idle.c mm/page_idle: constify 'struct bin_attribute' 2025-01-25 20:22:19 -08:00
page_io.c mm, swap: clean up device availability check 2025-01-25 20:22:36 -08:00
page_isolation.c mm/page_isolation: don't pass gfp flags to start_isolate_page_range() 2025-01-13 22:40:44 -08:00
page_owner.c
page_poison.c
page_reporting.c
page_reporting.h
page_table_check.c
page_vma_mapped.c mm: mass constification of folio/page pointers 2024-11-07 14:38:07 -08:00
pagewalk.c mm: pagewalk: add the ability to install PTEs 2024-11-11 00:26:44 -08:00
percpu-internal.h
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c mm/memblock: add memblock_alloc_or_panic interface 2025-01-25 20:22:38 -08:00
pgalloc-track.h
pgtable-generic.c mm: add RCU annotation to pte_offset_map(_lock) 2024-12-18 19:04:43 -08:00
process_vm_access.c mm: refactor mm_access() to not return NULL 2024-11-05 16:56:23 -08:00
pt_reclaim.c mm: pgtable: reclaim empty PTE page in madvise(MADV_DONTNEED) 2025-01-13 22:40:48 -08:00
ptdump.c
readahead.c The various patchsets are summarized below. Plus of course many 2025-01-26 18:36:23 -08:00
rmap.c mm: mass constification of folio/page pointers 2024-11-07 14:38:07 -08:00
rodata_test.c mm/rodata_test: verify test data is unchanged, rather than non-zero 2025-01-13 22:40:38 -08:00
secretmem.c add a string-to-qstr constructor 2025-01-27 19:25:45 -05:00
shmem.c The various patchsets are summarized below. Plus of course many 2025-01-26 18:36:23 -08:00
shmem_quota.c
show_mem.c mm/show_mem: use str_yes_no() helper in show_free_areas() 2024-11-07 14:38:08 -08:00
shrinker.c mm: shrinker: avoid memleak in alloc_shrinker_info 2024-10-31 20:27:04 -07:00
shrinker_debug.c saner replacement for debugfs_rename() 2025-01-15 13:14:37 +01:00
shuffle.c
shuffle.h
slab.h mm/slab: fix kernel-doc func param names 2025-01-13 10:22:04 +01:00
slab_common.c The various patchsets are summarized below. Plus of course many 2025-01-26 18:36:23 -08:00
slub.c Driver core and debugfs updates 2025-01-28 12:25:12 -08:00
sparse-vmemmap.c mm/memmap: prevent double scanning of memmap by kmemleak 2025-01-25 20:22:30 -08:00
sparse.c mm/memblock: add memblock_alloc_or_panic interface 2025-01-25 20:22:38 -08:00
swap.c mm/filemap: add read support for RWF_DONTCACHE 2025-01-25 20:22:43 -08:00
swap.h mm: fix swap_read_folio_zeromap() for large folios with partial zeromap 2024-09-17 01:07:01 -07:00
swap_cgroup.c mm/swap_cgroup: decouple swap cgroup recording and clearing 2025-01-25 20:22:19 -08:00
swap_slots.c mm, swap_slots: remove slot cache for freeing path 2025-01-25 20:22:37 -08:00
swap_state.c mm: remove unnecessary calls to lru_add_drain 2025-01-25 20:22:21 -08:00
swapfile.c mm, swap: fix reclaim offset calculation error during allocation 2025-02-01 03:53:26 -08:00
truncate.c mm/truncate: add folio_unmap_invalidate() helper 2025-01-25 20:22:43 -08:00
usercopy.c
userfaultfd.c mm: userfaultfd: recheck dst_pmd entry in move_pages_pte() 2025-01-13 22:40:46 -08:00
util.c mm: add comments to do_mmap(), mmap_region() and vm_mmap() 2025-01-13 22:40:59 -08:00
vma.c mm: make mmap_region() internal 2025-01-25 20:22:38 -08:00
vma.h mm: make mmap_region() internal 2025-01-25 20:22:38 -08:00
vma_internal.h mm/vma: move brk() internals to mm/vma.c 2025-01-13 22:40:42 -08:00
vmalloc.c mm: alloc_pages_bulk: rename API 2025-01-25 20:22:31 -08:00
vmpressure.c
vmscan.c mm/vmscan: accumulate nr_demoted for accurate demotion statistics 2025-02-01 03:53:24 -08:00
vmstat.c vmstat: disable vmstat_work on vmstat_cpu_down_prep() 2025-01-12 19:03:38 -08:00
workingset.c mm/mglru: rework workingset protection 2025-01-25 20:22:39 -08:00
z3fold.c
zbud.c
zpdesc.h mm/zsmalloc: introduce __zpdesc_clear/set_zsmalloc() 2025-01-25 20:22:35 -08:00
zpool.c
zsmalloc.c mm/zsmalloc: add __maybe_unused attribute for is_first_zpdesc() 2025-02-01 03:53:23 -08:00
zswap.c The various patchsets are summarized below. Plus of course many 2025-01-26 18:36:23 -08:00