Centos-kernel-stream-9

Commit Graph

Author	SHA1	Message	Date
Rafael Aquini	c8c9c0b259	mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER JIRA: https://issues.redhat.com/browse/RHEL-27745 Conflicts: * arch//Kconfig: all hunks dropped as there were only text blurbs and comments being changed with no functional changes whatsoever, and RHEL9 is missing several (unrelated) commits to these arches that tranform the text blurbs in the way these non-functional hunks were expecting; drivers/accel/qaic/qaic_data.c: hunk dropped due to RHEL-only commit `083c0cdce2` ("Merge DRM changes from upstream v6.8..v6.9"); * drivers/gpu/drm/i915/gem/selftests/huge_pages.c: hunk dropped due to RHEL-only commit `ca8b16c11b` ("Merge DRM changes from upstream v6.7..v6.8"); * drivers/gpu/drm/ttm/tests/ttm_pool_test.c: all hunks dropped due to RHEL-only commit `ca8b16c11b` ("Merge DRM changes from upstream v6.7..v6.8"); * drivers/video/fbdev/vermilion/vermilion.c: hunk dropped as RHEL9 misses commit `dbe7e429fe` ("vmlfb: framebuffer driver for Intel Vermilion Range"); * include/linux/pageblock-flags.h: differences due to out-of-order backport of upstream commits 72801513b2bf ("mm: set pageblock_order to HPAGE_PMD_ORDER in case with !CONFIG_HUGETLB_PAGE but THP enabled"), and 3a7e02c040b1 ("minmax: avoid overly complicated constant expressions in VM code"); * mm/mm_init.c: differences on the 3rd, and 4th hunks are due to RHEL backport commit `1845b92dcf` ("mm: move most of core MM initialization to mm/mm_init.c") ignoring the out-of-order backport of commit 3f6dac0fd1b8 ("mm/page_alloc: make deferred page init free pages in MAX_ORDER blocks") thus partially reverting the changes introduced by the latter; This patch is a backport of the following upstream commit: commit 5e0a760b44417f7cadd79de2204d6247109558a0 Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Date: Thu Dec 28 17:47:04 2023 +0300 mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER commit 23baf831a32c ("mm, treewide: redefine MAX_ORDER sanely") has changed the definition of MAX_ORDER to be inclusive. This has caused issues with code that was not yet upstream and depended on the previous definition. To draw attention to the altered meaning of the define, rename MAX_ORDER to MAX_PAGE_ORDER. Link: https://lkml.kernel.org/r/20231228144704.14033-2-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-12-09 12:24:17 -05:00
Rafael Aquini	51ea23f932	mm: disable kernelcore=mirror when no mirror memory JIRA: https://issues.redhat.com/browse/RHEL-27743 Conflicts: * mm/internal.h: context difference due to a series of conflict resolutions given a series of out-of-order backports which made "mirrored_kernelcore" to end up in a slightly different context when comparing with its upstream placement. We leverage this backport to make it go into the "right" place. This patch is a backport of the following upstream commit: commit 0db31d63f27e5b8ca84b9fd5a3cff5b12ac88abf Author: Ma Wupeng <mawupeng1@huawei.com> Date: Wed Aug 2 15:23:28 2023 +0800 mm: disable kernelcore=mirror when no mirror memory For system with kernelcore=mirror enabled while no mirrored memory is reported by efi. This could lead to kernel OOM during startup since all memory beside zone DMA are in the movable zone and this prevents the kernel to use it. Zone DMA/DMA32 initialization is independent of mirrored memory and their max pfn is set in zone_sizes_init(). Since kernel can fallback to zone DMA/DMA32 if there is no memory in zone Normal, these zones are seen as mirrored memory no mather their memory attributes are. To solve this problem, disable kernelcore=mirror when there is no real mirrored memory exists. Link: https://lkml.kernel.org/r/20230802072328.2107981-1-mawupeng1@huawei.com Signed-off-by: Ma Wupeng <mawupeng1@huawei.com> Suggested-by: Kefeng Wang <wangkefeng.wang@huawei.com> Suggested-by: Mike Rapoport <rppt@kernel.org> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Levi Yun <ppbuk5246@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-10-01 11:20:14 -04:00
Rafael Aquini	51d3fa6c4d	Revert "mm,memblock: reset memblock.reserved to system init state to prevent UAF" JIRA: https://issues.redhat.com/browse/RHEL-27742 This patch is a backport of the following upstream commit: commit c442a957b2f4e116f28aeb55bf2719cb7bb2ad60 Author: Mike Rapoport (IBM) <rppt@kernel.org> Date: Fri Jul 28 13:55:12 2023 +0300 Revert "mm,memblock: reset memblock.reserved to system init state to prevent UAF" This reverts commit 9e46e4dcd9d6cd88342b028dbfa5f4fb7483d39c. kbuild reports a warning in memblock_remove_region() because of a false positive caused by partial reset of the memblock state. Doing the full reset will remove the false positives, but will allow late use of memblock_free() to go unnoticed, so it is better to revert the offending commit. WARNING: CPU: 0 PID: 1 at mm/memblock.c:352 memblock_remove_region (kbuild/src/x86_64/mm/memblock.c:352 (discriminator 1)) Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc3-00001-g9e46e4dcd9d6 #2 RIP: 0010:memblock_remove_region (kbuild/src/x86_64/mm/memblock.c:352 (discriminator 1)) Call Trace: memblock_discard (kbuild/src/x86_64/mm/memblock.c:383) page_alloc_init_late (kbuild/src/x86_64/include/linux/find.h:208 kbuild/src/x86_64/include/linux/nodemask.h:266 kbuild/src/x86_64/mm/mm_init.c:2405) kernel_init_freeable (kbuild/src/x86_64/init/main.c:1325 kbuild/src/x86_64/init/main.c:1546) kernel_init (kbuild/src/x86_64/init/main.c:1439) ret_from_fork (kbuild/src/x86_64/arch/x86/kernel/process.c:145) ret_from_fork_asm (kbuild/src/x86_64/arch/x86/entry/entry_64.S:298) Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202307271656.447aa17e-oliver.sang@intel.com Signed-off-by: "Mike Rapoport (IBM)" <rppt@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-09-05 20:37:45 -04:00
Rafael Aquini	c3a75b7a7a	mm,memblock: reset memblock.reserved to system init state to prevent UAF JIRA: https://issues.redhat.com/browse/RHEL-27742 This patch is a backport of the following upstream commit: commit 9e46e4dcd9d6cd88342b028dbfa5f4fb7483d39c Author: Rik van Riel <riel@surriel.com> Date: Wed Jul 19 15:41:37 2023 -0400 mm,memblock: reset memblock.reserved to system init state to prevent UAF The memblock_discard function frees the memblock.reserved.regions array, which is good. However, if a subsequent memblock_free (or memblock_phys_free) comes in later, from for example ima_free_kexec_buffer, that will result in a use after free bug in memblock_isolate_range. When running a kernel with CONFIG_KASAN enabled, this will cause a kernel panic very early in boot. Without CONFIG_KASAN, there is a chance that memblock_isolate_range might scribble on memory that is now in use by somebody else. Avoid those issues by making sure that memblock_discard points memblock.reserved.regions back at the static buffer. If memblock_free is called after memblock memory is discarded, that will print a warning in memblock_remove_region. Signed-off-by: Rik van Riel <riel@surriel.com> Link: https://lore.kernel.org/r/20230719154137.732d8525@imladris.surriel.com Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-09-05 20:37:44 -04:00
Rafael Aquini	ae97c9af04	mm/memory_hotplug: remove reset_node_managed_pages() in hotadd_init_pgdat() JIRA: https://issues.redhat.com/browse/RHEL-27742 This patch is a backport of the following upstream commit: commit a668968f84265e698a122656c433809ab9f023fa Author: Haifeng Xu <haifeng.xu@shopee.com> Date: Wed Jun 7 02:45:48 2023 +0000 mm/memory_hotplug: remove reset_node_managed_pages() in hotadd_init_pgdat() managed pages has already been set to 0 in free_area_init_core_hotplug(), via zone_init_internals() on each zone. It's pointless to reset again. Furthermore, reset_node_managed_pages() no longer needs to be exposed outside of mm/memblock.c. Remove declaration in include/linux/memblock.h and define it as static. In addtion to this, the only caller of reset_node_managed_pages() is reset_all_zones_managed_pages(), which is annotated with __init, so it should be safe to also mark reset_node_managed_pages() as __init. Link: https://lkml.kernel.org/r/20230607024548.1240-1-haifeng.xu@shopee.com Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com> Suggested-by: David Hildenbrand <david@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-09-05 20:36:36 -04:00
Rafael Aquini	b0059d1858	memblock: Update nid info in memblock debugfs JIRA: https://issues.redhat.com/browse/RHEL-27742 This patch is a backport of the following upstream commit: commit de649e7f5edb2e61dbd3d64deae44cb165e657ad Author: Yuwei Guan <ssawgyw@gmail.com> Date: Thu Jun 1 21:31:49 2023 +0800 memblock: Update nid info in memblock debugfs The node id for memblock reserved regions will be wrong, so let's show 'x' for reg->nid == MAX_NUMNODES in debugfs to keep it align. Suggested-by: Mike Rapoport (IBM) <rppt@kernel.org> Co-developed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Yuwei Guan <ssawgyw@gmail.com> Link: https://lore.kernel.org/r/20230601133149.37160-1-ssawgyw@gmail.com Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-09-05 20:36:23 -04:00
Rafael Aquini	e7d4f1bda1	memblock: Add flags and nid info in memblock debugfs JIRA: https://issues.redhat.com/browse/RHEL-27742 This patch is a backport of the following upstream commit: commit 493f349e38d022057b3b6e13f589f108269c42b0 Author: Yuwei Guan <ssawgyw@gmail.com> Date: Fri May 19 18:53:21 2023 +0800 memblock: Add flags and nid info in memblock debugfs Currently, the memblock debugfs can display the count of memblock_type and the base and end of the reg. However, when memblock_mark_*() or memblock_set_node() is executed on some range, the information in the existing debugfs cannot make it clear why the address is not consecutive. For example, cat /sys/kernel/debug/memblock/memory 0: 0x0000000080000000..0x00000000901fffff 1: 0x0000000090200000..0x00000000905fffff 2: 0x0000000090600000..0x0000000092ffffff 3: 0x0000000093000000..0x00000000973fffff 4: 0x0000000097400000..0x00000000b71fffff 5: 0x00000000c0000000..0x00000000dfffffff 6: 0x00000000e2500000..0x00000000f87fffff 7: 0x00000000f8800000..0x00000000fa7fffff 8: 0x00000000fa800000..0x00000000fd3effff 9: 0x00000000fd3f0000..0x00000000fd3fefff 10: 0x00000000fd3ff000..0x00000000fd7fffff 11: 0x00000000fd800000..0x00000000fd901fff 12: 0x00000000fd902000..0x00000000fd909fff 13: 0x00000000fd90a000..0x00000000fd90bfff 14: 0x00000000fd90c000..0x00000000ffffffff 15: 0x0000000880000000..0x0000000affffffff So we can add flags and nid to this debugfs. For example, cat /sys/kernel/debug/memblock/memory 0: 0x0000000080000000..0x00000000901fffff 0 NONE 1: 0x0000000090200000..0x00000000905fffff 0 NOMAP 2: 0x0000000090600000..0x0000000092ffffff 0 NONE 3: 0x0000000093000000..0x00000000973fffff 0 NOMAP 4: 0x0000000097400000..0x00000000b71fffff 0 NONE 5: 0x00000000c0000000..0x00000000dfffffff 0 NONE 6: 0x00000000e2500000..0x00000000f87fffff 0 NONE 7: 0x00000000f8800000..0x00000000fa7fffff 0 NOMAP 8: 0x00000000fa800000..0x00000000fd3effff 0 NONE 9: 0x00000000fd3f0000..0x00000000fd3fefff 0 NOMAP 10: 0x00000000fd3ff000..0x00000000fd7fffff 0 NONE 11: 0x00000000fd800000..0x00000000fd901fff 0 NOMAP 12: 0x00000000fd902000..0x00000000fd909fff 0 NONE 13: 0x00000000fd90a000..0x00000000fd90bfff 0 NOMAP 14: 0x00000000fd90c000..0x00000000ffffffff 0 NONE 15: 0x0000000880000000..0x0000000affffffff 0 NONE Signed-off-by: Yuwei Guan <ssawgyw@gmail.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Link: https://lore.kernel.org/r/20230519105321.333-1-ssawgyw@gmail.com Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-09-05 20:35:40 -04:00
Rafael Aquini	14978763a6	Fix some coding style errors in memblock.c JIRA: https://issues.redhat.com/browse/RHEL-27742 This patch is a backport of the following upstream commit: commit fc493f83a25835c14cd96379c1a07459230881bc Author: Claudio Migliorelli <claudio.migliorelli@mail.polimi.it> Date: Sun Apr 23 15:29:35 2023 +0200 Fix some coding style errors in memblock.c This patch removes the initialization of some static variables to 0 and `false` in the memblock source file, according to the coding style guidelines. Signed-off-by: Claudio Migliorelli <claudio.migliorelli@mail.polimi.it> Link: https://lore.kernel.org/r/87r0sa7mm8.fsf@mail.polimi.it Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-09-05 20:35:13 -04:00
Eric Chanudet	5eadecd371	memblock: fix crash when reserved memory is not added to memory JIRA: https://issues.redhat.com/browse/RHEL-36126 Conflicts: backport out-of-order before commit 77e6c43e137c ("memblock: introduce MEMBLOCK_RSRV_NOINIT flag") where MEMBLOCK_RSRV_NOINIT is added and checked in the loop. commit 6a9531c3a88096a26cf3ac582f7ec44f94a7dcb2 Author: Yajun Deng <yajun.deng@linux.dev> Date: Thu Jan 18 14:18:53 2024 +0800 memblock: fix crash when reserved memory is not added to memory After commit 61167ad5fecd ("mm: pass nid to reserve_bootmem_region()") nid of a reserved region is used by init_reserved_page() (with CONFIG_DEFERRED_STRUCT_PAGE_INIT=y) to access node strucure. In many cases the nid of the reserved memory is not set and this causes a crash. When the nid of a reserved region is not set, fall back to early_pfn_to_nid(), so that nid of the first_online_node will be passed to init_reserved_page(). Fixes: 61167ad5fecd ("mm: pass nid to reserve_bootmem_region()") Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Link: https://lore.kernel.org/r/20240118061853.2652295-1-yajun.deng@linux.dev [rppt: massaged the commit message] Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Eric Chanudet <echanude@redhat.com>	2024-05-21 14:18:30 -04:00
Eric Chanudet	2a44683bfc	mm: pass nid to reserve_bootmem_region() JIRA: https://issues.redhat.com/browse/RHEL-36126 commit 61167ad5fecdeaa037f3df1ba354dddd5f66a1ed Author: Yajun Deng <yajun.deng@linux.dev> Date: Mon Jun 19 10:34:06 2023 +0800 mm: pass nid to reserve_bootmem_region() early_pfn_to_nid() is called frequently in init_reserved_page(), it returns the node id of the PFN. These PFN are probably from the same memory region, they have the same node id. It's not necessary to call early_pfn_to_nid() for each PFN. Pass nid to reserve_bootmem_region() and drop the call to early_pfn_to_nid() in init_reserved_page(). Also, set nid on all reserved pages before doing this, as some reserved memory regions may not be set nid. The most beneficial function is memmap_init_reserved_pages() if CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled. The following data was tested on an x86 machine with 190GB of RAM. before: memmap_init_reserved_pages() 67ms after: memmap_init_reserved_pages() 20ms Link: https://lkml.kernel.org/r/20230619023406.424298-1-yajun.deng@linux.dev Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Eric Chanudet <echanude@redhat.com>	2024-05-21 14:18:30 -04:00
Aristeu Rozanski	43880a624f	memblock: Avoid useless checks in memblock_merge_regions(). JIRA: https://issues.redhat.com/browse/RHEL-27740 Tested: by me commit 2fe03412e2e1be3d5ab37b8351a37c3aec506556 Author: Peng Zhang <zhangpeng.00@bytedance.com> Date: Sun Jan 29 17:00:34 2023 +0800 memblock: Avoid useless checks in memblock_merge_regions(). memblock_merge_regions() is called after regions have been modified to merge the neighboring compatible regions. That will check all regions but most checks are useless. Most of the time we only insert one or a few new regions, or modify one or a few regions. At this time, we don't need to check all the regions. We only need to check the changed regions, because other not related regions cannot be merged. Add two parameters to memblock_merge_regions() to indicate the lower and upper boundary to scan. Debug code that counts the number of total iterations in memblock_merge_regions(), like for instance void memblock_merge_regions(struct memblock_type *type) { static int iteration_count = 0; static int max_nr_regions = 0; max_nr_regions = max(max_nr_regions, (int)type->cnt); ... while () { iteration_count++; ... } pr_info("iteration_count: %d max_nr_regions %d", iteration_count, max_nr_regions); } Produces the following numbers on a physical machine with 1T of memory: before: [2.472243] iteration_count: 45410 max_nr_regions 178 after: [2.470869] iteration_count: 923 max_nr_regions 176 The actual startup speed seems to change little, but it does reduce the scan overhead. Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Link: https://lore.kernel.org/r/20230129090034.12310-3-zhangpeng.00@bytedance.com [rppt: massaged the changelog] Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>	2024-04-29 14:33:04 -04:00
Aristeu Rozanski	fa97bec97f	memblock: Make a boundary tighter in memblock_add_range(). JIRA: https://issues.redhat.com/browse/RHEL-27740 Tested: by me commit ad500fb2d11b3739dcbc17a31976828b9161ecf5 Author: Peng Zhang <zhangpeng.00@bytedance.com> Date: Sun Jan 29 17:00:33 2023 +0800 memblock: Make a boundary tighter in memblock_add_range(). When type->cnt * 2 + 1 is less than or equal to type->max, there is enough empty regions to insert. Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Link: https://lore.kernel.org/r/20230129090034.12310-2-zhangpeng.00@bytedance.com Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>	2024-04-29 14:33:04 -04:00
Lucas Zampieri	6f794c0e0b	Merge: MM update to v6.2 MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3738 JIRA: https://issues.redhat.com/browse/RHEL-27739 Depends: !3662 Dropped Patches and the reason they were dropped: Needs to be evaluated by the FS team: 138060ba92b3 ("fs: pass dentry to set acl method") 3b4c7bc01727 ("xattr: use rbtree for simple_xattrs") Needs to be evaluated by the NVME team: 4003f107fa2e ("mm: introduce FOLL_PCI_P2PDMA to gate getting PCI P2PDMA pages") Needs to be evaluated by the ZRAM team: 7c2af309abd2 ("zram: add size class equals check into recompression") Signed-off-by: Audra Mitchell <audra@redhat.com> Approved-by: Rafael Aquini <aquini@redhat.com> Approved-by: Chris von Recklinghausen <crecklin@redhat.com> Approved-by: Jocelyn Falempe <jfalempe@redhat.com> Approved-by: David Arcari <darcari@redhat.com> Approved-by: Steve Best <sbest@redhat.com> Approved-by: David Airlie <airlied@redhat.com> Merged-by: Lucas Zampieri <lzampier@redhat.com>	2024-04-17 10:14:56 -03:00
Audra Mitchell	2e80e2d6e8	Revert "mm: Always release pages to the buddy allocator in memblock_free_late()." JIRA: https://issues.redhat.com/browse/RHEL-27739 This patch is a backport of the following upstream commit: commit 647037adcad00f2bab8828d3d41cd0553d41f3bd Author: Aaron Thompson <dev@aaront.org> Date: Tue Feb 7 08:21:51 2023 +0000 Revert "mm: Always release pages to the buddy allocator in memblock_free_late()." This reverts commit 115d9d77bb0f9152c60b6e8646369fa7f6167593. The pages being freed by memblock_free_late() have already been initialized, but if they are in the deferred init range, __free_one_page() might access nearby uninitialized pages when trying to coalesce buddies. This can, for example, trigger this BUG: BUG: unable to handle page fault for address: ffffe964c02580c8 RIP: 0010:__list_del_entry_valid+0x3f/0x70 <TASK> __free_one_page+0x139/0x410 __free_pages_ok+0x21d/0x450 memblock_free_late+0x8c/0xb9 efi_free_boot_services+0x16b/0x25c efi_enter_virtual_mode+0x403/0x446 start_kernel+0x678/0x714 secondary_startup_64_no_verify+0xd2/0xdb </TASK> A proper fix will be more involved so revert this change for the time being. Fixes: 115d9d77bb0f ("mm: Always release pages to the buddy allocator in memblock_free_late().") Signed-off-by: Aaron Thompson <dev@aaront.org> Link: https://lore.kernel.org/r/20230207082151.1303-1-dev@aaront.org Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Audra Mitchell <audra@redhat.com>	2024-04-09 09:43:03 -04:00
Audra Mitchell	d3c2b38fbd	mm: Always release pages to the buddy allocator in memblock_free_late(). JIRA: https://issues.redhat.com/browse/RHEL-27739 This patch is a backport of the following upstream commit: commit 115d9d77bb0f9152c60b6e8646369fa7f6167593 Author: Aaron Thompson <dev@aaront.org> Date: Fri Jan 6 22:22:44 2023 +0000 mm: Always release pages to the buddy allocator in memblock_free_late(). If CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, memblock_free_pages() only releases pages to the buddy allocator if they are not in the deferred range. This is correct for free pages (as defined by for_each_free_mem_pfn_range_in_zone()) because free pages in the deferred range will be initialized and released as part of the deferred init process. memblock_free_pages() is called by memblock_free_late(), which is used to free reserved ranges after memblock_free_all() has run. All pages in reserved ranges have been initialized at that point, and accordingly, those pages are not touched by the deferred init process. This means that currently, if the pages that memblock_free_late() intends to release are in the deferred range, they will never be released to the buddy allocator. They will forever be reserved. In addition, memblock_free_pages() calls kmsan_memblock_free_pages(), which is also correct for free pages but is not correct for reserved pages. KMSAN metadata for reserved pages is initialized by kmsan_init_shadow(), which runs shortly before memblock_free_all(). For both of these reasons, memblock_free_pages() should only be called for free pages, and memblock_free_late() should call __free_pages_core() directly instead. One case where this issue can occur in the wild is EFI boot on x86_64. The x86 EFI code reserves all EFI boot services memory ranges via memblock_reserve() and frees them later via memblock_free_late() (efi_reserve_boot_services() and efi_free_boot_services(), respectively). If any of those ranges happens to fall within the deferred init range, the pages will not be released and that memory will be unavailable. For example, on an Amazon EC2 t3.micro VM (1 GB) booting via EFI: v6.2-rc2: # grep -E 'Node\|spanned\|present\|managed' /proc/zoneinfo Node 0, zone DMA spanned 4095 present 3999 managed 3840 Node 0, zone DMA32 spanned 246652 present 245868 managed 178867 v6.2-rc2 + patch: # grep -E 'Node\|spanned\|present\|managed' /proc/zoneinfo Node 0, zone DMA spanned 4095 present 3999 managed 3840 Node 0, zone DMA32 spanned 246652 present 245868 managed 222816 # +43,949 pages Fixes: `3a80a7fa79` ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set") Signed-off-by: Aaron Thompson <dev@aaront.org> Link: https://lore.kernel.org/r/01010185892de53e-e379acfb-7044-4b24-b30a-e2657c1ba989-000000@us-west-2.amazonses.com Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Audra Mitchell <audra@redhat.com>	2024-04-09 09:43:02 -04:00
Audra Mitchell	990c8d6286	memblock: Fix doc for memblock_phys_free JIRA: https://issues.redhat.com/browse/RHEL-27739 This patch is a backport of the following upstream commit: commit fa81ab49bbe4e1ce756581c970486de0ddb14309 Author: Miaoqian Lin <linmq006@gmail.com> Date: Fri Dec 16 14:03:03 2022 +0400 memblock: Fix doc for memblock_phys_free memblock_phys_free() is the counterpart to memblock_phys_alloc. Change memblock_alloc_xx() with memblock_phys_alloc_xx() to keep consistency. Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Link: https://lore.kernel.org/r/20221216100304.688209-1-linmq006@gmail.com Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Audra Mitchell <audra@redhat.com>	2024-04-09 09:43:02 -04:00
Mark Langsdorf	7232a1f776	x86/numa: Fix the address overlap check in numa_fill_memblks() JIRA: https://issues.redhat.com/browse/RHEL-26871 commit 9b99c17f7510bed2adbe17751fb8abddba5620bc Author: Alison Schofield <alison.schofield@intel.com> Date: Fri Jan 12 12:09:50 2024 -0800 numa_fill_memblks() fills in the gaps in numa_meminfo memblks over a physical address range. To do so, it first creates a list of existing memblks that overlap that address range. The issue is that it is off by one when comparing to the end of the address range, so memblks that do not overlap are selected. The impact of selecting a memblk that does not actually overlap is that an existing memblk may be filled when the expected action is to do nothing and return NUMA_NO_MEMBLK to the caller. The caller can then add a new NUMA node and memblk. Replace the broken open-coded search for address overlap with the memblock helper memblock_addrs_overlap(). Update the kernel doc and in code comments. Suggested by: "Huang, Ying" <ying.huang@intel.com> Fixes: 8f012db27c95 ("x86/numa: Introduce numa_fill_memblks()") Signed-off-by: Alison Schofield <alison.schofield@intel.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/10a3e6109c34c21a8dd4c513cf63df63481a2b07.1705085543.git.alison.schofield@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Mark Langsdorf <mlangsdo@redhat.com>	2024-04-05 17:00:51 -04:00
Paolo Bonzini	13262962e2	mm: Add support for unaccepted memory JIRA: https://issues.redhat.com/browse/RHEL-10059 UEFI Specification version 2.9 introduces the concept of memory acceptance. Some Virtual Machine platforms, such as Intel TDX or AMD SEV-SNP, require memory to be accepted before it can be used by the guest. Accepting happens via a protocol specific to the Virtual Machine platform. There are several ways the kernel can deal with unaccepted memory: 1. Accept all the memory during boot. It is easy to implement and it doesn't have runtime cost once the system is booted. The downside is very long boot time. Accept can be parallelized to multiple CPUs to keep it manageable (i.e. via DEFERRED_STRUCT_PAGE_INIT), but it tends to saturate memory bandwidth and does not scale beyond the point. 2. Accept a block of memory on the first use. It requires more infrastructure and changes in page allocator to make it work, but it provides good boot time. On-demand memory accept means latency spikes every time kernel steps onto a new memory block. The spikes will go away once workload data set size gets stabilized or all memory gets accepted. 3. Accept all memory in background. Introduce a thread (or multiple) that gets memory accepted proactively. It will minimize time the system experience latency spikes on memory allocation while keeping low boot time. This approach cannot function on its own. It is an extension of #2: background memory acceptance requires functional scheduler, but the page allocator may need to tap into unaccepted memory before that. The downside of the approach is that these threads also steal CPU cycles and memory bandwidth from the user's workload and may hurt user experience. Implement #1 and #2 for now. #2 is the default. Some workloads may want to use #1 with accept_memory=eager in kernel command line. #3 can be implemented later based on user's demands. Support of unaccepted memory requires a few changes in core-mm code: - memblock accepts memory on allocation. It serves early boot memory allocations and doesn't limit them to pre-accepted pool of memory. - page allocator accepts memory on the first allocation of the page. When kernel runs out of accepted memory, it accepts memory until the high watermark is reached. It helps to minimize fragmentation. EFI code will provide two helpers if the platform supports unaccepted memory: - accept_memory() makes a range of physical addresses accepted. - range_contains_unaccepted_memory() checks anything within the range of physical addresses requires acceptance. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mike Rapoport <rppt@linux.ibm.com> # memblock Link: https://lore.kernel.org/r/20230606142637.5171-2-kirill.shutemov@linux.intel.com (cherry picked from commit dcdfdd40fa82b6704d2841938e5c8ec3051eb0d6) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [RHEL: upstream has mm/mm_init.c split out of mm/page_alloc.c]	2023-10-30 09:14:17 +01:00
Paolo Bonzini	0a2ad02005	mm: avoid passing 0 to __ffs() JIRA: https://issues.redhat.com/browse/RHEL-10059 23baf831a32c ("mm, treewide: redefine MAX_ORDER sanely") results in various boot failures (hang) on arm targets Debug messages reveal the reason. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If start==0, __ffs(start) returns 0xfffffff or (as int) -1, which min_t() interprets as such, while min() apparently uses the returned unsigned long value. Obviously a negative order isn't received well by the rest of the code. [akpm@linux-foundation.org: fix comment, per Mike] Link: https://lkml.kernel.org/r/ZDBa7HWZK69dKKzH@kernel.org Link: https://lkml.kernel.org/r/20230406072529.vupqyrzqnhyozeyh@box.shutemov.name Fixes: 23baf831a32c ("mm, treewide: redefine MAX_ORDER sanely") Signed-off-by: "Kirill A. Shutemov" <kirill@shutemov.name> Reported-by: Guenter Roeck <linux@roeck-us.net> Link: https://lkml.kernel.org/r/9460377a-38aa-4f39-ad57-fb73725f92db@roeck-us.net Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 59f876fb9d68a4d8c20305d7a7a0daf4ee9478a8) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-10-30 09:12:42 +01:00
Paolo Bonzini	538bf6f332	mm, treewide: redefine MAX_ORDER sanely JIRA: https://issues.redhat.com/browse/RHEL-10059 MAX_ORDER currently defined as number of orders page allocator supports: user can ask buddy allocator for page order between 0 and MAX_ORDER-1. This definition is counter-intuitive and lead to number of bugs all over the kernel. Change the definition of MAX_ORDER to be inclusive: the range of orders user can ask from buddy allocator is 0..MAX_ORDER now. [kirill@shutemov.name: fix min() warning] Link: https://lkml.kernel.org/r/20230315153800.32wib3n5rickolvh@box [akpm@linux-foundation.org: fix another min_t warning] [kirill@shutemov.name: fixups per Zi Yan] Link: https://lkml.kernel.org/r/20230316232144.b7ic4cif4kjiabws@box.shutemov.name [akpm@linux-foundation.org: fix underlining in docs] Link: https://lore.kernel.org/oe-kbuild-all/202303191025.VRCTk6mP-lkp@intel.com/ Link: https://lkml.kernel.org/r/20230315113133.11326-11-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 23baf831a32c04f9a968812511540b1b3e648bf5) [RHEL: Fix conflicts by changing MAX_ORDER - 1 to MAX_ORDER, ">= MAX_ORDER" to "> MAX_ORDER", etc.] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-10-30 09:12:37 +01:00
Chris von Recklinghausen	8ba788a619	mm: add pageblock_align() macro JIRA: https://issues.redhat.com/browse/RHEL-1848 commit 5f7fa13fa858c17580ed513bd5e0a4b36d68fdd6 Author: Kefeng Wang <wangkefeng.wang@huawei.com> Date: Wed Sep 7 14:08:43 2022 +0800 mm: add pageblock_align() macro Add pageblock_align() macro and use it to simplify code. Link: https://lkml.kernel.org/r/20220907060844.126891-2-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2023-10-20 06:14:19 -04:00
Chris von Recklinghausen	556f683f8e	mm: reuse pageblock_start/end_pfn() macro JIRA: https://issues.redhat.com/browse/RHEL-1848 commit 4f9bc69ac5ce34071a9a51343bc81ca76cb2e3f1 Author: Kefeng Wang <wangkefeng.wang@huawei.com> Date: Wed Sep 7 14:08:42 2022 +0800 mm: reuse pageblock_start/end_pfn() macro Move pageblock_start_pfn/pageblock_end_pfn() into pageblock-flags.h, then they could be used somewhere else, not only in compaction, also use ALIGN_DOWN() instead of round_down() to be pair with ALIGN(), which should be same for pageblock usage. Link: https://lkml.kernel.org/r/20220907060844.126891-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2023-10-20 06:14:18 -04:00
Chris von Recklinghausen	6423549ae9	memblock,arm64: expand the static memblock memory table Bugzilla: https://bugzilla.redhat.com/2160210 commit 450d0e74d886c172ac2f72518b797a18ee8d1327 Author: Zhou Guanghui <zhouguanghui1@huawei.com> Date: Wed Jun 15 10:27:42 2022 +0000 memblock,arm64: expand the static memblock memory table In a system(Huawei Ascend ARM64 SoC) using HBM, a multi-bit ECC error occurs, and the BIOS will mark the corresponding area (for example, 2 MB) as unusable. When the system restarts next time, these areas are not reported or reported as EFI_UNUSABLE_MEMORY. Both cases lead to an increase in the number of memblocks, whereas EFI_UNUSABLE_MEMORY leads to a larger number of memblocks. For example, if the EFI_UNUSABLE_MEMORY type is reported: ... memory[0x92] [0x0000200834a00000-0x0000200835bfffff], 0x0000000001200000 bytes on node 7 flags: 0x0 memory[0x93] [0x0000200835c00000-0x0000200835dfffff], 0x0000000000200000 bytes on node 7 flags: 0x4 memory[0x94] [0x0000200835e00000-0x00002008367fffff], 0x0000000000a00000 bytes on node 7 flags: 0x0 memory[0x95] [0x0000200836800000-0x00002008369fffff], 0x0000000000200000 bytes on node 7 flags: 0x4 memory[0x96] [0x0000200836a00000-0x0000200837bfffff], 0x0000000001200000 bytes on node 7 flags: 0x0 memory[0x97] [0x0000200837c00000-0x0000200837dfffff], 0x0000000000200000 bytes on node 7 flags: 0x4 memory[0x98] [0x0000200837e00000-0x000020087fffffff], 0x0000000048200000 bytes on node 7 flags: 0x0 memory[0x99] [0x0000200880000000-0x0000200bcfffffff], 0x0000000350000000 bytes on node 6 flags: 0x0 memory[0x9a] [0x0000200bd0000000-0x0000200bd01fffff], 0x0000000000200000 bytes on node 6 flags: 0x4 memory[0x9b] [0x0000200bd0200000-0x0000200bd07fffff], 0x0000000000600000 bytes on node 6 flags: 0x0 memory[0x9c] [0x0000200bd0800000-0x0000200bd09fffff], 0x0000000000200000 bytes on node 6 flags: 0x4 memory[0x9d] [0x0000200bd0a00000-0x0000200fcfffffff], 0x00000003ff600000 bytes on node 6 flags: 0x0 memory[0x9e] [0x0000200fd0000000-0x0000200fd01fffff], 0x0000000000200000 bytes on node 6 flags: 0x4 memory[0x9f] [0x0000200fd0200000-0x0000200fffffffff], 0x000000002fe00000 bytes on node 6 flags: 0x0 ... The EFI memory map is parsed to construct the memblock arrays before the memblock arrays can be resized. As the result, memory regions beyond INIT_MEMBLOCK_REGIONS are lost. Add a new macro INIT_MEMBLOCK_MEMORY_REGIONS to replace INIT_MEMBLOCK_REGTIONS to define the size of the static memblock.memory array. Allow overriding memblock.memory array size with architecture defined INIT_MEMBLOCK_MEMORY_REGIONS and make arm64 to set INIT_MEMBLOCK_MEMORY_REGIONS to 1024 when CONFIG_EFI is enabled. Link: https://lkml.kernel.org/r/20220615102742.96450-1-zhouguanghui1@huawei.com Signed-off-by: Zhou Guanghui <zhouguanghui1@huawei.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Tested-by: Darren Hart <darren@os.amperecomputing.com> Acked-by: Will Deacon <will@kernel.org> [arm64] Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Xu Qiang <xuqiang36@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2023-03-24 11:19:28 -04:00
Chris von Recklinghausen	ad194ef36c	memblock: avoid some repeat when add new range Bugzilla: https://bugzilla.redhat.com/2160210 commit 28e1a8f4b0ff1eafc320ec733b9c61ee7eb633ea Author: Jinyu Tang <tjytimi@163.com> Date: Wed Jun 15 17:40:15 2022 +0800 memblock: avoid some repeat when add new range The worst case is that the new memory range overlaps all existing regions, which requires type->cnt + 1 empty struct memblock_region slots in the type->regions array. So if type->cnt + 1 + type->cnt is less than type->max, we can insert regions directly rather than calculate the needed amount before the insertion. And becase of merge operation in the end of function, tpye->cnt will increase slowly for many cases. This change allows to avoid unnecessary repeat of memblock ranges traversal for many cases when adding new memory range. Signed-off-by: Jinyu Tang <tjytimi@163.com> [rppt: massaged comment and changelog text] Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2023-03-24 11:19:16 -04:00
Chris von Recklinghausen	3fe654e9dd	memblock: Disable mirror feature if kernelcore is not specified Bugzilla: https://bugzilla.redhat.com/2160210 commit 902c2d91582c7ff0cb5f57ffb3766656f9b910c6 Author: Ma Wupeng <mawupeng1@huawei.com> Date: Tue Jun 14 17:21:56 2022 +0800 memblock: Disable mirror feature if kernelcore is not specified If system have some mirrored memory and mirrored feature is not specified in boot parameter, the basic mirrored feature will be enabled and this will lead to the following situations: - memblock memory allocation prefers mirrored region. This may have some unexpected influence on numa affinity. - contiguous memory will be split into several parts if parts of them is mirrored memory via memblock_mark_mirror(). To fix this, variable mirrored_kernelcore will be checked in memblock_mark_mirror(). Mark mirrored memory with flag MEMBLOCK_MIRROR iff kernelcore=mirror is added in the kernel parameters. Signed-off-by: Ma Wupeng <mawupeng1@huawei.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220614092156.1972846-6-mawupeng1@huawei.com Acked-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2023-03-24 11:19:13 -04:00
Chris von Recklinghausen	4ac28d31c7	mm: Ratelimited mirrored memory related warning messages Bugzilla: https://bugzilla.redhat.com/2160210 commit 14d9a675fd0d414b7ca3d47d2ff70fbda4f6cfc2 Author: Ma Wupeng <mawupeng1@huawei.com> Date: Tue Jun 14 17:21:53 2022 +0800 mm: Ratelimited mirrored memory related warning messages If system has mirrored memory, memblock will try to allocate mirrored memory firstly and fallback to non-mirrored memory when fails, but if with limited mirrored memory or some numa node without mirrored memory, lots of warning message about memblock allocation will occur. This patch ratelimit the warning message to avoid a very long print during bootup. Signed-off-by: Ma Wupeng <mawupeng1@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Link: https://lore.kernel.org/r/20220614092156.1972846-3-mawupeng1@huawei.com Acked-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2023-03-24 11:19:13 -04:00
Waiman Long	6e40e36730	mm: kmemleak: remove kmemleak_not_leak_phys() and the min_count argument to kmemleak_alloc_phys() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2151065 commit c200d90049dbe08fa8b016f74b713fddefca0479 Author: Patrick Wang <patrick.wang.shcn@gmail.com> Date: Sat, 11 Jun 2022 11:55:48 +0800 mm: kmemleak: remove kmemleak_not_leak_phys() and the min_count argument to kmemleak_alloc_phys() Patch series "mm: kmemleak: store objects allocated with physical address separately and check when scan", v4. The kmemleak__phys() interface uses "min_low_pfn" and "max_low_pfn" to check address. But on some architectures, kmemleak__phys() is called before those two variables initialized. The following steps will be taken: 1) Add OBJECT_PHYS flag and rbtree for the objects allocated with physical address 2) Store physical address in objects if allocated with OBJECT_PHYS 3) Check the boundary when scan instead of in kmemleak_*_phys() This patch set will solve: https://lore.kernel.org/r/20220527032504.30341-1-yee.lee@mediatek.com https://lore.kernel.org/r/9dd08bb5-f39e-53d8-f88d-bec598a08c93@gmail.com v3: https://lore.kernel.org/r/20220609124950.1694394-1-patrick.wang.shcn@gmail.com v2: https://lore.kernel.org/r/20220603035415.1243913-1-patrick.wang.shcn@gmail.com v1: https://lore.kernel.org/r/20220531150823.1004101-1-patrick.wang.shcn@gmail.com This patch (of 4): Remove the unused kmemleak_not_leak_phys() function. And remove the min_count argument to kmemleak_alloc_phys() function, assume it's 0. Link: https://lkml.kernel.org/r/20220611035551.1823303-1-patrick.wang.shcn@gmail.com Link: https://lkml.kernel.org/r/20220611035551.1823303-2-patrick.wang.shcn@gmail.com Signed-off-by: Patrick Wang <patrick.wang.shcn@gmail.com> Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Yee Lee <yee.lee@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Waiman Long <longman@redhat.com>	2023-02-06 19:29:16 -05:00
Chris von Recklinghausen	291e420033	memblock: __next_mem_pfn_range_in_zone: remove unneeded local variable nid Bugzilla: https://bugzilla.redhat.com/2120352 commit f30b002ccfee8c60c8feb590e145c0b5e8fa4c67 Author: Miaohe Lin <linmiaohe@huawei.com> Date: Thu Feb 17 22:07:54 2022 +0800 memblock: __next_mem_pfn_range_in_zone: remove unneeded local variable nid The nid is only used to act as output parameter of __next_mem_range. Since NULL can be passed to __next_mem_range as out_nid, we can thus remove nid by passing NULL here. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> [rppt: updated the commit message] Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:27:44 -04:00
Chris von Recklinghausen	529f8d4da3	memblock: use kfree() to release kmalloced memblock regions Bugzilla: https://bugzilla.redhat.com/2120352 commit c94afc46cae7ad41b2ad6a99368147879f4b0e56 Author: Miaohe Lin <linmiaohe@huawei.com> Date: Thu Feb 17 22:53:27 2022 +0800 memblock: use kfree() to release kmalloced memblock regions memblock.{reserved,memory}.regions may be allocated using kmalloc() in memblock_double_array(). Use kfree() to release these kmalloced regions indicated by memblock_{reserved,memory}_in_slab. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Fixes: `3010f87650` ("mm: discard memblock data later") Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:27:44 -04:00
Chris von Recklinghausen	880e8c868a	memblock: add MEMBLOCK_DRIVER_MANAGED to mimic IORESOURCE_SYSRAM_DRIVER_MANAGED Bugzilla: https://bugzilla.redhat.com/2120352 commit f7892d8e288d4b090176f26d9bf7943dbbb639a6 Author: David Hildenbrand <david@redhat.com> Date: Fri Nov 5 13:44:53 2021 -0700 memblock: add MEMBLOCK_DRIVER_MANAGED to mimic IORESOURCE_SYSRAM_DRIVER_MANAGED Let's add a flag that corresponds to IORESOURCE_SYSRAM_DRIVER_MANAGED, indicating that we're dealing with a memory region that is never indicated in the firmware-provided memory map, but always detected and added by a driver. Similar to MEMBLOCK_HOTPLUG, most infrastructure has to treat such memory regions like ordinary MEMBLOCK_NONE memory regions -- for example, when selecting memory regions to add to the vmcore for dumping in the crashkernel via for_each_mem_range(). However, especially kexec_file is not supposed to select such memblocks via for_each_free_mem_range() / for_each_free_mem_range_reverse() to place kexec images, similar to how we handle IORESOURCE_SYSRAM_DRIVER_MANAGED without CONFIG_ARCH_KEEP_MEMBLOCK. We'll make sure that memory hotplug code sets the flag where applicable (IORESOURCE_SYSRAM_DRIVER_MANAGED) next. This prepares architectures that need CONFIG_ARCH_KEEP_MEMBLOCK, such as arm64, for virtio-mem support. Note that kexec must not indicate this memory to the second kernel and must not place kexec-images on this memory. Let's add a comment to kexec_walk_memblock(), documenting how we handle MEMBLOCK_DRIVER_MANAGED now just like using IORESOURCE_SYSRAM_DRIVER_MANAGED in locate_mem_hole_callback() for kexec_walk_resources(). Also note that MEMBLOCK_HOTPLUG cannot be reused due to different semantics: MEMBLOCK_HOTPLUG: memory is indicated as "System RAM" in the firmware-provided memory map and added to the system early during boot; kexec has to indicate this memory to the second kernel and can place kexec-images on this memory. After memory hotunplug, kexec has to be re-armed. We mostly ignore this flag when "movable_node" is not set on the kernel command line, because then we're told to not care about hotunpluggability of such memory regions. MEMBLOCK_DRIVER_MANAGED: memory is not indicated as "System RAM" in the firmware-provided memory map; this memory is always detected and added to the system by a driver; memory might not actually be physically hotunpluggable. kexec must not indicate this memory to the second kernel and must not place kexec-images on this memory. Link: https://lkml.kernel.org/r/20211004093605.5830-5-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Jianyong Wu <Jianyong.Wu@arm.com> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Shahab Vahedi <shahab@synopsys.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:27:30 -04:00
Chris von Recklinghausen	3cbb272e07	memblock: allow to specify flags with memblock_add_node() Bugzilla: https://bugzilla.redhat.com/2120352 commit 952eea9b01e4bbb7011329f1b7240844e61e5128 Author: David Hildenbrand <david@redhat.com> Date: Fri Nov 5 13:44:49 2021 -0700 memblock: allow to specify flags with memblock_add_node() We want to specify flags when hotplugging memory. Let's prepare to pass flags to memblock_add_node() by adjusting all existing users. Note that when hotplugging memory the system is already up and running and we might have concurrent memblock users: for example, while we're hotplugging memory, kexec_file code might search for suitable memory regions to place kexec images. It's important to add the memory directly to memblock via a single call with the right flags, instead of adding the memory first and apply flags later: otherwise, concurrent memblock users might temporarily stumble over memblocks with wrong flags, which will be important in a follow-up patch that introduces a new flag to properly handle add_memory_driver_managed(). Link: https://lkml.kernel.org/r/20211004093605.5830-4-david@redhat.com Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Shahab Vahedi <shahab@synopsys.com> [arch/arc] Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Jianyong Wu <Jianyong.Wu@arm.com> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:27:30 -04:00
Chris von Recklinghausen	9bd2c53612	memblock: stop aliasing __memblock_free_late with memblock_free_late Bugzilla: https://bugzilla.redhat.com/2120352 commit 621d973901cf9fa6c6e31b31bdd36c5c5f3c9c9e Author: Mike Rapoport <rppt@kernel.org> Date: Fri Nov 5 13:43:16 2021 -0700 memblock: stop aliasing __memblock_free_late with memblock_free_late memblock_free_late() is a NOP wrapper for __memblock_free_late(), there is no point to keep this indirection. Drop the wrapper and rename __memblock_free_late() to memblock_free_late(). Link: https://lkml.kernel.org/r/20210930185031.18648-5-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Juergen Gross <jgross@suse.com> Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:27:09 -04:00
Al Stone	d1fd9d18f4	memblock: use memblock_free for freeing virtual pointers Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2071840 Tested: This is one of a series of patch sets to enable Arm SystemReady IR support in the kernel for NXP i.MX8 platforms. At this stage, this has been tested by ensuring we can survive the CI/CD loop -- i.e., that we have not broken anything else, and a simple boot test. When sufficient drivers have been brought in for i.MX8M, we will be able to run further tests. Conflicts: init/main.c This patch is being applied out of order, but is a simple function name replacement, so applied manually. commit 4421cca0a3e4833b3bf0f20de98eb580ab8c7290 Author: Mike Rapoport <rppt@kernel.org> Date: Fri Nov 5 13:43:22 2021 -0700 memblock: use memblock_free for freeing virtual pointers Rename memblock_free_ptr() to memblock_free() and use memblock_free() when freeing a virtual pointer so that memblock_free() will be a counterpart of memblock_alloc() The callers are updated with the below semantic patch and manual addition of (void *) casting to pointers that are represented by unsigned long variables. @@ identifier vaddr; expression size; @@ ( - memblock_phys_free(__pa(vaddr), size); + memblock_free(vaddr, size); \| - memblock_free_ptr(vaddr, size); + memblock_free(vaddr, size); ) [sfr@canb.auug.org.au: fixup] Link: https://lkml.kernel.org/r/20211018192940.3d1d532f@canb.auug.org.au Link: https://lkml.kernel.org/r/20210930185031.18648-7-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Juergen Gross <jgross@suse.com> Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 4421cca0a3e4833b3bf0f20de98eb580ab8c7290) Signed-off-by: Al Stone <ahs3@redhat.com>	2022-07-01 17:07:00 -06:00
Al Stone	14289d8c8f	memblock: rename memblock_free to memblock_phys_free Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2071840 Tested: This is one of a series of patch sets to enable Arm SystemReady IR support in the kernel for NXP i.MX8 platforms. At this stage, this has been tested by ensuring we can survive the CI/CD loop -- i.e., that we have not broken anything else, and a simple boot test. When sufficient drivers have been brought in for i.MX8M, we will be able to run further tests. Conflicts: arch/s390/kernel/setup.c arch/s390/kernel/smp.c These have been modified in ways that no longer strictly match the upstream code, throwing off the auto-merge; this is a simple function name replacement, however, so easily done manually instead. commit 3ecc68349bbab6bff1d12cbc7951ca6019b2faf6 Author: Mike Rapoport <rppt@kernel.org> Date: Fri Nov 5 13:43:19 2021 -0700 memblock: rename memblock_free to memblock_phys_free Since memblock_free() operates on a physical range, make its name reflect it and rename it to memblock_phys_free(), so it will be a logical counterpart to memblock_phys_alloc(). The callers are updated with the below semantic patch: @@ expression addr; expression size; @@ - memblock_free(addr, size); + memblock_phys_free(addr, size); Link: https://lkml.kernel.org/r/20210930185031.18648-6-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Juergen Gross <jgross@suse.com> Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 3ecc68349bbab6bff1d12cbc7951ca6019b2faf6) Signed-off-by: Al Stone <ahs3@redhat.com>	2022-07-01 17:06:59 -06:00
Mark Salter	3dcddb7d46	arm64: Track no early_pgtable_alloc() for kmemleak Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076088 commit c6975d7cab5b903aadbc0f78f9af4fae1bd23a50 Author: Qian Cai <quic_qiancai@quicinc.com> Date: Fri, 5 Nov 2021 11:05:09 -0400 After switched page size from 64KB to 4KB on several arm64 servers here, kmemleak starts to run out of early memory pool due to a huge number of those early_pgtable_alloc() calls: kmemleak_alloc_phys() memblock_alloc_range_nid() memblock_phys_alloc_range() early_pgtable_alloc() init_pmd() alloc_init_pud() __create_pgd_mapping() __map_memblock() paging_init() setup_arch() start_kernel() Increased the default value of DEBUG_KMEMLEAK_MEM_POOL_SIZE by 4 times won't be enough for a server with 200GB+ memory. There isn't much interesting to check memory leaks for those early page tables and those early memory mappings should not reference to other memory. Hence, no kmemleak false positives, and we can safely skip tracking those early allocations from kmemleak like we did in the commit `fed84c7852` ("mm/memblock.c: skip kmemleak for kasan_init()") without needing to introduce complications to automatically scale the value depends on the runtime memory size etc. After the patch, the default value of DEBUG_KMEMLEAK_MEM_POOL_SIZE becomes sufficient again. Signed-off-by: Qian Cai <quic_qiancai@quicinc.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Link: https://lore.kernel.org/r/20211105150509.7826-1-quic_qiancai@quicinc.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Mark Salter <msalter@redhat.com>	2022-04-18 10:05:58 -04:00
Rafael Aquini	ad73b2cd11	memblock: exclude MEMBLOCK_NOMAP regions from kmemleak Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2023396 This patch is a backport of the following upstream commit: commit 658aafc8139c23a6a23f6f4d9a0c4c95476838d4 Author: Mike Rapoport <rppt@kernel.org> Date: Thu Oct 21 10:09:29 2021 +0300 memblock: exclude MEMBLOCK_NOMAP regions from kmemleak Vladimir Zapolskiy reports: Commit a7259df76702 ("memblock: make memblock_find_in_range method private") invokes a kernel panic while running kmemleak on OF platforms with nomaped regions: Unable to handle kernel paging request at virtual address fff000021e00000 [...] scan_block+0x64/0x170 scan_gray_list+0xe8/0x17c kmemleak_scan+0x270/0x514 kmemleak_write+0x34c/0x4ac The memory allocated from memblock is registered with kmemleak, but if it is marked MEMBLOCK_NOMAP it won't have linear map entries so an attempt to scan such areas will fault. Ideally, memblock_mark_nomap() would inform kmemleak to ignore MEMBLOCK_NOMAP memory, but it can be called before kmemleak interfaces operating on physical addresses can use __va() conversion. Make sure that functions that mark allocated memory as MEMBLOCK_NOMAP take care of informing kmemleak to ignore such memory. Link: https://lore.kernel.org/all/8ade5174-b143-d621-8c8e-dc6a1898c6fb@linaro.org Link: https://lore.kernel.org/all/c30ff0a2-d196-c50d-22f0-bd50696b1205@quicinc.com Fixes: a7259df76702 ("memblock: make memblock_find_in_range method private") Reported-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org> Tested-by: Qian Cai <quic_qiancai@quicinc.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Rafael Aquini <aquini@redhat.com>	2021-11-29 11:44:11 -05:00
Rafael Aquini	ff0d6b7cfe	Revert "memblock: exclude NOMAP regions from kmemleak" Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2023396 This patch is a backport of the following upstream commit: commit 6c9a54551977ddf2d6e22c21354b4fb88946f96e Author: Mike Rapoport <rppt@kernel.org> Date: Thu Oct 21 10:09:28 2021 +0300 Revert "memblock: exclude NOMAP regions from kmemleak" Commit 6e44bd6d34d6 ("memblock: exclude NOMAP regions from kmemleak") breaks boot on EFI systems with kmemleak and VM_DEBUG enabled: efi: Processing EFI memory map: efi: 0x000090000000-0x000091ffffff [Conventional\| \| \| \| \| \| \| \| \| \| \|WB\|WT\|WC\|UC] efi: 0x000092000000-0x0000928fffff [Runtime Data\|RUN\| \| \| \| \| \| \| \| \| \|WB\|WT\|WC\|UC] ------------[ cut here ]------------ kernel BUG at mm/kmemleak.c:1140! Internal error: Oops - BUG: 0 [#1] SMP Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.0-rc6-next-20211019+ #104 pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : kmemleak_free_part_phys+0x64/0x8c lr : kmemleak_free_part_phys+0x38/0x8c sp : ffff800011eafbc0 x29: ffff800011eafbc0 x28: 1fffff7fffb41c0d x27: fffffbfffda0e068 x26: 0000000092000000 x25: 1ffff000023d5f94 x24: ffff800011ed84d0 x23: ffff800011ed84c0 x22: ffff800011ed83d8 x21: 0000000000900000 x20: ffff800011782000 x19: 0000000092000000 x18: ffff800011ee0730 x17: 0000000000000000 x16: 0000000000000000 x15: 1ffff0000233252c x14: ffff800019a905a0 x13: 0000000000000001 x12: ffff7000023d5ed7 x11: 1ffff000023d5ed6 x10: ffff7000023d5ed6 x9 : dfff800000000000 x8 : ffff800011eaf6b7 x7 : 0000000000000001 x6 : ffff800011eaf6b0 x5 : 00008ffffdc2a12a x4 : ffff7000023d5ed7 x3 : 1ffff000023dbf99 x2 : 1ffff000022f0463 x1 : 0000000000000000 x0 : ffffffffffffffff Call trace: kmemleak_free_part_phys+0x64/0x8c memblock_mark_nomap+0x5c/0x78 reserve_regions+0x294/0x33c efi_init+0x2d0/0x490 setup_arch+0x80/0x138 start_kernel+0xa0/0x3ec __primary_switched+0xc0/0xc8 Code: 34000041 97d526e7 f9418e80 36000040 (d4210000) random: get_random_bytes called from print_oops_end_marker+0x34/0x80 with crng_init=0 ---[ end trace 0000000000000000 ]--- The crash happens because kmemleak_free_part_phys() tries to use __va() before memstart_addr is initialized and this triggers a VM_BUG_ON() in arch/arm64/include/asm/memory.h: Revert 6e44bd6d34d6 ("memblock: exclude NOMAP regions from kmemleak"), the issue it is fixing will be fixed differently. Reported-by: Qian Cai <quic_qiancai@quicinc.com> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Rafael Aquini <aquini@redhat.com>	2021-11-29 11:44:10 -05:00
Rafael Aquini	870fd9faaa	memblock: check memory total_size Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2023396 This patch is a backport of the following upstream commit: commit 5173ed72bcfcddda21ff274ee31c6472fa150f29 Author: Peng Fan <peng.fan@nxp.com> Date: Mon Oct 18 15:15:45 2021 -0700 memblock: check memory total_size mem=[X][G\|M] is broken on ARM64 platform, there are cases that even type.cnt is 1, but total_size is not 0 because regions are merged into 1. So only check 'cnt' is not enough, total_size should be used, othersize bootargs 'mem=[X][G\|B]' not work anymore. Link: https://lkml.kernel.org/r/20210930024437.32598-1-peng.fan@oss.nxp.com Fixes: e888fa7bb882 ("memblock: Check memory add/cap ordering") Signed-off-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Rafael Aquini <aquini@redhat.com>	2021-11-29 11:44:04 -05:00
Rafael Aquini	953b2e148a	memblock: exclude NOMAP regions from kmemleak Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2023396 This patch is a backport of the following upstream commit: commit 6e44bd6d34d659c44cd8e7fc925c8a97f49b3c33 Author: Mike Rapoport <rppt@kernel.org> Date: Wed Oct 13 08:36:59 2021 +0300 memblock: exclude NOMAP regions from kmemleak Vladimir Zapolskiy reports: commit a7259df76702 ("memblock: make memblock_find_in_range method private") invokes a kernel panic while running kmemleak on OF platforms with nomaped regions: Unable to handle kernel paging request at virtual address fff000021e00000 [...] scan_block+0x64/0x170 scan_gray_list+0xe8/0x17c kmemleak_scan+0x270/0x514 kmemleak_write+0x34c/0x4ac Indeed, NOMAP regions don't have linear map entries so an attempt to scan these areas would fault. Prevent such faults by excluding NOMAP regions from kmemleak. Link: https://lore.kernel.org/all/8ade5174-b143-d621-8c8e-dc6a1898c6fb@linaro.org Fixes: a7259df76702 ("memblock: make memblock_find_in_range method private") Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Tested-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org> Signed-off-by: Rafael Aquini <aquini@redhat.com>	2021-11-29 11:44:00 -05:00
Rafael Aquini	26cc6f24ba	memblock: introduce saner 'memblock_free_ptr()' interface Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2023396 This patch is a backport of the following upstream commit: commit 77e02cf57b6cff9919949defb7fd9b8ac16399a2 Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Tue Sep 14 13:23:22 2021 -0700 memblock: introduce saner 'memblock_free_ptr()' interface The boot-time allocation interface for memblock is a mess, with 'memblock_alloc()' returning a virtual pointer, but then you are supposed to free it with 'memblock_free()' that takes a _physical_ address. Not only is that all kinds of strange and illogical, but it actually causes bugs, when people then use it like a normal allocation function, and it fails spectacularly on a NULL pointer: https://lore.kernel.org/all/20210912140820.GD25450@xsang-OptiPlex-9020/ or just random memory corruption if the debug checks don't catch it: https://lore.kernel.org/all/61ab2d0c-3313-aaab-514c-e15b7aa054a0@suse.cz/ I really don't want to apply patches that treat the symptoms, when the fundamental cause is this horribly confusing interface. I started out looking at just automating a sane replacement sequence, but because of this mix or virtual and physical addresses, and because people have used the "__pa()" macro that can take either a regular kernel pointer, or just the raw "unsigned long" address, it's all quite messy. So this just introduces a new saner interface for freeing a virtual address that was allocated using 'memblock_alloc()', and that was kept as a regular kernel pointer. And then it converts a couple of users that are obvious and easy to test, including the 'xbc_nodes' case in lib/bootconfig.c that caused problems. Reported-by: kernel test robot <oliver.sang@intel.com> Fixes: 40caa127f3c7 ("init: bootconfig: Remove all bootconfig data when the init memory is removed") Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Rafael Aquini <aquini@redhat.com>	2021-11-29 11:43:49 -05:00
Rafael Aquini	ca20b67b01	memblock: make memblock_find_in_range method private Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2023396 This patch is a backport of the following upstream commit: commit a7259df7670240ee03b0cfce8a3e5d3773911e24 Author: Mike Rapoport <rppt@kernel.org> Date: Thu Sep 2 15:00:26 2021 -0700 memblock: make memblock_find_in_range method private There are a lot of uses of memblock_find_in_range() along with memblock_reserve() from the times memblock allocation APIs did not exist. memblock_find_in_range() is the very core of memblock allocations, so any future changes to its internal behaviour would mandate updates of all the users outside memblock. Replace the calls to memblock_find_in_range() with an equivalent calls to memblock_phys_alloc() and memblock_phys_alloc_range() and make memblock_find_in_range() private method of memblock. This simplifies the callers, ensures that (unlikely) errors in memblock_reserve() are handled and improves maintainability of memblock_find_in_range(). Link: https://lkml.kernel.org/r/20210816122622.30279-1-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Acked-by: Kirill A. Shutemov <kirill.shtuemov@linux.intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [ACPI] Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Acked-by: Nick Kossifidis <mick@ics.forth.gr> [riscv] Tested-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Rafael Aquini <aquini@redhat.com>	2021-11-29 11:42:24 -05:00
Rafael Aquini	07bffc920b	memblock: stop poisoning raw allocations Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2023396 This patch is a backport of the following upstream commit: commit 08678804e0b305bbbf5b756ad365373e5fe885a2 Author: Mike Rapoport <rppt@kernel.org> Date: Thu Sep 2 14:58:05 2021 -0700 memblock: stop poisoning raw allocations Functions memblock_alloc_exact_nid_raw() and memblock_alloc_try_nid_raw() are intended for early memory allocation without overhead of zeroing the allocated memory. Since these functions were used to allocate the memory map, they have ended up with addition of a call to page_init_poison() that poisoned the allocated memory when CONFIG_PAGE_POISON was set. Since the memory map is allocated using a dedicated memmep_alloc() function that takes care of the poisoning, remove page poisoning from the memblock_alloc_*_raw() functions. Link: https://lkml.kernel.org/r/20210714123739.16493-5-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Michal Simek <monstr@monstr.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Rafael Aquini <aquini@redhat.com>	2021-11-29 11:41:52 -05:00
Rafael Aquini	b6289f4f7a	memblock: Check memory add/cap ordering Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2023396 This patch is a backport of the following upstream commit: commit e888fa7bb882a1f305526d8f49d7016a7bc5f5ca Author: Geert Uytterhoeven <geert+renesas@glider.be> Date: Wed Aug 11 10:55:18 2021 +0200 memblock: Check memory add/cap ordering For memblock_cap_memory_range() to work properly, it should be called after memory is detected and added to memblock with memblock_add() or memblock_add_node(). If memblock_cap_memory_range() would be called before memory is registered, we may silently corrupt memory later because the crash kernel will see all memory as available. Print a warning and bail out if ordering is not satisfied. Suggested-by: Mike Rapoport <rppt@kernel.org> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Link: https://lore.kernel.org/r/aabc5bad008d49f07d542815c6c8d28ec90bb09e.1628672091.git.geert+renesas@glider.be Signed-off-by: Rafael Aquini <aquini@redhat.com>	2021-11-29 11:40:27 -05:00
Rafael Aquini	597b6ab1c9	memblock: Add missing debug code to memblock_add_node() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2023396 This patch is a backport of the following upstream commit: commit 00974b9a83cb233d9c8f9758f541d9aa2a80c5cd Author: Geert Uytterhoeven <geert+renesas@glider.be> Date: Wed Aug 11 10:54:36 2021 +0200 memblock: Add missing debug code to memblock_add_node() All other memblock APIs built on top of memblock_add_range() contain debug code to print their parameters. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Link: https://lore.kernel.org/r/c45e5218b6fcf0e3aeb63d9a9d9792addae0bb7a.1628672041.git.geert+renesas@glider.be Signed-off-by: Rafael Aquini <aquini@redhat.com>	2021-11-29 11:40:26 -05:00
Mike Rapoport	79e482e9c3	memblock: make for_each_mem_range() traverse MEMBLOCK_HOTPLUG regions Commit `b10d6bca87` ("arch, drivers: replace for_each_membock() with for_each_mem_range()") didn't take into account that when there is movable_node parameter in the kernel command line, for_each_mem_range() would skip ranges marked with MEMBLOCK_HOTPLUG. The page table setup code in POWER uses for_each_mem_range() to create the linear mapping of the physical memory and since the regions marked as MEMORY_HOTPLUG are skipped, they never make it to the linear map. A later access to the memory in those ranges will fail: BUG: Unable to handle kernel data access on write at 0xc000000400000000 Faulting instruction address: 0xc00000000008a3c0 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 0 PID: 53 Comm: kworker/u2:0 Not tainted 5.13.0 #7 NIP: c00000000008a3c0 LR: c0000000003c1ed8 CTR: 0000000000000040 REGS: c000000008a57770 TRAP: 0300 Not tainted (5.13.0) MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 84222202 XER: 20040000 CFAR: c0000000003c1ed4 DAR: c000000400000000 DSISR: 42000000 IRQMASK: 0 GPR00: c0000000003c1ed8 c000000008a57a10 c0000000019da700 c000000400000000 GPR04: 0000000000000280 0000000000000180 0000000000000400 0000000000000200 GPR08: 0000000000000100 0000000000000080 0000000000000040 0000000000000300 GPR12: 0000000000000380 c000000001bc0000 c0000000001660c8 c000000006337e00 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000040000000 0000000020000000 c000000001a81990 c000000008c30000 GPR24: c000000008c20000 c000000001a81998 000fffffffff0000 c000000001a819a0 GPR28: c000000001a81908 c00c000001000000 c000000008c40000 c000000008a64680 NIP clear_user_page+0x50/0x80 LR __handle_mm_fault+0xc88/0x1910 Call Trace: __handle_mm_fault+0xc44/0x1910 (unreliable) handle_mm_fault+0x130/0x2a0 __get_user_pages+0x248/0x610 __get_user_pages_remote+0x12c/0x3e0 get_arg_page+0x54/0xf0 copy_string_kernel+0x11c/0x210 kernel_execve+0x16c/0x220 call_usermodehelper_exec_async+0x1b0/0x2f0 ret_from_kernel_thread+0x5c/0x70 Instruction dump: 79280fa4 79271764 79261f24 794ae8e2 7ca94214 7d683a14 7c893a14 7d893050 7d4903a6 60000000 60000000 60000000 <7c001fec> 7c091fec 7c081fec 7c051fec ---[ end trace 490b8c67e6075e09 ]--- Making for_each_mem_range() include MEMBLOCK_HOTPLUG regions in the traversal fixes this issue. Link: https://bugzilla.redhat.com/show_bug.cgi?id=1976100 Link: https://lkml.kernel.org/r/20210712071132.20902-1-rppt@kernel.org Fixes: `b10d6bca87` ("arch, drivers: replace for_each_membock() with for_each_mem_range()") Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Tested-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: <stable@vger.kernel.org> [5.10+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-07-23 17:43:28 -07:00
Linus Torvalds	a412897fb5	memblock, arm: fix crashes caused by holes in the memory map The coordination between freeing of unused memory map, pfn_valid() and core mm assumptions about validity of the memory map in various ranges was not designed for complex layouts of the physical memory with a lot of holes all over the place. Kefen Wang reported crashes in move_freepages() on a system with the following memory layout [1]: node 0: [mem 0x0000000080a00000-0x00000000855fffff] node 0: [mem 0x0000000086a00000-0x0000000087dfffff] node 0: [mem 0x000000008bd00000-0x000000008c4fffff] node 0: [mem 0x000000008e300000-0x000000008ecfffff] node 0: [mem 0x0000000090d00000-0x00000000bfffffff] node 0: [mem 0x00000000cc000000-0x00000000dc9fffff] node 0: [mem 0x00000000de700000-0x00000000de9fffff] node 0: [mem 0x00000000e0800000-0x00000000e0bfffff] node 0: [mem 0x00000000f4b00000-0x00000000f6ffffff] node 0: [mem 0x00000000fda00000-0x00000000ffffefff] These crashes can be mitigated by enabling CONFIG_HOLES_IN_ZONE on ARM and essentially turning pfn_valid_within() to pfn_valid() instead of having it hardwired to 1 on that architecture, but this would require to keep CONFIG_HOLES_IN_ZONE solely for this purpose. A cleaner approach is to update ARM's implementation of pfn_valid() to take into accounting rounding of the freed memory map to pageblock boundaries and make sure it returns true for PFNs that have memory map entries even if there is no physical memory backing those PFNs. [1] https://lore.kernel.org/lkml/2a1592ad-bc9d-4664-fd19-f7448a37edc0@huawei.com -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEeOVYVaWZL5900a/pOQOGJssO/ZEFAmDhzQQTHHJwcHRAbGlu dXguaWJtLmNvbQAKCRA5A4Ymyw79kXeUCACS0lssuKbaBxFk6OkEe0nbmbwN/n9z zKd2AWzw9xFxYZkLfOCmi5EPUMI0IeDYjOyZmnj8YDDd7wRLVxZ51LSdyFDZafXY j6SVYprSmwUjLkuajmqifY5DLbZYeGuI6WFvNVLljltHc0i/GIzx1Tld2yO/M0Jk NzHQ0/5nXmU74PvvY8LrWk+rRjTYqMuolHvbbl4nNId5e/FYEWNxEqNO5gq6aG5g +5t1BjyLf1NMp67uc5aLoLmr2ZwK8/UmZeSZ7i9z03gU/5B1srLluhoBsYBPVHFY hRNRKwWUDRUmqjJnu5/EzI+iQnj7t6zV1hyt+E5B1gb89vuSVcJNOPQt =wCcY -----END PGP SIGNATURE----- Merge tag 'memblock-v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock Pull memblock updates from Mike Rapoport: "Fix arm crashes caused by holes in the memory map. The coordination between freeing of unused memory map, pfn_valid() and core mm assumptions about validity of the memory map in various ranges was not designed for complex layouts of the physical memory with a lot of holes all over the place. Kefen Wang reported crashes in move_freepages() on a system with the following memory layout [1]: node 0: [mem 0x0000000080a00000-0x00000000855fffff] node 0: [mem 0x0000000086a00000-0x0000000087dfffff] node 0: [mem 0x000000008bd00000-0x000000008c4fffff] node 0: [mem 0x000000008e300000-0x000000008ecfffff] node 0: [mem 0x0000000090d00000-0x00000000bfffffff] node 0: [mem 0x00000000cc000000-0x00000000dc9fffff] node 0: [mem 0x00000000de700000-0x00000000de9fffff] node 0: [mem 0x00000000e0800000-0x00000000e0bfffff] node 0: [mem 0x00000000f4b00000-0x00000000f6ffffff] node 0: [mem 0x00000000fda00000-0x00000000ffffefff] These crashes can be mitigated by enabling CONFIG_HOLES_IN_ZONE on ARM and essentially turning pfn_valid_within() to pfn_valid() instead of having it hardwired to 1 on that architecture, but this would require to keep CONFIG_HOLES_IN_ZONE solely for this purpose. A cleaner approach is to update ARM's implementation of pfn_valid() to take into accounting rounding of the freed memory map to pageblock boundaries and make sure it returns true for PFNs that have memory map entries even if there is no physical memory backing those PFNs" Link: https://lore.kernel.org/lkml/2a1592ad-bc9d-4664-fd19-f7448a37edc0@huawei.com [1] * tag 'memblock-v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock: arm: extend pfn_valid to take into account freed memory map alignment memblock: ensure there is no overflow in memblock_overlaps_region() memblock: align freed memory map on pageblock boundaries with SPARSEMEM memblock: free_unused_memmap: use pageblock units instead of MAX_ORDER	2021-07-04 12:23:05 -07:00
Mike Rapoport	9092d4f7a1	memblock: update initialization of reserved pages The struct pages representing a reserved memory region are initialized using reserve_bootmem_range() function. This function is called for each reserved region just before the memory is freed from memblock to the buddy page allocator. The struct pages for MEMBLOCK_NOMAP regions are kept with the default values set by the memory map initialization which makes it necessary to have a special treatment for such pages in pfn_valid() and pfn_valid_within(). Split out initialization of the reserved pages to a function with a meaningful name and treat the MEMBLOCK_NOMAP regions the same way as the reserved regions and mark struct pages for the NOMAP regions as PageReserved. Link: https://lkml.kernel.org/r/20210511100550.28178-3-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-06-30 20:47:29 -07:00
Mike Rapoport	023accf5cd	memblock: ensure there is no overflow in memblock_overlaps_region() There maybe an overflow in memblock_overlaps_region() if it is called with base and size such that base + size > PHYS_ADDR_MAX Make sure that memblock_overlaps_region() caps the size to prevent such overflow and remove now duplicated call to memblock_cap_size() from memblock_is_region_reserved(). Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Tested-by: Tony Lindgren <tony@atomide.com>	2021-06-30 11:38:56 +03:00
Mike Rapoport	f921f53e08	memblock: align freed memory map on pageblock boundaries with SPARSEMEM When CONFIG_SPARSEMEM=y the ranges of the memory map that are freed are not aligned to the pageblock boundaries which breaks assumptions about homogeneity of the memory map throughout core mm code. Make sure that the freed memory map is always aligned on pageblock boundaries regardless of the memory model selection. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Tested-by: Tony Lindgren <tony@atomide.com>	2021-06-30 11:38:51 +03:00
Mike Rapoport	e2a86800d5	memblock: free_unused_memmap: use pageblock units instead of MAX_ORDER The code that frees unused memory map uses rounds start and end of the holes that are freed to MAX_ORDER_NR_PAGES to preserve continuity of the memory map for MAX_ORDER regions. Lots of core memory management functionality relies on homogeneity of the memory map within each pageblock which size may differ from MAX_ORDER in certain configurations. Although currently, for the architectures that use free_unused_memmap(), pageblock_order and MAX_ORDER are equivalent, it is cleaner to have common notation thought mm code. Replace MAX_ORDER_NR_PAGES with pageblock_nr_pages and update the comments to make it more clear why the alignment to pageblock boundaries is required. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Tested-by: Tony Lindgren <tony@atomide.com>	2021-06-30 11:38:33 +03:00

1 2 3 4 5 ...

313 Commits