hugetlb: prioritize surplus allocation from current node

JIRA: https://issues.redhat.com/browse/RHEL-68966
Tested: by me

commit d0f14f7ee0e2d5df447d54487ae0c3aee5a7208f
Author: Koichiro Den <koichiro.den@canonical.com>
Date:   Thu Dec 5 01:55:03 2024 +0900

    hugetlb: prioritize surplus allocation from current node

    Previously, surplus allocations triggered by mmap were typically made from
    the node where the process was running.  On a page fault, the area was
    reliably dequeued from the hugepage_freelists for that node.  However,
    since commit 003af997c8a9 ("hugetlb: force allocating surplus hugepages on
    mempolicy allowed nodes"), dequeue_hugetlb_folio_vma() may fall back to
    other nodes unnecessarily even if there is no MPOL_BIND policy, causing
    folios to be dequeued from nodes other than the current one.

    Also, allocating from the node where the current process is running is
    likely to result in a performance win, as mmap-ing processes often touch
    the area not so long after allocation.  This change minimizes surprises
    for users relying on the previous behavior while maintaining the benefit
    introduced by the commit.

    So, prioritize the node the current process is running on when possible.

    Link: https://lkml.kernel.org/r/20241204165503.628784-1-koichiro.den@canonical.com
    Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
    Acked-by: Aristeu Rozanski <aris@ruivo.org>
    Cc: Aristeu Rozanski <aris@redhat.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Muchun Song <muchun.song@linux.dev>
    Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
This commit is contained in:
Aristeu Rozanski 2025-01-29 14:40:36 -05:00
parent e7bc4c4673
commit 53820ed2e7
1 changed files with 17 additions and 3 deletions

View File

@ -2541,7 +2541,13 @@ static int gather_surplus_pages(struct hstate *h, long delta)
long needed, allocated;
bool alloc_ok = true;
int node;
nodemask_t *mbind_nodemask = policy_mbind_nodemask(htlb_alloc_mask(h));
nodemask_t *mbind_nodemask, alloc_nodemask;
mbind_nodemask = policy_mbind_nodemask(htlb_alloc_mask(h));
if (mbind_nodemask)
nodes_and(alloc_nodemask, *mbind_nodemask, cpuset_current_mems_allowed);
else
alloc_nodemask = cpuset_current_mems_allowed;
lockdep_assert_held(&hugetlb_lock);
needed = (h->resv_huge_pages + delta) - h->free_huge_pages;
@ -2557,8 +2563,16 @@ retry:
spin_unlock_irq(&hugetlb_lock);
for (i = 0; i < needed; i++) {
folio = NULL;
for_each_node_mask(node, cpuset_current_mems_allowed) {
if (!mbind_nodemask || node_isset(node, *mbind_nodemask)) {
/* Prioritize current node */
if (node_isset(numa_mem_id(), alloc_nodemask))
folio = alloc_surplus_hugetlb_folio(h, htlb_alloc_mask(h),
numa_mem_id(), NULL);
if (!folio) {
for_each_node_mask(node, alloc_nodemask) {
if (node == numa_mem_id())
continue;
folio = alloc_surplus_hugetlb_folio(h, htlb_alloc_mask(h),
node, NULL);
if (folio)