Commit Graph

1010 Commits

Author SHA1 Message Date
Nico Pache 32ce27b2f6 hugetlb: check for hugetlb folio before vmemmap_restore
commit 30a89adf872d2e46323840964c95dc0ae3bb5843
Author: Mike Kravetz <mike.kravetz@oracle.com>
Date:   Mon Oct 16 19:55:49 2023 -0700

    hugetlb: check for hugetlb folio before vmemmap_restore

    In commit d8f5f7e445f0 ("hugetlb: set hugetlb page flag before
    optimizing vmemmap") checks were added to print a warning if
    hugetlb_vmemmap_restore was called on a non-hugetlb page.

    This was mostly due to ordering issues in the hugetlb page set up and tear
    down sequencees.  One place missed was the routine
    dissolve_free_huge_page.

    Naoya Horiguchi noted: "I saw that VM_WARN_ON_ONCE() in
    hugetlb_vmemmap_restore is triggered when memory_failure() is called on a
    free hugetlb page with vmemmap optimization disabled (the warning is not
    triggered if vmemmap optimization is enabled).  I think that we need check
    folio_test_hugetlb() before dissolve_free_huge_page() calls
    hugetlb_vmemmap_restore_folio()."

    Perform the check as suggested by Naoya.

    Link: https://lkml.kernel.org/r/20231017032140.GA3680@monkey
    Fixes: d8f5f7e445f0 ("hugetlb: set hugetlb page flag before optimizing vmemmap")
    Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
    Suggested-by: Naoya Horiguchi <naoya.horiguchi@linux.dev>
    Tested-by: Naoya Horiguchi <naoya.horiguchi@linux.dev>
    Cc: Anshuman Khandual <anshuman.khandual@arm.com>
    Cc: Barry Song <song.bao.hua@hisilicon.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Joao Martins <joao.m.martins@oracle.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Cc: Oscar Salvador <osalvador@suse.de>
    Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

JIRA: https://issues.redhat.com/browse/RHEL-39710
Signed-off-by: Nico Pache <npache@redhat.com>
2024-06-13 10:42:17 -06:00
Nico Pache da7e25afc9 hugetlb: set hugetlb page flag before optimizing vmemmap
Conflicts:
       mm/hugetlb.c: missing 9c5ccf2db04b8 ("mm: remove HUGETLB_PAGE_DTOR")
        which changes folio_set_compound_dtor to folio_set_hugetlb.

commit d8f5f7e445f02eb10dee1a0a992146314cf460f8
Author: Mike Kravetz <mike.kravetz@oracle.com>
Date:   Tue Aug 29 14:37:34 2023 -0700

    hugetlb: set hugetlb page flag before optimizing vmemmap

    Currently, vmemmap optimization of hugetlb pages is performed before the
    hugetlb flag (previously hugetlb destructor) is set identifying it as a
    hugetlb folio.  This means there is a window of time where an ordinary
    folio does not have all associated vmemmap present.  The core mm only
    expects vmemmap to be potentially optimized for hugetlb and device dax.
    This can cause problems in code such as memory error handling that may
    want to write to tail struct pages.

    There is only one call to perform hugetlb vmemmap optimization today.  To
    fix this issue, simply set the hugetlb flag before that call.

    There was a similar issue in the free hugetlb path that was previously
    addressed.  The two routines that optimize or restore hugetlb vmemmap
    should only be passed hugetlb folios/pages.  To catch any callers not
    following this rule, add VM_WARN_ON calls to the routines.  In the hugetlb
    free code paths, some calls could be made to restore vmemmap after
    clearing the hugetlb flag.  This was 'safe' as in these cases vmemmap was
    already present and the call was a NOOP.  However, for consistency these
    calls where eliminated so that we can add the VM_WARN_ON checks.

    Link: https://lkml.kernel.org/r/20230829213734.69673-1-mike.kravetz@oracle.com
    Fixes: f41f2ed43c ("mm: hugetlb: free the vmemmap pages associated with each HugeTLB page")
    Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
    Reviewed-by: Muchun Song <songmuchun@bytedance.com>
    Cc: James Houghton <jthoughton@google.com>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
    Cc: Usama Arif <usama.arif@bytedance.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

JIRA: https://issues.redhat.com/browse/RHEL-39710
Signed-off-by: Nico Pache <npache@redhat.com>
2024-06-13 10:42:17 -06:00
Rafael Aquini 20fedd0c5f mm/hugetlb: fix missing hugetlb_lock for resv uncharge
JIRA: https://issues.redhat.com/browse/RHEL-37467
CVE: CVE-2024-36000

This commit is a backport of the following upstream commit:
commit b76b46902c2d0395488c8412e1116c2486cdfcb2
Author: Peter Xu <peterx@redhat.com>
Date:   Wed Apr 17 17:18:35 2024 -0400

    mm/hugetlb: fix missing hugetlb_lock for resv uncharge

    There is a recent report on UFFDIO_COPY over hugetlb:

    https://lore.kernel.org/all/000000000000ee06de0616177560@google.com/

    350:    lockdep_assert_held(&hugetlb_lock);

    Should be an issue in hugetlb but triggered in an userfault context, where
    it goes into the unlikely path where two threads modifying the resv map
    together.  Mike has a fix in that path for resv uncharge but it looks like
    the locking criteria was overlooked: hugetlb_cgroup_uncharge_folio_rsvd()
    will update the cgroup pointer, so it requires to be called with the lock
    held.

    Link: https://lkml.kernel.org/r/20240417211836.2742593-3-peterx@redhat.com
    Fixes: 79aa925bf2 ("hugetlb_cgroup: fix reservation accounting")
    Signed-off-by: Peter Xu <peterx@redhat.com>
    Reported-by: syzbot+4b8077a5fccc61c385a1@syzkaller.appspotmail.com
    Reviewed-by: Mina Almasry <almasrymina@google.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <aquini@redhat.com>
2024-05-20 14:19:00 -04:00
Nico Pache d1631d516e hugetlb: fix null-ptr-deref in hugetlb_vma_lock_write
commit 187da0f8250aa94bd96266096aef6f694e0b4cd2
Author: Mike Kravetz <mike.kravetz@oracle.com>
Date:   Mon Nov 13 17:20:33 2023 -0800

    hugetlb: fix null-ptr-deref in hugetlb_vma_lock_write

    The routine __vma_private_lock tests for the existence of a reserve map
    associated with a private hugetlb mapping.  A pointer to the reserve map
    is in vma->vm_private_data.  __vma_private_lock was checking the pointer
    for NULL.  However, it is possible that the low bits of the pointer could
    be used as flags.  In such instances, vm_private_data is not NULL and not
    a valid pointer.  This results in the null-ptr-deref reported by syzbot:

    general protection fault, probably for non-canonical address 0xdffffc000000001d:
     0000 [#1] PREEMPT SMP KASAN
    KASAN: null-ptr-deref in range [0x00000000000000e8-0x00000000000000ef]
    CPU: 0 PID: 5048 Comm: syz-executor139 Not tainted 6.6.0-rc7-syzkaller-00142-g88
    8cf78c29e2 #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 1
    0/09/2023
    RIP: 0010:__lock_acquire+0x109/0x5de0 kernel/locking/lockdep.c:5004
    ...
    Call Trace:
     <TASK>
     lock_acquire kernel/locking/lockdep.c:5753 [inline]
     lock_acquire+0x1ae/0x510 kernel/locking/lockdep.c:5718
     down_write+0x93/0x200 kernel/locking/rwsem.c:1573
     hugetlb_vma_lock_write mm/hugetlb.c:300 [inline]
     hugetlb_vma_lock_write+0xae/0x100 mm/hugetlb.c:291
     __hugetlb_zap_begin+0x1e9/0x2b0 mm/hugetlb.c:5447
     hugetlb_zap_begin include/linux/hugetlb.h:258 [inline]
     unmap_vmas+0x2f4/0x470 mm/memory.c:1733
     exit_mmap+0x1ad/0xa60 mm/mmap.c:3230
     __mmput+0x12a/0x4d0 kernel/fork.c:1349
     mmput+0x62/0x70 kernel/fork.c:1371
     exit_mm kernel/exit.c:567 [inline]
     do_exit+0x9ad/0x2a20 kernel/exit.c:861
     __do_sys_exit kernel/exit.c:991 [inline]
     __se_sys_exit kernel/exit.c:989 [inline]
     __x64_sys_exit+0x42/0x50 kernel/exit.c:989
     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
     do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
     entry_SYSCALL_64_after_hwframe+0x63/0xcd

    Mask off low bit flags before checking for NULL pointer.  In addition, the
    reserve map only 'belongs' to the OWNER (parent in parent/child
    relationships) so also check for the OWNER flag.

    Link: https://lkml.kernel.org/r/20231114012033.259600-1-mike.kravetz@oracle.com
    Reported-by: syzbot+6ada951e7c0f7bc8a71e@syzkaller.appspotmail.com
    Closes: https://lore.kernel.org/linux-mm/00000000000078d1e00608d7878b@google.com/
    Fixes: bf4916922c60 ("hugetlbfs: extend hugetlb_vma_lock to private VMAs")
    Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
    Reviewed-by: Rik van Riel <riel@surriel.com>
    Cc: Edward Adam Davis <eadavis@qq.com>
    Cc: Muchun Song <muchun.song@linux.dev>
    Cc: Nathan Chancellor <nathan@kernel.org>
    Cc: Nick Desaulniers <ndesaulniers@google.com>
    Cc: Tom Rix <trix@redhat.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

JIRA: https://issues.redhat.com/browse/RHEL-5619
Signed-off-by: Nico Pache <npache@redhat.com>
2024-04-30 17:51:33 -06:00
Nico Pache 734ceed97a mm/hugetlb: fix nodes huge page allocation when there are surplus pages
commit b72b3c9c34c825c81d205241c5f822fc7835923f
Author: Xueshi Hu <xueshi.hu@smartx.com>
Date:   Tue Aug 29 11:33:43 2023 +0800

    mm/hugetlb: fix nodes huge page allocation when there are surplus pages

    In set_nr_huge_pages(), local variable "count" is used to record
    persistent_huge_pages(), but when it cames to nodes huge page allocation,
    the semantics changes to nr_huge_pages.  When there exists surplus huge
    pages and using the interface under
    /sys/devices/system/node/node*/hugepages to change huge page pool size,
    this difference can result in the allocation of an unexpected number of
    huge pages.

    Steps to reproduce the bug:

    Starting with:

                                      Node 0          Node 1    Total
            HugePages_Total             0.00            0.00     0.00
            HugePages_Free              0.00            0.00     0.00
            HugePages_Surp              0.00            0.00     0.00

    create 100 huge pages in Node 0 and consume it, then set Node 0 's
    nr_hugepages to 0.

    yields:

                                      Node 0          Node 1    Total
            HugePages_Total           200.00            0.00   200.00
            HugePages_Free              0.00            0.00     0.00
            HugePages_Surp            200.00            0.00   200.00

    write 100 to Node 1's nr_hugepages

                    echo 100 > /sys/devices/system/node/node1/\
            hugepages/hugepages-2048kB/nr_hugepages

    gets:

                                      Node 0          Node 1    Total
            HugePages_Total           200.00          400.00   600.00
            HugePages_Free              0.00          400.00   400.00
            HugePages_Surp            200.00            0.00   200.00

    Kernel is expected to create only 100 huge pages and it gives 200.

    Link: https://lkml.kernel.org/r/20230829033343.467779-1-xueshi.hu@smartx.com
    Fixes: 9a30523066 ("hugetlb: add per node hstate attributes")
    Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Andi Kleen <andi@firstfloor.org>
    Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
    Cc: Mel Gorman <mel@csn.ul.ie>
    Cc: Muchun Song <muchun.song@linux.dev>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

JIRA: https://issues.redhat.com/browse/RHEL-5619
Signed-off-by: Nico Pache <npache@redhat.com>
2024-04-30 17:51:32 -06:00
Nico Pache 51217d9111 hugetlbfs: close race between MADV_DONTNEED and page fault
commit 2820b0f09be99f6406784b03a22dfc83e858449d
Author: Rik van Riel <riel@surriel.com>
Date:   Thu Oct 5 23:59:08 2023 -0400

    hugetlbfs: close race between MADV_DONTNEED and page fault

    Malloc libraries, like jemalloc and tcalloc, take decisions on when to
    call madvise independently from the code in the main application.

    This sometimes results in the application page faulting on an address,
    right after the malloc library has shot down the backing memory with
    MADV_DONTNEED.

    Usually this is harmless, because we always have some 4kB pages sitting
    around to satisfy a page fault.  However, with hugetlbfs systems often
    allocate only the exact number of huge pages that the application wants.

    Due to TLB batching, hugetlbfs MADV_DONTNEED will free pages outside of
    any lock taken on the page fault path, which can open up the following
    race condition:

           CPU 1                            CPU 2

           MADV_DONTNEED
           unmap page
           shoot down TLB entry
                                           page fault
                                           fail to allocate a huge page
                                           killed with SIGBUS
           free page

    Fix that race by pulling the locking from __unmap_hugepage_final_range
    into helper functions called from zap_page_range_single.  This ensures
    page faults stay locked out of the MADV_DONTNEED VMA until the huge pages
    have actually been freed.

    Link: https://lkml.kernel.org/r/20231006040020.3677377-4-riel@surriel.com
    Fixes: 04ada095dcfc ("hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing")
    Signed-off-by: Rik van Riel <riel@surriel.com>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Muchun Song <muchun.song@linux.dev>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

JIRA: https://issues.redhat.com/browse/RHEL-5619
Signed-off-by: Nico Pache <npache@redhat.com>
2024-04-30 17:51:32 -06:00
Nico Pache 86ade21642 hugetlbfs: extend hugetlb_vma_lock to private VMAs
commit bf4916922c60f43efaa329744b3eef539aa6a2b2
Author: Rik van Riel <riel@surriel.com>
Date:   Thu Oct 5 23:59:07 2023 -0400

    hugetlbfs: extend hugetlb_vma_lock to private VMAs

    Extend the locking scheme used to protect shared hugetlb mappings from
    truncate vs page fault races, in order to protect private hugetlb mappings
    (with resv_map) against MADV_DONTNEED.

    Add a read-write semaphore to the resv_map data structure, and use that
    from the hugetlb_vma_(un)lock_* functions, in preparation for closing the
    race between MADV_DONTNEED and page faults.

    Link: https://lkml.kernel.org/r/20231006040020.3677377-3-riel@surriel.com
    Fixes: 04ada095dcfc ("hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing")
    Signed-off-by: Rik van Riel <riel@surriel.com>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Muchun Song <muchun.song@linux.dev>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

JIRA: https://issues.redhat.com/browse/RHEL-5619
Signed-off-by: Nico Pache <npache@redhat.com>
2024-04-30 17:51:32 -06:00
Nico Pache 6d5662c985 hugetlbfs: clear resv_map pointer if mmap fails
commit 92fe9dcbe4e109a7ce6bab3e452210a35b0ab493
Author: Rik van Riel <riel@surriel.com>
Date:   Thu Oct 5 23:59:06 2023 -0400

    hugetlbfs: clear resv_map pointer if mmap fails

    Patch series "hugetlbfs: close race between MADV_DONTNEED and page fault", v7.

    Malloc libraries, like jemalloc and tcalloc, take decisions on when to
    call madvise independently from the code in the main application.

    This sometimes results in the application page faulting on an address,
    right after the malloc library has shot down the backing memory with
    MADV_DONTNEED.

    Usually this is harmless, because we always have some 4kB pages sitting
    around to satisfy a page fault.  However, with hugetlbfs systems often
    allocate only the exact number of huge pages that the application wants.

    Due to TLB batching, hugetlbfs MADV_DONTNEED will free pages outside of
    any lock taken on the page fault path, which can open up the following
    race condition:

           CPU 1                            CPU 2

           MADV_DONTNEED
           unmap page
           shoot down TLB entry
                                           page fault
                                           fail to allocate a huge page
                                           killed with SIGBUS
           free page

    Fix that race by extending the hugetlb_vma_lock locking scheme to also
    cover private hugetlb mappings (with resv_map), and pulling the locking
    from __unmap_hugepage_final_range into helper functions called from
    zap_page_range_single.  This ensures page faults stay locked out of the
    MADV_DONTNEED VMA until the huge pages have actually been freed.

    This patch (of 3):

    Hugetlbfs leaves a dangling pointer in the VMA if mmap fails.  This has
    not been a problem so far, but other code in this patch series tries to
    follow that pointer.

    Link: https://lkml.kernel.org/r/20231006040020.3677377-1-riel@surriel.com
    Link: https://lkml.kernel.org/r/20231006040020.3677377-2-riel@surriel.com
    Fixes: 04ada095dcfc ("hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing")
    Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
    Signed-off-by: Rik van Riel <riel@surriel.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Muchun Song <muchun.song@linux.dev>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

JIRA: https://issues.redhat.com/browse/RHEL-5619
Signed-off-by: Nico Pache <npache@redhat.com>
2024-04-30 17:51:32 -06:00
Nico Pache a62f4778e9 mm/hugetlb.c: fix a bug within a BUG(): inconsistent pte comparison
Conflicts: mm/hugetlb.c: RHEL DRM commit 26418f1a34 partially
backported the c33c794828f21 ("mm: ptep_get() conversion") commit which
introduced this bug upstream. although we dont have this chunk
downstreamand  we are not experiencing this. I still believe this removes
the "fragility" described in the patch and is the correct thing to do.
please review with careful eyes.

commit 191fcdb6c9cf8b738b1628cbcf3af63d545c825c
Author: John Hubbard <jhubbard@nvidia.com>
Date:   Fri Jun 30 18:04:42 2023 -0700

    mm/hugetlb.c: fix a bug within a BUG(): inconsistent pte comparison

    The following crash happens for me when running the -mm selftests (below).
    Specifically, it happens while running the uffd-stress subtests:

    kernel BUG at mm/hugetlb.c:7249!
    invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
    CPU: 0 PID: 3238 Comm: uffd-stress Not tainted 6.4.0-hubbard-github+ #109
    Hardware name: ASUS X299-A/PRIME X299-A, BIOS 1503 08/03/2018
    RIP: 0010:huge_pte_alloc+0x12c/0x1a0
    ...
    Call Trace:
     <TASK>
     ? __die_body+0x63/0xb0
     ? die+0x9f/0xc0
     ? do_trap+0xab/0x180
     ? huge_pte_alloc+0x12c/0x1a0
     ? do_error_trap+0xc6/0x110
     ? huge_pte_alloc+0x12c/0x1a0
     ? handle_invalid_op+0x2c/0x40
     ? huge_pte_alloc+0x12c/0x1a0
     ? exc_invalid_op+0x33/0x50
     ? asm_exc_invalid_op+0x16/0x20
     ? __pfx_put_prev_task_idle+0x10/0x10
     ? huge_pte_alloc+0x12c/0x1a0
     hugetlb_fault+0x1a3/0x1120
     ? finish_task_switch+0xb3/0x2a0
     ? lock_is_held_type+0xdb/0x150
     handle_mm_fault+0xb8a/0xd40
     ? find_vma+0x5d/0xa0
     do_user_addr_fault+0x257/0x5d0
     exc_page_fault+0x7b/0x1f0
     asm_exc_page_fault+0x22/0x30

    That happens because a BUG() statement in huge_pte_alloc() attempts to
    check that a pte, if present, is a hugetlb pte, but it does so in a
    non-lockless-safe manner that leads to a false BUG() report.

    We got here due to a couple of bugs, each of which by itself was not quite
    enough to cause a problem:

    First of all, before commit c33c794828f2("mm: ptep_get() conversion"), the
    BUG() statement in huge_pte_alloc() was itself fragile: it relied upon
    compiler behavior to only read the pte once, despite using it twice in the
    same conditional.

    Next, commit c33c794828f2 ("mm: ptep_get() conversion") broke that
    delicate situation, by causing all direct pte reads to be done via
    READ_ONCE().  And so READ_ONCE() got called twice within the same BUG()
    conditional, leading to comparing (potentially, occasionally) different
    versions of the pte, and thus to false BUG() reports.

    Fix this by taking a single snapshot of the pte before using it in the
    BUG conditional.

    Now, that commit is only partially to blame here but, people doing
    bisections will invariably land there, so this will help them find a fix
    for a real crash.  And also, the previous behavior was unlikely to ever
    expose this bug--it was fragile, yet not actually broken.

    So that's why I chose this commit for the Fixes tag, rather than the
    commit that created the original BUG() statement.

    Link: https://lkml.kernel.org/r/20230701010442.2041858-1-jhubbard@nvidia.com
    Fixes: c33c794828f2 ("mm: ptep_get() conversion")
    Signed-off-by: John Hubbard <jhubbard@nvidia.com>
    Acked-by: James Houghton <jthoughton@google.com>
    Acked-by: Muchun Song <songmuchun@bytedance.com>
    Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
    Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Adrian Hunter <adrian.hunter@intel.com>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Alex Williamson <alex.williamson@redhat.com>
    Cc: Alexander Potapenko <glider@google.com>
    Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
    Cc: Andrey Konovalov <andreyknvl@gmail.com>
    Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
    Cc: Christian Brauner <brauner@kernel.org>
    Cc: Christoph Hellwig <hch@infradead.org>
    Cc: Daniel Vetter <daniel@ffwll.ch>
    Cc: Dave Airlie <airlied@gmail.com>
    Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
    Cc: Dmitry Vyukov <dvyukov@google.com>
    Cc: Ian Rogers <irogers@google.com>
    Cc: Jason Gunthorpe <jgg@ziepe.ca>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Cc: Lorenzo Stoakes <lstoakes@gmail.com>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Mike Rapoport (IBM) <rppt@kernel.org>
    Cc: Namhyung Kim <namhyung@kernel.org>
    Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
    Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
    Cc: Roman Gushchin <roman.gushchin@linux.dev>
    Cc: SeongJae Park <sj@kernel.org>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
    Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
    Cc: Yu Zhao <yuzhao@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

JIRA: https://issues.redhat.com/browse/RHEL-5619
Signed-off-by: Nico Pache <npache@redhat.com>
2024-04-30 17:51:31 -06:00
Nico Pache bdb2b12d7b mm: replace mmap with vma write lock assertions when operating on a vma
commit e727bfd5e73a35ecbc4a01a15c659b9fafaa97c0
Author: Suren Baghdasaryan <surenb@google.com>
Date:   Fri Aug 4 08:27:21 2023 -0700

    mm: replace mmap with vma write lock assertions when operating on a vma

    Vma write lock assertion always includes mmap write lock assertion and
    additional vma lock checks when per-VMA locks are enabled. Replace
    weaker mmap_assert_write_locked() assertions with stronger
    vma_assert_write_locked() ones when we are operating on a vma which
    is expected to be locked.

    Link: https://lkml.kernel.org/r/20230804152724.3090321-4-surenb@google.com
    Suggested-by: Jann Horn <jannh@google.com>
    Signed-off-by: Suren Baghdasaryan <surenb@google.com>
    Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
    Cc: Linus Torvalds <torvalds@linuxfoundation.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

JIRA: https://issues.redhat.com/browse/RHEL-5619
Signed-off-by: Nico Pache <npache@redhat.com>
2024-04-30 17:51:30 -06:00
Nico Pache bd3e265719 hugetlb: do not clear hugetlb dtor until allocating vmemmap
commit 32c877191e022b55fe3a374f3d7e9fb5741c514d
Author: Mike Kravetz <mike.kravetz@oracle.com>
Date:   Tue Jul 11 15:09:41 2023 -0700

    hugetlb: do not clear hugetlb dtor until allocating vmemmap

    Patch series "Fix hugetlb free path race with memory errors".

    In the discussion of Jiaqi Yan's series "Improve hugetlbfs read on
    HWPOISON hugepages" the race window was discovered.
    https://lore.kernel.org/linux-mm/20230616233447.GB7371@monkey/

    Freeing a hugetlb page back to low level memory allocators is performed
    in two steps.
    1) Under hugetlb lock, remove page from hugetlb lists and clear destructor
    2) Outside lock, allocate vmemmap if necessary and call low level free
    Between these two steps, the hugetlb page will appear as a normal
    compound page.  However, vmemmap for tail pages could be missing.
    If a memory error occurs at this time, we could try to update page
    flags non-existant page structs.

    A much more detailed description is in the first patch.

    The first patch addresses the race window.  However, it adds a
    hugetlb_lock lock/unlock cycle to every vmemmap optimized hugetlb page
    free operation.  This could lead to slowdowns if one is freeing a large
    number of hugetlb pages.

    The second path optimizes the update_and_free_pages_bulk routine to only
    take the lock once in bulk operations.

    The second patch is technically not a bug fix, but includes a Fixes tag
    and Cc stable to avoid a performance regression.  It can be combined with
    the first, but was done separately make reviewing easier.

    This patch (of 2):

    Freeing a hugetlb page and releasing base pages back to the underlying
    allocator such as buddy or cma is performed in two steps:
    - remove_hugetlb_folio() is called to remove the folio from hugetlb
      lists, get a ref on the page and remove hugetlb destructor.  This
      all must be done under the hugetlb lock.  After this call, the page
      can be treated as a normal compound page or a collection of base
      size pages.
    - update_and_free_hugetlb_folio() is called to allocate vmemmap if
      needed and the free routine of the underlying allocator is called
      on the resulting page.  We can not hold the hugetlb lock here.

    One issue with this scheme is that a memory error could occur between
    these two steps.  In this case, the memory error handling code treats
    the old hugetlb page as a normal compound page or collection of base
    pages.  It will then try to SetPageHWPoison(page) on the page with an
    error.  If the page with error is a tail page without vmemmap, a write
    error will occur when trying to set the flag.

    Address this issue by modifying remove_hugetlb_folio() and
    update_and_free_hugetlb_folio() such that the hugetlb destructor is not
    cleared until after allocating vmemmap.  Since clearing the destructor
    requires holding the hugetlb lock, the clearing is done in
    remove_hugetlb_folio() if the vmemmap is present.  This saves a
    lock/unlock cycle.  Otherwise, destructor is cleared in
    update_and_free_hugetlb_folio() after allocating vmemmap.

    Note that this will leave hugetlb pages in a state where they are marked
    free (by hugetlb specific page flag) and have a ref count.  This is not
    a normal state.  The only code that would notice is the memory error
    code, and it is set up to retry in such a case.

    A subsequent patch will create a routine to do bulk processing of
    vmemmap allocation.  This will eliminate a lock/unlock cycle for each
    hugetlb page in the case where we are freeing a large number of pages.

    Link: https://lkml.kernel.org/r/20230711220942.43706-1-mike.kravetz@oracle.com
    Link: https://lkml.kernel.org/r/20230711220942.43706-2-mike.kravetz@oracle.com
    Fixes: ad2fa3717b ("mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page")
    Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
    Reviewed-by: Muchun Song <songmuchun@bytedance.com>
    Tested-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Cc: Axel Rasmussen <axelrasmussen@google.com>
    Cc: James Houghton <jthoughton@google.com>
    Cc: Jiaqi Yan <jiaqiyan@google.com>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

JIRA: https://issues.redhat.com/browse/RHEL-5619
Signed-off-by: Nico Pache <npache@redhat.com>
2024-04-30 17:51:27 -06:00
Nico Pache 0f336b05b8 hugetlb: revert use of page_cache_next_miss()
commit fd4aed8d985a3236d0877ff6d0c80ad39d4ce81a
Author: Mike Kravetz <mike.kravetz@oracle.com>
Date:   Wed Jun 21 14:24:03 2023 -0700

    hugetlb: revert use of page_cache_next_miss()

    Ackerley Tng reported an issue with hugetlbfs fallocate as noted in the
    Closes tag.  The issue showed up after the conversion of hugetlb page
    cache lookup code to use page_cache_next_miss.  User visible effects are:

    - hugetlbfs fallocate incorrectly returns -EEXIST if pages are presnet
      in the file.
    - hugetlb pages will not be included in core dumps if they need to be
      brought in via GUP.
    - userfaultfd UFFDIO_COPY will not notice pages already present in the
      cache.  It may try to allocate a new page and potentially return
      ENOMEM as opposed to EEXIST.

    Revert the use page_cache_next_miss() in hugetlb code.

    IMPORTANT NOTE FOR STABLE BACKPORTS:
    This patch will apply cleanly to v6.3.  However, due to the change of
    filemap_get_folio() return values, it will not function correctly.  This
    patch must be modified for stable backports.

    [dan.carpenter@linaro.org: fix hugetlbfs_pagecache_present()]
      Link: https://lkml.kernel.org/r/efa86091-6a2c-4064-8f55-9b44e1313015@moroto.mountain
    Link: https://lkml.kernel.org/r/20230621212403.174710-2-mike.kravetz@oracle.com
    Fixes: d0ce0e47b323 ("mm/hugetlb: convert hugetlb fault paths to use alloc_hugetlb_folio()")
    Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
    Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
    Reported-by: Ackerley Tng <ackerleytng@google.com>
    Closes: https://lore.kernel.org/linux-mm/cover.1683069252.git.ackerleytng@google.com
    Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Cc: Erdem Aktas <erdemaktas@google.com>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Cc: Vishal Annapurve <vannapurve@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

JIRA: https://issues.redhat.com/browse/RHEL-5619
Signed-off-by: Nico Pache <npache@redhat.com>
2024-04-30 17:51:24 -06:00
Chris von Recklinghausen edecef2b58 mm/hugetlb: fix uffd-wp bit lost when unsharing happens
Conflicts: mm/hugetlb.c - We already have
	ec8832d007cb ("mmu_notifiers: don't invalidate secondary TLBs as part of mmu_notifier_invalidate_range_end()")
	so don't add back the mmu_notifier_invalidate_range call

JIRA: https://issues.redhat.com/browse/RHEL-27741

commit 0f230bc24b6e1399b86b95642704a962d8aa40e6
Author: Peter Xu <peterx@redhat.com>
Date:   Mon Apr 17 15:53:13 2023 -0400

    mm/hugetlb: fix uffd-wp bit lost when unsharing happens

    When we try to unshare a pinned page for a private hugetlb, uffd-wp bit
    can get lost during unsharing.

    When above condition met, one can lose uffd-wp bit on the privately mapped
    hugetlb page.  It allows the page to be writable even if it should still be
    wr-protected.  I assume it can mean data loss.

    This should be very rare, only if an unsharing happened on a private
    hugetlb page with uffd-wp protected (e.g.  in a child which shares the
    same page with parent with UFFD_FEATURE_EVENT_FORK enabled).

    When I wrote the reproducer (provided in the last patch) I needed to
    use the newest gup_test cmd introduced by David to trigger it because I
    don't even know another way to do a proper RO longerm pin.

    Besides that, it needs a bunch of other conditions all met:

            (1) hugetlb being mapped privately,
            (2) userfaultfd registered with WP and EVENT_FORK,
            (3) the user app fork()s, then,
            (4) RO longterm pin onto a wr-protected anonymous page.

    If it's not impossible to hit in production I'd say extremely rare.

    Link: https://lkml.kernel.org/r/20230417195317.898696-3-peterx@redhat.com
    Fixes: 166f3ecc0daf ("mm/hugetlb: hook page faults for uffd write protection
")
    Signed-off-by: Peter Xu <peterx@redhat.com>
    Reported-by: Mike Kravetz <mike.kravetz@oracle.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    If it's not impossible to hit in production I'd say extremely rare.

    Link: https://lkml.kernel.org/r/20230417195317.898696-3-peterx@redhat.com
    Fixes: 166f3ecc0daf ("mm/hugetlb: hook page faults for uffd write protection
")
    Signed-off-by: Peter Xu <peterx@redhat.com>
    Reported-by: Mike Kravetz <mike.kravetz@oracle.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: Axel Rasmussen <axelrasmussen@google.com>
    Cc: Mika Penttilä <mpenttil@redhat.com>
    Cc: Nadav Amit <nadav.amit@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-30 07:01:02 -04:00
Chris von Recklinghausen 22af380202 mm/hugetlb: fix uffd-wp during fork()
JIRA: https://issues.redhat.com/browse/RHEL-27741

commit 5a2f8d22ace4c8ac8798fab836dca7350fa710b1
Author: Peter Xu <peterx@redhat.com>
Date:   Mon Apr 17 15:53:12 2023 -0400

    mm/hugetlb: fix uffd-wp during fork()

    Patch series "mm/hugetlb: More fixes around uffd-wp vs fork() / RO pins",
    v2.

    This patch (of 6):

    There're a bunch of things that were wrong:

      - Reading uffd-wp bit from a swap entry should use pte_swp_uffd_wp()
        rather than huge_pte_uffd_wp().

      - When copying over a pte, we should drop uffd-wp bit when
        !EVENT_FORK (aka, when !userfaultfd_wp(dst_vma)).

      - When doing early CoW for private hugetlb (e.g. when the parent page was
        pinned), uffd-wp bit should be properly carried over if necessary.

    No bug reported probably because most people do not even care about these
    corner cases, but they are still bugs and can be exposed by the recent unit
    tests introduced, so fix all of them in one shot.

    Link: https://lkml.kernel.org/r/20230417195317.898696-1-peterx@redhat.com
    Link: https://lkml.kernel.org/r/20230417195317.898696-2-peterx@redhat.com
    Fixes: bc70fbf269fd ("mm/hugetlb: handle uffd-wp during fork()")
    Signed-off-by: Peter Xu <peterx@redhat.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: Axel Rasmussen <axelrasmussen@google.com>
    Cc: Mika Penttilä <mpenttil@redhat.com>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Nadav Amit <nadav.amit@gmail.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-30 07:01:02 -04:00
Chris von Recklinghausen c4677d95e9 mm: hwpoison: support recovery from HugePage copy-on-write faults
JIRA: https://issues.redhat.com/browse/RHEL-27741

commit 1cb9dc4b475c7418f925ab0c97b6750007d9f52e
Author: Liu Shixin <liushixin2@huawei.com>
Date:   Thu Apr 13 21:13:49 2023 +0800

    mm: hwpoison: support recovery from HugePage copy-on-write faults

    copy-on-write of hugetlb user pages with uncorrectable errors will result
    in a kernel crash.  This is because the copy is performed in kernel mode
    and in general we can not handle accessing memory with such errors while
    in kernel mode.  Commit a873dfe1032a ("mm, hwpoison: try to recover from
    copy-on write faults") introduced the routine copy_user_highpage_mc() to
    gracefully handle copying of user pages with uncorrectable errors.
    However, the separate hugetlb copy-on-write code paths were not modified
    as part of commit a873dfe1032a.

    Modify hugetlb copy-on-write code paths to use copy_mc_user_highpage() so
    that they can also gracefully handle uncorrectable errors in user pages.
    This involves changing the hugetlb specific routine
    copy_user_large_folio() from type void to int so that it can return an
    error.  Modify the hugetlb userfaultfd code in the same way so that it can
    return -EHWPOISON if it encounters an uncorrectable error.

    Link: https://lkml.kernel.org/r/20230413131349.2524210-1-liushixin2@huawei.com
    Signed-off-by: Liu Shixin <liushixin2@huawei.com>
    Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
    Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Muchun Song <muchun.song@linux.dev>
    Cc: Tony Luck <tony.luck@intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-30 07:00:58 -04:00
Chris von Recklinghausen a3e721c8e7 mm: convert copy_user_huge_page() to copy_user_large_folio()
JIRA: https://issues.redhat.com/browse/RHEL-27741

commit c0e8150e144b62ae467520d0b51c4707c09e897b
Author: ZhangPeng <zhangpeng362@huawei.com>
Date:   Mon Apr 10 21:39:31 2023 +0800

    mm: convert copy_user_huge_page() to copy_user_large_folio()

    Replace copy_user_huge_page() with copy_user_large_folio().
    copy_user_large_folio() does the same as copy_user_huge_page(), but takes
    in folios instead of pages.  Remove pages_per_huge_page from
    copy_user_large_folio(), because we can get that from folio_nr_pages(dst).

    Convert copy_user_gigantic_page() to take in folios.

    Link: https://lkml.kernel.org/r/20230410133932.32288-6-zhangpeng362@huawei.com
    Signed-off-by: ZhangPeng <zhangpeng362@huawei.com>
    Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Muchun Song <muchun.song@linux.dev>
    Cc: Nanyong Sun <sunnanyong@huawei.com>
    Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-30 07:00:57 -04:00
Chris von Recklinghausen 7fe6fb66af userfaultfd: convert mfill_atomic_hugetlb() to use a folio
JIRA: https://issues.redhat.com/browse/RHEL-27741

commit 0169fd518a8934d8d723659752b07589ecc9f692
Author: ZhangPeng <zhangpeng362@huawei.com>
Date:   Mon Apr 10 21:39:30 2023 +0800

    userfaultfd: convert mfill_atomic_hugetlb() to use a folio

    Convert hugetlb_mfill_atomic_pte() to take in a folio pointer instead of
    a page pointer.

    Convert mfill_atomic_hugetlb() to use a folio.

    Link: https://lkml.kernel.org/r/20230410133932.32288-5-zhangpeng362@huawei.com
    Signed-off-by: ZhangPeng <zhangpeng362@huawei.com>
    Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Muchun Song <muchun.song@linux.dev>
    Cc: Nanyong Sun <sunnanyong@huawei.com>
    Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-30 07:00:56 -04:00
Chris von Recklinghausen 4b83c78b5d userfaultfd: convert copy_huge_page_from_user() to copy_folio_from_user()
JIRA: https://issues.redhat.com/browse/RHEL-27741

commit e87340ca5c9cecc8a11daf1a2dcabf23f06a4e10
Author: ZhangPeng <zhangpeng362@huawei.com>
Date:   Mon Apr 10 21:39:29 2023 +0800

    userfaultfd: convert copy_huge_page_from_user() to copy_folio_from_user()

    Replace copy_huge_page_from_user() with copy_folio_from_user().
    copy_folio_from_user() does the same as copy_huge_page_from_user(), but
    takes in a folio instead of a page.

    Convert page_kaddr to kaddr in copy_folio_from_user() to do indenting
    cleanup.

    Link: https://lkml.kernel.org/r/20230410133932.32288-4-zhangpeng362@huawei.com
    Signed-off-by: ZhangPeng <zhangpeng362@huawei.com>
    Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Muchun Song <muchun.song@linux.dev>
    Cc: Nanyong Sun <sunnanyong@huawei.com>
    Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-30 07:00:56 -04:00
Chris von Recklinghausen f7a54d8536 hugetlb: remove PageHeadHuge()
JIRA: https://issues.redhat.com/browse/RHEL-27741

commit 957ebbdf434013ee01f29f6c9174eced995ebbe7
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Mon Mar 27 16:10:50 2023 +0100

    hugetlb: remove PageHeadHuge()

    Sidhartha Kumar removed the last caller of PageHeadHuge(), so we can now
    remove it and make folio_test_hugetlb() the real implementation.  Add
    kernel-doc for folio_test_hugetlb().

    Link: https://lkml.kernel.org/r/20230327151050.1787744-1-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Muchun Song <songmuchun@bytedance.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-30 07:00:49 -04:00
Chris von Recklinghausen ab982ab697 mm: userfaultfd: combine 'mode' and 'wp_copy' arguments
Conflicts: mm/userfaultfd.c - We already have
	161e393c0f63 ("mm: Make pte_mkwrite() take a VMA")
	so pte_mkwrite takes 2  arguments

JIRA: https://issues.redhat.com/browse/RHEL-27741

commit d9712937037e0ce887920f321429826e9dbfd960
Author: Axel Rasmussen <axelrasmussen@google.com>
Date:   Tue Mar 14 15:12:49 2023 -0700

    mm: userfaultfd: combine 'mode' and 'wp_copy' arguments

    Many userfaultfd ioctl functions take both a 'mode' and a 'wp_copy'
    argument.  In future commits we plan to plumb the flags through to more
    places, so we'd be proliferating the very long argument list even further.

    Let's take the time to simplify the argument list.  Combine the two
    arguments into one - and generalize, so when we add more flags in the
    future, it doesn't imply more function arguments.

    Since the modes (copy, zeropage, continue) are mutually exclusive, store
    them as an integer value (0, 1, 2) in the low bits.  Place combine-able
    flag bits in the high bits.

    This is quite similar to an earlier patch proposed by Nadav Amit
    ("userfaultfd: introduce uffd_flags" [1]).  The main difference is that
    patch only handled flags, whereas this patch *also* combines the "mode"
    argument into the same type to shorten the argument list.

    [1]: https://lore.kernel.org/all/20220619233449.181323-2-namit@vmware.com/

    Link: https://lkml.kernel.org/r/20230314221250.682452-4-axelrasmussen@google
.com
    Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
    Acked-by: James Houghton <jthoughton@google.com>
    Acked-by: Peter Xu <peterx@redhat.com>
    Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Jan Kara <jack@suse.cz>
    Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Muchun Song <muchun.song@linux.dev>
    Cc: Shuah Khan <shuah@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-30 07:00:25 -04:00
Chris von Recklinghausen 75317fb06a mm: userfaultfd: don't pass around both mm and vma
Conflicts: mm/userfaultfd.c - We already have
	153132571f ("userfaultfd/shmem: support UFFDIO_CONTINUE for shmem")
	and
	73f37dbcfe17 ("mm: userfaultfd: fix UFFDIO_CONTINUE on fallocated shmem pages")
	so keep the setting of ret and possible jump to out.

JIRA: https://issues.redhat.com/browse/RHEL-27741

commit 61c5004022f56c443b86800e8985d8803f3a22aa
Author: Axel Rasmussen <axelrasmussen@google.com>
Date:   Tue Mar 14 15:12:48 2023 -0700

    mm: userfaultfd: don't pass around both mm and vma

    Quite a few userfaultfd functions took both mm and vma pointers as
    arguments.  Since the mm is trivially accessible via vma->vm_mm, there's
    no reason to pass both; it just needlessly extends the already long
    argument list.

    Get rid of the mm pointer, where possible, to shorten the argument list.

    Link: https://lkml.kernel.org/r/20230314221250.682452-3-axelrasmussen@google
.com
    Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
    Acked-by: Peter Xu <peterx@redhat.com>
    Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: James Houghton <jthoughton@google.com>
    Cc: Jan Kara <jack@suse.cz>
    Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Muchun Song <muchun.song@linux.dev>
    Cc: Nadav Amit <namit@vmware.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-30 07:00:25 -04:00
Chris von Recklinghausen 81108c01c8 mm: userfaultfd: rename functions for clarity + consistency
JIRA: https://issues.redhat.com/browse/RHEL-27741

commit a734991ccaec1985fff42fb26bb6d789d35defb4
Author: Axel Rasmussen <axelrasmussen@google.com>
Date:   Tue Mar 14 15:12:47 2023 -0700

    mm: userfaultfd: rename functions for clarity + consistency

    Patch series "mm: userfaultfd: refactor and add UFFDIO_CONTINUE_MODE_WP",
    v5.

    - Commits 1-3 refactor userfaultfd ioctl code without behavior changes, with the
      main goal of improving consistency and reducing the number of function args.

    - Commit 4 adds UFFDIO_CONTINUE_MODE_WP.

    This patch (of 4):

    The basic problem is, over time we've added new userfaultfd ioctls, and
    we've refactored the code so functions which used to handle only one case
    are now re-used to deal with several cases.  While this happened, we
    didn't bother to rename the functions.

    Similarly, as we added new functions, we cargo-culted pieces of the
    now-inconsistent naming scheme, so those functions too ended up with names
    that don't make a lot of sense.

    A key point here is, "copy" in most userfaultfd code refers specifically
    to UFFDIO_COPY, where we allocate a new page and copy its contents from
    userspace.  There are many functions with "copy" in the name that don't
    actually do this (at least in some cases).

    So, rename things into a consistent scheme.  The high level idea is that
    the call stack for userfaultfd ioctls becomes:

    userfaultfd_ioctl
      -> userfaultfd_(particular ioctl)
        -> mfill_atomic_(particular kind of fill operation)
          -> mfill_atomic    /* loops over pages in range */
            -> mfill_atomic_pte    /* deals with single pages */
              -> mfill_atomic_pte_(particular kind of fill operation)
                -> mfill_atomic_install_pte

    There are of course some special cases (shmem, hugetlb), but this is the
    general structure which all function names now adhere to.

    Link: https://lkml.kernel.org/r/20230314221250.682452-1-axelrasmussen@google.com
    Link: https://lkml.kernel.org/r/20230314221250.682452-2-axelrasmussen@google.com
    Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
    Acked-by: Peter Xu <peterx@redhat.com>
    Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: James Houghton <jthoughton@google.com>
    Cc: Jan Kara <jack@suse.cz>
    Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Muchun Song <muchun.song@linux.dev>
    Cc: Nadav Amit <namit@vmware.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-30 07:00:24 -04:00
Chris von Recklinghausen c0446613df mm: return an ERR_PTR from __filemap_get_folio
Conflicts:
	fs/nilfs2/page.c - We already have
		f6e0e1734424 ("nilfs2: Convert nilfs_copy_back_pages() to use filemap_get_folios()")
		so use folios instead of pages
	fs/smb/client/cifsfs.c - The backport of
		7b2404a886f8 ("cifs: Fix flushing, invalidation and file size with copy_file_range()")
		cited the lack of this patch as a conflict. Fix it.

JIRA: https://issues.redhat.com/browse/RHEL-27741

commit 66dabbb65d673aef40dd17bf62c042be8f6d4a4b
Author: Christoph Hellwig <hch@lst.de>
Date:   Tue Mar 7 15:34:10 2023 +0100

    mm: return an ERR_PTR from __filemap_get_folio

    Instead of returning NULL for all errors, distinguish between:

     - no entry found and not asked to allocated (-ENOENT)
     - failed to allocate memory (-ENOMEM)
     - would block (-EAGAIN)

    so that callers don't have to guess the error based on the passed in
    flags.

    Also pass through the error through the direct callers: filemap_get_folio,
    filemap_lock_folio filemap_grab_folio and filemap_get_incore_folio.

    [hch@lst.de: fix null-pointer deref]
      Link: https://lkml.kernel.org/r/20230310070023.GA13563@lst.de
      Link: https://lkml.kernel.org/r/20230310043137.GA1624890@u2004
    Link: https://lkml.kernel.org/r/20230307143410.28031-8-hch@lst.de
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> [nilfs2]
    Cc: Andreas Gruenbacher <agruenba@redhat.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-30 07:00:24 -04:00
Chris von Recklinghausen 79a922db6c mm: introduce FOLL_PCI_P2PDMA to gate getting PCI P2PDMA pages
Conflicts:
	include/linux/mm.h - We already have
		b5054174ac7c ("mm: move FOLL_* defs to mm_types.h")
		so that's where they were moved to
	mm/gup.c - We already have
		52650c8b46 ("mm/gup: remove the vma allocation from gup_longterm_locked()")
		so keep the check for dax and potentially return ENOTSUPP

JIRA: https://issues.redhat.com/browse/RHEL-27741

commit 4003f107fa2eabb0aab90e37a1ed7b74c6f0d132
Author: Logan Gunthorpe <logang@deltatee.com>
Date:   Fri Oct 21 11:41:09 2022 -0600

    mm: introduce FOLL_PCI_P2PDMA to gate getting PCI P2PDMA pages

    GUP Callers that expect PCI P2PDMA pages can now set FOLL_PCI_P2PDMA to
    allow obtaining P2PDMA pages. If GUP is called without the flag and a
    P2PDMA page is found, it will return an error in try_grab_page() or
    try_grab_folio().

    The check is safe to do before taking the reference to the page in both
    cases seeing the page should be protected by either the appropriate
    ptl or mmap_lock; or the gup fast guarantees preventing TLB flushes.

    try_grab_folio() has one call site that WARNs on failure and cannot
    actually deal with the failure of this function (it seems it will
    get into an infinite loop). Expand the comment there to document a
    couple more conditions on why it will not fail.

    FOLL_PCI_P2PDMA cannot be set if FOLL_LONGTERM is set. This is to copy
    fsdax until pgmap refcounts are fixed (see the link below for more
    information).

    Link: https://lkml.kernel.org/r/Yy4Ot5MoOhsgYLTQ@ziepe.ca
    Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Link: https://lore.kernel.org/r/20221021174116.7200-3-logang@deltatee.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-30 07:00:01 -04:00
Aristeu Rozanski 39263f3448 mm: hugetlb: change to return bool for isolate_hugetlb()
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me

commit 9747b9e92418b61c2281561e0651803f1fad0159
Author: Baolin Wang <baolin.wang@linux.alibaba.com>
Date:   Wed Feb 15 18:39:36 2023 +0800

    mm: hugetlb: change to return bool for isolate_hugetlb()

    Now the isolate_hugetlb() only returns 0 or -EBUSY, and most users did not
    care about the negative value, thus we can convert the isolate_hugetlb()
    to return a boolean value to make code more clear when checking the
    hugetlb isolation state.  Moreover converts 2 users which will consider
    the negative value returned by isolate_hugetlb().

    No functional changes intended.

    [akpm@linux-foundation.org: shorten locked section, per SeongJae Park]
    Link: https://lkml.kernel.org/r/12a287c5bebc13df304387087bbecc6421510849.1676424378.git.baolin.wang@linux.alibaba.com
    Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
    Acked-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
    Reviewed-by: SeongJae Park <sj@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2024-04-29 14:33:24 -04:00
Aristeu Rozanski 1ce48ef21a mm/hugetlb: convert hugetlb_wp() to take in a folio
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me

commit 371607a3c793d7183b0faecc1fb4aa88fadcf202
Author: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Date:   Wed Jan 25 09:05:36 2023 -0800

    mm/hugetlb: convert hugetlb_wp() to take in a folio

    Change the pagecache_page argument of hugetlb_wp to pagecache_folio.
    Replaces a call to find_lock_page() with filemap_lock_folio().

    Link: https://lkml.kernel.org/r/20230125170537.96973-8-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Reported-by: gerald.schaefer@linux.ibm.com
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2024-04-29 14:33:21 -04:00
Aristeu Rozanski ef2c321895 mm/hugetlb: convert hugetlb_add_to_page_cache to take in a folio
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me

commit 9b91c0e277a3dbb165c2e4301be7a231dc2f76f7
Author: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Date:   Wed Jan 25 09:05:35 2023 -0800

    mm/hugetlb: convert hugetlb_add_to_page_cache to take in a folio

    Every caller of hugetlb_add_to_page_cache() is now passing in
    &folio->page, change the function to take in a folio directly and clean up
    the call sites.

    Link: https://lkml.kernel.org/r/20230125170537.96973-7-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2024-04-29 14:33:21 -04:00
Aristeu Rozanski e453fe9b0f mm/hugetlb: convert restore_reserve_on_error to take in a folio
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me

commit d2d7bb44bfbd29200426ba17741550d36e081f91
Author: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Date:   Wed Jan 25 09:05:34 2023 -0800

    mm/hugetlb: convert restore_reserve_on_error to take in a folio

    Every caller of restore_reserve_on_error() is now passing in &folio->page,
    change the function to take in a folio directly and clean up the call
    sites.

    Link: https://lkml.kernel.org/r/20230125170537.96973-6-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2024-04-29 14:33:21 -04:00
Aristeu Rozanski 456efc9e7d mm/hugetlb: convert hugetlb fault paths to use alloc_hugetlb_folio()
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me
Conflicts: reverting 830fb0c1df, which was a backport of da9a298f5fa twice by mistake

commit d0ce0e47b323a8d7fb5dc3314ce56afa650ade2d
Author: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Date:   Wed Jan 25 09:05:33 2023 -0800

    mm/hugetlb: convert hugetlb fault paths to use alloc_hugetlb_folio()

    Change alloc_huge_page() to alloc_hugetlb_folio() by changing all callers
    to handle the now folio return type of the function.  In this conversion,
    alloc_huge_page_vma() is also changed to alloc_hugetlb_folio_vma() and
    hugepage_add_new_anon_rmap() is changed to take in a folio directly.  Many
    additions of '&folio->page' are cleaned up in subsequent patches.

    hugetlbfs_fallocate() is also refactored to use the RCU +
    page_cache_next_miss() API.

    Link: https://lkml.kernel.org/r/20230125170537.96973-5-sidhartha.kumar@oracle.com
    Suggested-by: Mike Kravetz <mike.kravetz@oracle.com>
    Reported-by: kernel test robot <lkp@intel.com>
    Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2024-04-29 14:33:21 -04:00
Aristeu Rozanski 5475350fe6 mm/hugetlb: convert putback_active_hugepage to take in a folio
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me

commit ea8e72f4116a995c2aba3fb738ac372c4115375a
Author: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Date:   Wed Jan 25 09:05:32 2023 -0800

    mm/hugetlb: convert putback_active_hugepage to take in a folio

    Convert putback_active_hugepage() to folio_putback_active_hugetlb(), this
    removes one user of the Huge Page macros which take in a page.  The
    callers in migrate.c are also cleaned up by being able to directly use the
    src and dst folio variables.

    Link: https://lkml.kernel.org/r/20230125170537.96973-4-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2024-04-29 14:33:21 -04:00
Aristeu Rozanski 127034405a mm/hugetlb: convert hugetlbfs_pagecache_present() to folios
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me

commit 91a2fb956ad993f3cbcfc632611e17e3699fb652
Author: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Date:   Wed Jan 25 09:05:31 2023 -0800

    mm/hugetlb: convert hugetlbfs_pagecache_present() to folios

    Refactor hugetlbfs_pagecache_present() to avoid getting and dropping a
    refcount on a page.  Use RCU and page_cache_next_miss() instead.

    Link: https://lkml.kernel.org/r/20230125170537.96973-3-sidhartha.kumar@oracle.com
    Suggested-by: Matthew Wilcox <willy@infradead.org>
    Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: kernel test robot <lkp@intel.com>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2024-04-29 14:33:21 -04:00
Aristeu Rozanski 7435e4f3bf mm/hugetlb: convert hugetlb_install_page to folios
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me

commit ea4c353df37750d170dc0dcbfa8c47c984779733
Author: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Date:   Wed Jan 25 09:05:30 2023 -0800

    mm/hugetlb: convert hugetlb_install_page to folios

    Patch series "convert hugetlb fault functions to folios", v2.

    This series converts the hugetlb page faulting functions to operate on
    folios. These include hugetlb_no_page(), hugetlb_wp(),
    copy_hugetlb_page_range(), and hugetlb_mcopy_atomic_pte().

    This patch (of 8):

    Change hugetlb_install_page() to hugetlb_install_folio().  This reduces
    one user of the Huge Page flag macros which take in a page.

    Link: https://lkml.kernel.org/r/20230125170537.96973-1-sidhartha.kumar@oracle.com
    Link: https://lkml.kernel.org/r/20230125170537.96973-2-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2024-04-29 14:33:21 -04:00
Aristeu Rozanski 67e4bca93f mm/hugetlb: convert demote_free_huge_page to folios
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me

commit bdd7be075acb650cc57d8ee752b5375b966ad07e
Author: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Date:   Fri Jan 13 16:30:57 2023 -0600

    mm/hugetlb: convert demote_free_huge_page to folios

    Change demote_free_huge_page to demote_free_hugetlb_folio() and change
    demote_pool_huge_page() pass in a folio.

    Link: https://lkml.kernel.org/r/20230113223057.173292-9-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2024-04-29 14:33:21 -04:00
Aristeu Rozanski b4f5bb189b mm/hugetlb: convert restore_reserve_on_error() to folios
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me

commit 0ffdc38eb564c1c71a58bbaf874945ba54293ff9
Author: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Date:   Fri Jan 13 16:30:56 2023 -0600

    mm/hugetlb: convert restore_reserve_on_error() to folios

    Use the hugetlb folio flag macros inside restore_reserve_on_error() and
    update the comments to reflect the use of folios.

    Link: https://lkml.kernel.org/r/20230113223057.173292-8-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2024-04-29 14:33:20 -04:00
Aristeu Rozanski 866c2662ab mm/hugetlb: convert alloc_migrate_huge_page to folios
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me

commit e37d3e838d9078538f920957d1e89682b6764977
Author: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Date:   Fri Jan 13 16:30:55 2023 -0600

    mm/hugetlb: convert alloc_migrate_huge_page to folios

    Change alloc_huge_page_nodemask() to alloc_hugetlb_folio_nodemask() and
    alloc_migrate_huge_page() to alloc_migrate_hugetlb_folio().  Both
    functions now return a folio rather than a page.

    Link: https://lkml.kernel.org/r/20230113223057.173292-7-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2024-04-29 14:33:20 -04:00
Aristeu Rozanski c31aaaaf03 mm/hugetlb: increase use of folios in alloc_huge_page()
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me

commit ff7d853b031302376a0d3640fa1c463d94079637
Author: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Date:   Fri Jan 13 16:30:54 2023 -0600

    mm/hugetlb: increase use of folios in alloc_huge_page()

    Change hugetlb_cgroup_commit_charge{,_rsvd}(), dequeue_huge_page_vma() and
    alloc_buddy_huge_page_with_mpol() to use folios so alloc_huge_page() is
    cleaned by operating on folios until its return.

    Link: https://lkml.kernel.org/r/20230113223057.173292-6-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2024-04-29 14:33:20 -04:00
Aristeu Rozanski ac517792d8 mm/hugetlb: convert alloc_surplus_huge_page() to folios
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me

commit 3a740e8bb56ef7ee6b9098b694caabab843be067
Author: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Date:   Fri Jan 13 16:30:53 2023 -0600

    mm/hugetlb: convert alloc_surplus_huge_page() to folios

    Change alloc_surplus_huge_page() to alloc_surplus_hugetlb_folio() and
    update its callers.

    Link: https://lkml.kernel.org/r/20230113223057.173292-5-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2024-04-29 14:33:20 -04:00
Aristeu Rozanski 5d0572483a mm/hugetlb: convert dequeue_hugetlb_page functions to folios
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me

commit a36f1e9024740c3820427afca4cd375e32a1bb15
Author: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Date:   Fri Jan 13 16:30:52 2023 -0600

    mm/hugetlb: convert dequeue_hugetlb_page functions to folios

    dequeue_huge_page_node_exact() is changed to dequeue_hugetlb_folio_node_
    exact() and dequeue_huge_page_nodemask() is changed to dequeue_hugetlb_
    folio_nodemask().  Update their callers to pass in a folio.

    Link: https://lkml.kernel.org/r/20230113223057.173292-4-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2024-04-29 14:33:20 -04:00
Aristeu Rozanski 95b2331b3b mm/hugetlb: convert __update_and_free_page() to folios
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me

commit 6f6956cf7e6a3034f61780446547e849aa4e216d
Author: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Date:   Fri Jan 13 16:30:51 2023 -0600

    mm/hugetlb: convert __update_and_free_page() to folios

    Change __update_and_free_page() to __update_and_free_hugetlb_folio() by
    changing its callers to pass in a folio.

    Link: https://lkml.kernel.org/r/20230113223057.173292-3-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2024-04-29 14:33:20 -04:00
Aristeu Rozanski edf79d9715 mm/hugetlb: convert isolate_hugetlb to folios
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me

commit 6aa3a920125e9f58891e2b5dc2efd4d0c1ff05a6
Author: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Date:   Fri Jan 13 16:30:50 2023 -0600

    mm/hugetlb: convert isolate_hugetlb to folios

    Patch series "continue hugetlb folio conversion", v3.

    This series continues the conversion of core hugetlb functions to use
    folios. This series converts many helper funtions in the hugetlb fault
    path. This is in preparation for another series to convert the hugetlb
    fault code paths to operate on folios.

    This patch (of 8):

    Convert isolate_hugetlb() to take in a folio and convert its callers to
    pass a folio.  Use page_folio() to convert the callers to use a folio is
    safe as isolate_hugetlb() operates on a head page.

    Link: https://lkml.kernel.org/r/20230113223057.173292-1-sidhartha.kumar@oracle.com
    Link: https://lkml.kernel.org/r/20230113223057.173292-2-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2024-04-29 14:33:20 -04:00
Aristeu Rozanski fec82fff3c mm: replace VM_LOCKED_CLEAR_MASK with VM_LOCKED_MASK
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me

commit e430a95a04efc557bc4ff9b3035c7c85aee5d63f
Author: Suren Baghdasaryan <surenb@google.com>
Date:   Thu Jan 26 11:37:48 2023 -0800

    mm: replace VM_LOCKED_CLEAR_MASK with VM_LOCKED_MASK

    To simplify the usage of VM_LOCKED_CLEAR_MASK in vm_flags_clear(), replace
    it with VM_LOCKED_MASK bitmask and convert all users.

    Link: https://lkml.kernel.org/r/20230126193752.297968-4-surenb@google.com
    Signed-off-by: Suren Baghdasaryan <surenb@google.com>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Mel Gorman <mgorman@techsingularity.net>
    Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
    Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Arjun Roy <arjunroy@google.com>
    Cc: Axel Rasmussen <axelrasmussen@google.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Howells <dhowells@redhat.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Eric Dumazet <edumazet@google.com>
    Cc: Greg Thelen <gthelen@google.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Jann Horn <jannh@google.com>
    Cc: Joel Fernandes <joelaf@google.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Kent Overstreet <kent.overstreet@linux.dev>
    Cc: Laurent Dufour <ldufour@linux.ibm.com>
    Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
    Cc: Lorenzo Stoakes <lstoakes@gmail.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Minchan Kim <minchan@google.com>
    Cc: Paul E. McKenney <paulmck@kernel.org>
    Cc: Peter Oskolkov <posk@google.com>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Punit Agrawal <punit.agrawal@bytedance.com>
    Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Cc: Sebastian Reichel <sebastian.reichel@collabora.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Soheil Hassas Yeganeh <soheil@google.com>
    Cc: Song Liu <songliubraving@fb.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2024-04-29 14:33:17 -04:00
Aristeu Rozanski db7d9d8a0e mm/hugetlb: convert get_hwpoison_huge_page() to folios
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me

commit 04bac040bc71b4b37550eed5854f34ca161756f9
Author: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Date:   Wed Jan 18 09:40:39 2023 -0800

    mm/hugetlb: convert get_hwpoison_huge_page() to folios

    Straightforward conversion of get_hwpoison_huge_page() to
    get_hwpoison_hugetlb_folio().  Reduces two references to a head page in
    memory-failure.c

    [arnd@arndb.de: fix get_hwpoison_hugetlb_folio() stub]
      Link: https://lkml.kernel.org/r/20230119111920.635260-1-arnd@kernel.org
    Link: https://lkml.kernel.org/r/20230118174039.14247-1-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2024-04-29 14:33:11 -04:00
Aristeu Rozanski 2ca0a475ff mm/memory-failure: convert hugetlb_clear_page_hwpoison to folios
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me

commit 2ff6cecee669bf0fc63eadebac8cfc81f74b9a4c
Author: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Date:   Thu Jan 12 14:46:03 2023 -0600

    mm/memory-failure: convert hugetlb_clear_page_hwpoison to folios

    Change hugetlb_clear_page_hwpoison() to folio_clear_hugetlb_hwpoison() by
    changing the function to take in a folio.  This converts one use of
    ClearPageHWPoison and HPageRawHwpUnreliable to their folio equivalents.

    Link: https://lkml.kernel.org/r/20230112204608.80136-4-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2024-04-29 14:33:07 -04:00
Aristeu Rozanski 5c7727de2d hugetlb: remove uses of compound_dtor and compound_nr
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me

commit 2d678c641a4625d2b1cfeb50d7426fab6d3740b3
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Wed Jan 11 14:29:07 2023 +0000

    hugetlb: remove uses of compound_dtor and compound_nr

    Convert the entire file to use the folio equivalents.

    Link: https://lkml.kernel.org/r/20230111142915.1001531-22-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2024-04-29 14:33:06 -04:00
Aristeu Rozanski 7a174469c9 hugetlb: remove uses of folio_mapcount_ptr
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me

commit 46f2722825983a51e849eb0ef2814e5c7f040fef
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Wed Jan 11 14:28:59 2023 +0000

    hugetlb: remove uses of folio_mapcount_ptr

    Use the entire_mapcount field directly.

    Link: https://lkml.kernel.org/r/20230111142915.1001531-14-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2024-04-29 14:33:05 -04:00
Aristeu Rozanski 5455c3da6d mm/mmu_notifier: remove unused mmu_notifier_range_update_to_read_only export
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me

commit 7d4a8be0c4b2b7ffb367929d2b352651f083806b
Author: Alistair Popple <apopple@nvidia.com>
Date:   Tue Jan 10 13:57:22 2023 +1100

    mm/mmu_notifier: remove unused mmu_notifier_range_update_to_read_only export

    mmu_notifier_range_update_to_read_only() was originally introduced in
    commit c6d23413f8 ("mm/mmu_notifier:
    mmu_notifier_range_update_to_read_only() helper") as an optimisation for
    device drivers that know a range has only been mapped read-only.  However
    there are no users of this feature so remove it.  As it is the only user
    of the struct mmu_notifier_range.vma field remove that also.

    Link: https://lkml.kernel.org/r/20230110025722.600912-1-apopple@nvidia.com
    Signed-off-by: Alistair Popple <apopple@nvidia.com>
    Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
    Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Ira Weiny <ira.weiny@intel.com>
    Cc: Jerome Glisse <jglisse@redhat.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Ralph Campbell <rcampbell@nvidia.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2024-04-29 14:33:05 -04:00
Aristeu Rozanski 757f797bb2 hugetlb: initialize variable to avoid compiler warning
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me

commit c5094ec79cbe487983e3a96548a7eb1c1c82c727
Author: Mike Kravetz <mike.kravetz@oracle.com>
Date:   Fri Dec 16 14:45:07 2022 -0800

    hugetlb: initialize variable to avoid compiler warning

    With the gcc 'maybe-uninitialized' warning enabled, gcc will produce:

      mm/hugetlb.c:6896:20: warning: `chg' may be used uninitialized

    This is a false positive, but may be difficult for the compiler to
    determine.  maybe-uninitialized is disabled by default, but this gets
    flagged as a 0-DAY build regression.

    Initialize the variable to silence the warning.

    Link: https://lkml.kernel.org/r/20221216224507.106789-1-mike.kravetz@oracle.com
    Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2024-04-29 14:33:00 -04:00
Audra Mitchell fe1df9d089 hugetlb: unshare some PMDs when splitting VMAs
JIRA: https://issues.redhat.com/browse/RHEL-27739

This patch is a backport of the following upstream commit:
commit b30c14cd61025eeea2f2e8569606cd167ba9ad2d
Author: James Houghton <jthoughton@google.com>
Date:   Wed Jan 4 23:19:10 2023 +0000

    hugetlb: unshare some PMDs when splitting VMAs

    PMD sharing can only be done in PUD_SIZE-aligned pieces of VMAs; however,
    it is possible that HugeTLB VMAs are split without unsharing the PMDs
    first.

    Without this fix, it is possible to hit the uffd-wp-related WARN_ON_ONCE
    in hugetlb_change_protection [1].  The key there is that
    hugetlb_unshare_all_pmds will not attempt to unshare PMDs in
    non-PUD_SIZE-aligned sections of the VMA.

    It might seem ideal to unshare in hugetlb_vm_op_open, but we need to
    unshare in both the new and old VMAs, so unsharing in hugetlb_vm_op_split
    seems natural.

    [1]: https://lore.kernel.org/linux-mm/CADrL8HVeOkj0QH5VZZbRzybNE8CG-tEGFshnA+bG9nMgcWtBSg@mail.gmail.com/

    Link: https://lkml.kernel.org/r/20230104231910.1464197-1-jthoughton@google.com
    Fixes: 6dfeaff93b ("hugetlb/userfaultfd: unshare all pmds for hugetlbfs when register wp")
    Signed-off-by: James Houghton <jthoughton@google.com>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Acked-by: Peter Xu <peterx@redhat.com>
    Cc: Axel Rasmussen <axelrasmussen@google.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Audra Mitchell <audra@redhat.com>
2024-04-09 09:43:02 -04:00
Audra Mitchell 85e2ce12f3 mm/gup: reliable R/O long-term pinning in COW mappings
JIRA: https://issues.redhat.com/browse/RHEL-27739

This patch is a backport of the following upstream commit:
commit 84209e87c6963f928194a890399e24e8ad299db1
Author: David Hildenbrand <david@redhat.com>
Date:   Wed Nov 16 11:26:48 2022 +0100

    mm/gup: reliable R/O long-term pinning in COW mappings

    We already support reliable R/O pinning of anonymous memory. However,
    assume we end up pinning (R/O long-term) a pagecache page or the shared
    zeropage inside a writable private ("COW") mapping. The next write access
    will trigger a write-fault and replace the pinned page by an exclusive
    anonymous page in the process page tables to break COW: the pinned page no
    longer corresponds to the page mapped into the process' page table.

    Now that FAULT_FLAG_UNSHARE can break COW on anything mapped into a
    COW mapping, let's properly break COW first before R/O long-term
    pinning something that's not an exclusive anon page inside a COW
    mapping. FAULT_FLAG_UNSHARE will break COW and map an exclusive anon page
    instead that can get pinned safely.

    With this change, we can stop using FOLL_FORCE|FOLL_WRITE for reliable
    R/O long-term pinning in COW mappings.

    With this change, the new R/O long-term pinning tests for non-anonymous
    memory succeed:
      # [RUN] R/O longterm GUP pin ... with shared zeropage
      ok 151 Longterm R/O pin is reliable
      # [RUN] R/O longterm GUP pin ... with memfd
      ok 152 Longterm R/O pin is reliable
      # [RUN] R/O longterm GUP pin ... with tmpfile
      ok 153 Longterm R/O pin is reliable
      # [RUN] R/O longterm GUP pin ... with huge zeropage
      ok 154 Longterm R/O pin is reliable
      # [RUN] R/O longterm GUP pin ... with memfd hugetlb (2048 kB)
      ok 155 Longterm R/O pin is reliable
      # [RUN] R/O longterm GUP pin ... with memfd hugetlb (1048576 kB)
      ok 156 Longterm R/O pin is reliable
      # [RUN] R/O longterm GUP-fast pin ... with shared zeropage
      ok 157 Longterm R/O pin is reliable
      # [RUN] R/O longterm GUP-fast pin ... with memfd
      ok 158 Longterm R/O pin is reliable
      # [RUN] R/O longterm GUP-fast pin ... with tmpfile
      ok 159 Longterm R/O pin is reliable
      # [RUN] R/O longterm GUP-fast pin ... with huge zeropage
      ok 160 Longterm R/O pin is reliable
      # [RUN] R/O longterm GUP-fast pin ... with memfd hugetlb (2048 kB)
      ok 161 Longterm R/O pin is reliable
      # [RUN] R/O longterm GUP-fast pin ... with memfd hugetlb (1048576 kB)
      ok 162 Longterm R/O pin is reliable

    Note 1: We don't care about short-term R/O-pinning, because they have
    snapshot semantics: they are not supposed to observe modifications that
    happen after pinning.

    As one example, assume we start direct I/O to read from a page and store
    page content into a file: modifications to page content after starting
    direct I/O are not guaranteed to end up in the file. So even if we'd pin
    the shared zeropage, the end result would be as expected -- getting zeroes
    stored to the file.

    Note 2: For shared mappings we'll now always fallback to the slow path to
    lookup the VMA when R/O long-term pining. While that's the necessary price
    we have to pay right now, it's actually not that bad in practice: most
    FOLL_LONGTERM users already specify FOLL_WRITE, for example, along with
    FOLL_FORCE because they tried dealing with COW mappings correctly ...

    Note 3: For users that use FOLL_LONGTERM right now without FOLL_WRITE,
    such as VFIO, we'd now no longer pin the shared zeropage. Instead, we'd
    populate exclusive anon pages that we can pin. There was a concern that
    this could affect the memlock limit of existing setups.

    For example, a VM running with VFIO could run into the memlock limit and
    fail to run. However, we essentially had the same behavior already in
    commit 17839856fd ("gup: document and work around "COW can break either
    way" issue") which got merged into some enterprise distros, and there were
    not any such complaints. So most probably, we're fine.

    Link: https://lkml.kernel.org/r/20221116102659.70287-10-david@redhat.com
    Signed-off-by: David Hildenbrand <david@redhat.com>
    Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
    Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
    Reviewed-by: John Hubbard <jhubbard@nvidia.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Audra Mitchell <audra@redhat.com>
2024-04-09 09:42:57 -04:00
Audra Mitchell d115504bd5 mm: add early FAULT_FLAG_UNSHARE consistency checks
JIRA: https://issues.redhat.com/browse/RHEL-27739
Conflicts:
    Minor context difference due to out of order backports:
    c007e2df2e ("mm/hugetlb: fix uffd wr-protection for CoW optimization path")
    92a1aa89946b ("mm: rework handling in do_wp_page() based on private vs. shared mappings")
    887f390a3d60 ("mm: ptep_get() conversion")

This patch is a backport of the following upstream commit:
commit cdc5021cda194112bc0962d6a0e90b379968c504
Author: David Hildenbrand <david@redhat.com>
Date:   Wed Nov 16 11:26:43 2022 +0100

    mm: add early FAULT_FLAG_UNSHARE consistency checks

    For now, FAULT_FLAG_UNSHARE only applies to anonymous pages, which
    implies a COW mapping. Let's hide FAULT_FLAG_UNSHARE early if we're not
    dealing with a COW mapping, such that we treat it like a read fault as
    documented and don't have to worry about the flag throughout all fault
    handlers.

    While at it, centralize the check for mutual exclusion of
    FAULT_FLAG_UNSHARE and FAULT_FLAG_WRITE and just drop the check that
    either flag is set in the WP handler.

    Link: https://lkml.kernel.org/r/20221116102659.70287-5-david@redhat.com
    Signed-off-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Audra Mitchell <audra@redhat.com>
2024-04-09 09:42:57 -04:00
Audra Mitchell 52c4085ea0 mm: allow multiple error returns in try_grab_page()
JIRA: https://issues.redhat.com/browse/RHEL-27739
Conflicts:
    Major context differences as both of the following commits got merged at the
    same time upstream (see merge commit for details
    57a196a58421 ("hugetlb: simplify hugetlb handling in follow_page_mask")
    0f0892356fa1 ("mm: allow multiple error returns in try_grab_page()")
    e2ca6ba6ba01 ("Merge tag 'mm-stable-2022-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm")

    Essentially, commit 0f0892356fa1 references functions follow_huge_pmd_pte and
    follow_huge_pud, which were merged into one function (hugetlb_follow_page_mask)
    in commit 57a196a58421. Because both of these commits were accepted around the
    same time, please refer to how upstream resolved the conflicts in commit
    e2ca6ba6ba01.

This patch is a backport of the following upstream commit:
commit 0f0892356fa174bdd8bd655c820ee3658c4c9f01
Author: Logan Gunthorpe <logang@deltatee.com>
Date:   Fri Oct 21 11:41:08 2022 -0600

    mm: allow multiple error returns in try_grab_page()

    In order to add checks for P2PDMA memory into try_grab_page(), expand
    the error return from a bool to an int/error code. Update all the
    callsites handle change in usage.

    Also remove the WARN_ON_ONCE() call at the callsites seeing there
    already is a WARN_ON_ONCE() inside the function if it fails.

    Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
    Reviewed-by: Dan Williams <dan.j.williams@intel.com>
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20221021174116.7200-2-logang@deltatee.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Audra Mitchell <audra@redhat.com>
2024-04-09 09:42:52 -04:00
Chris von Recklinghausen 80a5bb5a00 hugetlb: remove duplicate mmu notifications
JIRA: https://issues.redhat.com/browse/RHEL-27736

commit 369258ce41c6d7663a7b6d509356fecad577378d
Author: Mike Kravetz <mike.kravetz@oracle.com>
Date:   Mon Nov 14 15:55:07 2022 -0800

    hugetlb: remove duplicate mmu notifications

    The common hugetlb unmap routine __unmap_hugepage_range performs mmu
    notification calls.  However, in the case where __unmap_hugepage_range is
    called via __unmap_hugepage_range_final, mmu notification calls are
    performed earlier in other calling routines.

    Remove mmu notification calls from __unmap_hugepage_range.  Add
    notification calls to the only other caller: unmap_hugepage_range.
    unmap_hugepage_range is called for truncation and hole punch, so change
    notification type from UNMAP to CLEAR as this is more appropriate.

    Link: https://lkml.kernel.org/r/20221114235507.294320-4-mike.kravetz@oracle.com
    Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
    Suggested-by: Peter Xu <peterx@redhat.com>
    Cc: Wei Chen <harperchen1110@gmail.com>
    Cc: Axel Rasmussen <axelrasmussen@google.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Mina Almasry <almasrymina@google.com>
    Cc: Nadav Amit <nadav.amit@gmail.com>
    Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
    Cc: Rik van Riel <riel@surriel.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-01 11:20:05 -04:00
Jerry Snitselaar feb173f234 mmu_notifiers: rename invalidate_range notifier
JIRA: https://issues.redhat.com/browse/RHEL-26541
Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Conflicts: flush_tlb_page_nosync has vma struct arg

commit 1af5a8109904b7f00828e7f9f63f5695b42f8215
Author: Alistair Popple <apopple@nvidia.com>
Date:   Tue Jul 25 23:42:07 2023 +1000

    mmu_notifiers: rename invalidate_range notifier

    There are two main use cases for mmu notifiers.  One is by KVM which uses
    mmu_notifier_invalidate_range_start()/end() to manage a software TLB.

    The other is to manage hardware TLBs which need to use the
    invalidate_range() callback because HW can establish new TLB entries at
    any time.  Hence using start/end() can lead to memory corruption as these
    callbacks happen too soon/late during page unmap.

    mmu notifier users should therefore either use the start()/end() callbacks
    or the invalidate_range() callbacks.  To make this usage clearer rename
    the invalidate_range() callback to arch_invalidate_secondary_tlbs() and
    update documention.

    Link: https://lkml.kernel.org/r/6f77248cd25545c8020a54b4e567e8b72be4dca1.1690292440.git-series.apopple@nvidia.com
    Signed-off-by: Alistair Popple <apopple@nvidia.com>
    Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
    Acked-by: Catalin Marinas <catalin.marinas@arm.com>
    Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
    Cc: Andrew Donnellan <ajd@linux.ibm.com>
    Cc: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
    Cc: Frederic Barrat <fbarrat@linux.ibm.com>
    Cc: Jason Gunthorpe <jgg@ziepe.ca>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Kevin Tian <kevin.tian@intel.com>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Nicholas Piggin <npiggin@gmail.com>
    Cc: Nicolin Chen <nicolinc@nvidia.com>
    Cc: Robin Murphy <robin.murphy@arm.com>
    Cc: Sean Christopherson <seanjc@google.com>
    Cc: SeongJae Park <sj@kernel.org>
    Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
    Cc: Will Deacon <will@kernel.org>
    Cc: Zhi Wang <zhi.wang.linux@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

(cherry picked from commit 1af5a8109904b7f00828e7f9f63f5695b42f8215)
Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com>
2024-02-26 15:51:24 -07:00
Jerry Snitselaar efb6748971 mmu_notifiers: don't invalidate secondary TLBs as part of mmu_notifier_invalidate_range_end()
JIRA: https://issues.redhat.com/browse/RHEL-26541
Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Conflicts: Context diff due to some commits not being backported yet such as c33c794828f2 ("mm: ptep_get() conversion"),
           and 959a78b6dd45 ("mm/hugetlb: use a folio in hugetlb_wp()").

commit ec8832d007cb7b50229ad5745eec35b847cc9120
Author: Alistair Popple <apopple@nvidia.com>
Date:   Tue Jul 25 23:42:06 2023 +1000

    mmu_notifiers: don't invalidate secondary TLBs as part of mmu_notifier_invalidate_range_end()

    Secondary TLBs are now invalidated from the architecture specific TLB
    invalidation functions.  Therefore there is no need to explicitly notify
    or invalidate as part of the range end functions.  This means we can
    remove mmu_notifier_invalidate_range_end_only() and some of the
    ptep_*_notify() functions.

    Link: https://lkml.kernel.org/r/90d749d03cbab256ca0edeb5287069599566d783.1690292440.git-series.apopple@nvidia.com
    Signed-off-by: Alistair Popple <apopple@nvidia.com>
    Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
    Cc: Andrew Donnellan <ajd@linux.ibm.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
    Cc: Frederic Barrat <fbarrat@linux.ibm.com>
    Cc: Jason Gunthorpe <jgg@ziepe.ca>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Kevin Tian <kevin.tian@intel.com>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Nicholas Piggin <npiggin@gmail.com>
    Cc: Nicolin Chen <nicolinc@nvidia.com>
    Cc: Robin Murphy <robin.murphy@arm.com>
    Cc: Sean Christopherson <seanjc@google.com>
    Cc: SeongJae Park <sj@kernel.org>
    Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
    Cc: Will Deacon <will@kernel.org>
    Cc: Zhi Wang <zhi.wang.linux@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

(cherry picked from commit ec8832d007cb7b50229ad5745eec35b847cc9120)
Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com>
2024-02-26 15:49:51 -07:00
Scott Weaver 8c2c8bf31a Merge: DRM Backport 9.4 dependencies
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3094

JIRA: https://issues.redhat.com/browse/RHEL-1349

Depends: !2843

Depends: !3129

These are the dependencies needed for the 9.4 DRM backport.

Omitted-fix: cf683e8870bd4be0fd6b98639286700a35088660 (fix is included)

Omitted-fix: c042030aa15e9265504a034243a8cae062e900a1 (fix is included)

Signed-off-by: Mika Penttilä <mpenttil@redhat.com>

Approved-by: Tony Camuso <tcamuso@redhat.com>
Approved-by: Chris von Recklinghausen <crecklin@redhat.com>
Approved-by: Donald Dutile <ddutile@redhat.com>
Approved-by: Michel Dänzer <mdaenzer@redhat.com>

Signed-off-by: Scott Weaver <scweaver@redhat.com>
2023-11-02 12:33:47 -04:00
Paolo Bonzini 538bf6f332 mm, treewide: redefine MAX_ORDER sanely
JIRA: https://issues.redhat.com/browse/RHEL-10059

MAX_ORDER currently defined as number of orders page allocator supports:
user can ask buddy allocator for page order between 0 and MAX_ORDER-1.

This definition is counter-intuitive and lead to number of bugs all over
the kernel.

Change the definition of MAX_ORDER to be inclusive: the range of orders
user can ask from buddy allocator is 0..MAX_ORDER now.

[kirill@shutemov.name: fix min() warning]
  Link: https://lkml.kernel.org/r/20230315153800.32wib3n5rickolvh@box
[akpm@linux-foundation.org: fix another min_t warning]
[kirill@shutemov.name: fixups per Zi Yan]
  Link: https://lkml.kernel.org/r/20230316232144.b7ic4cif4kjiabws@box.shutemov.name
[akpm@linux-foundation.org: fix underlining in docs]
  Link: https://lore.kernel.org/oe-kbuild-all/202303191025.VRCTk6mP-lkp@intel.com/
Link: https://lkml.kernel.org/r/20230315113133.11326-11-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Michael Ellerman <mpe@ellerman.id.au>	[powerpc]
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 23baf831a32c04f9a968812511540b1b3e648bf5)

[RHEL: Fix conflicts by changing MAX_ORDER - 1 to MAX_ORDER,
       ">= MAX_ORDER" to "> MAX_ORDER", etc.]

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-30 09:12:37 +01:00
Mika Penttilä 7ef8f6ec98 mm: fix a few rare cases of using swapin error pte marker
JIRA: https://issues.redhat.com/browse/RHEL-1349
Upstream Status: v6.2-rc7

commit 7e3ce3f8d2d235f916baad1582f6cf12e0319013
Author:     Peter Xu <peterx@redhat.com>
AuthorDate: Wed Dec 14 15:04:53 2022 -0500
Commit:     Andrew Morton <akpm@linux-foundation.org>
CommitDate: Wed Jan 18 17:02:19 2023 -0800

    This patch should harden commit 15520a3f0469 ("mm: use pte markers for
    swap errors") on using pte markers for swapin errors on a few corner
    cases.

    1. Propagate swapin errors across fork()s: if there're swapin errors in
       the parent mm, after fork()s the child should sigbus too when an error
       page is accessed.

    2. Fix a rare condition race in pte_marker_clear() where a uffd-wp pte
       marker can be quickly switched to a swapin error.

    3. Explicitly ignore swapin error pte markers in change_protection().

    I mostly don't worry on (2) or (3) at all, but we should still have them. 
    Case (1) is special because it can potentially cause silent data corrupt
    on child when parent has swapin error triggered with swapoff, but since
    swapin error is rare itself already it's probably not easy to trigger
    either.

    Currently there is a priority difference between the uffd-wp bit and the
    swapin error entry, in which the swapin error always has higher priority
    (e.g.  we don't need to wr-protect a swapin error pte marker).

    If there will be a 3rd bit introduced, we'll probably need to consider a
    more involved approach so we may need to start operate on the bits.  Let's
    leave that for later.

    This patch is tested with case (1) explicitly where we'll get corrupted
    data before in the child if there's existing swapin error pte markers, and
    after patch applied the child can be rightfully killed.

    We don't need to copy stable for this one since 15520a3f0469 just landed
    as part of v6.2-rc1, only "Fixes" applied.

    Link: https://lkml.kernel.org/r/20221214200453.1772655-3-peterx@redhat.com
    Fixes: 15520a3f0469 ("mm: use pte markers for swap errors")
    Signed-off-by: Peter Xu <peterx@redhat.com>
    Acked-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: "Huang, Ying" <ying.huang@intel.com>
    Cc: Nadav Amit <nadav.amit@gmail.com>
    Cc: Pengfei Xu <pengfei.xu@intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Mika Penttilä <mpenttil@redhat.com>
2023-10-30 07:03:06 +02:00
Chris von Recklinghausen d384489054 mm: convert head_subpages_mapcount() into folio_nr_pages_mapped()
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit eec20426d48bd7b63c69969a793943ed1a99b731
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Wed Jan 11 14:28:48 2023 +0000

    mm: convert head_subpages_mapcount() into folio_nr_pages_mapped()

    Calling this 'mapcount' is confusing since mapcount is usually the number
    of times something is mapped; instead this is the number of mapped pages.
    It's also better to enforce that this is a folio rather than a head page.

    Move folio_nr_pages_mapped() into mm/internal.h since this is not
    something we want device drivers or filesystems poking at.  Get rid of
    folio_subpages_mapcount_ptr() and use folio->_nr_pages_mapped directly.

    Link: https://lkml.kernel.org/r/20230111142915.1001531-3-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:52 -04:00
Chris von Recklinghausen fe5f50def7 mm: remove folio_pincount_ptr() and head_compound_pincount()
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 94688e8eb453e616098cb930e5f6fed4a6ea2dfa
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Wed Jan 11 14:28:47 2023 +0000

    mm: remove folio_pincount_ptr() and head_compound_pincount()

    We can use folio->_pincount directly, since all users are guarded by tests
    of compound/large.

    Link: https://lkml.kernel.org/r/20230111142915.1001531-2-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: John Hubbard <jhubbard@nvidia.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:52 -04:00
Chris von Recklinghausen 40638b50bc mm/hugetlb: introduce hugetlb_walk()
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 9c67a20704e763f9cb8cd262c3e45de7bd2816bc
Author: Peter Xu <peterx@redhat.com>
Date:   Fri Dec 16 10:52:29 2022 -0500

    mm/hugetlb: introduce hugetlb_walk()

    huge_pte_offset() is the main walker function for hugetlb pgtables.  The
    name is not really representing what it does, though.

    Instead of renaming it, introduce a wrapper function called hugetlb_walk()
    which will use huge_pte_offset() inside.  Assert on the locks when walking
    the pgtable.

    Note, the vma lock assertion will be a no-op for private mappings.

    Document the last special case in the page_vma_mapped_walk() path where we
    don't need any more lock to call hugetlb_walk().

    Taking vma lock there is not needed because either: (1) potential callers
    of hugetlb pvmw holds i_mmap_rwsem already (from one rmap_walk()), or (2)
    the caller will not walk a hugetlb vma at all so the hugetlb code path not
    reachable (e.g.  in ksm or uprobe paths).

    It's slightly implicit for future page_vma_mapped_walk() callers on that
    lock requirement.  But anyway, when one day this rule breaks, one will get
    a straightforward warning in hugetlb_walk() with lockdep, then there'll be
    a way out.

    [akpm@linux-foundation.org: coding-style cleanups]
    Link: https://lkml.kernel.org/r/20221216155229.2043750-1-peterx@redhat.com
    Signed-off-by: Peter Xu <peterx@redhat.com>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Reviewed-by: John Hubbard <jhubbard@nvidia.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: James Houghton <jthoughton@google.com>
    Cc: Jann Horn <jannh@google.com>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Cc: Nadav Amit <nadav.amit@gmail.com>
    Cc: Rik van Riel <riel@surriel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:46 -04:00
Chris von Recklinghausen 342b235b99 mm/hugetlb: make follow_hugetlb_page() safe to pmd unshare
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit eefc7fa53608920203a1402ecf7255ecfa8bb030
Author: Peter Xu <peterx@redhat.com>
Date:   Fri Dec 16 10:52:23 2022 -0500

    mm/hugetlb: make follow_hugetlb_page() safe to pmd unshare

    Since follow_hugetlb_page() walks the pgtable, it needs the vma lock to
    make sure the pgtable page will not be freed concurrently.

    Link: https://lkml.kernel.org/r/20221216155223.2043727-1-peterx@redhat.com
    Signed-off-by: Peter Xu <peterx@redhat.com>
    Acked-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Reviewed-by: John Hubbard <jhubbard@nvidia.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: James Houghton <jthoughton@google.com>
    Cc: Jann Horn <jannh@google.com>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Cc: Nadav Amit <nadav.amit@gmail.com>
    Cc: Rik van Riel <riel@surriel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:45 -04:00
Chris von Recklinghausen 18455d905f mm/hugetlb: make hugetlb_follow_page_mask() safe to pmd unshare
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 7d049f3a03ea705522210d70b9d3e223ef86d663
Author: Peter Xu <peterx@redhat.com>
Date:   Fri Dec 16 10:52:19 2022 -0500

    mm/hugetlb: make hugetlb_follow_page_mask() safe to pmd unshare

    Since hugetlb_follow_page_mask() walks the pgtable, it needs the vma lock
    to make sure the pgtable page will not be freed concurrently.

    Link: https://lkml.kernel.org/r/20221216155219.2043714-1-peterx@redhat.com
    Signed-off-by: Peter Xu <peterx@redhat.com>
    Acked-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Reviewed-by: John Hubbard <jhubbard@nvidia.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: James Houghton <jthoughton@google.com>
    Cc: Jann Horn <jannh@google.com>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Cc: Nadav Amit <nadav.amit@gmail.com>
    Cc: Rik van Riel <riel@surriel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:45 -04:00
Chris von Recklinghausen 20b7a6fe2d mm/hugetlb: move swap entry handling into vma lock when faulted
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit fcd48540d188876c917a377d81cd24c100332a62
Author: Peter Xu <peterx@redhat.com>
Date:   Fri Dec 16 10:50:55 2022 -0500

    mm/hugetlb: move swap entry handling into vma lock when faulted

    In hugetlb_fault(), there used to have a special path to handle swap entry
    at the entrance using huge_pte_offset().  That's unsafe because
    huge_pte_offset() for a pmd sharable range can access freed pgtables if
    without any lock to protect the pgtable from being freed after pmd
    unshare.

    Here the simplest solution to make it safe is to move the swap handling to
    be after the vma lock being held.  We may need to take the fault mutex on
    either migration or hwpoison entries now (also the vma lock, but that's
    really needed), however neither of them is hot path.

    Note that the vma lock cannot be released in hugetlb_fault() when the
    migration entry is detected, because in migration_entry_wait_huge() the
    pgtable page will be used again (by taking the pgtable lock), so that also
    need to be protected by the vma lock.  Modify migration_entry_wait_huge()
    so that it must be called with vma read lock held, and properly release
    the lock in __migration_entry_wait_huge().

    Link: https://lkml.kernel.org/r/20221216155100.2043537-5-peterx@redhat.com
    Signed-off-by: Peter Xu <peterx@redhat.com>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Reviewed-by: John Hubbard <jhubbard@nvidia.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: James Houghton <jthoughton@google.com>
    Cc: Jann Horn <jannh@google.com>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Cc: Nadav Amit <nadav.amit@gmail.com>
    Cc: Rik van Riel <riel@surriel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:44 -04:00
Chris von Recklinghausen ff598ff493 mm/hugetlb: don't wait for migration entry during follow page
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit bb373dce2c7b473023f9e69f041a22d81171b71a
Author: Peter Xu <peterx@redhat.com>
Date:   Fri Dec 16 10:50:53 2022 -0500

    mm/hugetlb: don't wait for migration entry during follow page

    That's what the code does with !hugetlb pages, so we should logically do
    the same for hugetlb, so migration entry will also be treated as no page.

    This is probably also the last piece in follow_page code that may sleep,
    the last one should be removed in cf994dd8af27 ("mm/gup: remove
    FOLL_MIGRATION", 2022-11-16).

    Link: https://lkml.kernel.org/r/20221216155100.2043537-3-peterx@redhat.com
    Signed-off-by: Peter Xu <peterx@redhat.com>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: John Hubbard <jhubbard@nvidia.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: James Houghton <jthoughton@google.com>
    Cc: Jann Horn <jannh@google.com>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Cc: Nadav Amit <nadav.amit@gmail.com>
    Cc: Rik van Riel <riel@surriel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:44 -04:00
Chris von Recklinghausen 809793f9b9 hugetlb: update vma flag check for hugetlb vma lock
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 379c2e60e82ff71510a949033bf8431f39f66c75
Author: Mike Kravetz <mike.kravetz@oracle.com>
Date:   Mon Dec 12 15:50:42 2022 -0800

    hugetlb: update vma flag check for hugetlb vma lock

    The check for whether a hugetlb vma lock exists partially depends on the
    vma's flags.  Currently, it checks for either VM_MAYSHARE or VM_SHARED.
    The reason both flags are used is because VM_MAYSHARE was previously
    cleared in hugetlb vmas as they are tore down.  This is no longer the
    case, and only the VM_MAYSHARE check is required.

    Link: https://lkml.kernel.org/r/20221212235042.178355-2-mike.kravetz@oracle.com
    Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
    Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
    Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: James Houghton <jthoughton@google.com>
    Cc: Mina Almasry <almasrymina@google.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
    Cc: Peter Xu <peterx@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:43 -04:00
Chris von Recklinghausen 653ae76632 mm/uffd: always wr-protect pte in pte|pmd_mkuffd_wp()
Conflicts: mm/userfaultfd.c - RHEL-only patch
	8e95bedaa1a ("mm: Fix CVE-2022-2590 by reverting "mm/shmem: unconditionally set pte dirty in mfill_atomic_install_pte"")
	causes a merge conflict with this patch. Since upstream commit
	5535be309971 ("mm/gup: fix FOLL_FORCE COW security issue and remove FOLL_COW")
	actually fixes the CVE we can safely remove the conflicted lines
	and replace them with the lines the upstream version of thes
	patch adds

JIRA: https://issues.redhat.com/browse/RHEL-1848

commit f1eb1bacfba9019823b2fce42383f010cd561fa6
Author: Peter Xu <peterx@redhat.com>
Date:   Wed Dec 14 15:15:33 2022 -0500

    mm/uffd: always wr-protect pte in pte|pmd_mkuffd_wp()

    This patch is a cleanup to always wr-protect pte/pmd in mkuffd_wp paths.

    The reasons I still think this patch is worthwhile, are:

      (1) It is a cleanup already; diffstat tells.

      (2) It just feels natural after I thought about this, if the pte is uffd
          protected, let's remove the write bit no matter what it was.

      (2) Since x86 is the only arch that supports uffd-wp, it also redefines
          pte|pmd_mkuffd_wp() in that it should always contain removals of
          write bits.  It means any future arch that want to implement uffd-wp
          should naturally follow this rule too.  It's good to make it a
          default, even if with vm_page_prot changes on VM_UFFD_WP.

      (3) It covers more than vm_page_prot.  So no chance of any potential
          future "accident" (like pte_mkdirty() sparc64 or loongarch, even
          though it just got its pte_mkdirty fixed <1 month ago).  It'll be
          fairly clear when reading the code too that we don't worry anything
          before a pte_mkuffd_wp() on uncertainty of the write bit.

    We may call pte_wrprotect() one more time in some paths (e.g.  thp split),
    but that should be fully local bitop instruction so the overhead should be
    negligible.

    Although this patch should logically also fix all the known issues on
    uffd-wp too recently on page migration (not for numa hint recovery - that
    may need another explcit pte_wrprotect), but this is not the plan for that
    fix.  So no fixes, and stable doesn't need this.

    Link: https://lkml.kernel.org/r/20221214201533.1774616-1-peterx@redhat.com
    Signed-off-by: Peter Xu <peterx@redhat.com>
    Acked-by: David Hildenbrand <david@redhat.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Ives van Hoorne <ives@codesandbox.io>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Nadav Amit <nadav.amit@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:43 -04:00
Chris von Recklinghausen 8e0969ab45 mm: move folio_set_compound_order() to mm/internal.h
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 04a42e72d77a93a166b79c34b7bc862f55a53967
Author: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Date:   Wed Dec 14 22:17:57 2022 -0800

    mm: move folio_set_compound_order() to mm/internal.h

    folio_set_compound_order() is moved to an mm-internal location so external
    folio users cannot misuse this function.  Change the name of the function
    to folio_set_order() and use WARN_ON_ONCE() rather than BUG_ON.  Also,
    handle the case if a non-large folio is passed and add clarifying comments
    to the function.

    Link: https://lore.kernel.org/lkml/20221207223731.32784-1-sidhartha.kumar@oracle.com/T/
    Link: https://lkml.kernel.org/r/20221215061757.223440-1-sidhartha.kumar@oracle.com
    Fixes: 9fd330582b2f ("mm: add folio dtor and order setter functions")
    Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Suggested-by: Mike Kravetz <mike.kravetz@oracle.com>
    Suggested-by: Muchun Song <songmuchun@bytedance.com>
    Suggested-by: Matthew Wilcox <willy@infradead.org>
    Suggested-by: John Hubbard <jhubbard@nvidia.com>
    Reviewed-by: John Hubbard <jhubbard@nvidia.com>
    Reviewed-by: Muchun Song <songmuchun@bytedance.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:43 -04:00
Chris von Recklinghausen dc21656712 hugetlb: really allocate vma lock for all sharable vmas
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit e700898fa075c69b3ae02b702ab57fb75e1a82ec
Author: Mike Kravetz <mike.kravetz@oracle.com>
Date:   Mon Dec 12 15:50:41 2022 -0800

    hugetlb: really allocate vma lock for all sharable vmas

    Commit bbff39cc6cbc ("hugetlb: allocate vma lock for all sharable vmas")
    removed the pmd sharable checks in the vma lock helper routines.  However,
    it left the functional version of helper routines behind #ifdef
    CONFIG_ARCH_WANT_HUGE_PMD_SHARE.  Therefore, the vma lock is not being
    used for sharable vmas on architectures that do not support pmd sharing.
    On these architectures, a potential fault/truncation race is exposed that
    could leave pages in a hugetlb file past i_size until the file is removed.

    Move the functional vma lock helpers outside the ifdef, and remove the
    non-functional stubs.  Since the vma lock is not just for pmd sharing,
    rename the routine __vma_shareable_flags_pmd.

    Link: https://lkml.kernel.org/r/20221212235042.178355-1-mike.kravetz@oracle.com
    Fixes: bbff39cc6cbc ("hugetlb: allocate vma lock for all sharable vmas")
    Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
    Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
    Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: James Houghton <jthoughton@google.com>
    Cc: Mina Almasry <almasrymina@google.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:38 -04:00
Chris von Recklinghausen e546340977 mm/hugetlb: set head flag before setting compound_order in __prep_compound_gigantic_folio
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit c45bc55a99957b20e4e0333bcd42e12d1833a7f5
Author: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Date:   Mon Dec 12 14:55:29 2022 -0800

    mm/hugetlb: set head flag before setting compound_order in __prep_compound_gigantic_folio

    folio_set_compound_order() checks if the passed in folio is a large folio.
    A large folio is indicated by the PG_head flag.  Call __folio_set_head()
    before setting the order.

    Link: https://lkml.kernel.org/r/20221212225529.22493-1-sidhartha.kumar@oracle.com
    Fixes: d1c6095572d0 ("mm/hugetlb: convert hugetlb prep functions to folios")
    Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Reported-by: David Hildenbrand <david@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:36 -04:00
Chris von Recklinghausen c6d772b121 mm/hugetlb: change hugetlb allocation functions to return a folio
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 19fc1a7e8b2b3b0e18fbea84ee26517e1b0f1a6e
Author: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Date:   Tue Nov 29 14:50:39 2022 -0800

    mm/hugetlb: change hugetlb allocation functions to return a folio

    Many hugetlb allocation helper functions have now been converting to
    folios, update their higher level callers to be compatible with folios.
    alloc_pool_huge_page is reorganized to avoid a smatch warning reporting
    the folio variable is uninitialized.

    [sidhartha.kumar@oracle.com: update alloc_and_dissolve_hugetlb_folio comments]
      Link: https://lkml.kernel.org/r/20221206233512.146535-1-sidhartha.kumar@oracle.com
    Link: https://lkml.kernel.org/r/20221129225039.82257-11-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Reported-by: Wei Chen <harperchen1110@gmail.com>
    Suggested-by: John Hubbard <jhubbard@nvidia.com>
    Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Mina Almasry <almasrymina@google.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Cc: Tarun Sahu <tsahu@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:34 -04:00
Chris von Recklinghausen 7e650ba2b1 mm/hugetlb: convert hugetlb prep functions to folios
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit d1c6095572d0cf00c0cd30378639ff9387b34edd
Author: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Date:   Tue Nov 29 14:50:38 2022 -0800

    mm/hugetlb: convert hugetlb prep functions to folios

    Convert prep_new_huge_page() and __prep_compound_gigantic_page() to
    folios.

    Link: https://lkml.kernel.org/r/20221129225039.82257-10-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Mina Almasry <almasrymina@google.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
    Cc: Tarun Sahu <tsahu@linux.ibm.com>
    Cc: Wei Chen <harperchen1110@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:33 -04:00
Chris von Recklinghausen f381670865 mm/hugetlb: convert free_gigantic_page() to folios
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 7f325a8d25631e68cd75afaeaf330187e45e0eb5
Author: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Date:   Tue Nov 29 14:50:37 2022 -0800

    mm/hugetlb: convert free_gigantic_page() to folios

    Convert callers of free_gigantic_page() to use folios, function is then
    renamed to free_gigantic_folio().

    Link: https://lkml.kernel.org/r/20221129225039.82257-9-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Mina Almasry <almasrymina@google.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
    Cc: Tarun Sahu <tsahu@linux.ibm.com>
    Cc: Wei Chen <harperchen1110@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:33 -04:00
Chris von Recklinghausen 3d85e464e2 mm/hugetlb: convert enqueue_huge_page() to folios
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 240d67a86ecb0fa18863821a0cb55783ad50ef30
Author: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Date:   Tue Nov 29 14:50:36 2022 -0800

    mm/hugetlb: convert enqueue_huge_page() to folios

    Convert callers of enqueue_huge_page() to pass in a folio, function is
    renamed to enqueue_hugetlb_folio().

    Link: https://lkml.kernel.org/r/20221129225039.82257-8-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Mina Almasry <almasrymina@google.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
    Cc: Tarun Sahu <tsahu@linux.ibm.com>
    Cc: Wei Chen <harperchen1110@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:33 -04:00
Chris von Recklinghausen 9a4125ce96 mm/hugetlb: convert add_hugetlb_page() to folios and add hugetlb_cma_folio()
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 2f6c57d696abcd2d27d07b8506d5e6bcc060e77a
Author: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Date:   Tue Nov 29 14:50:35 2022 -0800

    mm/hugetlb: convert add_hugetlb_page() to folios and add hugetlb_cma_folio()

    Convert add_hugetlb_page() to take in a folio, also convert
    hugetlb_cma_page() to take in a folio.

    Link: https://lkml.kernel.org/r/20221129225039.82257-7-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Mina Almasry <almasrymina@google.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
    Cc: Tarun Sahu <tsahu@linux.ibm.com>
    Cc: Wei Chen <harperchen1110@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:32 -04:00
Chris von Recklinghausen 1814b3b531 mm/hugetlb: convert update_and_free_page() to folios
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit d6ef19e25df2aa50f932a78c368d7bb710eaaa1b
Author: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Date:   Tue Nov 29 14:50:34 2022 -0800

    mm/hugetlb: convert update_and_free_page() to folios

    Make more progress on converting the free_huge_page() destructor to
    operate on folios by converting update_and_free_page() to folios.

    Link: https://lkml.kernel.org/r/20221129225039.82257-6-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>\
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Mina Almasry <almasrymina@google.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
    Cc: Tarun Sahu <tsahu@linux.ibm.com>
    Cc: Wei Chen <harperchen1110@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:32 -04:00
Chris von Recklinghausen 4940f2d374 mm/hugetlb: convert remove_hugetlb_page() to folios
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit cfd5082b514765f873504cc60a50cce30738bfd3
Author: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Date:   Tue Nov 29 14:50:33 2022 -0800

    mm/hugetlb: convert remove_hugetlb_page() to folios

    Removes page_folio() call by converting callers to directly pass a folio
    into __remove_hugetlb_page().

    Link: https://lkml.kernel.org/r/20221129225039.82257-5-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Mina Almasry <almasrymina@google.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
    Cc: Tarun Sahu <tsahu@linux.ibm.com>
    Cc: Wei Chen <harperchen1110@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:32 -04:00
Chris von Recklinghausen b611c893df mm/hugetlb: convert dissolve_free_huge_page() to folios
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 1a7cdab59b22465b850501e3897a3f3aa01670d8
Author: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Date:   Tue Nov 29 14:50:32 2022 -0800

    mm/hugetlb: convert dissolve_free_huge_page() to folios

    Removes compound_head() call by using a folio rather than a head page.

    Link: https://lkml.kernel.org/r/20221129225039.82257-4-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Mina Almasry <almasrymina@google.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
    Cc: Tarun Sahu <tsahu@linux.ibm.com>
    Cc: Wei Chen <harperchen1110@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:31 -04:00
Chris von Recklinghausen 22f017224c mm/hugetlb: convert destroy_compound_gigantic_page() to folios
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 911565b8285381e62d3bfd0cae2889a022737c37
Author: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Date:   Tue Nov 29 14:50:31 2022 -0800

    mm/hugetlb: convert destroy_compound_gigantic_page() to folios

    Convert page operations within __destroy_compound_gigantic_page() to the
    corresponding folio operations.

    Link: https://lkml.kernel.org/r/20221129225039.82257-3-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Mina Almasry <almasrymina@google.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
    Cc: Tarun Sahu <tsahu@linux.ibm.com>
    Cc: Wei Chen <harperchen1110@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:31 -04:00
Chris von Recklinghausen 99c827d6e4 mm: add folio dtor and order setter functions
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 9fd330582b2fe43c49ebcd02b2480f051f85aad4
Author: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Date:   Tue Nov 29 14:50:30 2022 -0800

    mm: add folio dtor and order setter functions

    Patch series "convert core hugetlb functions to folios", v5.

    ============== OVERVIEW ===========================
    Now that many hugetlb helper functions that deal with hugetlb specific
    flags[1] and hugetlb cgroups[2] are converted to folios, higher level
    allocation, prep, and freeing functions within hugetlb can also be
    converted to operate in folios.

    Patch 1 of this series implements the wrapper functions around setting the
    compound destructor and compound order for a folio.  Besides the user
    added in patch 1, patch 2 and patch 9 also use these helper functions.

    Patches 2-10 convert the higher level hugetlb functions to folios.

    ============== TESTING ===========================
    LTP:
            Ran 10 back to back rounds of the LTP hugetlb test suite.

    Gigantic Huge Pages:
            Test allocation and freeing via hugeadm commands:
                    hugeadm --pool-pages-min 1GB:10
                    hugeadm --pool-pages-min 1GB:0

    Demote:
            Demote 1 1GB hugepages to 512 2MB hugepages
                    echo 1 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
                    echo 1 > /sys/kernel/mm/hugepages/hugepages-1048576kB/demote
                    cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
                            # 512
                    cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
                            # 0

    [1] https://lore.kernel.org/lkml/20220922154207.1575343-1-sidhartha.kumar@oracle.com/
    [2] https://lore.kernel.org/linux-mm/20221101223059.460937-1-sidhartha.kumar@oracle.com/

    This patch (of 10):

    Add folio equivalents for set_compound_order() and
    set_compound_page_dtor().

    Also remove extra new-lines introduced by mm/hugetlb: convert
    move_hugetlb_state() to folios and mm/hugetlb_cgroup: convert
    hugetlb_cgroup_uncharge_page() to folios.

    [sidhartha.kumar@oracle.com: clarify folio_set_compound_order() zero support]
      Link: https://lkml.kernel.org/r/20221207223731.32784-1-sidhartha.kumar@oracle.com
    Link: https://lkml.kernel.org/r/20221129225039.82257-1-sidhartha.kumar@oracle.com
    Link: https://lkml.kernel.org/r/20221129225039.82257-2-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Suggested-by: Mike Kravetz <mike.kravetz@oracle.com>
    Suggested-by: Muchun Song <songmuchun@bytedance.com>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Mina Almasry <almasrymina@google.com>
    Cc: Tarun Sahu <tsahu@linux.ibm.com>
    Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
    Cc: Wei Chen <harperchen1110@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:31 -04:00
Chris von Recklinghausen e1c02a97f1 mm,thp,rmap: simplify compound page mapcount handling
Conflicts:
	include/linux/mm.h - We already have
		a1554c002699 ("include/linux/mm.h: move nr_free_buffer_pages from swap.h to mm.h")
		so keep declaration of nr_free_buffer_pages
	mm/huge_memory.c - We already have RHEL-only commit
		0837bdd68b ("Revert "mm: thp: stabilize the THP mapcount in page_remove_anon_compound_rmap"")
		so there is a difference in deleted code.

JIRA: https://issues.redhat.com/browse/RHEL-1848

commit cb67f4282bf9693658dbda934a441ddbbb1446df
Author: Hugh Dickins <hughd@google.com>
Date:   Wed Nov 2 18:51:38 2022 -0700

    mm,thp,rmap: simplify compound page mapcount handling

    Compound page (folio) mapcount calculations have been different for anon
    and file (or shmem) THPs, and involved the obscure PageDoubleMap flag.
    And each huge mapping and unmapping of a file (or shmem) THP involved
    atomically incrementing and decrementing the mapcount of every subpage of
    that huge page, dirtying many struct page cachelines.

    Add subpages_mapcount field to the struct folio and first tail page, so
    that the total of subpage mapcounts is available in one place near the
    head: then page_mapcount() and total_mapcount() and page_mapped(), and
    their folio equivalents, are so quick that anon and file and hugetlb don't
    need to be optimized differently.  Delete the unloved PageDoubleMap.

    page_add and page_remove rmap functions must now maintain the
    subpages_mapcount as well as the subpage _mapcount, when dealing with pte
    mappings of huge pages; and correct maintenance of NR_ANON_MAPPED and
    NR_FILE_MAPPED statistics still needs reading through the subpages, using
    nr_subpages_unmapped() - but only when first or last pmd mapping finds
    subpages_mapcount raised (double-map case, not the common case).

    But are those counts (used to decide when to split an anon THP, and in
    vmscan's pagecache_reclaimable heuristic) correctly maintained?  Not
    quite: since page_remove_rmap() (and also split_huge_pmd()) is often
    called without page lock, there can be races when a subpage pte mapcount
    0<->1 while compound pmd mapcount 0<->1 is scanning - races which the
    previous implementation had prevented.  The statistics might become
    inaccurate, and even drift down until they underflow through 0.  That is
    not good enough, but is better dealt with in a followup patch.

    Update a few comments on first and second tail page overlaid fields.
    hugepage_add_new_anon_rmap() has to "increment" compound_mapcount, but
    subpages_mapcount and compound_pincount are already correctly at 0, so
    delete its reinitialization of compound_pincount.

    A simple 100 X munmap(mmap(2GB, MAP_SHARED|MAP_POPULATE, tmpfs), 2GB) took
    18 seconds on small pages, and used to take 1 second on huge pages, but
    now takes 119 milliseconds on huge pages.  Mapping by pmds a second time
    used to take 860ms and now takes 92ms; mapping by pmds after mapping by
    ptes (when the scan is needed) used to take 870ms and now takes 495ms.
    But there might be some benchmarks which would show a slowdown, because
    tail struct pages now fall out of cache until final freeing checks them.

    Link: https://lkml.kernel.org/r/47ad693-717-79c8-e1ba-46c3a6602e48@google.co
m
    Signed-off-by: Hugh Dickins <hughd@google.com>
    Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: James Houghton <jthoughton@google.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Mina Almasry <almasrymina@google.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Yang Shi <shy828301@gmail.com>
    Cc: Zach O'Keefe <zokeefe@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:26 -04:00
Chris von Recklinghausen 311d13ef90 mm/hugetlb: convert move_hugetlb_state() to folios
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 345c62d163496ae4b5c1ce530b1588067d8f5a8b
Author: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Date:   Tue Nov 1 15:30:59 2022 -0700

    mm/hugetlb: convert move_hugetlb_state() to folios

    Clean up unmap_and_move_huge_page() by converting move_hugetlb_state() to
    take in folios.

    [akpm@linux-foundation.org: fix CONFIG_HUGETLB_PAGE=n build]
    Link: https://lkml.kernel.org/r/20221101223059.460937-10-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Reviewed-by: Muchun Song <songmuchun@bytedance.com>
    Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
    Cc: Bui Quang Minh <minhquangbui99@gmail.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Mina Almasry <almasrymina@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:25 -04:00
Chris von Recklinghausen b96436486a mm/hugetlb_cgroup: convert hugetlb_cgroup_uncharge_page() to folios
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit d4ab0316cc33aeedf6dcb1c2c25e097a25766132
Author: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Date:   Tue Nov 1 15:30:57 2022 -0700

    mm/hugetlb_cgroup: convert hugetlb_cgroup_uncharge_page() to folios

    Continue to use a folio inside free_huge_page() by converting
    hugetlb_cgroup_uncharge_page*() to folios.

    Link: https://lkml.kernel.org/r/20221101223059.460937-8-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Reviewed-by: Muchun Song <songmuchun@bytedance.com>
    Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
    Cc: Bui Quang Minh <minhquangbui99@gmail.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Mina Almasry <almasrymina@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:25 -04:00
Chris von Recklinghausen b9544876bc mm/hugetlb: convert free_huge_page to folios
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 0356c4b96f6890dd61af4c902f681764f4bdba09
Author: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Date:   Tue Nov 1 15:30:56 2022 -0700

    mm/hugetlb: convert free_huge_page to folios

    Use folios inside free_huge_page(), this is in preparation for converting
    hugetlb_cgroup_uncharge_page() to take in a folio.

    Link: https://lkml.kernel.org/r/20221101223059.460937-7-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Reviewed-by: Muchun Song <songmuchun@bytedance.com>
    Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
    Cc: Bui Quang Minh <minhquangbui99@gmail.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Mina Almasry <almasrymina@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:24 -04:00
Chris von Recklinghausen 12ff8e1504 mm/hugetlb: convert isolate_or_dissolve_huge_page to folios
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit d5e33bd8c16b6f5f47665d378f078bee72b85225
Author: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Date:   Tue Nov 1 15:30:55 2022 -0700

    mm/hugetlb: convert isolate_or_dissolve_huge_page to folios

    Removes a call to compound_head() by using a folio when operating on the
    head page of a hugetlb compound page.

    Link: https://lkml.kernel.org/r/20221101223059.460937-6-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Reviewed-by: Muchun Song <songmuchun@bytedance.com>
    Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
    Cc: Bui Quang Minh <minhquangbui99@gmail.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Mina Almasry <almasrymina@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:24 -04:00
Chris von Recklinghausen 6658973279 mm/hugetlb_cgroup: convert hugetlb_cgroup_migrate to folios
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 29f394304f624b06fafb3cc9c3da8779f71f4bee
Author: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Date:   Tue Nov 1 15:30:54 2022 -0700

    mm/hugetlb_cgroup: convert hugetlb_cgroup_migrate to folios

    Cleans up intermediate page to folio conversion code in
    hugetlb_cgroup_migrate() by changing its arguments from pages to folios.

    Link: https://lkml.kernel.org/r/20221101223059.460937-5-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Reviewed-by: Muchun Song <songmuchun@bytedance.com>
    Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
    Cc: Bui Quang Minh <minhquangbui99@gmail.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Mina Almasry <almasrymina@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:24 -04:00
Chris von Recklinghausen 6200fa5886 mm/hugetlb_cgroup: convert set_hugetlb_cgroup*() to folios
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit de656ed376c4cb47c5713fba52f8bbfbea44f387
Author: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Date:   Tue Nov 1 15:30:53 2022 -0700

    mm/hugetlb_cgroup: convert set_hugetlb_cgroup*() to folios

    Allows __prep_new_huge_page() to operate on a folio by converting
    set_hugetlb_cgroup*() to take in a folio.

    Link: https://lkml.kernel.org/r/20221101223059.460937-4-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
    Cc: Bui Quang Minh <minhquangbui99@gmail.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Mina Almasry <almasrymina@google.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:23 -04:00
Chris von Recklinghausen d153aac91e mm/hugetlb_cgroup: convert hugetlb_cgroup_from_page() to folios
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit f074732d599e19a2a5b12e54743ad5eaccbe6550
Author: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Date:   Tue Nov 1 15:30:52 2022 -0700

    mm/hugetlb_cgroup: convert hugetlb_cgroup_from_page() to folios

    Introduce folios in __remove_hugetlb_page() by converting
    hugetlb_cgroup_from_page() to use folios.

    Also gets rid of unsed hugetlb_cgroup_from_page_resv() function.

    Link: https://lkml.kernel.org/r/20221101223059.460937-3-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Reviewed-by: Muchun Song <songmuchun@bytedance.com>
    Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
    Cc: Bui Quang Minh <minhquangbui99@gmail.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Mina Almasry <almasrymina@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:23 -04:00
Chris von Recklinghausen 0e8d7c85ff mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit e591ef7d96d6ea249916f351dc26a636e565c635
Author: Naoya Horiguchi <naoya.horiguchi@nec.com>
Date:   Mon Oct 24 15:20:09 2022 +0900

    mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage

    Patch series "mm, hwpoison: improve handling workload related to hugetlb
    and memory_hotplug", v7.

    This patchset tries to solve the issue among memory_hotplug, hugetlb and hwpoison.
    In this patchset, memory hotplug handles hwpoison pages like below:

      - hwpoison pages should not prevent memory hotremove,
      - memory block with hwpoison pages should not be onlined.

    This patch (of 4):

    HWPoisoned page is not supposed to be accessed once marked, but currently
    such accesses can happen during memory hotremove because
    do_migrate_range() can be called before dissolve_free_huge_pages() is
    called.

    Clear HPageMigratable for hwpoisoned hugepages to prevent them from being
    migrated.  This should be done in hugetlb_lock to avoid race against
    isolate_hugetlb().

    get_hwpoison_huge_page() needs to have a flag to show it's called from
    unpoison to take refcount of hwpoisoned hugepages, so add it.

    [naoya.horiguchi@linux.dev: remove TestClearHPageMigratable and reduce to test and clear separately]
      Link: https://lkml.kernel.org/r/20221025053559.GA2104800@ik1-406-35019.vs.sakura.ne.jp
    Link: https://lkml.kernel.org/r/20221024062012.1520887-1-naoya.horiguchi@linux.dev
    Link: https://lkml.kernel.org/r/20221024062012.1520887-2-naoya.horiguchi@linux.dev
    Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Reported-by: Miaohe Lin <linmiaohe@huawei.com>
    Reviewed-by: Oscar Salvador <osalvador@suse.de>
    Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Jane Chu <jane.chu@oracle.com>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Cc: Yang Shi <shy828301@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:18 -04:00
Chris von Recklinghausen ac4694cf43 Revert "mm/uffd: fix warning without PTE_MARKER_UFFD_WP compiled in"
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit b12fdbf15f92b6cf5fecdd8a1855afe8809e5c58
Author: Peter Xu <peterx@redhat.com>
Date:   Mon Oct 24 15:33:36 2022 -0400

    Revert "mm/uffd: fix warning without PTE_MARKER_UFFD_WP compiled in"

    With " mm/uffd: Fix vma check on userfault for wp" to fix the
    registration, we'll be safe to remove the macro hacks now.

    Link: https://lkml.kernel.org/r/20221024193336.1233616-3-peterx@redhat.com
    Signed-off-by: Peter Xu <peterx@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:18 -04:00
Chris von Recklinghausen bdc3c88db4 mm/hugetlb: unify clearing of RestoreReserve for private pages
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 4781593d5dbae50500d1c7975be03b590ae2b92a
Author: Peter Xu <peterx@redhat.com>
Date:   Thu Oct 20 15:38:32 2022 -0400

    mm/hugetlb: unify clearing of RestoreReserve for private pages

    A trivial cleanup to move clearing of RestoreReserve into adding anon rmap
    of private hugetlb mappings.  It matches with the shared mappings where we
    only clear the bit when adding into page cache, rather than spreading it
    around the code paths.

    Link: https://lkml.kernel.org/r/20221020193832.776173-1-peterx@redhat.com
    Signed-off-by: Peter Xu <peterx@redhat.com>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:17 -04:00
Chris von Recklinghausen 3fe0d67558 hugetlb: simplify hugetlb handling in follow_page_mask
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 57a196a58421a4b0c45949ae7309f21829aaa77f
Author: Mike Kravetz <mike.kravetz@oracle.com>
Date:   Sun Sep 18 19:13:48 2022 -0700

    hugetlb: simplify hugetlb handling in follow_page_mask

    During discussions of this series [1], it was suggested that hugetlb
    handling code in follow_page_mask could be simplified.  At the beginning
    of follow_page_mask, there currently is a call to follow_huge_addr which
    'may' handle hugetlb pages.  ia64 is the only architecture which provides
    a follow_huge_addr routine that does not return error.  Instead, at each
    level of the page table a check is made for a hugetlb entry.  If a hugetlb
    entry is found, a call to a routine associated with that entry is made.

    Currently, there are two checks for hugetlb entries at each page table
    level.  The first check is of the form:

            if (p?d_huge())
                    page = follow_huge_p?d();

    the second check is of the form:

            if (is_hugepd())
                    page = follow_huge_pd().

    We can replace these checks, as well as the special handling routines such
    as follow_huge_p?d() and follow_huge_pd() with a single routine to handle
    hugetlb vmas.

    A new routine hugetlb_follow_page_mask is called for hugetlb vmas at the
    beginning of follow_page_mask.  hugetlb_follow_page_mask will use the
    existing routine huge_pte_offset to walk page tables looking for hugetlb
    entries.  huge_pte_offset can be overwritten by architectures, and already
    handles special cases such as hugepd entries.

    [1] https://lore.kernel.org/linux-mm/cover.1661240170.git.baolin.wang@linux.alibaba.com/

    [mike.kravetz@oracle.com: remove vma (pmd sharing) per Peter]
      Link: https://lkml.kernel.org/r/20221028181108.119432-1-mike.kravetz@oracle.com
    [mike.kravetz@oracle.com: remove left over hugetlb_vma_unlock_read()]
      Link: https://lkml.kernel.org/r/20221030225825.40872-1-mike.kravetz@oracle.com
    Link: https://lkml.kernel.org/r/20220919021348.22151-1-mike.kravetz@oracle.com
    Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
    Suggested-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
    Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
    Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
    Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:15 -04:00
Chris von Recklinghausen 2e4f279847 hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 04ada095dcfc4ae359418053c0be94453bdf1e84
Author: Mike Kravetz <mike.kravetz@oracle.com>
Date:   Mon Nov 14 15:55:06 2022 -0800

    hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing

    madvise(MADV_DONTNEED) ends up calling zap_page_range() to clear page
    tables associated with the address range.  For hugetlb vmas,
    zap_page_range will call __unmap_hugepage_range_final.  However,
    __unmap_hugepage_range_final assumes the passed vma is about to be removed
    and deletes the vma_lock to prevent pmd sharing as the vma is on the way
    out.  In the case of madvise(MADV_DONTNEED) the vma remains, but the
    missing vma_lock prevents pmd sharing and could potentially lead to issues
    with truncation/fault races.

    This issue was originally reported here [1] as a BUG triggered in
    page_try_dup_anon_rmap.  Prior to the introduction of the hugetlb
    vma_lock, __unmap_hugepage_range_final cleared the VM_MAYSHARE flag to
    prevent pmd sharing.  Subsequent faults on this vma were confused as
    VM_MAYSHARE indicates a sharable vma, but was not set so page_mapping was
    not set in new pages added to the page table.  This resulted in pages that
    appeared anonymous in a VM_SHARED vma and triggered the BUG.

    Address issue by adding a new zap flag ZAP_FLAG_UNMAP to indicate an unmap
    call from unmap_vmas().  This is used to indicate the 'final' unmapping of
    a hugetlb vma.  When called via MADV_DONTNEED, this flag is not set and
    the vm_lock is not deleted.

    [1] https://lore.kernel.org/lkml/CAO4mrfdLMXsao9RF4fUE8-Wfde8xmjsKrTNMNC9wjUb6JudD0g@mail.gmail.com/

    Link: https://lkml.kernel.org/r/20221114235507.294320-3-mike.kravetz@oracle.com
    Fixes: 90e7e7f5ef3f ("mm: enable MADV_DONTNEED for hugetlb mappings")
    Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
    Reported-by: Wei Chen <harperchen1110@gmail.com>
    Cc: Axel Rasmussen <axelrasmussen@google.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Mina Almasry <almasrymina@google.com>
    Cc: Nadav Amit <nadav.amit@gmail.com>
    Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Rik van Riel <riel@surriel.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:13 -04:00
Chris von Recklinghausen 89b8017a38 hugetlb: fix __prep_compound_gigantic_page page flag setting
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 7fb0728a9b005b8fc55e835529047cca15191031
Author: Mike Kravetz <mike.kravetz@oracle.com>
Date:   Fri Nov 18 11:52:49 2022 -0800

    hugetlb: fix __prep_compound_gigantic_page page flag setting

    Commit 2b21624fc232 ("hugetlb: freeze allocated pages before creating
    hugetlb pages") changed the order page flags were cleared and set in the
    head page.  It moved the __ClearPageReserved after __SetPageHead.
    However, there is a check to make sure __ClearPageReserved is never done
    on a head page.  If CONFIG_DEBUG_VM_PGFLAGS is enabled, the following BUG
    will be hit when creating a hugetlb gigantic page:

        page dumped because: VM_BUG_ON_PAGE(1 && PageCompound(page))
        ------------[ cut here ]------------
        kernel BUG at include/linux/page-flags.h:500!
        Call Trace will differ depending on whether hugetlb page is created
        at boot time or run time.

    Make sure to __ClearPageReserved BEFORE __SetPageHead.

    Link: https://lkml.kernel.org/r/20221118195249.178319-1-mike.kravetz@oracle.com
    Fixes: 2b21624fc232 ("hugetlb: freeze allocated pages before creating hugetlb pages")
    Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
    Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
    Acked-by: Muchun Song <songmuchun@bytedance.com>
    Tested-by: Tarun Sahu <tsahu@linux.ibm.com>
    Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Joao Martins <joao.m.martins@oracle.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Cc: Oscar Salvador <osalvador@suse.de>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:12 -04:00
Chris von Recklinghausen 24a1691241 hugetlb: fix memory leak associated with vma_lock structure
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 612b8a317023e1396965aacac43d80053c6e77db
Author: Mike Kravetz <mike.kravetz@oracle.com>
Date:   Wed Oct 19 13:19:57 2022 -0700

    hugetlb: fix memory leak associated with vma_lock structure

    The hugetlb vma_lock structure hangs off the vm_private_data pointer of
    sharable hugetlb vmas.  The structure is vma specific and can not be
    shared between vmas.  At fork and various other times, vmas are duplicated
    via vm_area_dup().  When this happens, the pointer in the newly created
    vma must be cleared and the structure reallocated.  Two hugetlb specific
    routines deal with this hugetlb_dup_vma_private and hugetlb_vm_op_open.
    Both routines are called for newly created vmas.  hugetlb_dup_vma_private
    would always clear the pointer and hugetlb_vm_op_open would allocate the
    new vms_lock structure.  This did not work in the case of this calling
    sequence pointed out in [1].

      move_vma
        copy_vma
          new_vma = vm_area_dup(vma);
          new_vma->vm_ops->open(new_vma); --> new_vma has its own vma lock.
        is_vm_hugetlb_page(vma)
          clear_vma_resv_huge_pages
            hugetlb_dup_vma_private --> vma->vm_private_data is set to NULL

    When clearing hugetlb_dup_vma_private we actually leak the associated
    vma_lock structure.

    The vma_lock structure contains a pointer to the associated vma.  This
    information can be used in hugetlb_dup_vma_private and hugetlb_vm_op_open
    to ensure we only clear the vm_private_data of newly created (copied)
    vmas.  In such cases, the vma->vma_lock->vma field will not point to the
    vma.

    Update hugetlb_dup_vma_private and hugetlb_vm_op_open to not clear
    vm_private_data if vma->vma_lock->vma == vma.  Also, log a warning if
    hugetlb_vm_op_open ever encounters the case where vma_lock has already
    been correctly allocated for the vma.

    [1] https://lore.kernel.org/linux-mm/5154292a-4c55-28cd-0935-82441e512fc3@huawei.com/

    Link: https://lkml.kernel.org/r/20221019201957.34607-1-mike.kravetz@oracle.com
    Fixes: 131a79b474e9 ("hugetlb: fix vma lock handling during split vma and range unmapping")
    Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
    Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
    Cc: Axel Rasmussen <axelrasmussen@google.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Davidlohr Bueso <dave@stgolabs.net>
    Cc: James Houghton <jthoughton@google.com>
    Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Mina Almasry <almasrymina@google.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
    Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
    Cc: Sven Schnelle <svens@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:04 -04:00
Chris von Recklinghausen 62938ffdf0 mm/hugetlb.c: make __hugetlb_vma_unlock_write_put() static
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit acfac37851e01b40c30a7afd0d93ad8db8914f25
Author: Andrew Morton <akpm@linux-foundation.org>
Date:   Fri Oct 7 12:59:20 2022 -0700

    mm/hugetlb.c: make __hugetlb_vma_unlock_write_put() static

    Reported-by: kernel test robot <lkp@intel.com>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:00 -04:00
Chris von Recklinghausen 58c07ff87d hugetlb: allocate vma lock for all sharable vmas
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit bbff39cc6cbcb86ccfacb2dcafc79912a9f9df69
Author: Mike Kravetz <mike.kravetz@oracle.com>
Date:   Tue Oct 4 18:17:07 2022 -0700

    hugetlb: allocate vma lock for all sharable vmas

    The hugetlb vma lock was originally designed to synchronize pmd sharing.
    As such, it was only necessary to allocate the lock for vmas that were
    capable of pmd sharing.  Later in the development cycle, it was discovered
    that it could also be used to simplify fault/truncation races as described
    in [1].  However, a subsequent change to allocate the lock for all vmas
    that use the page cache was never made.  A fault/truncation race could
    leave pages in a file past i_size until the file is removed.

    Remove the previous restriction and allocate lock for all VM_MAYSHARE
    vmas.  Warn in the unlikely event of allocation failure.

    [1] https://lore.kernel.org/lkml/Yxiv0SkMkZ0JWGGp@monkey/#t

    Link: https://lkml.kernel.org/r/20221005011707.514612-4-mike.kravetz@oracle.com
    Fixes: "hugetlb: clean up code checking for fault/truncation races"
    Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
    Cc: Axel Rasmussen <axelrasmussen@google.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Davidlohr Bueso <dave@stgolabs.net>
    Cc: James Houghton <jthoughton@google.com>
    Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Mina Almasry <almasrymina@google.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
    Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
    Cc: Sven Schnelle <svens@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:14:55 -04:00
Chris von Recklinghausen 51803c7ce0 hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit ecfbd733878da48ed03a5b8a9c301366a03e3cca
Author: Mike Kravetz <mike.kravetz@oracle.com>
Date:   Tue Oct 4 18:17:06 2022 -0700

    hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer

    hugetlb file truncation/hole punch code may need to back out and take
    locks in order in the routine hugetlb_unmap_file_folio().  This code could
    race with vma freeing as pointed out in [1] and result in accessing a
    stale vma pointer.  To address this, take the vma_lock when clearing the
    vma_lock->vma pointer.

    [1] https://lore.kernel.org/linux-mm/01f10195-7088-4462-6def-909549c75ef4@huawei.com/

    [mike.kravetz@oracle.com: address build issues]
      Link: https://lkml.kernel.org/r/Yz5L1uxQYR1VqFtJ@monkey
    Link: https://lkml.kernel.org/r/20221005011707.514612-3-mike.kravetz@oracle.com
    Fixes: "hugetlb: use new vma_lock for pmd sharing synchronization"
    Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
    Cc: Axel Rasmussen <axelrasmussen@google.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Davidlohr Bueso <dave@stgolabs.net>
    Cc: James Houghton <jthoughton@google.com>
    Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Mina Almasry <almasrymina@google.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
    Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
    Cc: Sven Schnelle <svens@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:14:55 -04:00
Chris von Recklinghausen 02174dae48 hugetlb: fix vma lock handling during split vma and range unmapping
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 131a79b474e973f023c5c75e2323a940332103be
Author: Mike Kravetz <mike.kravetz@oracle.com>
Date:   Tue Oct 4 18:17:05 2022 -0700

    hugetlb: fix vma lock handling during split vma and range unmapping

    Patch series "hugetlb: fixes for new vma lock series".

    In review of the series "hugetlb: Use new vma lock for huge pmd sharing
    synchronization", Miaohe Lin pointed out two key issues:

    1) There is a race in the routine hugetlb_unmap_file_folio when locks
       are dropped and reacquired in the correct order [1].

    2) With the switch to using vma lock for fault/truncate synchronization,
       we need to make sure lock exists for all VM_MAYSHARE vmas, not just
       vmas capable of pmd sharing.

    These two issues are addressed here.  In addition, having a vma lock
    present in all VM_MAYSHARE vmas, uncovered some issues around vma
    splitting.  Those are also addressed.

    [1] https://lore.kernel.org/linux-mm/01f10195-7088-4462-6def-909549c75ef4@huawei.com/

    This patch (of 3):

    The hugetlb vma lock hangs off the vm_private_data field and is specific
    to the vma.  When vm_area_dup() is called as part of vma splitting, the
    vma lock pointer is copied to the new vma.  This will result in issues
    such as double freeing of the structure.  Update the hugetlb open vm_ops
    to allocate a new vma lock for the new vma.

    The routine __unmap_hugepage_range_final unconditionally unset VM_MAYSHARE
    to prevent subsequent pmd sharing.  hugetlb_vma_lock_free attempted to
    anticipate this by checking both VM_MAYSHARE and VM_SHARED.  However, if
    only VM_MAYSHARE was set we would miss the free.  With the introduction of
    the vma lock, a vma can not participate in pmd sharing if vm_private_data
    is NULL.  Instead of clearing VM_MAYSHARE in __unmap_hugepage_range_final,
    free the vma lock to prevent sharing.  Also, update the sharing code to
    make sure vma lock is indeed a condition for pmd sharing.
    hugetlb_vma_lock_free can then key off VM_MAYSHARE and not miss any vmas.

    Link: https://lkml.kernel.org/r/20221005011707.514612-1-mike.kravetz@oracle.com
    Link: https://lkml.kernel.org/r/20221005011707.514612-2-mike.kravetz@oracle.com
    Fixes: "hugetlb: add vma based lock for pmd sharing"
    Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
    Cc: Axel Rasmussen <axelrasmussen@google.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Davidlohr Bueso <dave@stgolabs.net>
    Cc: James Houghton <jthoughton@google.com>
    Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Mina Almasry <almasrymina@google.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
    Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
    Cc: Sven Schnelle <svens@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:14:55 -04:00
Chris von Recklinghausen a43bab41ba mm/hugetlb: add available_huge_pages() func
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 8346d69d8bcb6c526a0d8bd126241dff41a60723
Author: Xin Hao <xhao@linux.alibaba.com>
Date:   Thu Sep 22 10:19:29 2022 +0800

    mm/hugetlb: add available_huge_pages() func

    In hugetlb.c there are several places which compare the values of
    'h->free_huge_pages' and 'h->resv_huge_pages', it looks a bit messy, so
    add a new available_huge_pages() function to do these.

    Link: https://lkml.kernel.org/r/20220922021929.98961-1-xhao@linux.alibaba.com
    Signed-off-by: Xin Hao <xhao@linux.alibaba.com>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Reviewed-by: Muchun Song <songmuchun@bytedance.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Oscar Salvador <osalvador@suse.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:14:53 -04:00
Chris von Recklinghausen d69f8317cf hugetlb: freeze allocated pages before creating hugetlb pages
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 2b21624fc23277553ef254b3ad02c37afa1c484d
Author: Mike Kravetz <mike.kravetz@oracle.com>
Date:   Fri Sep 16 14:46:38 2022 -0700

    hugetlb: freeze allocated pages before creating hugetlb pages

    When creating hugetlb pages, the hugetlb code must first allocate
    contiguous pages from a low level allocator such as buddy, cma or
    memblock.  The pages returned from these low level allocators are ref
    counted.  This creates potential issues with other code taking speculative
    references on these pages before they can be transformed to a hugetlb
    page.  This issue has been addressed with methods and code such as that
    provided in [1].

    Recent discussions about vmemmap freeing [2] have indicated that it would
    be beneficial to freeze all sub pages, including the head page of pages
    returned from low level allocators before converting to a hugetlb page.
    This helps avoid races if we want to replace the page containing vmemmap
    for the head page.

    There have been proposals to change at least the buddy allocator to return
    frozen pages as described at [3].  If such a change is made, it can be
    employed by the hugetlb code.  However, as mentioned above hugetlb uses
    several low level allocators so each would need to be modified to return
    frozen pages.  For now, we can manually freeze the returned pages.  This
    is done in two places:

    1) alloc_buddy_huge_page, only the returned head page is ref counted.
       We freeze the head page, retrying once in the VERY rare case where
       there may be an inflated ref count.
    2) prep_compound_gigantic_page, for gigantic pages the current code
       freezes all pages except the head page.  New code will simply freeze
       the head page as well.

    In a few other places, code checks for inflated ref counts on newly
    allocated hugetlb pages.  With the modifications to freeze after
    allocating, this code can be removed.

    After hugetlb pages are freshly allocated, they are often added to the
    hugetlb free lists.  Since these pages were previously ref counted, this
    was done via put_page() which would end up calling the hugetlb destructor:
    free_huge_page.  With changes to freeze pages, we simply call
    free_huge_page directly to add the pages to the free list.

    In a few other places, freshly allocated hugetlb pages were immediately
    put into use, and the expectation was they were already ref counted.  In
    these cases, we must manually ref count the page.

    [1] https://lore.kernel.org/linux-mm/20210622021423.154662-3-mike.kravetz@oracle.com/
    [2] https://lore.kernel.org/linux-mm/20220802180309.19340-1-joao.m.martins@oracle.com/
    [3] https://lore.kernel.org/linux-mm/20220809171854.3725722-1-willy@infradead.org/

    [mike.kravetz@oracle.com: fix NULL pointer dereference]
      Link: https://lkml.kernel.org/r/20220921202702.106069-1-mike.kravetz@oracle.com
    Link: https://lkml.kernel.org/r/20220916214638.155744-1-mike.kravetz@oracle.com
    Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
    Reviewed-by: Oscar Salvador <osalvador@suse.de>
    Reviewed-by: Muchun Song <songmuchun@bytedance.com>
    Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Joao Martins <joao.m.martins@oracle.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Peter Xu <peterx@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:14:48 -04:00