Centos-kernel-stream-9/Documentation/mm
Rafael Aquini aab4f9828f mm: fix race between __split_huge_pmd_locked() and GUP-fast
JIRA: https://issues.redhat.com/browse/RHEL-27745

This patch is a backport of the following upstream commit:
commit 3a5a8d343e1cf96eb9971b17cbd4b832ab19b8e7
Author: Ryan Roberts <ryan.roberts@arm.com>
Date:   Wed May 1 15:33:10 2024 +0100

    mm: fix race between __split_huge_pmd_locked() and GUP-fast

    __split_huge_pmd_locked() can be called for a present THP, devmap or
    (non-present) migration entry.  It calls pmdp_invalidate() unconditionally
    on the pmdp and only determines if it is present or not based on the
    returned old pmd.  This is a problem for the migration entry case because
    pmd_mkinvalid(), called by pmdp_invalidate() must only be called for a
    present pmd.

    On arm64 at least, pmd_mkinvalid() will mark the pmd such that any future
    call to pmd_present() will return true.  And therefore any lockless
    pgtable walker could see the migration entry pmd in this state and start
    interpretting the fields as if it were present, leading to BadThings (TM).
    GUP-fast appears to be one such lockless pgtable walker.

    x86 does not suffer the above problem, but instead pmd_mkinvalid() will
    corrupt the offset field of the swap entry within the swap pte.  See link
    below for discussion of that problem.

    Fix all of this by only calling pmdp_invalidate() for a present pmd.  And
    for good measure let's add a warning to all implementations of
    pmdp_invalidate[_ad]().  I've manually reviewed all other
    pmdp_invalidate[_ad]() call sites and believe all others to be conformant.

    This is a theoretical bug found during code review.  I don't have any test
    case to trigger it in practice.

    Link: https://lkml.kernel.org/r/20240501143310.1381675-1-ryan.roberts@arm.com
    Link: https://lore.kernel.org/all/0dd7827a-6334-439a-8fd0-43c98e6af22b@arm.com/
    Fixes: 84c3fc4e9c ("mm: thp: check pmd migration entry in common path")
    Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
    Reviewed-by: Zi Yan <ziy@nvidia.com>
    Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
    Acked-by: David Hildenbrand <david@redhat.com>
    Cc: Andreas Larsson <andreas@gaisler.com>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org>
    Cc: Borislav Petkov (AMD) <bp@alien8.de>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
    Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Naveen N. Rao <naveen.n.rao@linux.ibm.com>
    Cc: Nicholas Piggin <npiggin@gmail.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Sven Schnelle <svens@linux.ibm.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Will Deacon <will@kernel.org>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-12-09 12:25:09 -05:00
..
damon
active_mm.rst lazy tlb: allow lazy tlb mm refcounting to be configurable 2024-11-04 09:14:17 -05:00
arch_pgtable_helpers.rst mm: fix race between __split_huge_pmd_locked() and GUP-fast 2024-12-09 12:25:09 -05:00
balance.rst mm: discard __GFP_ATOMIC 2024-04-29 14:33:09 -04:00
bootmem.rst
free_page_reporting.rst
highmem.rst mm: add orphaned kernel-doc to the rst files. 2024-10-01 11:21:59 -04:00
hmm.rst
hugetlbfs_reserv.rst mm: convert free_huge_page() to free_huge_folio() 2024-10-01 11:21:47 -04:00
hwpoison.rst
index.rst mm: kill frontswap 2024-06-28 12:24:06 -04:00
ksm.rst ksm: add the ksm prefix to the names of the ksm private structures 2023-10-20 06:13:49 -04:00
memory-model.rst
mmu_notifier.rst
multigen_lru.rst mm: multi-gen LRU: improve design doc 2024-03-13 10:35:40 -06:00
numa.rst
oom.rst
overcommit-accounting.rst docs: mm: fix vm overcommit documentation for OVERCOMMIT_GUESS 2023-11-10 10:13:43 +01:00
page_allocation.rst
page_cache.rst
page_frags.rst net: remove gfp_mask from napi_alloc_skb() 2024-04-23 08:33:11 +02:00
page_migration.rst mm: convert migrate_pages() to work on folios 2024-09-05 20:35:25 -04:00
page_owner.rst tools/vm: rename tools/vm to tools/mm 2024-04-29 14:33:03 -04:00
page_reclaim.rst
page_table_check.rst mm/page_table_check: support userfault wr-protect entries 2024-12-09 12:25:08 -05:00
page_tables.rst
physical_memory.rst
process_addrs.rst
remap_file_pages.rst
shmfs.rst
slab.rst
slub.rst tools/vm: rename tools/vm to tools/mm 2024-04-29 14:33:03 -04:00
split_page_table_lock.rst mm: remove pgtable_{pmd, pte}_page_{ctor, dtor}() wrappers 2024-10-01 11:21:30 -04:00
swap.rst
transhuge.rst mm: convert deferred_split_huge_page() to deferred_split_folio() 2024-04-29 14:33:06 -04:00
unevictable-lru.rst shmem: add support to ignore swap 2024-04-30 07:00:13 -04:00
vmalloc.rst
vmalloced-kernel-stacks.rst
vmemmap_dedup.rst powerpc/book3s64/radix: add support for vmemmap optimization for radix 2024-10-01 11:19:58 -04:00
z3fold.rst
zsmalloc.rst mm: add orphaned kernel-doc to the rst files. 2024-10-01 11:21:59 -04:00