Commit Graph

833 Commits

Author SHA1 Message Date
Rafael Aquini 160e863cf9 mm: shmem: remove unnecessary warning in shmem_writepage()
JIRA: https://issues.redhat.com/browse/RHEL-84184

This patch is a backport of the following upstream commit:
commit adae46ac1e38a288b14f0298e27412adcba83f8e
Author: Ricardo Cañuelo Navarro <rcn@igalia.com>
Date:   Wed Feb 26 13:26:27 2025 +0100

    mm: shmem: remove unnecessary warning in shmem_writepage()

    Although the scenario where shmem_writepage() is called with info->flags &
    VM_LOCKED is unlikely to happen, it's still possible, as evidenced by
    syzbot [1].  However, the warning in this case isn't necessary because the
    situation is already handled correctly [2].

    [2] https://lore.kernel.org/lkml/8afe1f7f-31a2-4fc0-1fbd-f9ba8a116fe3@google.com/

    Link: https://lkml.kernel.org/r/20250226-20250221-warning-in-shmem_writepage-v1-1-5ad19420e17e@igalia.com
    Fixes: 9a976f0c847b ("shmem: skip page split if we're not reclaiming")
    Signed-off-by: Ricardo Cañuelo Navarro <rcn@igalia.com>
    Reported-by: Pengfei Xu <pengfei.xu@intel.com>
    Closes: https://lore.kernel.org/lkml/ZZ9PShXjKJkVelNm@xpf.sh.intel.com/ [1]
    Suggested-by: Hugh Dickins <hughd@google.com>
    Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
    Cc: Florent Revest <revest@chromium.org>
    Cc: Christian Brauner <brauner@kernel.org>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Davidlohr Bueso <dave@stgolabs.net>
    Cc: Florent Revest <revest@chromium.org>
    Cc: Luis Chamberalin <mcgrof@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2025-04-18 08:39:59 -04:00
Ryan Sullivan 2e2ebe63c2 fs: super_set_uuid()
JIRA: https://issues.redhat.com/browse/RHEL-8810

Some weird old filesytems have UUID-like things that we wish to expose
as UUIDs, but are smaller; add a length field so that the new
FS_IOC_(GET|SET)UUID ioctls can handle them in generic code.

And add a helper super_set_uuid(), for setting nonstandard length uuids.

Helper is now required for the new FS_IOC_GETUUID ioctl; if
super_set_uuid() hasn't been called, the ioctl won't be supported.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Link: https://lore.kernel.org/r/20240207025624.1019754-2-kent.overstreet@linux.dev
Signed-off-by: Christian Brauner <brauner@kernel.org>
(cherry picked from commit a4af51ce229b1e1eab003966dbfebf9d80093a77)
Signed-off-by: Ryan Sullivan <rysulliv@redhat.com>
2025-02-07 17:06:38 -05:00
Herton R. Krzesinski 30fd4705e8 mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling
JIRA: https://issues.redhat.com/browse/RHEL-68912
Conflicts: small context differences in headers inclusion and because
           we do not have "mm: mmap: map MAP_STACK to VM_NOHUGEPAGE"
           applied

commit 5baf8b037debf4ec60108ccfeccb8636d1dbad81
Author: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Date:   Tue Oct 29 18:11:47 2024 +0000

    mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling

    Currently MTE is permitted in two circumstances (desiring to use MTE
    having been specified by the VM_MTE flag) - where MAP_ANONYMOUS is
    specified, as checked by arch_calc_vm_flag_bits() and actualised by
    setting the VM_MTE_ALLOWED flag, or if the file backing the mapping is
    shmem, in which case we set VM_MTE_ALLOWED in shmem_mmap() when the mmap
    hook is activated in mmap_region().

    The function that checks that, if VM_MTE is set, VM_MTE_ALLOWED is also
    set is the arm64 implementation of arch_validate_flags().

    Unfortunately, we intend to refactor mmap_region() to perform this check
    earlier, meaning that in the case of a shmem backing we will not have
    invoked shmem_mmap() yet, causing the mapping to fail spuriously.

    It is inappropriate to set this architecture-specific flag in general mm
    code anyway, so a sensible resolution of this issue is to instead move the
    check somewhere else.

    We resolve this by setting VM_MTE_ALLOWED much earlier in do_mmap(), via
    the arch_calc_vm_flag_bits() call.

    This is an appropriate place to do this as we already check for the
    MAP_ANONYMOUS case here, and the shmem file case is simply a variant of
    the same idea - we permit RAM-backed memory.

    This requires a modification to the arch_calc_vm_flag_bits() signature to
    pass in a pointer to the struct file associated with the mapping, however
    this is not too egregious as this is only used by two architectures anyway
    - arm64 and parisc.

    So this patch performs this adjustment and removes the unnecessary
    assignment of VM_MTE_ALLOWED in shmem_mmap().

    [akpm@linux-foundation.org: fix whitespace, per Catalin]
    Link: https://lkml.kernel.org/r/ec251b20ba1964fb64cf1607d2ad80c47f3873df.1730224667.git.lorenzo.stoakes@oracle.com
    Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails")
    Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
    Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
    Reported-by: Jann Horn <jannh@google.com>
    Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
    Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
    Cc: Andreas Larsson <andreas@gaisler.com>
    Cc: David S. Miller <davem@davemloft.net>
    Cc: Helge Deller <deller@gmx.de>
    Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
    Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Mark Brown <broonie@kernel.org>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Will Deacon <will@kernel.org>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
2024-12-09 16:30:34 -03:00
Rafael Aquini d17d9c61a3 mm: revert "mm: shmem: fix data-race in shmem_getattr()"
JIRA: https://issues.redhat.com/browse/RHEL-27745
JIRA: https://issues.redhat.com/browse/RHEL-70053
CVE: CVE-2024-53136
Conflicts:
  * minor context difference due to RHEL9 missing upstream commit
    0d72b92883c6 ("fs: pass the request_mask to generic_fillattr")
    and its related series, as well as upstream commit e1e4cfd01a6e
    ("mm,tmpfs: consider end of file write in shmem_is_huge")

This patch is a backport of the following upstream commit:
commit d1aa0c04294e29883d65eac6c2f72fe95cc7c049
Author: Andrew Morton <akpm@linux-foundation.org>
Date:   Fri Nov 15 16:57:24 2024 -0800

    mm: revert "mm: shmem: fix data-race in shmem_getattr()"

    Revert d949d1d14fa2 ("mm: shmem: fix data-race in shmem_getattr()") as
    suggested by Chuck [1].  It is causing deadlocks when accessing tmpfs over
    NFS.

    As Hugh commented, "added just to silence a syzbot sanitizer splat: added
    where there has never been any practical problem".

    Link: https://lkml.kernel.org/r/ZzdxKF39VEmXSSyN@tissot.1015granger.net [1]
    Fixes: d949d1d14fa2 ("mm: shmem: fix data-race in shmem_getattr()")
    Acked-by: Hugh Dickins <hughd@google.com>
    Cc: Chuck Lever <chuck.lever@oracle.com>
    Cc: Jeongjun Park <aha310510@gmail.com>
    Cc: Yu Zhao <yuzhao@google.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-12-09 12:26:00 -05:00
Rafael Aquini 70afff39cc mm: shmem: fix data-race in shmem_getattr()
JIRA: https://issues.redhat.com/browse/RHEL-27745
JIRA: https://issues.redhat.com/browse/RHEL-66818
CVE: CVE-2024-50228
Conflicts:
  * minor context difference due to RHEL9 missing upstream commit
    0d72b92883c6 ("fs: pass the request_mask to generic_fillattr")
    and its related series, as well as upstream commit e1e4cfd01a6e
    ("mm,tmpfs: consider end of file write in shmem_is_huge")

This patch is a backport of the following upstream commit:
commit d949d1d14fa281ace388b1de978e8f2cd52875cf
Author: Jeongjun Park <aha310510@gmail.com>
Date:   Mon Sep 9 21:35:58 2024 +0900

    mm: shmem: fix data-race in shmem_getattr()

    I got the following KCSAN report during syzbot testing:

    ==================================================================
    BUG: KCSAN: data-race in generic_fillattr / inode_set_ctime_current

    write to 0xffff888102eb3260 of 4 bytes by task 6565 on cpu 1:
     inode_set_ctime_to_ts include/linux/fs.h:1638 [inline]
     inode_set_ctime_current+0x169/0x1d0 fs/inode.c:2626
     shmem_mknod+0x117/0x180 mm/shmem.c:3443
     shmem_create+0x34/0x40 mm/shmem.c:3497
     lookup_open fs/namei.c:3578 [inline]
     open_last_lookups fs/namei.c:3647 [inline]
     path_openat+0xdbc/0x1f00 fs/namei.c:3883
     do_filp_open+0xf7/0x200 fs/namei.c:3913
     do_sys_openat2+0xab/0x120 fs/open.c:1416
     do_sys_open fs/open.c:1431 [inline]
     __do_sys_openat fs/open.c:1447 [inline]
     __se_sys_openat fs/open.c:1442 [inline]
     __x64_sys_openat+0xf3/0x120 fs/open.c:1442
     x64_sys_call+0x1025/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:258
     do_syscall_x64 arch/x86/entry/common.c:52 [inline]
     do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83
     entry_SYSCALL_64_after_hwframe+0x76/0x7e

    read to 0xffff888102eb3260 of 4 bytes by task 3498 on cpu 0:
     inode_get_ctime_nsec include/linux/fs.h:1623 [inline]
     inode_get_ctime include/linux/fs.h:1629 [inline]
     generic_fillattr+0x1dd/0x2f0 fs/stat.c:62
     shmem_getattr+0x17b/0x200 mm/shmem.c:1157
     vfs_getattr_nosec fs/stat.c:166 [inline]
     vfs_getattr+0x19b/0x1e0 fs/stat.c:207
     vfs_statx_path fs/stat.c:251 [inline]
     vfs_statx+0x134/0x2f0 fs/stat.c:315
     vfs_fstatat+0xec/0x110 fs/stat.c:341
     __do_sys_newfstatat fs/stat.c:505 [inline]
     __se_sys_newfstatat+0x58/0x260 fs/stat.c:499
     __x64_sys_newfstatat+0x55/0x70 fs/stat.c:499
     x64_sys_call+0x141f/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:263
     do_syscall_x64 arch/x86/entry/common.c:52 [inline]
     do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83
     entry_SYSCALL_64_after_hwframe+0x76/0x7e

    value changed: 0x2755ae53 -> 0x27ee44d3

    Reported by Kernel Concurrency Sanitizer on:
    CPU: 0 UID: 0 PID: 3498 Comm: udevd Not tainted 6.11.0-rc6-syzkaller-00326-gd1f2d51b711a-dirty #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
    ==================================================================

    When calling generic_fillattr(), if you don't hold read lock, data-race
    will occur in inode member variables, which can cause unexpected
    behavior.

    Since there is no special protection when shmem_getattr() calls
    generic_fillattr(), data-race occurs by functions such as shmem_unlink()
    or shmem_mknod(). This can cause unexpected results, so commenting it out
    is not enough.

    Therefore, when calling generic_fillattr() from shmem_getattr(), it is
    appropriate to protect the inode using inode_lock_shared() and
    inode_unlock_shared() to prevent data-race.

    Link: https://lkml.kernel.org/r/20240909123558.70229-1-aha310510@gmail.com
    Fixes: 44a30220bc ("shmem: recalculate file inode when fstat")
    Signed-off-by: Jeongjun Park <aha310510@gmail.com>
    Reported-by: syzbot <syzkaller@googlegroup.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Yu Zhao <yuzhao@google.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-12-09 12:25:54 -05:00
Rafael Aquini cb26b8cc87 mm/shmem: inline shmem_is_huge() for disabled transparent hugepages
JIRA: https://issues.redhat.com/browse/RHEL-27745

This patch is a backport of the following upstream commit:
commit 1f737846aa3c45f07a06fa0d018b39e1afb8084a
Author: Sumanth Korikkar <sumanthk@linux.ibm.com>
Date:   Tue Apr 9 17:54:07 2024 +0200

    mm/shmem: inline shmem_is_huge() for disabled transparent hugepages

    In order to  minimize code size (CONFIG_CC_OPTIMIZE_FOR_SIZE=y),
    compiler might choose to make a regular function call (out-of-line) for
    shmem_is_huge() instead of inlining it. When transparent hugepages are
    disabled (CONFIG_TRANSPARENT_HUGEPAGE=n), it can cause compilation
    error.

    mm/shmem.c: In function `shmem_getattr':
    ./include/linux/huge_mm.h:383:27: note: in expansion of macro `BUILD_BUG'
      383 | #define HPAGE_PMD_SIZE ({ BUILD_BUG(); 0; })
          |                           ^~~~~~~~~
    mm/shmem.c:1148:33: note: in expansion of macro `HPAGE_PMD_SIZE'
     1148 |                 stat->blksize = HPAGE_PMD_SIZE;

    To prevent the possible error, always inline shmem_is_huge() when
    transparent hugepages are disabled.

    Link: https://lkml.kernel.org/r/20240409155407.2322714-1-sumanthk@linux.ibm.com
    Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
    Acked-by: David Hildenbrand <david@redhat.com>
    Cc: Alexander Gordeev <agordeev@linux.ibm.com>
    Cc: Heiko Carstens <hca@linux.ibm.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Ilya Leoshkevich <iii@linux.ibm.com>
    Cc: Vasily Gorbik <gor@linux.ibm.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-12-09 12:24:49 -05:00
Rafael Aquini fe6b91357e zswap: memcontrol: implement zswap writeback disabling
JIRA: https://issues.redhat.com/browse/RHEL-27745

This patch is a backport of the following upstream commit:
commit 501a06fe8e4c185bbda371b8cedbdf1b23a633d8
Author: Nhat Pham <nphamcs@gmail.com>
Date:   Thu Dec 7 11:24:06 2023 -0800

    zswap: memcontrol: implement zswap writeback disabling

    During our experiment with zswap, we sometimes observe swap IOs due to
    occasional zswap store failures and writebacks-to-swap.  These swapping
    IOs prevent many users who cannot tolerate swapping from adopting zswap to
    save memory and improve performance where possible.

    This patch adds the option to disable this behavior entirely: do not
    writeback to backing swapping device when a zswap store attempt fail, and
    do not write pages in the zswap pool back to the backing swap device (both
    when the pool is full, and when the new zswap shrinker is called).

    This new behavior can be opted-in/out on a per-cgroup basis via a new
    cgroup file.  By default, writebacks to swap device is enabled, which is
    the previous behavior.  Initially, writeback is enabled for the root
    cgroup, and a newly created cgroup will inherit the current setting of its
    parent.

    Note that this is subtly different from setting memory.swap.max to 0, as
    it still allows for pages to be stored in the zswap pool (which itself
    consumes swap space in its current form).

    This patch should be applied on top of the zswap shrinker series:

    https://lore.kernel.org/linux-mm/20231130194023.4102148-1-nphamcs@gmail.com/

    as it also disables the zswap shrinker, a major source of zswap
    writebacks.

    For the most part, this feature is motivated by internal parties who
    have already established their opinions regarding swapping - the
    workloads that are highly sensitive to IO, and especially those who are
    using servers with really slow disk performance (for instance, massive
    but slow HDDs).  For these folks, it's impossible to convince them to
    even entertain zswap if swapping also comes as a packaged deal.
    Writeback disabling is quite a useful feature in these situations - on
    a mixed workloads deployment, they can disable writeback for the more
    IO-sensitive workloads, and enable writeback for other background
    workloads.

    For instance, on a server with HDD, I allocate memories and populate
    them with random values (so that zswap store will always fail), and
    specify memory.high low enough to trigger reclaim.  The time it takes
    to allocate the memories and just read through it a couple of times
    (doing silly things like computing the values' average etc.):

    zswap.writeback disabled:
    real 0m30.537s
    user 0m23.687s
    sys 0m6.637s
    0 pages swapped in
    0 pages swapped out

    zswap.writeback enabled:
    real 0m45.061s
    user 0m24.310s
    sys 0m8.892s
    712686 pages swapped in
    461093 pages swapped out

    (the last two lines are from vmstat -s).

    [nphamcs@gmail.com: add a comment about recurring zswap store failures leading to reclaim inefficiency]
      Link: https://lkml.kernel.org/r/20231221005725.3446672-1-nphamcs@gmail.com
    Link: https://lkml.kernel.org/r/20231207192406.3809579-1-nphamcs@gmail.com
    Signed-off-by: Nhat Pham <nphamcs@gmail.com>
    Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
    Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
    Acked-by: Chris Li <chrisl@kernel.org>
    Cc: Dan Streetman <ddstreet@ieee.org>
    Cc: David Heidelberg <david@ixit.cz>
    Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Mike Rapoport (IBM) <rppt@kernel.org>
    Cc: Muchun Song <muchun.song@linux.dev>
    Cc: Roman Gushchin <roman.gushchin@linux.dev>
    Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
    Cc: Seth Jennings <sjenning@redhat.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Tejun Heo <tj@kernel.org>
    Cc: Vitaly Wool <vitaly.wool@konsulko.com>
    Cc: Zefan Li <lizefan.x@bytedance.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-12-09 12:24:12 -05:00
Rafael Aquini cffebe7f1d mm: convert swap_cluster_readahead and swap_vma_readahead to return a folio
JIRA: https://issues.redhat.com/browse/RHEL-27745

This patch is a backport of the following upstream commit:
commit a4575c4138db887bd27dc7f87cf7cfb0224c6f5e
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Wed Dec 13 21:58:42 2023 +0000

    mm: convert swap_cluster_readahead and swap_vma_readahead to return a folio

    shmem_swapin_cluster() immediately converts the page back to a folio, and
    swapin_readahead() may as well call folio_file_page() once instead of
    having each function call it.

    [willy@infradead.org: avoid NULL pointer deref]
      Link: https://lkml.kernel.org/r/ZYI7OcVlM1voKfBl@casper.infradead.org
    Link: https://lkml.kernel.org/r/20231213215842.671461-14-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-12-09 12:24:09 -05:00
Rafael Aquini f9e926534b mempolicy: alloc_pages_mpol() for NUMA policy without vma
JIRA: https://issues.redhat.com/browse/RHEL-27745
Conflicts:
  * mm/swap.h, mm/swap_state.c, and mm/zwap.c: minor context differences due to
    out-of-oder backport of commit a65b0e7607cc ("zswap: make shrinking memcg-aware")

This patch is a backport of the following upstream commit:
commit ddc1a5cbc05dc62743a2f409b96faa5cf95ba064
Author: Hugh Dickins <hughd@google.com>
Date:   Thu Oct 19 13:39:08 2023 -0700

    mempolicy: alloc_pages_mpol() for NUMA policy without vma

    Shrink shmem's stack usage by eliminating the pseudo-vma from its folio
    allocation.  alloc_pages_mpol(gfp, order, pol, ilx, nid) becomes the
    principal actor for passing mempolicy choice down to __alloc_pages(),
    rather than vma_alloc_folio(gfp, order, vma, addr, hugepage).

    vma_alloc_folio() and alloc_pages() remain, but as wrappers around
    alloc_pages_mpol().  alloc_pages_bulk_*() untouched, except to provide the
    additional args to policy_nodemask(), which subsumes policy_node().
    Cleanup throughout, cutting out some unhelpful "helpers".

    It would all be much simpler without MPOL_INTERLEAVE, but that adds a
    dynamic to the constant mpol: complicated by v3.6 commit 09c231cb8b
    ("tmpfs: distribute interleave better across nodes"), which added ino bias
    to the interleave, hidden from mm/mempolicy.c until this commit.

    Hence "ilx" throughout, the "interleave index".  Originally I thought it
    could be done just with nid, but that's wrong: the nodemask may come from
    the shared policy layer below a shmem vma, or it may come from the task
    layer above a shmem vma; and without the final nodemask then nodeid cannot
    be decided.  And how ilx is applied depends also on page order.

    The interleave index is almost always irrelevant unless MPOL_INTERLEAVE:
    with one exception in alloc_pages_mpol(), where the NO_INTERLEAVE_INDEX
    passed down from vma-less alloc_pages() is also used as hint not to use
    THP-style hugepage allocation - to avoid the overhead of a hugepage arg
    (though I don't understand why we never just added a GFP bit for THP - if
    it actually needs a different allocation strategy from other pages of the
    same order).  vma_alloc_folio() still carries its hugepage arg here, but
    it is not used, and should be removed when agreed.

    get_vma_policy() no longer allows a NULL vma: over time I believe we've
    eradicated all the places which used to need it e.g.  swapoff and madvise
    used to pass NULL vma to read_swap_cache_async(), but now know the vma.

    [hughd@google.com: handle NULL mpol being passed to __read_swap_cache_async()]
      Link: https://lkml.kernel.org/r/ea419956-4751-0102-21f7-9c93cb957892@google.com
    Link: https://lkml.kernel.org/r/74e34633-6060-f5e3-aee-7040d43f2e93@google.com
    Link: https://lkml.kernel.org/r/1738368e-bac0-fd11-ed7f-b87142a939fe@google.com
    Signed-off-by: Hugh Dickins <hughd@google.com>
    Cc: Andi Kleen <ak@linux.intel.com>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Cc: Huang Ying <ying.huang@intel.com>
    Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Nhat Pham <nphamcs@gmail.com>
    Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Cc: Suren Baghdasaryan <surenb@google.com>
    Cc: Tejun heo <tj@kernel.org>
    Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
    Cc: Yang Shi <shy828301@gmail.com>
    Cc: Yosry Ahmed <yosryahmed@google.com>
    Cc: Domenico Cerasuolo <mimmocerasuolo@gmail.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-12-09 12:23:16 -05:00
Rafael Aquini dce5c250e1 shmem: _add_to_page_cache() before shmem_inode_acct_blocks()
JIRA: https://issues.redhat.com/browse/RHEL-27745

This patch is a backport of the following upstream commit:
commit 3022fd7af9604d44ec43da8a4398872989599b18
Author: Hugh Dickins <hughd@google.com>
Date:   Fri Sep 29 20:32:40 2023 -0700

    shmem: _add_to_page_cache() before shmem_inode_acct_blocks()

    There has been a recurring problem, that when a tmpfs volume is being
    filled by racing threads, some fail with ENOSPC (or consequent SIGBUS or
    EFAULT) even though all allocations were within the permitted size.

    This was a problem since early days, but magnified and complicated by the
    addition of huge pages.  We have often worked around it by adding some
    slop to the tmpfs size, but it's hard to say how much is needed, and some
    users prefer not to do that e.g.  keeping sparse files in a tightly
    tailored tmpfs helps to prevent accidental writing to holes.

    This comes from the allocation sequence:
    1. check page cache for existing folio
    2. check and reserve from vm_enough_memory
    3. check and account from size of tmpfs
    4. if huge, check page cache for overlapping folio
    5. allocate physical folio, huge or small
    6. check and charge from mem cgroup limit
    7. add to page cache (but maybe another folio already got in).

    Concurrent tasks allocating at the same position could deplete the size
    allowance and fail.  Doing vm_enough_memory and size checks before the
    folio allocation was intentional (to limit the load on the page allocator
    from this source) and still has some virtue; but memory cgroup never did
    that, so I think it's better reordered to favour predictable behaviour.

    1. check page cache for existing folio
    2. if huge, check page cache for overlapping folio
    3. allocate physical folio, huge or small
    4. check and charge from mem cgroup limit
    5. add to page cache (but maybe another folio already got in)
    6. check and reserve from vm_enough_memory
    7. check and account from size of tmpfs.

    The folio lock held from allocation onwards ensures that the !uptodate
    folio cannot be used by others, and can safely be deleted from the cache
    if checks 6 or 7 subsequently fail (and those waiting on folio lock
    already check that the folio was not truncated once they get the lock);
    and the early addition to page cache ensures that racers find it before
    they try to duplicate the accounting.

    Seize the opportunity to tidy up shmem_get_folio_gfp()'s ENOSPC retrying,
    which can be combined inside the new shmem_alloc_and_add_folio(): doing 2
    splits twice (once huge, once nonhuge) is not exactly equivalent to trying
    5 splits (and giving up early on huge), but let's keep it simple unless
    more complication proves necessary.

    Userfaultfd is a foreign country: they do things differently there, and
    for good reason - to avoid mmap_lock deadlock.  Leave ordering in
    shmem_mfill_atomic_pte() untouched for now, but I would rather like to
    mesh it better with shmem_get_folio_gfp() in the future.

    Link: https://lkml.kernel.org/r/22ddd06-d919-33b-1219-56335c1bf28e@google.com
    Signed-off-by: Hugh Dickins <hughd@google.com>
    Cc: Axel Rasmussen <axelrasmussen@google.com>
    Cc: Carlos Maiolino <cem@kernel.org>
    Cc: Christian Brauner <brauner@kernel.org>
    Cc: Chuck Lever <chuck.lever@oracle.com>
    Cc: Darrick J. Wong <djwong@kernel.org>
    Cc: Dave Chinner <dchinner@redhat.com>
    Cc: Jan Kara <jack@suse.cz>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Tim Chen <tim.c.chen@intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-12-09 12:22:44 -05:00
Rafael Aquini a8e624508d shmem: move memcg charge out of shmem_add_to_page_cache()
JIRA: https://issues.redhat.com/browse/RHEL-27745

This patch is a backport of the following upstream commit:
commit 054a9f7ccd0a60607fb9bbe1e06ca671494971bf
Author: Hugh Dickins <hughd@google.com>
Date:   Fri Sep 29 20:31:27 2023 -0700

    shmem: move memcg charge out of shmem_add_to_page_cache()

    Extract shmem's memcg charging out of shmem_add_to_page_cache(): it's
    misleading done there, because many calls are dealing with a swapcache
    page, whose memcg is nowadays always remembered while swapped out, then
    the charge re-levied when it's brought back into swapcache.

    Temporarily move it back up to the shmem_get_folio_gfp() level, where the
    memcg was charged before v5.8; but the next commit goes on to move it back
    down to a new home.

    In making this change, it becomes clear that shmem_swapin_folio() does not
    need to know the vma, just the fault mm (if any): call it fault_mm rather
    than charge_mm - let mem_cgroup_charge() decide whom to charge.

    Link: https://lkml.kernel.org/r/4b2143c5-bf32-64f0-841-81a81158dac@google.com
    Signed-off-by: Hugh Dickins <hughd@google.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Cc: Axel Rasmussen <axelrasmussen@google.com>
    Cc: Carlos Maiolino <cem@kernel.org>
    Cc: Christian Brauner <brauner@kernel.org>
    Cc: Chuck Lever <chuck.lever@oracle.com>
    Cc: Darrick J. Wong <djwong@kernel.org>
    Cc: Dave Chinner <dchinner@redhat.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Tim Chen <tim.c.chen@intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-12-09 12:22:43 -05:00
Rafael Aquini e313de880d shmem: shmem_acct_blocks() and shmem_inode_acct_blocks()
JIRA: https://issues.redhat.com/browse/RHEL-27745

This patch is a backport of the following upstream commit:
commit 4199f51a7eb2054d68964efbd8d39c68053a8714
Author: Hugh Dickins <hughd@google.com>
Date:   Fri Sep 29 20:30:03 2023 -0700

    shmem: shmem_acct_blocks() and shmem_inode_acct_blocks()

    By historical accident, shmem_acct_block() and shmem_inode_acct_block()
    were never pluralized when the pages argument was added, despite their
    complements being shmem_unacct_blocks() and shmem_inode_unacct_blocks()
    all along.  It has been an irritation: fix their naming at last.

    Link: https://lkml.kernel.org/r/9124094-e4ab-8be7-ef80-9a87bdc2e4fc@google.com
    Signed-off-by: Hugh Dickins <hughd@google.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Cc: Axel Rasmussen <axelrasmussen@google.com>
    Cc: Carlos Maiolino <cem@kernel.org>
    Cc: Christian Brauner <brauner@kernel.org>
    Cc: Chuck Lever <chuck.lever@oracle.com>
    Cc: Darrick J. Wong <djwong@kernel.org>
    Cc: Dave Chinner <dchinner@redhat.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Tim Chen <tim.c.chen@intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-12-09 12:22:43 -05:00
Rafael Aquini 85fde1fa80 shmem: factor shmem_falloc_wait() out of shmem_fault()
JIRA: https://issues.redhat.com/browse/RHEL-27745

This patch is a backport of the following upstream commit:
commit f0a9ad1d4d9ba3c694bca91d8d67be9a4a33b902
Author: Hugh Dickins <hughd@google.com>
Date:   Fri Sep 29 20:27:53 2023 -0700

    shmem: factor shmem_falloc_wait() out of shmem_fault()

    That Trinity livelock shmem_falloc avoidance block is unlikely, and a
    distraction from the proper business of shmem_fault(): separate it out.
    (This used to help compilers save stack on the fault path too, but both
    gcc and clang nowadays seem to make better choices anyway.)

    Link: https://lkml.kernel.org/r/6fe379a4-6176-9225-9263-fe60d2633c0@google.com
    Signed-off-by: Hugh Dickins <hughd@google.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Cc: Axel Rasmussen <axelrasmussen@google.com>
    Cc: Carlos Maiolino <cem@kernel.org>
    Cc: Christian Brauner <brauner@kernel.org>
    Cc: Chuck Lever <chuck.lever@oracle.com>
    Cc: Darrick J. Wong <djwong@kernel.org>
    Cc: Dave Chinner <dchinner@redhat.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Tim Chen <tim.c.chen@intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-12-09 12:22:42 -05:00
Rafael Aquini d6ca007045 shmem: remove vma arg from shmem_get_folio_gfp()
JIRA: https://issues.redhat.com/browse/RHEL-27745

This patch is a backport of the following upstream commit:
commit e3e1a5067fd2f1b3f4f7c651f5b33082962d1aa1
Author: Hugh Dickins <hughd@google.com>
Date:   Fri Sep 29 20:26:53 2023 -0700

    shmem: remove vma arg from shmem_get_folio_gfp()

    The vma is already there in vmf->vma, so no need for a separate arg.

    Link: https://lkml.kernel.org/r/d9ce6f65-a2ed-48f4-4299-fdb0544875c5@google.com
    Signed-off-by: Hugh Dickins <hughd@google.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Cc: Axel Rasmussen <axelrasmussen@google.com>
    Cc: Carlos Maiolino <cem@kernel.org>
    Cc: Christian Brauner <brauner@kernel.org>
    Cc: Chuck Lever <chuck.lever@oracle.com>
    Cc: Darrick J. Wong <djwong@kernel.org>
    Cc: Dave Chinner <dchinner@redhat.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Tim Chen <tim.c.chen@intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-12-09 12:22:41 -05:00
Rafael Aquini 2495138fda shmem: Refactor shmem_symlink()
JIRA: https://issues.redhat.com/browse/RHEL-27745

This patch is a backport of the following upstream commit:
commit 23a31d87645c652734f89f477f69ddac9aa402cb
Author: Chuck Lever <chuck.lever@oracle.com>
Date:   Fri Jun 30 13:48:56 2023 -0400

    shmem: Refactor shmem_symlink()

    De-duplicate the error handling paths. No change in behavior is
    expected.

    Suggested-by: Jeff Layton <jlayton@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
    Message-Id: <168814733654.530310.9958360833543413152.stgit@manet.1015granger.net>
    Signed-off-by: Christian Brauner <brauner@kernel.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-12-09 12:21:44 -05:00
Rado Vrbovsky c154c6dc53 Merge: fs: backport mnt_idmap type
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4324

JIRA: https://issues.redhat.com/browse/RHEL-33888

This MR back ports idmapping changes to sync. our RHEL-9 kernel with the
upstream kernel to version 6.3.

Our current kernel has idmapped mounts support but there have been many
changes since this initial implementation in the base kernel. In
particular we need the type safety changes and we have seen difficulty
back porting other requested changes on more than one occassion.

The Jira this MR has been raised for is arother example of such a request.

It is needed for a back port of a BPF feature to RHEL 9 which allows BPF
programs to do file verification with LSM and fsverity. To satisfy this
request changes made in the upstream 6.3 kernel are needed which is the
reason we have chosen upstream 6.3 as the target release for the MR.

The first fix has been omitted because it appears to be the same as
24b5308cf5ee ("selftests/filesystems: grant executable permission to
run_fat_tests.sh"). In any case the requirement is to make the path
tools/testing/selftests/filesystems/fat/run_fat_tests.sh executable which
is done.

The second and third Omitted patches are a straight apply and revert leaving
the source unchanged.

Omitted-Fix: 1d4beeb4edc7 ("selftests/filesystems: grant executable permission to run_fat_tests.sh")

Omitted-Fix: 4a47c6385bb4 ovl: turn of SB_POSIXACL with idmapped layers temporarily

Omitted-Fix: 7c4d37c269ac Revert "ovl: turn of SB_POSIXACL with idmapped layers temporarily"

Signed-off-by: Ian Kent <ikent@redhat.com>

Approved-by: Scott Mayhew <smayhew@redhat.com>
Approved-by: Chris von Recklinghausen <crecklin@redhat.com>
Approved-by: Xin Long <lxin@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2024-11-11 08:26:30 +00:00
Rado Vrbovsky 570a71d7db Merge: mm: update core code to v6.6 upstream
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5252

JIRA: https://issues.redhat.com/browse/RHEL-27743  
JIRA: https://issues.redhat.com/browse/RHEL-59459    
CVE: CVE-2024-46787    
Depends: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4961  
  
This MR brings RHEL9 core MM code up to upstream's v6.6 LTS level.    
This work follows up on the previous v6.5 update (RHEL-27742) and as such,    
the bulk of this changeset is comprised of refactoring and clean-ups of     
the internal implementation of several APIs as it further advances the     
conversion to FOLIOS, and follow up on the per-VMA locking changes.

Also, with the rebase to v6.6 LTS, we complete the infrastructure to allow    
Control-flow Enforcement Technology, a.k.a. Shadow Stacks, for x86 builds,    
and we add a potential extra level of protection (assessment pending) to help    
on mitigating kernel heap exploits dubbed as "SlubStick".     
    
Follow-up fixes are omitted from this series either because they are irrelevant to     
the bits we support on RHEL or because they depend on bigger changesets introduced     
upstream more recently. A follow-up ticket (RHEL-27745) will deal with these and other cases separately.    

Omitted-fix: e540b8c5da04 ("mips: mm: add slab availability checking in ioremap_prot")    
Omitted-fix: f7875966dc0c ("tools headers UAPI: Sync files changed by new fchmodat2 and map_shadow_stack syscalls with the kernel sources")   
Omitted-fix: df39038cd895 ("s390/mm: Fix VM_FAULT_HWPOISON handling in do_exception()")    
Omitted-fix: 12bbaae7635a ("mm: create FOLIO_FLAG_FALSE and FOLIO_TYPE_OPS macros")    
Omitted-fix: fd1a745ce03e ("mm: support page_mapcount() on page_has_type() pages")    
Omitted-fix: d99e3140a4d3 ("mm: turn folio_test_hugetlb into a PageType")    
Omitted-fix: fa2690af573d ("mm: page_ref: remove folio_try_get_rcu()")    
Omitted-fix: f442fa614137 ("mm: gup: stop abusing try_grab_folio")    
Omitted-fix: cb0f01beb166 ("mm/mprotect: fix dax pud handling")    
    
Signed-off-by: Rafael Aquini <raquini@redhat.com>

Approved-by: John W. Linville <linville@redhat.com>
Approved-by: Mark Salter <msalter@redhat.com>
Approved-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Approved-by: Chris von Recklinghausen <crecklin@redhat.com>
Approved-by: Steve Best <sbest@redhat.com>
Approved-by: David Airlie <airlied@redhat.com>
Approved-by: Michal Schmidt <mschmidt@redhat.com>
Approved-by: Baoquan He <5820488-baoquan_he@users.noreply.gitlab.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2024-10-30 07:22:28 +00:00
Ian Kent 6836d0308d fs: port i_{g,u}id_{needs_}update() to mnt_idmap
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus

Conflicts: Update to add incremental changes needed due to CentOS Stream
	commit 469e1d13f6 ("shmem: quota support").

commit 0dbe12f2e49c046444461b5f4be49df2cafb3a40
Author: Christian Brauner <brauner@kernel.org>
Date:   Fri Jan 13 12:49:29 2023 +0100

    fs: port i_{g,u}id_{needs_}update() to mnt_idmap

    Convert to struct mnt_idmap.

    Last cycle we merged the necessary infrastructure in
    256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
    This is just the conversion to struct mnt_idmap.

    Currently we still pass around the plain namespace that was attached to a
    mount. This is in general pretty convenient but it makes it easy to
    conflate namespaces that are relevant on the filesystem with namespaces
    that are relevent on the mount level. Especially for non-vfs developers
    without detailed knowledge in this area this can be a potential source for
    bugs.

    Once the conversion to struct mnt_idmap is done all helpers down to the
    really low-level helpers will take a struct mnt_idmap argument instead of
    two namespace arguments. This way it becomes impossible to conflate the two
    eliminating the possibility of any bugs. All of the vfs and all filesystems
    only operate on struct mnt_idmap.

    Acked-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>

Signed-off-by: Ian Kent <ikent@redhat.com>
2024-10-16 10:45:32 +08:00
Ian Kent 95a4490e2f quota: port to mnt_idmap
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus

Conflicts: Hunk #1 against fs/f2fs/file.c failed but I cannot see any
	reason for it, manually apply change.
	Update to add incremental changes needed due to CentOS Stream commit 469e1d13f6
	("shmem: quota support").

commit f861646a65623bcff91d544acbc4413d62d97b79
Author: Christian Brauner <brauner@kernel.org>
Date:   Fri Jan 13 12:49:28 2023 +0100

    quota: port to mnt_idmap

    Convert to struct mnt_idmap.

    Last cycle we merged the necessary infrastructure in
    256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
    This is just the conversion to struct mnt_idmap.

    Currently we still pass around the plain namespace that was attached to a
    mount. This is in general pretty convenient but it makes it easy to
    conflate namespaces that are relevant on the filesystem with namespaces
    that are relevent on the mount level. Especially for non-vfs developers
    without detailed knowledge in this area this can be a potential source for
    bugs.

    Once the conversion to struct mnt_idmap is done all helpers down to the
    really low-level helpers will take a struct mnt_idmap argument instead of
    two namespace arguments. This way it becomes impossible to conflate the two
    eliminating the possibility of any bugs. All of the vfs and all filesystems
    only operate on struct mnt_idmap.

    Acked-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>

Signed-off-by: Ian Kent <ikent@redhat.com>
2024-10-16 10:45:31 +08:00
Ian Kent 2171c567b5 fs: port inode_init_owner() to mnt_idmap
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus

Conflicts: For consistency drop btrfs hunks because it isn't supported in
	CentOS Stream and other backports also drop such hunks.
	CentOS Stream does not have upstream commit 3db1de0e582c3 ("f2fs:
	change the current atomic write way") so there is no call to
	f2fs_get_tmpfile() in f2fs_ioc_start_atomic_write() to change.
	The above patch also adds the definition of f2fs_get_tmpfile()
	to fs/f2fs/f2fs.h so it's not there to change resulting in a
	hunk reject for fs/f2fs/f2fs.h.
        Upstream commit 787caf1bdcd9f ("f2fs: fix to enable compress for
        newly created file if extension matches") is not present in CentOS
        Stream resulting in a number of rejects against fs/f2fs/namei.c,
        manually apply these changes.
	Dropped hunks for ntfs3 because the source is not present in
	the CentOS Stream source tree.
	CentOS Stream commit 892da692fa ("shmem: support idmapped
	mounts for tmpfs") which causes a reject in fs/shmem.c, manually
	apply the hunk (note: taking account of these changes at the times
	they are needed will result in an updated mm/shmem.c once this
	series is completed).
	Update to add incremental changes needed due to CentOS Stream
	commit 469e1d13f6 ("shmem: quota support").

commit f2d40141d5d90b882e2c35b226f9244a63b82b6e
Author: Christian Brauner <brauner@kernel.org>
Date:   Fri Jan 13 12:49:25 2023 +0100

    fs: port inode_init_owner() to mnt_idmap

    Convert to struct mnt_idmap.

    Last cycle we merged the necessary infrastructure in
    256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
    This is just the conversion to struct mnt_idmap.

    Currently we still pass around the plain namespace that was attached to a
    mount. This is in general pretty convenient but it makes it easy to
    conflate namespaces that are relevant on the filesystem with namespaces
    that are relevent on the mount level. Especially for non-vfs developers
    without detailed knowledge in this area this can be a potential source for
    bugs.

    Once the conversion to struct mnt_idmap is done all helpers down to the
    really low-level helpers will take a struct mnt_idmap argument instead of
    two namespace arguments. This way it becomes impossible to conflate the two
    eliminating the possibility of any bugs. All of the vfs and all filesystems
    only operate on struct mnt_idmap.

    Acked-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>

Signed-off-by: Ian Kent <ikent@redhat.com>
2024-10-16 10:45:26 +08:00
Ian Kent 92d69b838d fs: port xattr to mnt_idmap
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus

Conflicts: The cifs source has been moved in CentOS Stream so manually
	apply rejected hunk to fs/smb/client/xattr.c.
        Dropped hunks for ntfs3 because the source is not present in
        the CentOS Stream source tree.
	CentOS Stream commit 98ba731fc7 ("ovl: Move xattr support
	to new xattrs.c file") moved ovl_own_xattr_set(), manually apply
	changes.
	CentOS Stream commit 67e2fcb2f3 ("evm: don't copy up
	'security.evm' xattr") is present causing hunk #1 against
	include/linux/evm.h to be rejected, manually apply.
	Upstream commit 5d1ef2ce13a90 ("ima: Introduce
	ima_get_current_hash_algo()") is not present in CentOS Stream
	which causes fuzz 1 for hunk #1 against include/linux/ima.h.
	There's a reject of hunk #1 for include/linux/lsm_hooks.h but
	I can't see any reason for it, manually applied the hunk.
	CentOS Stream does not have upstream commit ce5bb5a86e5eb
	("ima: Return int in the functions to measure a buffer") which
	results in a reject of hunk #2 against security/integrity/ima/ima.h
	and hunks #8 and #11 against security/integrity/ima/ima_main.c, so
	manually apply hunks. There also appears to be a whitespace
	mismatch causing hunk #7 to report fuzz 2 on application.
	CentOS Stream does not have upstream commit c7423dbdbc9ec
	("ima: Handle -ESTALE returned by ima_filter_rule_match()")
	which results in a reject of hunk #3 against
	security/integrity/ima/ima_policy.c, so manually apply hunk.

commit 39f60c1ccee72caa0104145b5dbf5d37cce1ea39
Author: Christian Brauner <brauner@kernel.org>
Date:   Fri Jan 13 12:49:23 2023 +0100

    fs: port xattr to mnt_idmap

    Convert to struct mnt_idmap.

    Last cycle we merged the necessary infrastructure in
    256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
    This is just the conversion to struct mnt_idmap.

    Currently we still pass around the plain namespace that was attached to a
    mount. This is in general pretty convenient but it makes it easy to
    conflate namespaces that are relevant on the filesystem with namespaces
    that are relevent on the mount level. Especially for non-vfs developers
    without detailed knowledge in this area this can be a potential source for
    bugs.

    Once the conversion to struct mnt_idmap is done all helpers down to the
    really low-level helpers will take a struct mnt_idmap argument instead of
    two namespace arguments. This way it becomes impossible to conflate the two
    eliminating the possibility of any bugs. All of the vfs and all filesystems
    only operate on struct mnt_idmap.

    Acked-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>

Signed-off-by: Ian Kent <ikent@redhat.com>
2024-10-16 10:45:21 +08:00
Ian Kent 060dc0b240 fs: port ->fileattr_set() to pass mnt_idmap
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus

Conflicts: For consistency drop btrfs hunks because it isn't supported in
	CentOS Stream and other backports also drop such hunks.

commit 8782a9aea3ab4d697ad67d1f8ebca38a4e1c24ab
Author: Christian Brauner <brauner@kernel.org>
Date:   Fri Jan 13 12:49:21 2023 +0100

    fs: port ->fileattr_set() to pass mnt_idmap

    Convert to struct mnt_idmap.

    Last cycle we merged the necessary infrastructure in
    256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
    This is just the conversion to struct mnt_idmap.

    Currently we still pass around the plain namespace that was attached to a
    mount. This is in general pretty convenient but it makes it easy to
    conflate namespaces that are relevant on the filesystem with namespaces
    that are relevent on the mount level. Especially for non-vfs developers
    without detailed knowledge in this area this can be a potential source for
    bugs.

    Once the conversion to struct mnt_idmap is done all helpers down to the
    really low-level helpers will take a struct mnt_idmap argument instead of
    two namespace arguments. This way it becomes impossible to conflate the two
    eliminating the possibility of any bugs. All of the vfs and all filesystems
    only operate on struct mnt_idmap.

    Acked-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>

Signed-off-by: Ian Kent <ikent@redhat.com>
2024-10-16 10:45:18 +08:00
Ian Kent be97228574 fs: port ->set_acl() to pass mnt_idmap
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus

Conflicts: For consistency drop btrfs hunks because it isn't supported in
	CentOS Stream and other backports also drop such hunks.
	The cifs source has been moved in CentOS Stream so manually
	apply rejected hunks to fs/smb/client/cifsacl.c and
	fs/smb/client/cifsproto.h.
	Dropped hunks for ntfs3 and ksmbd because the source is not
	present in the CentOS Stream source tree.
	CentOS Stream commit 892da692fa ("shmem: support idmapped
	mounts for tmpfs") is present, which cuases hunk #1 against
	mm/shmem.c to be rejected, manually apply the hunk.
	CentOS Stream commit 48fa94aacd ("ceph: fscrypt_auth handling
	for ceph") is present which causes fuzz 1 of hunk #1 against
	fs/ceph/inode.c.

commit 13e83a4923bea7c4f2f6714030cb7e56d20ef7e5
Author: Christian Brauner <brauner@kernel.org>
Date:   Fri Jan 13 12:49:20 2023 +0100

    fs: port ->set_acl() to pass mnt_idmap

    Convert to struct mnt_idmap.

    Last cycle we merged the necessary infrastructure in
    256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
    This is just the conversion to struct mnt_idmap.

    Currently we still pass around the plain namespace that was attached to a
    mount. This is in general pretty convenient but it makes it easy to
    conflate namespaces that are relevant on the filesystem with namespaces
    that are relevent on the mount level. Especially for non-vfs developers
    without detailed knowledge in this area this can be a potential source for
    bugs.

    Once the conversion to struct mnt_idmap is done all helpers down to the
    really low-level helpers will take a struct mnt_idmap argument instead of
    two namespace arguments. This way it becomes impossible to conflate the two
    eliminating the possibility of any bugs. All of the vfs and all filesystems
    only operate on struct mnt_idmap.

    Acked-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>

Signed-off-by: Ian Kent <ikent@redhat.com>
2024-10-16 10:45:12 +08:00
Ian Kent 0dcf7b37eb fs: port ->tmpfile() to pass mnt_idmap
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus

Conflicts: For consistency drop btrfs hunks because it isn't supported in
	CentOS Stream and other backports also drop such hunks.
	Upstream commit 863f144f12add ("vfs: open inside ->tmpfile()") is
	not present which caused a reject in fs/f2fs/namei.c for hunk #1,
	applied manually.
	The hunk of the patch against fs/minix/namei.c was rejected but I
	can't see any reason for it, applied manually.
	CentOS Stream has commit 9e0a1fff8d ("ubifs: Implement
	RENAME_WHITEOUT") which caused a reject in the hunk against
	fs/ubifs/dir.c, manually applied.

commit 011e2b717b1b921d3706a9d48ff83a025563e826
Author: Christian Brauner <brauner@kernel.org>
Date:   Fri Jan 13 12:49:18 2023 +0100

    fs: port ->tmpfile() to pass mnt_idmap

    Convert to struct mnt_idmap.

    Last cycle we merged the necessary infrastructure in
    256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
    This is just the conversion to struct mnt_idmap.

    Currently we still pass around the plain namespace that was attached to a
    mount. This is in general pretty convenient but it makes it easy to
    conflate namespaces that are relevant on the filesystem with namespaces
    that are relevent on the mount level. Especially for non-vfs developers
    without detailed knowledge in this area this can be a potential source for
    bugs.

    Once the conversion to struct mnt_idmap is done all helpers down to the
    really low-level helpers will take a struct mnt_idmap argument instead of
    two namespace arguments. This way it becomes impossible to conflate the two
    eliminating the possibility of any bugs. All of the vfs and all filesystems
    only operate on struct mnt_idmap.

    Acked-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>

Signed-off-by: Ian Kent <ikent@redhat.com>
2024-10-16 10:45:10 +08:00
Ian Kent 956e3ad810 fs: port ->mknod() to pass mnt_idmap
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus

Conflicts: For consistency drop btrfs hunks because it isn't supported in
	CentOS Stream and other backports also drop such hunks.
	The cifs source has been moved in CentOS Stream so manually
	apply rejected hunks to fs/smb/client/cifsfs.h and
	fs/smb/client/dir.c.
	Dropped hunks for ntfs3 because the source is not present in the
	CentOS Stream source tree.
	CentOS Stream commit 892da692fa ("shmem: support idmapped
	mounts for tmpfs") is present, which cuases hunks #2-#4 to be
	rejected, manually apply the hunks.
	CentOS Stream commit f0f830cd7e ("ceph: create symlinks with
	encrypted and base64-encoded targets") is present and resulted
	in fuzz against fs/ceph/dir.c hunk #2.
	Upstream commit 863f144f12add ("vfs: open inside ->tmpfile()")
	is missing causing fuzz against fs/ext2/namei.c.
	Upstream commit 7d37539037c2f ("fuse: implement ->tmpfile()")
	is missing causing fuzz in hunk #4 against fs/fuse/dir.c.
	CentOS Stream commit 892da692fa ("shmem: support idmapped
	mounts for tmpfs") is present, so a patch reorder was needed
	with appropriate adjustments.

commit 5ebb29bee8d5fc173b774e0755be8cb335503ee3
Author: Christian Brauner <brauner@kernel.org>
Date:   Fri Jan 13 12:49:16 2023 +0100

    fs: port ->mknod() to pass mnt_idmap

    Convert to struct mnt_idmap.

    Last cycle we merged the necessary infrastructure in
    256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
    This is just the conversion to struct mnt_idmap.

    Currently we still pass around the plain namespace that was attached to a
    mount. This is in general pretty convenient but it makes it easy to
    conflate namespaces that are relevant on the filesystem with namespaces
    that are relevent on the mount level. Especially for non-vfs developers
    without detailed knowledge in this area this can be a potential source for
    bugs.

    Once the conversion to struct mnt_idmap is done all helpers down to the
    really low-level helpers will take a struct mnt_idmap argument instead of
    two namespace arguments. This way it becomes impossible to conflate the two
    eliminating the possibility of any bugs. All of the vfs and all filesystems
    only operate on struct mnt_idmap.

    Acked-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>

Signed-off-by: Ian Kent <ikent@redhat.com>
2024-10-16 10:45:08 +08:00
Ian Kent 19f3b4f1ba fs: port ->rename() to pass mnt_idmap
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus

Conflicts: For consistency drop btrfs hunks because it isn't supported in
	CentOS Stream and other backports also drop such hunks.
	The cifs source has been moved in CentOS Stream so manually
	apply rejected hunks to fs/smb/client/cifsfs.h and
	fs/smb/client/inode.c.
	Dropped hunks for ntfs3 because the source is not present in the
	CentOS Stream source tree.
	Upstream commit cc14d24026704 ("hpfs: Convert symlinks to
	read_folio") is not present which causes fuzz 1 for hunk #1.
	CentOS Stream commit 892da692fa ("shmem: support idmapped
	mounts for tmpfs") is present, so a patch reorder was needed
	with appropriate adjustments.

commit e18275ae55e07a2937e48134589c2f4c1d99a369
Author: Christian Brauner <brauner@kernel.org>
Date:   Fri Jan 13 12:49:17 2023 +0100

    fs: port ->rename() to pass mnt_idmap

    Convert to struct mnt_idmap.

    Last cycle we merged the necessary infrastructure in
    256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
    This is just the conversion to struct mnt_idmap.

    Currently we still pass around the plain namespace that was attached to a
    mount. This is in general pretty convenient but it makes it easy to
    conflate namespaces that are relevant on the filesystem with namespaces
    that are relevent on the mount level. Especially for non-vfs developers
    without detailed knowledge in this area this can be a potential source for
    bugs.

    Once the conversion to struct mnt_idmap is done all helpers down to the
    really low-level helpers will take a struct mnt_idmap argument instead of
    two namespace arguments. This way it becomes impossible to conflate the two
    eliminating the possibility of any bugs. All of the vfs and all filesystems
    only operate on struct mnt_idmap.

    Acked-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>

Signed-off-by: Ian Kent <ikent@redhat.com>
2024-10-16 10:45:07 +08:00
Ian Kent a7750be4f4 fs: port ->mkdir() to pass mnt_idmap
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus

Conflicts: For consistency drop btrfs hunks because it isn't supported in
	CentOS Stream and other backports also drop such hunks.
	The cifs source has been moved in CentOS Stream so manually
	apply rejected hunks to fs/smb/client/cifsfs.h and
	fs/smb/client/inode.c.
	Dropped hunks for ntfs3 because the source is not present in the
	CentOS Stream source tree.

commit c54bd91e9eaba43f09aadc25b52ea869ff3b5587
Author: Christian Brauner <brauner@kernel.org>
Date:   Fri Jan 13 12:49:15 2023 +0100

    fs: port ->mkdir() to pass mnt_idmap

    Convert to struct mnt_idmap.

    Last cycle we merged the necessary infrastructure in
    256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
    This is just the conversion to struct mnt_idmap.

    Currently we still pass around the plain namespace that was attached to a
    mount. This is in general pretty convenient but it makes it easy to
    conflate namespaces that are relevant on the filesystem with namespaces
    that are relevent on the mount level. Especially for non-vfs developers
    without detailed knowledge in this area this can be a potential source for
    bugs.

    Once the conversion to struct mnt_idmap is done all helpers down to the
    really low-level helpers will take a struct mnt_idmap argument instead of
    two namespace arguments. This way it becomes impossible to conflate the two
    eliminating the possibility of any bugs. All of the vfs and all filesystems
    only operate on struct mnt_idmap.

    Acked-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>

Signed-off-by: Ian Kent <ikent@redhat.com>
2024-10-16 10:45:00 +08:00
Ian Kent 5744ba0ee3 fs: port ->symlink() to pass mnt_idmap
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus

Conflicts: The cifs source has been moved in CentOS Stream so manually
	apply rejected hunks to fs/smb/client/cifsfs.h and
	fs/smb/client/link.c.
	Dropped hunks for ntfs3 because the source is not present in the
	CentOS Stream source tree.
	CentOS Stream commit f0f830cd7e ("ceph: create symlinks with
	encrypted and base64-encoded targets") is present and resulted
	in fuzz against fs/ceph/dir.c.

commit 7a77db95511c39be4b2db2ceca152ef589adc2dc
Author: Christian Brauner <brauner@kernel.org>
Date:   Fri Jan 13 12:49:14 2023 +0100

    fs: port ->symlink() to pass mnt_idmap

    Convert to struct mnt_idmap.

    Last cycle we merged the necessary infrastructure in
    256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
    This is just the conversion to struct mnt_idmap.

    Currently we still pass around the plain namespace that was attached to a
    mount. This is in general pretty convenient but it makes it easy to
    conflate namespaces that are relevant on the filesystem with namespaces
    that are relevent on the mount level. Especially for non-vfs developers
    without detailed knowledge in this area this can be a potential source for
    bugs.

    Once the conversion to struct mnt_idmap is done all helpers down to the
    really low-level helpers will take a struct mnt_idmap argument instead of
    two namespace arguments. This way it becomes impossible to conflate the two
    eliminating the possibility of any bugs. All of the vfs and all filesystems
    only operate on struct mnt_idmap.

    Acked-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>

Signed-off-by: Ian Kent <ikent@redhat.com>
2024-10-16 10:45:00 +08:00
Ian Kent a56d1daadf fs: port ->create() to pass mnt_idmap
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus

Conflicts: For consistency drop btrfs hunks because it isn't supported in
	CentOS Stream and other backports also drop such hunks.
	The cifs source has been moved in CentOS Stream so manually
	apply rejected hunks to fs/smb/client/cifsfs.h and
	fs/smb/client/dir.c.
	Dropped hunks for ntfs3 because the source is not present in the
	CentOS Stream source tree.
	CentOS Stream commit 892da692fa ("shmem: support idmapped
	mounts for tmpfs") is present, which cuases fuzz in mm/shmem.c.

commit 6c960e68aaed335a0040f16654f3c5e5bfcf9249
Author: Christian Brauner <brauner@kernel.org>
Date:   Fri Jan 13 12:49:13 2023 +0100

    fs: port ->create() to pass mnt_idmap

    Convert to struct mnt_idmap.

    Last cycle we merged the necessary infrastructure in
    256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
    This is just the conversion to struct mnt_idmap.

    Currently we still pass around the plain namespace that was attached to a
    mount. This is in general pretty convenient but it makes it easy to
    conflate namespaces that are relevant on the filesystem with namespaces
    that are relevent on the mount level. Especially for non-vfs developers
    without detailed knowledge in this area this can be a potential source for
    bugs.

    Once the conversion to struct mnt_idmap is done all helpers down to the
    really low-level helpers will take a struct mnt_idmap argument instead of
    two namespace arguments. This way it becomes impossible to conflate the two
    eliminating the possibility of any bugs. All of the vfs and all filesystems
    only operate on struct mnt_idmap.

    Acked-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>

Signed-off-by: Ian Kent <ikent@redhat.com>
2024-10-16 10:44:53 +08:00
Ian Kent 6ad3fa5fce fs: port ->getattr() to pass mnt_idmap
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus

Conflicts: CentOS Stream has commit 3e0b6f1fa9 ("afs: use
	read_seqbegin() in afs_check_validity() and afs_getattr()"),
	manually apply hunk #2 to fs/afs/inode.c.
	CentOS Stream commit 3b06927229 {"afs: split
        afs_pagecache_valid() out of afs_validate()") is present which
        causes a reject in fs/afs/internal.h, manually apply hunk to
	fs/afs/internal.h.
	For consistency drop btrfs hunks because it isn't supported in
	CentOS Stream and other backports also drop such hunks.
	CentOS Stream commit 48fa94aacd ("ceph: fscrypt_auth handling
	for ceph") alters the definition of _ceph_setattr() causing fuzz.
	The cifs source has been moved in CentOS Stream so manually
	apply rejected hunks to fs/smb/client/cifsfs.h and
	fs/smb/client/inode.c.
	Upstream commit 2e1d66379e ("staging: erofs: drop the extern
        prefix for function definitions") caused strange behaviour when
        applying this patch, there was a conflict in fs/erofs/internal.h but
        after a refresh the hunk and context looked ok. The hunk had to be
	manually applied.
	Upstream commit 2db0487faa211 ("f2fs: move f2fs_force_buffered_io()
	into file.c") is not present in CentOS Stream which causes fuzz
	when applying the first hunk to fs/f2fs/file.c.
	Upstream commit 30abce053f811 ("fat: report creation time in statx")
	is not present in CentOS Stream which caused a reject so apply change
	manually.
	Dropped hunks for ksmbd because the source is not present in the
	CentOS Stream source tree.
	Dropped hunks for ntfs3 because the source is not present in the
	CentOS Stream source tree.
	There was fuzz with hunk #2 against fs/nfs/inode.c but I was
	unable to see any difference.
	CentOS Stream commit 98ba731fc7 ("ovl: Move xattr support
	to new xattrs.c file") is present which caused fuzz in
	fs/overlayfs/overlayfs.h.
	Upstream commit d919a1e79bac8 ("proc: fix a dentry lock race
	between release_task and lookup") is not present in CentOS
	Stream causing fuzz applying hunk #1 against fs/proc/base.c.
	CentOS Stream commit 20c470188c ("vfs: plumb i_version
	handling into struct kstat") is present causing fuzz in hunk
	#2 against fs/stat.c.
	Upstream commit e0c49bd2b4d3c ("fs: sysv: Fix sysv_nblocks()
	returns wrong value") is not present in CentOS Stream causing
	fuzz applying hunk#1 against fs/sysv/itree.c.
	CentOS Stream commit 892da692fa ("shmem: support idmapped
	mounts for tmpfs") is present so it's ok to pass idmap to
	generic_fillattr().
	CentOS Stream commit f0f830cd7e {"ceph: create symlinks
	with encrypted and base64-encoded targets") uses the old
	struct user_namespace and so leaves those changes out, make
	those getattr() changes here.
	Allow for CentOS Stream commit 6c3396a0d8 ("kernfs: Introduce
	separate rwsem to protect inode attributes") which is already
	present.
	CentOS Stream commit f5219db0c0 ("KVM: fix Add KVM_CREATE_GUEST_MEMFD
	ioctl() for guest-specific backing memory") updated the upstream commit
	a7800aa80ea4d ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific
	backing memory") to account for missing idmapping commits. Now we have
	updated the second and final place these changes were made make the final
	needed adjustment to match the original upstream patch.

commit b74d24f7a74ffd2d42ca883d84b7422b8d545901
Author: Christian Brauner <brauner@kernel.org>
Date:   Fri Jan 13 12:49:12 2023 +0100

    fs: port ->getattr() to pass mnt_idmap

    Convert to struct mnt_idmap.

    Last cycle we merged the necessary infrastructure in
    256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
    This is just the conversion to struct mnt_idmap.

    Currently we still pass around the plain namespace that was attached to a
    mount. This is in general pretty convenient but it makes it easy to
    conflate namespaces that are relevant on the filesystem with namespaces
    that are relevent on the mount level. Especially for non-vfs developers
    without detailed knowledge in this area this can be a potential source for
    bugs.

    Once the conversion to struct mnt_idmap is done all helpers down to the
    really low-level helpers will take a struct mnt_idmap argument instead of
    two namespace arguments. This way it becomes impossible to conflate the two
    eliminating the possibility of any bugs. All of the vfs and all filesystems
    only operate on struct mnt_idmap.

    Acked-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>

Signed-off-by: Ian Kent <ikent@redhat.com>
2024-10-16 09:37:45 +08:00
Ian Kent 43ca440cdf fs: port ->setattr() to pass mnt_idmap
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus

Conflicts: CentOS Stream commit 3c29fadfb1 ("afs: split
	afs_pagecache_valid() out of afs_validate()") is present, manually
	adjust hunk #1 of fs/afs/internal.h.
	For consistency drop btrfs hunks because it isn't supported in
	CentOS Stream and other backports also drop such hunks.
	CentOS Stream commit 48fa94aacd ("ceph: fscrypt_auth handling
	for ceph") alters the definition of _ceph_setattr(), adjust
	manually.
	CentOS Stream commit 34b2a2b5a3 {"ceph: add some fscrypt
	guardrails") introduces a call to fscrypt_prepare_setattr() which
	causes fuzz when applying.
	The cifs source has been moved in CentOS Stream so manually
	apply rejected hunks to fs/smb/client/cifsfs.h and
	fs/smb/client/inode.c.
	Upstream commit 5a646fb3a3e2d ("coda: avoid doing bad things on
	inode type changes during revalidation") is not present which
	causes fuzz in fs/coda/coda_linux.h.
	Dropped hunks for ntfs3 because the source is not present in
	the CentOS Stream source tree.
	CentOS Stream commit 98ba731fc7 ("ovl: Move xattr support
	to new xattrs.c file") is presnt so manually apply hunk.
	CentOS Stream commit 892da692fa ("shmem: support idmapped
	mounts for tmpfs") is present so it's ok to pass idmap to
	setattr_prepare() and setattr_copy().
	Update to add incremental changes needed due to CentOS Stream
	commit 469e1d13f6 ("shmem: quota support").
	Allow for CentOS Stream commit 6c3396a0d8 ("kernfs: Introduce
	separate rwsem to protect inode attributes") which is already
	present.
	CentOS Stream commit f5219db0c0 ("KVM: fix Add KVM_CREATE_GUEST_MEMFD
	ioctl() for guest-specific backing memory") updated the upstream commit
	a7800aa80ea4d ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific
	backing memory") to account for missing idmapping commits. Now we have
	updated one of the two places these changes were made make one of the
	needed adjustments to match the original upstream patch.

commit c1632a0f11209338fc300c66252bcc4686e609e8
Author: Christian Brauner <brauner@kernel.org>
Date:   Fri Jan 13 12:49:11 2023 +0100

    fs: port ->setattr() to pass mnt_idmap

    Convert to struct mnt_idmap.

    Last cycle we merged the necessary infrastructure in
    256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
    This is just the conversion to struct mnt_idmap.

    Currently we still pass around the plain namespace that was attached to a
    mount. This is in general pretty convenient but it makes it easy to
    conflate namespaces that are relevant on the filesystem with namespaces
    that are relevent on the mount level. Especially for non-vfs developers
    without detailed knowledge in this area this can be a potential source for
    bugs.

    Once the conversion to struct mnt_idmap is done all helpers down to the
    really low-level helpers will take a struct mnt_idmap argument instead of
    two namespace arguments. This way it becomes impossible to conflate the two
    eliminating the possibility of any bugs. All of the vfs and all filesystems
    only operate on struct mnt_idmap.

    Acked-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>

Signed-off-by: Ian Kent <ikent@redhat.com>
2024-10-16 09:07:05 +08:00
Ian Kent 310906db16 fs: pass dentry to set acl method
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus

Conflicts: I didn't want to just drop the btrfs hunks so I made the
    change to btrfs_setattr() init_user_ns instead of the expected
    mnt_userns. That should at least cause a conflict if btrfs changes
    to a supported fs in the future.
    CentOS Stream commit 48fa94aacd ("ceph: fscrypt_auth handling for
    ceph") is present, make necessary adjustment.
    CentOS Stream commit 892da692fa ("shmem: support idmapped mounts
    for tmpfs") is present, make necessary adjustment.
    The changes for fs/ksmbd/* were dropped as the directory doesn't
    exist in CentOS Stream.
    The changes for fs/ntfs3/* were dropped as the directory doesn't
    exist in CentOS Stream.

commit 138060ba92b3b0d77c8e6818d0f33398b23ea42e
Author: Christian Brauner <brauner@kernel.org>
Date:   Fri Sep 23 10:29:39 2022 +0200

    fs: pass dentry to set acl method

    The current way of setting and getting posix acls through the generic
    xattr interface is error prone and type unsafe. The vfs needs to
    interpret and fixup posix acls before storing or reporting it to
    userspace. Various hacks exist to make this work. The code is hard to
    understand and difficult to maintain in it's current form. Instead of
    making this work by hacking posix acls through xattr handlers we are
    building a dedicated posix acl api around the get and set inode
    operations. This removes a lot of hackiness and makes the codepaths
    easier to maintain. A lot of background can be found in [1].

    Since some filesystem rely on the dentry being available to them when
    setting posix acls (e.g., 9p and cifs) they cannot rely on set acl inode
    operation. But since ->set_acl() is required in order to use the generic
    posix acl xattr handlers filesystems that do not implement this inode
    operation cannot use the handler and need to implement their own
    dedicated posix acl handlers.

    Update the ->set_acl() inode method to take a dentry argument. This
    allows all filesystems to rely on ->set_acl().

    As far as I can tell all codepaths can be switched to rely on the dentry
    instead of just the inode. Note that the original motivation for passing
    the dentry separate from the inode instead of just the dentry in the
    xattr handlers was because of security modules that call
    security_d_instantiate(). This hook is called during
    d_instantiate_new(), d_add(), __d_instantiate_anon(), and
    d_splice_alias() to initialize the inode's security context and possibly
    to set security.* xattrs. Since this only affects security.* xattrs this
    is completely irrelevant for posix acls.

    Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1]
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>

Signed-off-by: Ian Kent <ikent@redhat.com>
2024-10-15 16:11:25 +08:00
Ian Kent 8763195146 attr: port attribute changes to new types
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus

Conflict: Hunk 2 of fs/f2fs/file.c failed to apply but the source looked
	identical and required manual application.
	Hunks 2 and 3 failed to apply to fs/attr.c due to CentOS Stream
	commit 33c38120a3 ("fs: account for group membership") having
	already been applied requiring manual application.
	Update to add incremental changes needed due to CentOS Stream
	("shmem: quota support").

commit b27c82e1296572cfa3997e58db3118a33915f85c
Author: Christian Brauner <brauner@kernel.org>
Date:   Tue Jun 21 16:14:54 2022 +0200

    attr: port attribute changes to new types

    Now that we introduced new infrastructure to increase the type safety
    for filesystems supporting idmapped mounts port the first part of the
    vfs over to them.

    This ports the attribute changes codepaths to rely on the new better
    helpers using a dedicated type.

    Before this change we used to take a shortcut and place the actual
    values that would be written to inode->i_{g,u}id into struct iattr. This
    had the advantage that we moved idmappings mostly out of the picture
    early on but it made reasoning about changes more difficult than it
    should be.

    The filesystem was never explicitly told that it dealt with an idmapped
    mount. The transition to the value that needed to be stored in
    inode->i_{g,u}id appeared way too early and increased the probability of
    bugs in various codepaths.

    We know place the same value in struct iattr no matter if this is an
    idmapped mount or not. The vfs will only deal with type safe
    vfs{g,u}id_t. This makes it massively safer to perform permission checks
    as the type will tell us what checks we need to perform and what helpers
    we need to use.

    Fileystems raising FS_ALLOW_IDMAP can't simply write ia_vfs{g,u}id to
    inode->i_{g,u}id since they are different types. Instead they need to
    use the dedicated vfs{g,u}id_to_k{g,u}id() helpers that map the
    vfs{g,u}id into the filesystem.

    The other nice effect is that filesystems like overlayfs don't need to
    care about idmappings explicitly anymore and can simply set up struct
    iattr accordingly directly.

    Link: https://lore.kernel.org/lkml/CAHk-=win6+ahs1EwLkcq8apqLi_1wXFWbrPf340zYEhObpz4jA@mail.gmail.com [1]
    Link: https://lore.kernel.org/r/20220621141454.2914719-9-brauner@kernel.org
    Cc: Seth Forshee <sforshee@digitalocean.com>
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Aleksa Sarai <cyphar@cyphar.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    CC: linux-fsdevel@vger.kernel.org
    Reviewed-by: Seth Forshee <sforshee@digitalocean.com>
    Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>

Signed-off-by: Ian Kent <ikent@redhat.com>
2024-10-15 16:10:59 +08:00
Ian Kent 0d2dd7a477 quota: port quota helpers mount ids
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus

Conflict: There was a conflict in a hunk applied to f2fs_setattr() but the
	source looked identical and required manual application.
	Update to account for changes to is_quota_modification() and
	dquot_transfer() from CentOS Stream commit 469e1d13f6 ("shmem:
	quota support").

commit 71e7b535b8900d7ce7d5279fa472711db5251ae5
Author: Christian Brauner <brauner@kernel.org>
Date:   Tue Jun 21 16:14:52 2022 +0200

    quota: port quota helpers mount ids

    Port the is_quota_modification() and dqout_transfer() helper to type
    safe vfs{g,u}id_t. Since these helpers are only called by a few
    filesystems don't introduce a new helper but simply extend the existing
    helpers to pass down the mount's idmapping.

    Note, that this is a non-functional change, i.e. nothing will have
    happened here or at the end of this series to how quota are done! This
    a change necessary because we will at the end of this series make
    ownership changes easier to reason about by keeping the original value
    in struct iattr for both non-idmapped and idmapped mounts.

    For now we always pass the initial idmapping which makes the idmapping
    functions these helpers call nops.

    This is done because we currently always pass the actual value to be
    written to i_{g,u}id via struct iattr. While this allowed us to treat
    the {g,u}id values in struct iattr as values that can be directly
    written to inode->i_{g,u}id it also increases the potential for
    confusion for filesystems.

    Now that we are have dedicated types to prevent this confusion we will
    ultimately only map the value from the idmapped mount into a filesystem
    value that can be written to inode->i_{g,u}id when the filesystem
    actually updates the inode. So pass down the initial idmapping until we
    finished that conversion at which point we pass down the mount's
    idmapping.

    Since struct iattr uses an anonymous union with overlapping types as
    supported by the C standard, filesystems that haven't converted to
    ia_vfs{g,u}id won't see any difference and things will continue to work
    as before. In other words, no functional changes intended with this
    change.

    Link: https://lore.kernel.org/r/20220621141454.2914719-7-brauner@kernel.org
    Cc: Seth Forshee <sforshee@digitalocean.com>
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Jan Kara <jack@suse.cz>
    Cc: Aleksa Sarai <cyphar@cyphar.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    CC: linux-fsdevel@vger.kernel.org
    Reviewed-by: Jan Kara <jack@suse.cz>
    Reviewed-by: Seth Forshee <sforshee@digitalocean.com>
    Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>

Signed-off-by: Ian Kent <ikent@redhat.com>
2024-10-15 16:10:58 +08:00
Ian Kent 00383cd059 fs: port to iattr ownership update helpers
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus

Conflicts: Update to use the iattrs update helpers in mm/shmem.c due to
	the quota changes from CentOS Stream commit 469e1d13f6 ("shmem:
	quota support").

commit 35faf3109a78516f60ca13f957083d5e5535fde0
Author: Christian Brauner <brauner@kernel.org>
Date:   Tue Jun 21 16:14:51 2022 +0200

    fs: port to iattr ownership update helpers

    Earlier we introduced new helpers to abstract ownership update and
    remove code duplication. This converts all filesystems supporting
    idmapped mounts to make use of these new helpers.

    For now we always pass the initial idmapping which makes the idmapping
    functions these helpers call nops.

    This is done because we currently always pass the actual value to be
    written to i_{g,u}id via struct iattr. While this allowed us to treat
    the {g,u}id values in struct iattr as values that can be directly
    written to inode->i_{g,u}id it also increases the potential for
    confusion for filesystems.

    Now that we are have dedicated types to prevent this confusion we will
    ultimately only map the value from the idmapped mount into a filesystem
    value that can be written to inode->i_{g,u}id when the filesystem
    actually updates the inode. So pass down the initial idmapping until we
    finished that conversion at which point we pass down the mount's
    idmapping.

    No functional changes intended.

    Link: https://lore.kernel.org/r/20220621141454.2914719-6-brauner@kernel.org
    Cc: Seth Forshee <sforshee@digitalocean.com>
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Aleksa Sarai <cyphar@cyphar.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    CC: linux-fsdevel@vger.kernel.org
    Reviewed-by: Seth Forshee <sforshee@digitalocean.com>
    Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>

Signed-off-by: Ian Kent <ikent@redhat.com>
2024-10-15 16:10:57 +08:00
Rafael Aquini cd1cd44bf9 mm/swap: inline folio_set_swap_entry() and folio_swap_entry()
JIRA: https://issues.redhat.com/browse/RHEL-27743

This patch is a backport of the following upstream commit:
commit 3d2c908768877714a354ee6d7bf93e801400d5e2
Author: David Hildenbrand <david@redhat.com>
Date:   Mon Aug 21 18:08:48 2023 +0200

    mm/swap: inline folio_set_swap_entry() and folio_swap_entry()

    Let's simply work on the folio directly and remove the helpers.

    Link: https://lkml.kernel.org/r/20230821160849.531668-4-david@redhat.com
    Signed-off-by: David Hildenbrand <david@redhat.com>
    Suggested-by: Matthew Wilcox <willy@infradead.org>
    Reviewed-by: Chris Li <chrisl@kernel.org>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Dan Streetman <ddstreet@ieee.org>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Seth Jennings <sjenning@redhat.com>
    Cc: Vitaly Wool <vitaly.wool@konsulko.com>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-10-01 11:22:06 -04:00
Rafael Aquini db6591e712 tmpfs: trivial support for direct IO
JIRA: https://issues.redhat.com/browse/RHEL-27743

This patch is a backport of the following upstream commit:
commit e88e0d366f9cfbb810b0c8509dc5d130d5a53e02
Author: Hugh Dickins <hughd@google.com>
Date:   Thu Aug 10 23:27:07 2023 -0700

    tmpfs: trivial support for direct IO

    Depending upon your philosophical viewpoint, either tmpfs always does
    direct IO, or it cannot ever do direct IO; but whichever, if tmpfs is to
    stand in for a more sophisticated filesystem, it can be helpful for tmpfs
    to support O_DIRECT.  So, give tmpfs a shmem_file_open() method, to set
    the FMODE_CAN_ODIRECT flag: then unchanged shmem_file_read_iter() and new
    shmem_file_write_iter() do the work (without any shmem_direct_IO() stub).

    Perhaps later, once the direct_IO method has been eliminated from all
    filesystems, generic_file_write_iter() will be such that tmpfs can again
    use it, even for O_DIRECT.

    xfstests auto generic which were not run on tmpfs before but now pass:
    036 091 113 125 130 133 135 198 207 208 209 210 211 212 214 226 239 263
    323 355 391 406 412 422 427 446 451 465 551 586 591 609 615 647 708 729
    with no new failures.

    LTP dio tests which were not run on tmpfs before but now pass:
    dio01 through dio30, except for dio04 and dio10, which fail because
    tmpfs dio read and write allow odd count: tmpfs could be made stricter,
    but would that be an improvement?

    Signed-off-by: Hugh Dickins <hughd@google.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Message-Id: <6f2742-6f1f-cae9-7c5b-ed20fc53215@google.com>
    Signed-off-by: Christian Brauner <brauner@kernel.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-10-01 11:21:37 -04:00
Rafael Aquini 81b2421a05 tmpfs: track free_ispace instead of free_inodes
JIRA: https://issues.redhat.com/browse/RHEL-27743

This patch is a backport of the following upstream commit:
commit e07c469e979c104464300aaa3b7923f929055cd0
Author: Hugh Dickins <hughd@google.com>
Date:   Tue Aug 8 21:32:21 2023 -0700

    tmpfs: track free_ispace instead of free_inodes

    In preparation for assigning some inode space to extended attributes,
    keep track of free_ispace instead of number of free_inodes: as if one
    tmpfs inode (and accompanying dentry) occupies very approximately 1KiB.

    Unsigned long is large enough for free_ispace, on 64-bit and on 32-bit:
    but take care to enforce the maximum.  And fix the nr_blocks maximum on
    32-bit: S64_MAX would be too big for it there, so say LONG_MAX instead.

    Delete the incorrect limited<->unlimited blocks/inodes comment above
    shmem_reconfigure(): leave it to the error messages below to describe.

    Signed-off-by: Hugh Dickins <hughd@google.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
    Message-Id: <4fe1739-d9e7-8dfd-5bce-12e7339711da@google.com>
    Signed-off-by: Christian Brauner <brauner@kernel.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-10-01 11:21:35 -04:00
Rafael Aquini d41514ca9f xattr: simple_xattr_set() return old_xattr to be freed
JIRA: https://issues.redhat.com/browse/RHEL-27743
Conflicts:
  * mm/shmem.c: this commit had a merge conflict upstream with commit
      6528733416f1 ("shmem: convert to ctime accessor functions"), backported
      earlier in this set. The conflict was solved via merge commit ecd7db20474c
      ("Merge tag 'v6.6-vfs.tmpfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs"),
      from which we borrow the hunk adjustment for this backport.

This patch is a backport of the following upstream commit:
commit 5de75970c9fd7220e394b76e6d20fbafa1369b5a
Author: Hugh Dickins <hughd@google.com>
Date:   Tue Aug 8 21:30:59 2023 -0700

    xattr: simple_xattr_set() return old_xattr to be freed

    tmpfs wants to support limited user extended attributes, but kernfs
    (or cgroupfs, the only kernfs with KERNFS_ROOT_SUPPORT_USER_XATTR)
    already supports user extended attributes through simple xattrs: but
    limited by a policy (128KiB per inode) too liberal to be used on tmpfs.

    To allow a different limiting policy for tmpfs, without affecting the
    policy for kernfs, change simple_xattr_set() to return the replaced or
    removed xattr (if any), leaving the caller to update their accounting
    then free the xattr (by simple_xattr_free(), renamed from the static
    free_simple_xattr()).

    Signed-off-by: Hugh Dickins <hughd@google.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Reviewed-by: Christian Brauner <brauner@kernel.org>
    Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
    Message-Id: <158c6585-2aa7-d4aa-90ff-f7c3f8fe407c@google.com>
    Signed-off-by: Christian Brauner <brauner@kernel.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-10-01 11:21:34 -04:00
Rafael Aquini f8d1f89f03 mm/shmem.c: use helper macro K()
JIRA: https://issues.redhat.com/browse/RHEL-27743

This patch is a backport of the following upstream commit:
commit b91742d84d29c39b643992b95560cfb7337eab18
Author: ZhangPeng <zhangpeng362@huawei.com>
Date:   Fri Aug 4 09:25:56 2023 +0800

    mm/shmem.c: use helper macro K()

    Use helper macro K() to improve code readability.  No functional
    modification involved.

    Link: https://lkml.kernel.org/r/20230804012559.2617515-5-zhangpeng362@huawei.com
    Signed-off-by: ZhangPeng <zhangpeng362@huawei.com>
    Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
    Cc: Nanyong Sun <sunnanyong@huawei.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-10-01 11:20:55 -04:00
Rafael Aquini a87aa37b6f tmpfs: verify {g,u}id mount options correctly
JIRA: https://issues.redhat.com/browse/RHEL-27743

This patch is a backport of the following upstream commit:
commit 0200679fc7953177941e41c2a4241d0b6c2c5de8
Author: Christian Brauner <brauner@kernel.org>
Date:   Tue Aug 1 18:17:04 2023 +0200

    tmpfs: verify {g,u}id mount options correctly

    A while ago we received the following report:

    "The other outstanding issue I noticed comes from the fact that
    fsconfig syscalls may occur in a different userns than that which
    called fsopen. That means that resolving the uid/gid via
    current_user_ns() can save a kuid that isn't mapped in the associated
    namespace when the filesystem is finally mounted. This means that it
    is possible for an unprivileged user to create files owned by any
    group in a tmpfs mount (since we can set the SUID bit on the tmpfs
    directory), or a tmpfs that is owned by any user, including the root
    group/user."

    The contract for {g,u}id mount options and {g,u}id values in general set
    from userspace has always been that they are translated according to the
    caller's idmapping. In so far, tmpfs has been doing the correct thing.
    But since tmpfs is mountable in unprivileged contexts it is also
    necessary to verify that the resulting {k,g}uid is representable in the
    namespace of the superblock to avoid such bugs as above.

    The new mount api's cross-namespace delegation abilities are already
    widely used. After having talked to a bunch of userspace this is the
    most faithful solution with minimal regression risks. I know of one
    users - systemd - that makes use of the new mount api in this way and
    they don't set unresolable {g,u}ids. So the regression risk is minimal.

    Link: https://lore.kernel.org/lkml/CALxfFW4BXhEwxR0Q5LSkg-8Vb4r2MONKCcUCVioehXQKr35eHg@mail.gmail.com
    Fixes: f32356261d ("vfs: Convert ramfs, shmem, tmpfs, devtmpfs, rootfs to use the new mount API")
    Reviewed-by: "Seth Forshee (DigitalOcean)" <sforshee@kernel.org>
    Reported-by: Seth Jenkins <sethjenkins@google.com>
    Message-Id: <20230801-vfs-fs_context-uidgid-v1-1-daf46a050bbf@kernel.org>
    Signed-off-by: Christian Brauner <brauner@kernel.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-10-01 11:20:14 -04:00
Rafael Aquini 4b5fb83182 mm: make PTE_MARKER_SWAPIN_ERROR more general
JIRA: https://issues.redhat.com/browse/RHEL-27743

This patch is a backport of the following upstream commit:
commit af19487f00f34ff8643921d7909dbb3fedc7e329
Author: Axel Rasmussen <axelrasmussen@google.com>
Date:   Fri Jul 7 14:55:33 2023 -0700

    mm: make PTE_MARKER_SWAPIN_ERROR more general

    Patch series "add UFFDIO_POISON to simulate memory poisoning with UFFD",
    v4.

    This series adds a new userfaultfd feature, UFFDIO_POISON. See commit 4
    for a detailed description of the feature.

    This patch (of 8):

    Future patches will reuse PTE_MARKER_SWAPIN_ERROR to implement
    UFFDIO_POISON, so make some various preparations for that:

    First, rename it to just PTE_MARKER_POISONED.  The "SWAPIN" can be
    confusing since we're going to re-use it for something not really related
    to swap.  This can be particularly confusing for things like hugetlbfs,
    which doesn't support swap whatsoever.  Also rename some various helper
    functions.

    Next, fix pte marker copying for hugetlbfs.  Previously, it would WARN on
    seeing a PTE_MARKER_SWAPIN_ERROR, since hugetlbfs doesn't support swap.
    But, since we're going to re-use it, we want it to go ahead and copy it
    just like non-hugetlbfs memory does today.  Since the code to do this is
    more complicated now, pull it out into a helper which can be re-used in
    both places.  While we're at it, also make it slightly more explicit in
    its handling of e.g.  uffd wp markers.

    For non-hugetlbfs page faults, instead of returning VM_FAULT_SIGBUS for an
    error entry, return VM_FAULT_HWPOISON.  For most cases this change doesn't
    matter, e.g.  a userspace program would receive a SIGBUS either way.  But
    for UFFDIO_POISON, this change will let KVM guests get an MCE out of the
    box, instead of giving a SIGBUS to the hypervisor and requiring it to
    somehow inject an MCE.

    Finally, for hugetlbfs faults, handle PTE_MARKER_POISONED, and return
    VM_FAULT_HWPOISON_LARGE in such cases.  Note that this can't happen today
    because the lack of swap support means we'll never end up with such a PTE
    anyway, but this behavior will be needed once such entries *can* show up
    via UFFDIO_POISON.

    Link: https://lkml.kernel.org/r/20230707215540.2324998-1-axelrasmussen@google.com
    Link: https://lkml.kernel.org/r/20230707215540.2324998-2-axelrasmussen@google.com
    Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
    Acked-by: Peter Xu <peterx@redhat.com>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Brian Geffon <bgeffon@google.com>
    Cc: Christian Brauner <brauner@kernel.org>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Gaosheng Cui <cuigaosheng1@huawei.com>
    Cc: Huang, Ying <ying.huang@intel.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: James Houghton <jthoughton@google.com>
    Cc: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
    Cc: Jiaqi Yan <jiaqiyan@google.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
    Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Mike Rapoport (IBM) <rppt@kernel.org>
    Cc: Muchun Song <muchun.song@linux.dev>
    Cc: Nadav Amit <namit@vmware.com>
    Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Cc: Ryan Roberts <ryan.roberts@arm.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Cc: Suleiman Souhlal <suleiman@google.com>
    Cc: Suren Baghdasaryan <surenb@google.com>
    Cc: T.J. Alumbaugh <talumbau@google.com>
    Cc: Yu Zhao <yuzhao@google.com>
    Cc: ZhangPeng <zhangpeng362@huawei.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-10-01 11:18:03 -04:00
Rafael Aquini 79e59ae792 shmem: convert to ctime accessor functions
JIRA: https://issues.redhat.com/browse/RHEL-27743
Conflicts:
  * mm/shmem.c: minor context conflicts on the 2nd, 3rd, and 4th hunks
      due to RHEL missing commits 256c8aed2b42 ("fs: introduce dedicated
      idmap type for mounts") and its follow-up series, as well as
      due to out-of-order backport of commit  7a80e5b8c6fa ("shmem: support
      idmapped mounts for tmpfs") which folds bits of commit 138060ba92b3
      ("fs: pass dentry to set acl method")

This patch is a backport of the following upstream commit:
commit 6528733416f13dd67eda1f34e74a2242af36d638
Author: Jeff Layton <jlayton@kernel.org>
Date:   Wed Jul 5 15:01:52 2023 -0400

    shmem: convert to ctime accessor functions

    In later patches, we're going to change how the inode's ctime field is
    used. Switch to using accessor functions instead of raw accesses of
    inode->i_ctime.

    Signed-off-by: Jeff Layton <jlayton@kernel.org>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Message-Id: <20230705190309.579783-85-jlayton@kernel.org>
    Signed-off-by: Christian Brauner <brauner@kernel.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-10-01 11:17:47 -04:00
Rafael Aquini d14035b708 shmem: convert to simple_rename_timestamp
JIRA: https://issues.redhat.com/browse/RHEL-27743

This patch is a backport of the following upstream commit:
commit 944d0d9def9de37f0209ff73f3d8daa1baccab67
Author: Jeff Layton <jlayton@kernel.org>
Date:   Wed Jul 5 15:00:36 2023 -0400

    shmem: convert to simple_rename_timestamp

    A rename potentially involves updating 4 different inode timestamps.
    Convert to the new simple_rename_timestamp helper function.

    Signed-off-by: Jeff Layton <jlayton@kernel.org>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Message-Id: <20230705190309.579783-9-jlayton@kernel.org>
    Signed-off-by: Christian Brauner <brauner@kernel.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-10-01 11:17:46 -04:00
Rafael Aquini 89b7c01962 mm: increase usage of folio_next_index() helper
JIRA: https://issues.redhat.com/browse/RHEL-27743

This patch is a backport of the following upstream commit:
commit 87b11f862254396a93636f0998377ac3f6648f5f
Author: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Date:   Tue Jun 27 10:43:49 2023 -0700

    mm: increase usage of folio_next_index() helper

    Simplify code pattern of 'folio->index + folio_nr_pages(folio)' by using
    the existing helper folio_next_index().

    Link: https://lkml.kernel.org/r/20230627174349.491803-1-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Suggested-by: Christoph Hellwig <hch@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Cc: Andreas Dilger <adilger.kernel@dilger.ca>
    Cc: Christoph Hellwig <hch@infradead.org>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Theodore Ts'o <tytso@mit.edu>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-10-01 11:17:23 -04:00
Waiman Long 1af51c424f mm: shmem: fix getting incorrect lruvec when replacing a shmem folio
JIRA: https://issues.redhat.com/browse/RHEL-56023

commit 9094b4a1c76cfe84b906cc152bab34d4ba26fa5c
Author: Baolin Wang <baolin.wang@linux.alibaba.com>
Date:   Thu, 13 Jun 2024 16:21:19 +0800

    mm: shmem: fix getting incorrect lruvec when replacing a shmem folio

    When testing shmem swapin, I encountered the warning below on my machine.
    The reason is that replacing an old shmem folio with a new one causes
    mem_cgroup_migrate() to clear the old folio's memcg data.  As a result,
    the old folio cannot get the correct memcg's lruvec needed to remove
    itself from the LRU list when it is being freed.  This could lead to
    possible serious problems, such as LRU list crashes due to holding the
    wrong LRU lock, and incorrect LRU statistics.

    To fix this issue, we can fallback to use the mem_cgroup_replace_folio()
    to replace the old shmem folio.

    [ 5241.100311] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x5d9960
    [ 5241.100317] head: order:4 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
    [ 5241.100319] flags: 0x17fffe0000040068(uptodate|lru|head|swapbacked|node=0|zone=2|lastcpupid=0x3ffff)
    [ 5241.100323] raw: 17fffe0000040068 fffffdffd6687948 fffffdffd69ae008 0000000000000000
    [ 5241.100325] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
    [ 5241.100326] head: 17fffe0000040068 fffffdffd6687948 fffffdffd69ae008 0000000000000000
    [ 5241.100327] head: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
    [ 5241.100328] head: 17fffe0000000204 fffffdffd6665801 ffffffffffffffff 0000000000000000
    [ 5241.100329] head: 0000000a00000010 0000000000000000 00000000ffffffff 0000000000000000
    [ 5241.100330] page dumped because: VM_WARN_ON_ONCE_FOLIO(!memcg && !mem_cgroup_disabled())
    [ 5241.100338] ------------[ cut here ]------------
    [ 5241.100339] WARNING: CPU: 19 PID: 78402 at include/linux/memcontrol.h:775 folio_lruvec_lock_irqsave+0x140/0x150
    [...]
    [ 5241.100374] pc : folio_lruvec_lock_irqsave+0x140/0x150
    [ 5241.100375] lr : folio_lruvec_lock_irqsave+0x138/0x150
    [ 5241.100376] sp : ffff80008b38b930
    [...]
    [ 5241.100398] Call trace:
    [ 5241.100399]  folio_lruvec_lock_irqsave+0x140/0x150
    [ 5241.100401]  __page_cache_release+0x90/0x300
    [ 5241.100404]  __folio_put+0x50/0x108
    [ 5241.100406]  shmem_replace_folio+0x1b4/0x240
    [ 5241.100409]  shmem_swapin_folio+0x314/0x528
    [ 5241.100411]  shmem_get_folio_gfp+0x3b4/0x930
    [ 5241.100412]  shmem_fault+0x74/0x160
    [ 5241.100414]  __do_fault+0x40/0x218
    [ 5241.100417]  do_shared_fault+0x34/0x1b0
    [ 5241.100419]  do_fault+0x40/0x168
    [ 5241.100420]  handle_pte_fault+0x80/0x228
    [ 5241.100422]  __handle_mm_fault+0x1c4/0x440
    [ 5241.100424]  handle_mm_fault+0x60/0x1f0
    [ 5241.100426]  do_page_fault+0x120/0x488
    [ 5241.100429]  do_translation_fault+0x4c/0x68
    [ 5241.100431]  do_mem_abort+0x48/0xa0
    [ 5241.100434]  el0_da+0x38/0xc0
    [ 5241.100436]  el0t_64_sync_handler+0x68/0xc0
    [ 5241.100437]  el0t_64_sync+0x14c/0x150
    [ 5241.100439] ---[ end trace 0000000000000000 ]---

    [baolin.wang@linux.alibaba.com: remove less helpful comments, per Matthew]
      Link: https://lkml.kernel.org/r/ccad3fe1375b468ebca3227b6b729f3eaf9d8046.1718423197.git.baolin.wang@linux.alibaba.com
    Link: https://lkml.kernel.org/r/3c11000dd6c1df83015a8321a859e9775ebbc23e.1718266112.git.baolin.wang@linux.alibaba.com
    Fixes: 85ce2c517ade ("memcontrol: only transfer the memcg data for migration")
    Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
    Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Nhat Pham <nphamcs@gmail.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Roman Gushchin <roman.gushchin@linux.dev>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2024-09-30 09:47:03 -04:00
Rafael Aquini 0a98879655 shmem: minor fixes to splice-read implementation
JIRA: https://issues.redhat.com/browse/RHEL-27742

This patch is a backport of the following upstream commit:
commit fa598952fac059054316dccb2213478ccb81a0d1
Author: Hugh Dickins <hughd@google.com>
Date:   Sun Jul 23 14:05:54 2023 -0700

    shmem: minor fixes to splice-read implementation

    HWPoison: my reading of folio_test_hwpoison() is that it only tests the
    head page of a large folio, whereas splice_folio_into_pipe() will splice
    as much of the folio as it can: so for safety we should also check the
    has_hwpoisoned flag, set if any of the folio's pages are hwpoisoned.
    (Perhaps that ugliness can be improved at the mm end later.)

    The call to splice_zeropage_into_pipe() risked overrunning past EOF: ask
    it for "part" not "len".

    Link: https://lkml.kernel.org/r/32c72c9c-72a8-115f-407d-f0148f368@google.com
    Fixes: bd194b187115 ("shmem: Implement splice-read")
    Signed-off-by: Hugh Dickins <hughd@google.com>
    Reviewed-by: David Howells <dhowells@redhat.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-09-05 20:35:53 -04:00
Rafael Aquini 3af248f180 mm: shmem: fix UAF bug in shmem_show_options()
JIRA: https://issues.redhat.com/browse/RHEL-27742
Conflicts:
  * minor context diff due to out-of-order backport of upstream's v6.6
    commit b4d3de57cab2 ("shmem: properly report quota mount options")

This patch is a backport of the following upstream commit:
commit 283ebdee2da30f65cba04c8fe690b97acfc7f4c4
Author: Tu Jinjiang <tujinjiang@huawei.com>
Date:   Thu May 25 11:16:40 2023 +0800

    mm: shmem: fix UAF bug in shmem_show_options()

    shmem_show_options() uses sbinfo->mpol without adding it's refcnt. This
    may lead to race with replacement of the mpol by remount. The execution
    sequence is as follows.

           CPU0                                   CPU1
    shmem_show_options()                        shmem_reconfigure()
        shmem_show_mpol(seq, sbinfo->mpol)          mpol = sbinfo->mpol
                                                    mpol_put(mpol)
            mpol->mode

    The KASAN report is as follows.

    BUG: KASAN: slab-use-after-free in shmem_show_options+0x21b/0x340
    Read of size 2 at addr ffff888124324004 by task mount/2388

    CPU: 2 PID: 2388 Comm: mount Not tainted 6.4.0-rc3-00017-g9d646009f65d-dirty #8
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
    Call Trace:
     <TASK>
     dump_stack_lvl+0x37/0x50
     print_report+0xd0/0x620
     ? shmem_show_options+0x21b/0x340
     ? __virt_addr_valid+0xf4/0x180
     ? shmem_show_options+0x21b/0x340
     kasan_report+0xb8/0xe0
     ? shmem_show_options+0x21b/0x340
     shmem_show_options+0x21b/0x340
     ? __pfx_shmem_show_options+0x10/0x10
     ? strchr+0x2c/0x50
     ? strlen+0x23/0x40
     ? seq_puts+0x7d/0x90
     show_vfsmnt+0x1e6/0x260
     ? __pfx_show_vfsmnt+0x10/0x10
     ? __kasan_kmalloc+0x7f/0x90
     seq_read_iter+0x57a/0x740
     vfs_read+0x2e2/0x4a0
     ? __pfx_vfs_read+0x10/0x10
     ? down_write_killable+0xb8/0x140
     ? __pfx_down_write_killable+0x10/0x10
     ? __fget_light+0xa9/0x1e0
     ? up_write+0x3f/0x80
     ksys_read+0xb8/0x150
     ? __pfx_ksys_read+0x10/0x10
     ? fpregs_assert_state_consistent+0x55/0x60
     ? exit_to_user_mode_prepare+0x2d/0x120
     do_syscall_64+0x3c/0x90
     entry_SYSCALL_64_after_hwframe+0x72/0xdc

     </TASK>

    Allocated by task 2387:
     kasan_save_stack+0x22/0x50
     kasan_set_track+0x25/0x30
     __kasan_slab_alloc+0x59/0x70
     kmem_cache_alloc+0xdd/0x220
     mpol_new+0x83/0x150
     mpol_parse_str+0x280/0x4a0
     shmem_parse_one+0x364/0x520
     vfs_parse_fs_param+0xf8/0x1a0
     vfs_parse_fs_string+0xc9/0x130
     shmem_parse_options+0xb2/0x110
     path_mount+0x597/0xdf0
     do_mount+0xcd/0xf0
     __x64_sys_mount+0xbd/0x100
     do_syscall_64+0x3c/0x90
     entry_SYSCALL_64_after_hwframe+0x72/0xdc

    Freed by task 2389:
     kasan_save_stack+0x22/0x50
     kasan_set_track+0x25/0x30
     kasan_save_free_info+0x2e/0x50
     __kasan_slab_free+0x10e/0x1a0
     kmem_cache_free+0x9c/0x350
     shmem_reconfigure+0x278/0x370
     reconfigure_super+0x383/0x450
     path_mount+0xcc5/0xdf0
     do_mount+0xcd/0xf0
     __x64_sys_mount+0xbd/0x100
     do_syscall_64+0x3c/0x90
     entry_SYSCALL_64_after_hwframe+0x72/0xdc

    The buggy address belongs to the object at ffff888124324000
     which belongs to the cache numa_policy of size 32
    The buggy address is located 4 bytes inside of
     freed 32-byte region [ffff888124324000, ffff888124324020)
    ==================================================================

    To fix the bug, shmem_get_sbmpol() / mpol_put() needs to be called
    before / after shmem_show_mpol() call.

    Link: https://lkml.kernel.org/r/20230525031640.593733-1-tujinjiang@huawei.com
    Signed-off-by: Tu Jinjiang <tujinjiang@huawei.com>
    Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
    Acked-by: Hugh Dickins <hughd@google.com>
    Cc: Nanyong Sun <sunnanyong@huawei.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-09-05 20:35:52 -04:00
Rafael Aquini b0287192e6 shmem: Implement splice-read
JIRA: https://issues.redhat.com/browse/RHEL-27742

This patch is a backport of the following upstream commit:
commit bd194b187115da7b98b660b049315f6c9c8267d1
Author: David Howells <dhowells@redhat.com>
Date:   Mon May 22 14:49:56 2023 +0100

    shmem: Implement splice-read

    The new filemap_splice_read() has an implicit expectation via
    filemap_get_pages() that ->read_folio() exists if ->readahead() doesn't
    fully populate the pagecache of the file it is reading from[1], potentially
    leading to a jump to NULL if this doesn't exist.  shmem, however, (and by
    extension, tmpfs, ramfs and rootfs), doesn't have ->read_folio(),

    Work around this by equipping shmem with its own splice-read
    implementation, based on filemap_splice_read(), but able to paste in
    zero_page when there's a page missing.

    Signed-off-by: David Howells <dhowells@redhat.com>
    cc: Daniel Golle <daniel@makrotopia.org>
    cc: Guenter Roeck <groeck7@gmail.com>
    cc: Christoph Hellwig <hch@lst.de>
    cc: Jens Axboe <axboe@kernel.dk>
    cc: Al Viro <viro@zeniv.linux.org.uk>
    cc: John Hubbard <jhubbard@nvidia.com>
    cc: David Hildenbrand <david@redhat.com>
    cc: Matthew Wilcox <willy@infradead.org>
    cc: Hugh Dickins <hughd@google.com>
    cc: linux-block@vger.kernel.org
    cc: linux-fsdevel@vger.kernel.org
    cc: linux-mm@kvack.org
    Link: https://lore.kernel.org/r/Y+pdHFFTk1TTEBsO@makrotopia.org/ [1]
    Link: https://lore.kernel.org/r/20230522135018.2742245-10-dhowells@redhat.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-09-05 20:35:52 -04:00
Lucas Zampieri 2424e8e040 Merge: mm: follow up work for the MM v6.4 update and disable CONFIG_PER_VMA_LOCK until it is fixed
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4749

JIRA: https://issues.redhat.com/browse/RHEL-48221  
  
It was identified that our process to bring in code-base updates   
has been unwittingly missing some of the peripheric commits not   
touching directly the core code under mm/ the directory.  
While most of these identified peripheric commits are simple  
and basic clean-ups, some are relevant changesets that might end   
up causing real(and subtle) issues for RHEL deployments if they  
remain missing.   
  
The intent of this patchset is to close the aforementioned GAP  
by bringing in the missing peripheric commits from v5.14 up to  
v6.4, which is the level we're parking our codebase for RHEL-9.5.  
  
A secondary intent of this patchset is to bring in upstream's   
v6.5 commit that disables the PER_VMA_LOCK feature which was   
recently introduced (to RHEL-9.5) but was marked BROKEN upstream  
circa release v6.5, in order to avoid the reported issues with  
memory corruptions in the upstream builds.  
  
Signed-off-by: Rafael Aquini <aquini@redhat.com>

Approved-by: Mark Langsdorf <mlangsdo@redhat.com>
Approved-by: Waiman Long <longman@redhat.com>
Approved-by: David Arcari <darcari@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Lucas Zampieri <lzampier@redhat.com>
2024-08-06 14:21:52 +00:00