Centos-kernel-stream-9

Commit Graph

Author	SHA1	Message	Date
Rafael Aquini	160e863cf9	mm: shmem: remove unnecessary warning in shmem_writepage() JIRA: https://issues.redhat.com/browse/RHEL-84184 This patch is a backport of the following upstream commit: commit adae46ac1e38a288b14f0298e27412adcba83f8e Author: Ricardo Cañuelo Navarro <rcn@igalia.com> Date: Wed Feb 26 13:26:27 2025 +0100 mm: shmem: remove unnecessary warning in shmem_writepage() Although the scenario where shmem_writepage() is called with info->flags & VM_LOCKED is unlikely to happen, it's still possible, as evidenced by syzbot [1]. However, the warning in this case isn't necessary because the situation is already handled correctly [2]. [2] https://lore.kernel.org/lkml/8afe1f7f-31a2-4fc0-1fbd-f9ba8a116fe3@google.com/ Link: https://lkml.kernel.org/r/20250226-20250221-warning-in-shmem_writepage-v1-1-5ad19420e17e@igalia.com Fixes: 9a976f0c847b ("shmem: skip page split if we're not reclaiming") Signed-off-by: Ricardo Cañuelo Navarro <rcn@igalia.com> Reported-by: Pengfei Xu <pengfei.xu@intel.com> Closes: https://lore.kernel.org/lkml/ZZ9PShXjKJkVelNm@xpf.sh.intel.com/ [1] Suggested-by: Hugh Dickins <hughd@google.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Florent Revest <revest@chromium.org> Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Florent Revest <revest@chromium.org> Cc: Luis Chamberalin <mcgrof@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2025-04-18 08:39:59 -04:00
Ryan Sullivan	2e2ebe63c2	fs: super_set_uuid() JIRA: https://issues.redhat.com/browse/RHEL-8810 Some weird old filesytems have UUID-like things that we wish to expose as UUIDs, but are smaller; add a length field so that the new FS_IOC_(GET\|SET)UUID ioctls can handle them in generic code. And add a helper super_set_uuid(), for setting nonstandard length uuids. Helper is now required for the new FS_IOC_GETUUID ioctl; if super_set_uuid() hasn't been called, the ioctl won't be supported. Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Link: https://lore.kernel.org/r/20240207025624.1019754-2-kent.overstreet@linux.dev Signed-off-by: Christian Brauner <brauner@kernel.org> (cherry picked from commit a4af51ce229b1e1eab003966dbfebf9d80093a77) Signed-off-by: Ryan Sullivan <rysulliv@redhat.com>	2025-02-07 17:06:38 -05:00
Herton R. Krzesinski	30fd4705e8	mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling JIRA: https://issues.redhat.com/browse/RHEL-68912 Conflicts: small context differences in headers inclusion and because we do not have "mm: mmap: map MAP_STACK to VM_NOHUGEPAGE" applied commit 5baf8b037debf4ec60108ccfeccb8636d1dbad81 Author: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Date: Tue Oct 29 18:11:47 2024 +0000 mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling Currently MTE is permitted in two circumstances (desiring to use MTE having been specified by the VM_MTE flag) - where MAP_ANONYMOUS is specified, as checked by arch_calc_vm_flag_bits() and actualised by setting the VM_MTE_ALLOWED flag, or if the file backing the mapping is shmem, in which case we set VM_MTE_ALLOWED in shmem_mmap() when the mmap hook is activated in mmap_region(). The function that checks that, if VM_MTE is set, VM_MTE_ALLOWED is also set is the arm64 implementation of arch_validate_flags(). Unfortunately, we intend to refactor mmap_region() to perform this check earlier, meaning that in the case of a shmem backing we will not have invoked shmem_mmap() yet, causing the mapping to fail spuriously. It is inappropriate to set this architecture-specific flag in general mm code anyway, so a sensible resolution of this issue is to instead move the check somewhere else. We resolve this by setting VM_MTE_ALLOWED much earlier in do_mmap(), via the arch_calc_vm_flag_bits() call. This is an appropriate place to do this as we already check for the MAP_ANONYMOUS case here, and the shmem file case is simply a variant of the same idea - we permit RAM-backed memory. This requires a modification to the arch_calc_vm_flag_bits() signature to pass in a pointer to the struct file associated with the mapping, however this is not too egregious as this is only used by two architectures anyway - arm64 and parisc. So this patch performs this adjustment and removes the unnecessary assignment of VM_MTE_ALLOWED in shmem_mmap(). [akpm@linux-foundation.org: fix whitespace, per Catalin] Link: https://lkml.kernel.org/r/ec251b20ba1964fb64cf1607d2ad80c47f3873df.1730224667.git.lorenzo.stoakes@oracle.com Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Jann Horn <jannh@google.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andreas Larsson <andreas@gaisler.com> Cc: David S. Miller <davem@davemloft.net> Cc: Helge Deller <deller@gmx.de> Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Brown <broonie@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Will Deacon <will@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Herton R. Krzesinski <herton@redhat.com>	2024-12-09 16:30:34 -03:00
Rafael Aquini	d17d9c61a3	mm: revert "mm: shmem: fix data-race in shmem_getattr()" JIRA: https://issues.redhat.com/browse/RHEL-27745 JIRA: https://issues.redhat.com/browse/RHEL-70053 CVE: CVE-2024-53136 Conflicts: * minor context difference due to RHEL9 missing upstream commit 0d72b92883c6 ("fs: pass the request_mask to generic_fillattr") and its related series, as well as upstream commit e1e4cfd01a6e ("mm,tmpfs: consider end of file write in shmem_is_huge") This patch is a backport of the following upstream commit: commit d1aa0c04294e29883d65eac6c2f72fe95cc7c049 Author: Andrew Morton <akpm@linux-foundation.org> Date: Fri Nov 15 16:57:24 2024 -0800 mm: revert "mm: shmem: fix data-race in shmem_getattr()" Revert d949d1d14fa2 ("mm: shmem: fix data-race in shmem_getattr()") as suggested by Chuck [1]. It is causing deadlocks when accessing tmpfs over NFS. As Hugh commented, "added just to silence a syzbot sanitizer splat: added where there has never been any practical problem". Link: https://lkml.kernel.org/r/ZzdxKF39VEmXSSyN@tissot.1015granger.net [1] Fixes: d949d1d14fa2 ("mm: shmem: fix data-race in shmem_getattr()") Acked-by: Hugh Dickins <hughd@google.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Jeongjun Park <aha310510@gmail.com> Cc: Yu Zhao <yuzhao@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-12-09 12:26:00 -05:00
Rafael Aquini	70afff39cc	mm: shmem: fix data-race in shmem_getattr() JIRA: https://issues.redhat.com/browse/RHEL-27745 JIRA: https://issues.redhat.com/browse/RHEL-66818 CVE: CVE-2024-50228 Conflicts: * minor context difference due to RHEL9 missing upstream commit 0d72b92883c6 ("fs: pass the request_mask to generic_fillattr") and its related series, as well as upstream commit e1e4cfd01a6e ("mm,tmpfs: consider end of file write in shmem_is_huge") This patch is a backport of the following upstream commit: commit d949d1d14fa281ace388b1de978e8f2cd52875cf Author: Jeongjun Park <aha310510@gmail.com> Date: Mon Sep 9 21:35:58 2024 +0900 mm: shmem: fix data-race in shmem_getattr() I got the following KCSAN report during syzbot testing: ================================================================== BUG: KCSAN: data-race in generic_fillattr / inode_set_ctime_current write to 0xffff888102eb3260 of 4 bytes by task 6565 on cpu 1: inode_set_ctime_to_ts include/linux/fs.h:1638 [inline] inode_set_ctime_current+0x169/0x1d0 fs/inode.c:2626 shmem_mknod+0x117/0x180 mm/shmem.c:3443 shmem_create+0x34/0x40 mm/shmem.c:3497 lookup_open fs/namei.c:3578 [inline] open_last_lookups fs/namei.c:3647 [inline] path_openat+0xdbc/0x1f00 fs/namei.c:3883 do_filp_open+0xf7/0x200 fs/namei.c:3913 do_sys_openat2+0xab/0x120 fs/open.c:1416 do_sys_open fs/open.c:1431 [inline] __do_sys_openat fs/open.c:1447 [inline] __se_sys_openat fs/open.c:1442 [inline] __x64_sys_openat+0xf3/0x120 fs/open.c:1442 x64_sys_call+0x1025/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:258 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e read to 0xffff888102eb3260 of 4 bytes by task 3498 on cpu 0: inode_get_ctime_nsec include/linux/fs.h:1623 [inline] inode_get_ctime include/linux/fs.h:1629 [inline] generic_fillattr+0x1dd/0x2f0 fs/stat.c:62 shmem_getattr+0x17b/0x200 mm/shmem.c:1157 vfs_getattr_nosec fs/stat.c:166 [inline] vfs_getattr+0x19b/0x1e0 fs/stat.c:207 vfs_statx_path fs/stat.c:251 [inline] vfs_statx+0x134/0x2f0 fs/stat.c:315 vfs_fstatat+0xec/0x110 fs/stat.c:341 __do_sys_newfstatat fs/stat.c:505 [inline] __se_sys_newfstatat+0x58/0x260 fs/stat.c:499 __x64_sys_newfstatat+0x55/0x70 fs/stat.c:499 x64_sys_call+0x141f/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:263 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e value changed: 0x2755ae53 -> 0x27ee44d3 Reported by Kernel Concurrency Sanitizer on: CPU: 0 UID: 0 PID: 3498 Comm: udevd Not tainted 6.11.0-rc6-syzkaller-00326-gd1f2d51b711a-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024 ================================================================== When calling generic_fillattr(), if you don't hold read lock, data-race will occur in inode member variables, which can cause unexpected behavior. Since there is no special protection when shmem_getattr() calls generic_fillattr(), data-race occurs by functions such as shmem_unlink() or shmem_mknod(). This can cause unexpected results, so commenting it out is not enough. Therefore, when calling generic_fillattr() from shmem_getattr(), it is appropriate to protect the inode using inode_lock_shared() and inode_unlock_shared() to prevent data-race. Link: https://lkml.kernel.org/r/20240909123558.70229-1-aha310510@gmail.com Fixes: `44a30220bc` ("shmem: recalculate file inode when fstat") Signed-off-by: Jeongjun Park <aha310510@gmail.com> Reported-by: syzbot <syzkaller@googlegroup.com> Cc: Hugh Dickins <hughd@google.com> Cc: Yu Zhao <yuzhao@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-12-09 12:25:54 -05:00
Rafael Aquini	cb26b8cc87	mm/shmem: inline shmem_is_huge() for disabled transparent hugepages JIRA: https://issues.redhat.com/browse/RHEL-27745 This patch is a backport of the following upstream commit: commit 1f737846aa3c45f07a06fa0d018b39e1afb8084a Author: Sumanth Korikkar <sumanthk@linux.ibm.com> Date: Tue Apr 9 17:54:07 2024 +0200 mm/shmem: inline shmem_is_huge() for disabled transparent hugepages In order to minimize code size (CONFIG_CC_OPTIMIZE_FOR_SIZE=y), compiler might choose to make a regular function call (out-of-line) for shmem_is_huge() instead of inlining it. When transparent hugepages are disabled (CONFIG_TRANSPARENT_HUGEPAGE=n), it can cause compilation error. mm/shmem.c: In function `shmem_getattr': ./include/linux/huge_mm.h:383:27: note: in expansion of macro `BUILD_BUG' 383 \| #define HPAGE_PMD_SIZE ({ BUILD_BUG(); 0; }) \| ^~~~~~~~~ mm/shmem.c:1148:33: note: in expansion of macro `HPAGE_PMD_SIZE' 1148 \| stat->blksize = HPAGE_PMD_SIZE; To prevent the possible error, always inline shmem_is_huge() when transparent hugepages are disabled. Link: https://lkml.kernel.org/r/20240409155407.2322714-1-sumanthk@linux.ibm.com Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-12-09 12:24:49 -05:00
Rafael Aquini	fe6b91357e	zswap: memcontrol: implement zswap writeback disabling JIRA: https://issues.redhat.com/browse/RHEL-27745 This patch is a backport of the following upstream commit: commit 501a06fe8e4c185bbda371b8cedbdf1b23a633d8 Author: Nhat Pham <nphamcs@gmail.com> Date: Thu Dec 7 11:24:06 2023 -0800 zswap: memcontrol: implement zswap writeback disabling During our experiment with zswap, we sometimes observe swap IOs due to occasional zswap store failures and writebacks-to-swap. These swapping IOs prevent many users who cannot tolerate swapping from adopting zswap to save memory and improve performance where possible. This patch adds the option to disable this behavior entirely: do not writeback to backing swapping device when a zswap store attempt fail, and do not write pages in the zswap pool back to the backing swap device (both when the pool is full, and when the new zswap shrinker is called). This new behavior can be opted-in/out on a per-cgroup basis via a new cgroup file. By default, writebacks to swap device is enabled, which is the previous behavior. Initially, writeback is enabled for the root cgroup, and a newly created cgroup will inherit the current setting of its parent. Note that this is subtly different from setting memory.swap.max to 0, as it still allows for pages to be stored in the zswap pool (which itself consumes swap space in its current form). This patch should be applied on top of the zswap shrinker series: https://lore.kernel.org/linux-mm/20231130194023.4102148-1-nphamcs@gmail.com/ as it also disables the zswap shrinker, a major source of zswap writebacks. For the most part, this feature is motivated by internal parties who have already established their opinions regarding swapping - the workloads that are highly sensitive to IO, and especially those who are using servers with really slow disk performance (for instance, massive but slow HDDs). For these folks, it's impossible to convince them to even entertain zswap if swapping also comes as a packaged deal. Writeback disabling is quite a useful feature in these situations - on a mixed workloads deployment, they can disable writeback for the more IO-sensitive workloads, and enable writeback for other background workloads. For instance, on a server with HDD, I allocate memories and populate them with random values (so that zswap store will always fail), and specify memory.high low enough to trigger reclaim. The time it takes to allocate the memories and just read through it a couple of times (doing silly things like computing the values' average etc.): zswap.writeback disabled: real 0m30.537s user 0m23.687s sys 0m6.637s 0 pages swapped in 0 pages swapped out zswap.writeback enabled: real 0m45.061s user 0m24.310s sys 0m8.892s 712686 pages swapped in 461093 pages swapped out (the last two lines are from vmstat -s). [nphamcs@gmail.com: add a comment about recurring zswap store failures leading to reclaim inefficiency] Link: https://lkml.kernel.org/r/20231221005725.3446672-1-nphamcs@gmail.com Link: https://lkml.kernel.org/r/20231207192406.3809579-1-nphamcs@gmail.com Signed-off-by: Nhat Pham <nphamcs@gmail.com> Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Chris Li <chrisl@kernel.org> Cc: Dan Streetman <ddstreet@ieee.org> Cc: David Heidelberg <david@ixit.cz> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-12-09 12:24:12 -05:00
Rafael Aquini	cffebe7f1d	mm: convert swap_cluster_readahead and swap_vma_readahead to return a folio JIRA: https://issues.redhat.com/browse/RHEL-27745 This patch is a backport of the following upstream commit: commit a4575c4138db887bd27dc7f87cf7cfb0224c6f5e Author: Matthew Wilcox (Oracle) <willy@infradead.org> Date: Wed Dec 13 21:58:42 2023 +0000 mm: convert swap_cluster_readahead and swap_vma_readahead to return a folio shmem_swapin_cluster() immediately converts the page back to a folio, and swapin_readahead() may as well call folio_file_page() once instead of having each function call it. [willy@infradead.org: avoid NULL pointer deref] Link: https://lkml.kernel.org/r/ZYI7OcVlM1voKfBl@casper.infradead.org Link: https://lkml.kernel.org/r/20231213215842.671461-14-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-12-09 12:24:09 -05:00
Rafael Aquini	f9e926534b	mempolicy: alloc_pages_mpol() for NUMA policy without vma JIRA: https://issues.redhat.com/browse/RHEL-27745 Conflicts: * mm/swap.h, mm/swap_state.c, and mm/zwap.c: minor context differences due to out-of-oder backport of commit a65b0e7607cc ("zswap: make shrinking memcg-aware") This patch is a backport of the following upstream commit: commit ddc1a5cbc05dc62743a2f409b96faa5cf95ba064 Author: Hugh Dickins <hughd@google.com> Date: Thu Oct 19 13:39:08 2023 -0700 mempolicy: alloc_pages_mpol() for NUMA policy without vma Shrink shmem's stack usage by eliminating the pseudo-vma from its folio allocation. alloc_pages_mpol(gfp, order, pol, ilx, nid) becomes the principal actor for passing mempolicy choice down to __alloc_pages(), rather than vma_alloc_folio(gfp, order, vma, addr, hugepage). vma_alloc_folio() and alloc_pages() remain, but as wrappers around alloc_pages_mpol(). alloc_pages_bulk_*() untouched, except to provide the additional args to policy_nodemask(), which subsumes policy_node(). Cleanup throughout, cutting out some unhelpful "helpers". It would all be much simpler without MPOL_INTERLEAVE, but that adds a dynamic to the constant mpol: complicated by v3.6 commit `09c231cb8b` ("tmpfs: distribute interleave better across nodes"), which added ino bias to the interleave, hidden from mm/mempolicy.c until this commit. Hence "ilx" throughout, the "interleave index". Originally I thought it could be done just with nid, but that's wrong: the nodemask may come from the shared policy layer below a shmem vma, or it may come from the task layer above a shmem vma; and without the final nodemask then nodeid cannot be decided. And how ilx is applied depends also on page order. The interleave index is almost always irrelevant unless MPOL_INTERLEAVE: with one exception in alloc_pages_mpol(), where the NO_INTERLEAVE_INDEX passed down from vma-less alloc_pages() is also used as hint not to use THP-style hugepage allocation - to avoid the overhead of a hugepage arg (though I don't understand why we never just added a GFP bit for THP - if it actually needs a different allocation strategy from other pages of the same order). vma_alloc_folio() still carries its hugepage arg here, but it is not used, and should be removed when agreed. get_vma_policy() no longer allows a NULL vma: over time I believe we've eradicated all the places which used to need it e.g. swapoff and madvise used to pass NULL vma to read_swap_cache_async(), but now know the vma. [hughd@google.com: handle NULL mpol being passed to __read_swap_cache_async()] Link: https://lkml.kernel.org/r/ea419956-4751-0102-21f7-9c93cb957892@google.com Link: https://lkml.kernel.org/r/74e34633-6060-f5e3-aee-7040d43f2e93@google.com Link: https://lkml.kernel.org/r/1738368e-bac0-fd11-ed7f-b87142a939fe@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Huang Ying <ying.huang@intel.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tejun heo <tj@kernel.org> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Yosry Ahmed <yosryahmed@google.com> Cc: Domenico Cerasuolo <mimmocerasuolo@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-12-09 12:23:16 -05:00
Rafael Aquini	dce5c250e1	shmem: _add_to_page_cache() before shmem_inode_acct_blocks() JIRA: https://issues.redhat.com/browse/RHEL-27745 This patch is a backport of the following upstream commit: commit 3022fd7af9604d44ec43da8a4398872989599b18 Author: Hugh Dickins <hughd@google.com> Date: Fri Sep 29 20:32:40 2023 -0700 shmem: _add_to_page_cache() before shmem_inode_acct_blocks() There has been a recurring problem, that when a tmpfs volume is being filled by racing threads, some fail with ENOSPC (or consequent SIGBUS or EFAULT) even though all allocations were within the permitted size. This was a problem since early days, but magnified and complicated by the addition of huge pages. We have often worked around it by adding some slop to the tmpfs size, but it's hard to say how much is needed, and some users prefer not to do that e.g. keeping sparse files in a tightly tailored tmpfs helps to prevent accidental writing to holes. This comes from the allocation sequence: 1. check page cache for existing folio 2. check and reserve from vm_enough_memory 3. check and account from size of tmpfs 4. if huge, check page cache for overlapping folio 5. allocate physical folio, huge or small 6. check and charge from mem cgroup limit 7. add to page cache (but maybe another folio already got in). Concurrent tasks allocating at the same position could deplete the size allowance and fail. Doing vm_enough_memory and size checks before the folio allocation was intentional (to limit the load on the page allocator from this source) and still has some virtue; but memory cgroup never did that, so I think it's better reordered to favour predictable behaviour. 1. check page cache for existing folio 2. if huge, check page cache for overlapping folio 3. allocate physical folio, huge or small 4. check and charge from mem cgroup limit 5. add to page cache (but maybe another folio already got in) 6. check and reserve from vm_enough_memory 7. check and account from size of tmpfs. The folio lock held from allocation onwards ensures that the !uptodate folio cannot be used by others, and can safely be deleted from the cache if checks 6 or 7 subsequently fail (and those waiting on folio lock already check that the folio was not truncated once they get the lock); and the early addition to page cache ensures that racers find it before they try to duplicate the accounting. Seize the opportunity to tidy up shmem_get_folio_gfp()'s ENOSPC retrying, which can be combined inside the new shmem_alloc_and_add_folio(): doing 2 splits twice (once huge, once nonhuge) is not exactly equivalent to trying 5 splits (and giving up early on huge), but let's keep it simple unless more complication proves necessary. Userfaultfd is a foreign country: they do things differently there, and for good reason - to avoid mmap_lock deadlock. Leave ordering in shmem_mfill_atomic_pte() untouched for now, but I would rather like to mesh it better with shmem_get_folio_gfp() in the future. Link: https://lkml.kernel.org/r/22ddd06-d919-33b-1219-56335c1bf28e@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Carlos Maiolino <cem@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Dave Chinner <dchinner@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Tim Chen <tim.c.chen@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-12-09 12:22:44 -05:00
Rafael Aquini	a8e624508d	shmem: move memcg charge out of shmem_add_to_page_cache() JIRA: https://issues.redhat.com/browse/RHEL-27745 This patch is a backport of the following upstream commit: commit 054a9f7ccd0a60607fb9bbe1e06ca671494971bf Author: Hugh Dickins <hughd@google.com> Date: Fri Sep 29 20:31:27 2023 -0700 shmem: move memcg charge out of shmem_add_to_page_cache() Extract shmem's memcg charging out of shmem_add_to_page_cache(): it's misleading done there, because many calls are dealing with a swapcache page, whose memcg is nowadays always remembered while swapped out, then the charge re-levied when it's brought back into swapcache. Temporarily move it back up to the shmem_get_folio_gfp() level, where the memcg was charged before v5.8; but the next commit goes on to move it back down to a new home. In making this change, it becomes clear that shmem_swapin_folio() does not need to know the vma, just the fault mm (if any): call it fault_mm rather than charge_mm - let mem_cgroup_charge() decide whom to charge. Link: https://lkml.kernel.org/r/4b2143c5-bf32-64f0-841-81a81158dac@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Carlos Maiolino <cem@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Dave Chinner <dchinner@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Tim Chen <tim.c.chen@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-12-09 12:22:43 -05:00
Rafael Aquini	e313de880d	shmem: shmem_acct_blocks() and shmem_inode_acct_blocks() JIRA: https://issues.redhat.com/browse/RHEL-27745 This patch is a backport of the following upstream commit: commit 4199f51a7eb2054d68964efbd8d39c68053a8714 Author: Hugh Dickins <hughd@google.com> Date: Fri Sep 29 20:30:03 2023 -0700 shmem: shmem_acct_blocks() and shmem_inode_acct_blocks() By historical accident, shmem_acct_block() and shmem_inode_acct_block() were never pluralized when the pages argument was added, despite their complements being shmem_unacct_blocks() and shmem_inode_unacct_blocks() all along. It has been an irritation: fix their naming at last. Link: https://lkml.kernel.org/r/9124094-e4ab-8be7-ef80-9a87bdc2e4fc@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Carlos Maiolino <cem@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Dave Chinner <dchinner@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Tim Chen <tim.c.chen@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-12-09 12:22:43 -05:00
Rafael Aquini	85fde1fa80	shmem: factor shmem_falloc_wait() out of shmem_fault() JIRA: https://issues.redhat.com/browse/RHEL-27745 This patch is a backport of the following upstream commit: commit f0a9ad1d4d9ba3c694bca91d8d67be9a4a33b902 Author: Hugh Dickins <hughd@google.com> Date: Fri Sep 29 20:27:53 2023 -0700 shmem: factor shmem_falloc_wait() out of shmem_fault() That Trinity livelock shmem_falloc avoidance block is unlikely, and a distraction from the proper business of shmem_fault(): separate it out. (This used to help compilers save stack on the fault path too, but both gcc and clang nowadays seem to make better choices anyway.) Link: https://lkml.kernel.org/r/6fe379a4-6176-9225-9263-fe60d2633c0@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Carlos Maiolino <cem@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Dave Chinner <dchinner@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Tim Chen <tim.c.chen@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-12-09 12:22:42 -05:00
Rafael Aquini	d6ca007045	shmem: remove vma arg from shmem_get_folio_gfp() JIRA: https://issues.redhat.com/browse/RHEL-27745 This patch is a backport of the following upstream commit: commit e3e1a5067fd2f1b3f4f7c651f5b33082962d1aa1 Author: Hugh Dickins <hughd@google.com> Date: Fri Sep 29 20:26:53 2023 -0700 shmem: remove vma arg from shmem_get_folio_gfp() The vma is already there in vmf->vma, so no need for a separate arg. Link: https://lkml.kernel.org/r/d9ce6f65-a2ed-48f4-4299-fdb0544875c5@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Carlos Maiolino <cem@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Dave Chinner <dchinner@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Tim Chen <tim.c.chen@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-12-09 12:22:41 -05:00
Rafael Aquini	2495138fda	shmem: Refactor shmem_symlink() JIRA: https://issues.redhat.com/browse/RHEL-27745 This patch is a backport of the following upstream commit: commit 23a31d87645c652734f89f477f69ddac9aa402cb Author: Chuck Lever <chuck.lever@oracle.com> Date: Fri Jun 30 13:48:56 2023 -0400 shmem: Refactor shmem_symlink() De-duplicate the error handling paths. No change in behavior is expected. Suggested-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Message-Id: <168814733654.530310.9958360833543413152.stgit@manet.1015granger.net> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-12-09 12:21:44 -05:00
Rado Vrbovsky	c154c6dc53	Merge: fs: backport mnt_idmap type MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4324 JIRA: https://issues.redhat.com/browse/RHEL-33888 This MR back ports idmapping changes to sync. our RHEL-9 kernel with the upstream kernel to version 6.3. Our current kernel has idmapped mounts support but there have been many changes since this initial implementation in the base kernel. In particular we need the type safety changes and we have seen difficulty back porting other requested changes on more than one occassion. The Jira this MR has been raised for is arother example of such a request. It is needed for a back port of a BPF feature to RHEL 9 which allows BPF programs to do file verification with LSM and fsverity. To satisfy this request changes made in the upstream 6.3 kernel are needed which is the reason we have chosen upstream 6.3 as the target release for the MR. The first fix has been omitted because it appears to be the same as 24b5308cf5ee ("selftests/filesystems: grant executable permission to run_fat_tests.sh"). In any case the requirement is to make the path tools/testing/selftests/filesystems/fat/run_fat_tests.sh executable which is done. The second and third Omitted patches are a straight apply and revert leaving the source unchanged. Omitted-Fix: 1d4beeb4edc7 ("selftests/filesystems: grant executable permission to run_fat_tests.sh") Omitted-Fix: 4a47c6385bb4 ovl: turn of SB_POSIXACL with idmapped layers temporarily Omitted-Fix: 7c4d37c269ac Revert "ovl: turn of SB_POSIXACL with idmapped layers temporarily" Signed-off-by: Ian Kent <ikent@redhat.com> Approved-by: Scott Mayhew <smayhew@redhat.com> Approved-by: Chris von Recklinghausen <crecklin@redhat.com> Approved-by: Xin Long <lxin@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>	2024-11-11 08:26:30 +00:00
Rado Vrbovsky	570a71d7db	Merge: mm: update core code to v6.6 upstream MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5252 JIRA: https://issues.redhat.com/browse/RHEL-27743 JIRA: https://issues.redhat.com/browse/RHEL-59459 CVE: CVE-2024-46787 Depends: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4961 This MR brings RHEL9 core MM code up to upstream's v6.6 LTS level. This work follows up on the previous v6.5 update (RHEL-27742) and as such, the bulk of this changeset is comprised of refactoring and clean-ups of the internal implementation of several APIs as it further advances the conversion to FOLIOS, and follow up on the per-VMA locking changes. Also, with the rebase to v6.6 LTS, we complete the infrastructure to allow Control-flow Enforcement Technology, a.k.a. Shadow Stacks, for x86 builds, and we add a potential extra level of protection (assessment pending) to help on mitigating kernel heap exploits dubbed as "SlubStick". Follow-up fixes are omitted from this series either because they are irrelevant to the bits we support on RHEL or because they depend on bigger changesets introduced upstream more recently. A follow-up ticket (RHEL-27745) will deal with these and other cases separately. Omitted-fix: e540b8c5da04 ("mips: mm: add slab availability checking in ioremap_prot") Omitted-fix: f7875966dc0c ("tools headers UAPI: Sync files changed by new fchmodat2 and map_shadow_stack syscalls with the kernel sources") Omitted-fix: df39038cd895 ("s390/mm: Fix VM_FAULT_HWPOISON handling in do_exception()") Omitted-fix: 12bbaae7635a ("mm: create FOLIO_FLAG_FALSE and FOLIO_TYPE_OPS macros") Omitted-fix: fd1a745ce03e ("mm: support page_mapcount() on page_has_type() pages") Omitted-fix: d99e3140a4d3 ("mm: turn folio_test_hugetlb into a PageType") Omitted-fix: fa2690af573d ("mm: page_ref: remove folio_try_get_rcu()") Omitted-fix: f442fa614137 ("mm: gup: stop abusing try_grab_folio") Omitted-fix: cb0f01beb166 ("mm/mprotect: fix dax pud handling") Signed-off-by: Rafael Aquini <raquini@redhat.com> Approved-by: John W. Linville <linville@redhat.com> Approved-by: Mark Salter <msalter@redhat.com> Approved-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Approved-by: Chris von Recklinghausen <crecklin@redhat.com> Approved-by: Steve Best <sbest@redhat.com> Approved-by: David Airlie <airlied@redhat.com> Approved-by: Michal Schmidt <mschmidt@redhat.com> Approved-by: Baoquan He <5820488-baoquan_he@users.noreply.gitlab.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>	2024-10-30 07:22:28 +00:00
Ian Kent	6836d0308d	fs: port i_{g,u}id_{needs_}update() to mnt_idmap JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus Conflicts: Update to add incremental changes needed due to CentOS Stream commit `469e1d13f6` ("shmem: quota support"). commit 0dbe12f2e49c046444461b5f4be49df2cafb3a40 Author: Christian Brauner <brauner@kernel.org> Date: Fri Jan 13 12:49:29 2023 +0100 fs: port i_{g,u}id_{needs_}update() to mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-16 10:45:32 +08:00
Ian Kent	95a4490e2f	quota: port to mnt_idmap JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus Conflicts: Hunk #1 against fs/f2fs/file.c failed but I cannot see any reason for it, manually apply change. Update to add incremental changes needed due to CentOS Stream commit `469e1d13f6` ("shmem: quota support"). commit f861646a65623bcff91d544acbc4413d62d97b79 Author: Christian Brauner <brauner@kernel.org> Date: Fri Jan 13 12:49:28 2023 +0100 quota: port to mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-16 10:45:31 +08:00
Ian Kent	2171c567b5	fs: port inode_init_owner() to mnt_idmap JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus Conflicts: For consistency drop btrfs hunks because it isn't supported in CentOS Stream and other backports also drop such hunks. CentOS Stream does not have upstream commit 3db1de0e582c3 ("f2fs: change the current atomic write way") so there is no call to f2fs_get_tmpfile() in f2fs_ioc_start_atomic_write() to change. The above patch also adds the definition of f2fs_get_tmpfile() to fs/f2fs/f2fs.h so it's not there to change resulting in a hunk reject for fs/f2fs/f2fs.h. Upstream commit 787caf1bdcd9f ("f2fs: fix to enable compress for newly created file if extension matches") is not present in CentOS Stream resulting in a number of rejects against fs/f2fs/namei.c, manually apply these changes. Dropped hunks for ntfs3 because the source is not present in the CentOS Stream source tree. CentOS Stream commit `892da692fa` ("shmem: support idmapped mounts for tmpfs") which causes a reject in fs/shmem.c, manually apply the hunk (note: taking account of these changes at the times they are needed will result in an updated mm/shmem.c once this series is completed). Update to add incremental changes needed due to CentOS Stream commit `469e1d13f6` ("shmem: quota support"). commit f2d40141d5d90b882e2c35b226f9244a63b82b6e Author: Christian Brauner <brauner@kernel.org> Date: Fri Jan 13 12:49:25 2023 +0100 fs: port inode_init_owner() to mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-16 10:45:26 +08:00
Ian Kent	92d69b838d	fs: port xattr to mnt_idmap JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus Conflicts: The cifs source has been moved in CentOS Stream so manually apply rejected hunk to fs/smb/client/xattr.c. Dropped hunks for ntfs3 because the source is not present in the CentOS Stream source tree. CentOS Stream commit `98ba731fc7` ("ovl: Move xattr support to new xattrs.c file") moved ovl_own_xattr_set(), manually apply changes. CentOS Stream commit `67e2fcb2f3` ("evm: don't copy up 'security.evm' xattr") is present causing hunk #1 against include/linux/evm.h to be rejected, manually apply. Upstream commit 5d1ef2ce13a90 ("ima: Introduce ima_get_current_hash_algo()") is not present in CentOS Stream which causes fuzz 1 for hunk #1 against include/linux/ima.h. There's a reject of hunk #1 for include/linux/lsm_hooks.h but I can't see any reason for it, manually applied the hunk. CentOS Stream does not have upstream commit ce5bb5a86e5eb ("ima: Return int in the functions to measure a buffer") which results in a reject of hunk #2 against security/integrity/ima/ima.h and hunks #8 and #11 against security/integrity/ima/ima_main.c, so manually apply hunks. There also appears to be a whitespace mismatch causing hunk #7 to report fuzz 2 on application. CentOS Stream does not have upstream commit c7423dbdbc9ec ("ima: Handle -ESTALE returned by ima_filter_rule_match()") which results in a reject of hunk #3 against security/integrity/ima/ima_policy.c, so manually apply hunk. commit 39f60c1ccee72caa0104145b5dbf5d37cce1ea39 Author: Christian Brauner <brauner@kernel.org> Date: Fri Jan 13 12:49:23 2023 +0100 fs: port xattr to mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-16 10:45:21 +08:00
Ian Kent	060dc0b240	fs: port ->fileattr_set() to pass mnt_idmap JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus Conflicts: For consistency drop btrfs hunks because it isn't supported in CentOS Stream and other backports also drop such hunks. commit 8782a9aea3ab4d697ad67d1f8ebca38a4e1c24ab Author: Christian Brauner <brauner@kernel.org> Date: Fri Jan 13 12:49:21 2023 +0100 fs: port ->fileattr_set() to pass mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-16 10:45:18 +08:00
Ian Kent	be97228574	fs: port ->set_acl() to pass mnt_idmap JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus Conflicts: For consistency drop btrfs hunks because it isn't supported in CentOS Stream and other backports also drop such hunks. The cifs source has been moved in CentOS Stream so manually apply rejected hunks to fs/smb/client/cifsacl.c and fs/smb/client/cifsproto.h. Dropped hunks for ntfs3 and ksmbd because the source is not present in the CentOS Stream source tree. CentOS Stream commit `892da692fa` ("shmem: support idmapped mounts for tmpfs") is present, which cuases hunk #1 against mm/shmem.c to be rejected, manually apply the hunk. CentOS Stream commit `48fa94aacd` ("ceph: fscrypt_auth handling for ceph") is present which causes fuzz 1 of hunk #1 against fs/ceph/inode.c. commit 13e83a4923bea7c4f2f6714030cb7e56d20ef7e5 Author: Christian Brauner <brauner@kernel.org> Date: Fri Jan 13 12:49:20 2023 +0100 fs: port ->set_acl() to pass mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-16 10:45:12 +08:00
Ian Kent	0dcf7b37eb	fs: port ->tmpfile() to pass mnt_idmap JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus Conflicts: For consistency drop btrfs hunks because it isn't supported in CentOS Stream and other backports also drop such hunks. Upstream commit 863f144f12add ("vfs: open inside ->tmpfile()") is not present which caused a reject in fs/f2fs/namei.c for hunk #1, applied manually. The hunk of the patch against fs/minix/namei.c was rejected but I can't see any reason for it, applied manually. CentOS Stream has commit `9e0a1fff8d` ("ubifs: Implement RENAME_WHITEOUT") which caused a reject in the hunk against fs/ubifs/dir.c, manually applied. commit 011e2b717b1b921d3706a9d48ff83a025563e826 Author: Christian Brauner <brauner@kernel.org> Date: Fri Jan 13 12:49:18 2023 +0100 fs: port ->tmpfile() to pass mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-16 10:45:10 +08:00
Ian Kent	956e3ad810	fs: port ->mknod() to pass mnt_idmap JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus Conflicts: For consistency drop btrfs hunks because it isn't supported in CentOS Stream and other backports also drop such hunks. The cifs source has been moved in CentOS Stream so manually apply rejected hunks to fs/smb/client/cifsfs.h and fs/smb/client/dir.c. Dropped hunks for ntfs3 because the source is not present in the CentOS Stream source tree. CentOS Stream commit `892da692fa` ("shmem: support idmapped mounts for tmpfs") is present, which cuases hunks #2-#4 to be rejected, manually apply the hunks. CentOS Stream commit `f0f830cd7e` ("ceph: create symlinks with encrypted and base64-encoded targets") is present and resulted in fuzz against fs/ceph/dir.c hunk #2. Upstream commit 863f144f12add ("vfs: open inside ->tmpfile()") is missing causing fuzz against fs/ext2/namei.c. Upstream commit 7d37539037c2f ("fuse: implement ->tmpfile()") is missing causing fuzz in hunk #4 against fs/fuse/dir.c. CentOS Stream commit `892da692fa` ("shmem: support idmapped mounts for tmpfs") is present, so a patch reorder was needed with appropriate adjustments. commit 5ebb29bee8d5fc173b774e0755be8cb335503ee3 Author: Christian Brauner <brauner@kernel.org> Date: Fri Jan 13 12:49:16 2023 +0100 fs: port ->mknod() to pass mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-16 10:45:08 +08:00
Ian Kent	19f3b4f1ba	fs: port ->rename() to pass mnt_idmap JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus Conflicts: For consistency drop btrfs hunks because it isn't supported in CentOS Stream and other backports also drop such hunks. The cifs source has been moved in CentOS Stream so manually apply rejected hunks to fs/smb/client/cifsfs.h and fs/smb/client/inode.c. Dropped hunks for ntfs3 because the source is not present in the CentOS Stream source tree. Upstream commit cc14d24026704 ("hpfs: Convert symlinks to read_folio") is not present which causes fuzz 1 for hunk #1. CentOS Stream commit `892da692fa` ("shmem: support idmapped mounts for tmpfs") is present, so a patch reorder was needed with appropriate adjustments. commit e18275ae55e07a2937e48134589c2f4c1d99a369 Author: Christian Brauner <brauner@kernel.org> Date: Fri Jan 13 12:49:17 2023 +0100 fs: port ->rename() to pass mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-16 10:45:07 +08:00
Ian Kent	a7750be4f4	fs: port ->mkdir() to pass mnt_idmap JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus Conflicts: For consistency drop btrfs hunks because it isn't supported in CentOS Stream and other backports also drop such hunks. The cifs source has been moved in CentOS Stream so manually apply rejected hunks to fs/smb/client/cifsfs.h and fs/smb/client/inode.c. Dropped hunks for ntfs3 because the source is not present in the CentOS Stream source tree. commit c54bd91e9eaba43f09aadc25b52ea869ff3b5587 Author: Christian Brauner <brauner@kernel.org> Date: Fri Jan 13 12:49:15 2023 +0100 fs: port ->mkdir() to pass mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-16 10:45:00 +08:00
Ian Kent	5744ba0ee3	fs: port ->symlink() to pass mnt_idmap JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus Conflicts: The cifs source has been moved in CentOS Stream so manually apply rejected hunks to fs/smb/client/cifsfs.h and fs/smb/client/link.c. Dropped hunks for ntfs3 because the source is not present in the CentOS Stream source tree. CentOS Stream commit `f0f830cd7e` ("ceph: create symlinks with encrypted and base64-encoded targets") is present and resulted in fuzz against fs/ceph/dir.c. commit 7a77db95511c39be4b2db2ceca152ef589adc2dc Author: Christian Brauner <brauner@kernel.org> Date: Fri Jan 13 12:49:14 2023 +0100 fs: port ->symlink() to pass mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-16 10:45:00 +08:00
Ian Kent	a56d1daadf	fs: port ->create() to pass mnt_idmap JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus Conflicts: For consistency drop btrfs hunks because it isn't supported in CentOS Stream and other backports also drop such hunks. The cifs source has been moved in CentOS Stream so manually apply rejected hunks to fs/smb/client/cifsfs.h and fs/smb/client/dir.c. Dropped hunks for ntfs3 because the source is not present in the CentOS Stream source tree. CentOS Stream commit `892da692fa` ("shmem: support idmapped mounts for tmpfs") is present, which cuases fuzz in mm/shmem.c. commit 6c960e68aaed335a0040f16654f3c5e5bfcf9249 Author: Christian Brauner <brauner@kernel.org> Date: Fri Jan 13 12:49:13 2023 +0100 fs: port ->create() to pass mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-16 10:44:53 +08:00
Ian Kent	6ad3fa5fce	fs: port ->getattr() to pass mnt_idmap JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus Conflicts: CentOS Stream has commit `3e0b6f1fa9` ("afs: use read_seqbegin() in afs_check_validity() and afs_getattr()"), manually apply hunk #2 to fs/afs/inode.c. CentOS Stream commit `3b06927229` {"afs: split afs_pagecache_valid() out of afs_validate()") is present which causes a reject in fs/afs/internal.h, manually apply hunk to fs/afs/internal.h. For consistency drop btrfs hunks because it isn't supported in CentOS Stream and other backports also drop such hunks. CentOS Stream commit `48fa94aacd` ("ceph: fscrypt_auth handling for ceph") alters the definition of _ceph_setattr() causing fuzz. The cifs source has been moved in CentOS Stream so manually apply rejected hunks to fs/smb/client/cifsfs.h and fs/smb/client/inode.c. Upstream commit `2e1d66379e` ("staging: erofs: drop the extern prefix for function definitions") caused strange behaviour when applying this patch, there was a conflict in fs/erofs/internal.h but after a refresh the hunk and context looked ok. The hunk had to be manually applied. Upstream commit 2db0487faa211 ("f2fs: move f2fs_force_buffered_io() into file.c") is not present in CentOS Stream which causes fuzz when applying the first hunk to fs/f2fs/file.c. Upstream commit 30abce053f811 ("fat: report creation time in statx") is not present in CentOS Stream which caused a reject so apply change manually. Dropped hunks for ksmbd because the source is not present in the CentOS Stream source tree. Dropped hunks for ntfs3 because the source is not present in the CentOS Stream source tree. There was fuzz with hunk #2 against fs/nfs/inode.c but I was unable to see any difference. CentOS Stream commit `98ba731fc7` ("ovl: Move xattr support to new xattrs.c file") is present which caused fuzz in fs/overlayfs/overlayfs.h. Upstream commit d919a1e79bac8 ("proc: fix a dentry lock race between release_task and lookup") is not present in CentOS Stream causing fuzz applying hunk #1 against fs/proc/base.c. CentOS Stream commit `20c470188c` ("vfs: plumb i_version handling into struct kstat") is present causing fuzz in hunk #2 against fs/stat.c. Upstream commit e0c49bd2b4d3c ("fs: sysv: Fix sysv_nblocks() returns wrong value") is not present in CentOS Stream causing fuzz applying hunk#1 against fs/sysv/itree.c. CentOS Stream commit `892da692fa` ("shmem: support idmapped mounts for tmpfs") is present so it's ok to pass idmap to generic_fillattr(). CentOS Stream commit `f0f830cd7e` {"ceph: create symlinks with encrypted and base64-encoded targets") uses the old struct user_namespace and so leaves those changes out, make those getattr() changes here. Allow for CentOS Stream commit `6c3396a0d8` ("kernfs: Introduce separate rwsem to protect inode attributes") which is already present. CentOS Stream commit `f5219db0c0` ("KVM: fix Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory") updated the upstream commit a7800aa80ea4d ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory") to account for missing idmapping commits. Now we have updated the second and final place these changes were made make the final needed adjustment to match the original upstream patch. commit b74d24f7a74ffd2d42ca883d84b7422b8d545901 Author: Christian Brauner <brauner@kernel.org> Date: Fri Jan 13 12:49:12 2023 +0100 fs: port ->getattr() to pass mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-16 09:37:45 +08:00
Ian Kent	43ca440cdf	fs: port ->setattr() to pass mnt_idmap JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus Conflicts: CentOS Stream commit `3c29fadfb1` ("afs: split afs_pagecache_valid() out of afs_validate()") is present, manually adjust hunk #1 of fs/afs/internal.h. For consistency drop btrfs hunks because it isn't supported in CentOS Stream and other backports also drop such hunks. CentOS Stream commit `48fa94aacd` ("ceph: fscrypt_auth handling for ceph") alters the definition of _ceph_setattr(), adjust manually. CentOS Stream commit `34b2a2b5a3` {"ceph: add some fscrypt guardrails") introduces a call to fscrypt_prepare_setattr() which causes fuzz when applying. The cifs source has been moved in CentOS Stream so manually apply rejected hunks to fs/smb/client/cifsfs.h and fs/smb/client/inode.c. Upstream commit 5a646fb3a3e2d ("coda: avoid doing bad things on inode type changes during revalidation") is not present which causes fuzz in fs/coda/coda_linux.h. Dropped hunks for ntfs3 because the source is not present in the CentOS Stream source tree. CentOS Stream commit `98ba731fc7` ("ovl: Move xattr support to new xattrs.c file") is presnt so manually apply hunk. CentOS Stream commit `892da692fa` ("shmem: support idmapped mounts for tmpfs") is present so it's ok to pass idmap to setattr_prepare() and setattr_copy(). Update to add incremental changes needed due to CentOS Stream commit `469e1d13f6` ("shmem: quota support"). Allow for CentOS Stream commit `6c3396a0d8` ("kernfs: Introduce separate rwsem to protect inode attributes") which is already present. CentOS Stream commit `f5219db0c0` ("KVM: fix Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory") updated the upstream commit a7800aa80ea4d ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory") to account for missing idmapping commits. Now we have updated one of the two places these changes were made make one of the needed adjustments to match the original upstream patch. commit c1632a0f11209338fc300c66252bcc4686e609e8 Author: Christian Brauner <brauner@kernel.org> Date: Fri Jan 13 12:49:11 2023 +0100 fs: port ->setattr() to pass mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-16 09:07:05 +08:00
Ian Kent	310906db16	fs: pass dentry to set acl method JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus Conflicts: I didn't want to just drop the btrfs hunks so I made the change to btrfs_setattr() init_user_ns instead of the expected mnt_userns. That should at least cause a conflict if btrfs changes to a supported fs in the future. CentOS Stream commit `48fa94aacd` ("ceph: fscrypt_auth handling for ceph") is present, make necessary adjustment. CentOS Stream commit `892da692fa` ("shmem: support idmapped mounts for tmpfs") is present, make necessary adjustment. The changes for fs/ksmbd/* were dropped as the directory doesn't exist in CentOS Stream. The changes for fs/ntfs3/* were dropped as the directory doesn't exist in CentOS Stream. commit 138060ba92b3b0d77c8e6818d0f33398b23ea42e Author: Christian Brauner <brauner@kernel.org> Date: Fri Sep 23 10:29:39 2022 +0200 fs: pass dentry to set acl method The current way of setting and getting posix acls through the generic xattr interface is error prone and type unsafe. The vfs needs to interpret and fixup posix acls before storing or reporting it to userspace. Various hacks exist to make this work. The code is hard to understand and difficult to maintain in it's current form. Instead of making this work by hacking posix acls through xattr handlers we are building a dedicated posix acl api around the get and set inode operations. This removes a lot of hackiness and makes the codepaths easier to maintain. A lot of background can be found in [1]. Since some filesystem rely on the dentry being available to them when setting posix acls (e.g., 9p and cifs) they cannot rely on set acl inode operation. But since ->set_acl() is required in order to use the generic posix acl xattr handlers filesystems that do not implement this inode operation cannot use the handler and need to implement their own dedicated posix acl handlers. Update the ->set_acl() inode method to take a dentry argument. This allows all filesystems to rely on ->set_acl(). As far as I can tell all codepaths can be switched to rely on the dentry instead of just the inode. Note that the original motivation for passing the dentry separate from the inode instead of just the dentry in the xattr handlers was because of security modules that call security_d_instantiate(). This hook is called during d_instantiate_new(), d_add(), __d_instantiate_anon(), and d_splice_alias() to initialize the inode's security context and possibly to set security.* xattrs. Since this only affects security.* xattrs this is completely irrelevant for posix acls. Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1] Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-15 16:11:25 +08:00
Ian Kent	8763195146	attr: port attribute changes to new types JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus Conflict: Hunk 2 of fs/f2fs/file.c failed to apply but the source looked identical and required manual application. Hunks 2 and 3 failed to apply to fs/attr.c due to CentOS Stream commit `33c38120a3` ("fs: account for group membership") having already been applied requiring manual application. Update to add incremental changes needed due to CentOS Stream ("shmem: quota support"). commit b27c82e1296572cfa3997e58db3118a33915f85c Author: Christian Brauner <brauner@kernel.org> Date: Tue Jun 21 16:14:54 2022 +0200 attr: port attribute changes to new types Now that we introduced new infrastructure to increase the type safety for filesystems supporting idmapped mounts port the first part of the vfs over to them. This ports the attribute changes codepaths to rely on the new better helpers using a dedicated type. Before this change we used to take a shortcut and place the actual values that would be written to inode->i_{g,u}id into struct iattr. This had the advantage that we moved idmappings mostly out of the picture early on but it made reasoning about changes more difficult than it should be. The filesystem was never explicitly told that it dealt with an idmapped mount. The transition to the value that needed to be stored in inode->i_{g,u}id appeared way too early and increased the probability of bugs in various codepaths. We know place the same value in struct iattr no matter if this is an idmapped mount or not. The vfs will only deal with type safe vfs{g,u}id_t. This makes it massively safer to perform permission checks as the type will tell us what checks we need to perform and what helpers we need to use. Fileystems raising FS_ALLOW_IDMAP can't simply write ia_vfs{g,u}id to inode->i_{g,u}id since they are different types. Instead they need to use the dedicated vfs{g,u}id_to_k{g,u}id() helpers that map the vfs{g,u}id into the filesystem. The other nice effect is that filesystems like overlayfs don't need to care about idmappings explicitly anymore and can simply set up struct iattr accordingly directly. Link: https://lore.kernel.org/lkml/CAHk-=win6+ahs1EwLkcq8apqLi_1wXFWbrPf340zYEhObpz4jA@mail.gmail.com [1] Link: https://lore.kernel.org/r/20220621141454.2914719-9-brauner@kernel.org Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> CC: linux-fsdevel@vger.kernel.org Reviewed-by: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-15 16:10:59 +08:00
Ian Kent	0d2dd7a477	quota: port quota helpers mount ids JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus Conflict: There was a conflict in a hunk applied to f2fs_setattr() but the source looked identical and required manual application. Update to account for changes to is_quota_modification() and dquot_transfer() from CentOS Stream commit `469e1d13f6` ("shmem: quota support"). commit 71e7b535b8900d7ce7d5279fa472711db5251ae5 Author: Christian Brauner <brauner@kernel.org> Date: Tue Jun 21 16:14:52 2022 +0200 quota: port quota helpers mount ids Port the is_quota_modification() and dqout_transfer() helper to type safe vfs{g,u}id_t. Since these helpers are only called by a few filesystems don't introduce a new helper but simply extend the existing helpers to pass down the mount's idmapping. Note, that this is a non-functional change, i.e. nothing will have happened here or at the end of this series to how quota are done! This a change necessary because we will at the end of this series make ownership changes easier to reason about by keeping the original value in struct iattr for both non-idmapped and idmapped mounts. For now we always pass the initial idmapping which makes the idmapping functions these helpers call nops. This is done because we currently always pass the actual value to be written to i_{g,u}id via struct iattr. While this allowed us to treat the {g,u}id values in struct iattr as values that can be directly written to inode->i_{g,u}id it also increases the potential for confusion for filesystems. Now that we are have dedicated types to prevent this confusion we will ultimately only map the value from the idmapped mount into a filesystem value that can be written to inode->i_{g,u}id when the filesystem actually updates the inode. So pass down the initial idmapping until we finished that conversion at which point we pass down the mount's idmapping. Since struct iattr uses an anonymous union with overlapping types as supported by the C standard, filesystems that haven't converted to ia_vfs{g,u}id won't see any difference and things will continue to work as before. In other words, no functional changes intended with this change. Link: https://lore.kernel.org/r/20220621141454.2914719-7-brauner@kernel.org Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> CC: linux-fsdevel@vger.kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-15 16:10:58 +08:00
Ian Kent	00383cd059	fs: port to iattr ownership update helpers JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus Conflicts: Update to use the iattrs update helpers in mm/shmem.c due to the quota changes from CentOS Stream commit `469e1d13f6` ("shmem: quota support"). commit 35faf3109a78516f60ca13f957083d5e5535fde0 Author: Christian Brauner <brauner@kernel.org> Date: Tue Jun 21 16:14:51 2022 +0200 fs: port to iattr ownership update helpers Earlier we introduced new helpers to abstract ownership update and remove code duplication. This converts all filesystems supporting idmapped mounts to make use of these new helpers. For now we always pass the initial idmapping which makes the idmapping functions these helpers call nops. This is done because we currently always pass the actual value to be written to i_{g,u}id via struct iattr. While this allowed us to treat the {g,u}id values in struct iattr as values that can be directly written to inode->i_{g,u}id it also increases the potential for confusion for filesystems. Now that we are have dedicated types to prevent this confusion we will ultimately only map the value from the idmapped mount into a filesystem value that can be written to inode->i_{g,u}id when the filesystem actually updates the inode. So pass down the initial idmapping until we finished that conversion at which point we pass down the mount's idmapping. No functional changes intended. Link: https://lore.kernel.org/r/20220621141454.2914719-6-brauner@kernel.org Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> CC: linux-fsdevel@vger.kernel.org Reviewed-by: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-15 16:10:57 +08:00
Rafael Aquini	cd1cd44bf9	mm/swap: inline folio_set_swap_entry() and folio_swap_entry() JIRA: https://issues.redhat.com/browse/RHEL-27743 This patch is a backport of the following upstream commit: commit 3d2c908768877714a354ee6d7bf93e801400d5e2 Author: David Hildenbrand <david@redhat.com> Date: Mon Aug 21 18:08:48 2023 +0200 mm/swap: inline folio_set_swap_entry() and folio_swap_entry() Let's simply work on the folio directly and remove the helpers. Link: https://lkml.kernel.org/r/20230821160849.531668-4-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Suggested-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Chris Li <chrisl@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Hugh Dickins <hughd@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-10-01 11:22:06 -04:00
Rafael Aquini	db6591e712	tmpfs: trivial support for direct IO JIRA: https://issues.redhat.com/browse/RHEL-27743 This patch is a backport of the following upstream commit: commit e88e0d366f9cfbb810b0c8509dc5d130d5a53e02 Author: Hugh Dickins <hughd@google.com> Date: Thu Aug 10 23:27:07 2023 -0700 tmpfs: trivial support for direct IO Depending upon your philosophical viewpoint, either tmpfs always does direct IO, or it cannot ever do direct IO; but whichever, if tmpfs is to stand in for a more sophisticated filesystem, it can be helpful for tmpfs to support O_DIRECT. So, give tmpfs a shmem_file_open() method, to set the FMODE_CAN_ODIRECT flag: then unchanged shmem_file_read_iter() and new shmem_file_write_iter() do the work (without any shmem_direct_IO() stub). Perhaps later, once the direct_IO method has been eliminated from all filesystems, generic_file_write_iter() will be such that tmpfs can again use it, even for O_DIRECT. xfstests auto generic which were not run on tmpfs before but now pass: 036 091 113 125 130 133 135 198 207 208 209 210 211 212 214 226 239 263 323 355 391 406 412 422 427 446 451 465 551 586 591 609 615 647 708 729 with no new failures. LTP dio tests which were not run on tmpfs before but now pass: dio01 through dio30, except for dio04 and dio10, which fail because tmpfs dio read and write allow odd count: tmpfs could be made stricter, but would that be an improvement? Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <6f2742-6f1f-cae9-7c5b-ed20fc53215@google.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-10-01 11:21:37 -04:00
Rafael Aquini	81b2421a05	tmpfs: track free_ispace instead of free_inodes JIRA: https://issues.redhat.com/browse/RHEL-27743 This patch is a backport of the following upstream commit: commit e07c469e979c104464300aaa3b7923f929055cd0 Author: Hugh Dickins <hughd@google.com> Date: Tue Aug 8 21:32:21 2023 -0700 tmpfs: track free_ispace instead of free_inodes In preparation for assigning some inode space to extended attributes, keep track of free_ispace instead of number of free_inodes: as if one tmpfs inode (and accompanying dentry) occupies very approximately 1KiB. Unsigned long is large enough for free_ispace, on 64-bit and on 32-bit: but take care to enforce the maximum. And fix the nr_blocks maximum on 32-bit: S64_MAX would be too big for it there, so say LONG_MAX instead. Delete the incorrect limited<->unlimited blocks/inodes comment above shmem_reconfigure(): leave it to the error messages below to describe. Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Message-Id: <4fe1739-d9e7-8dfd-5bce-12e7339711da@google.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-10-01 11:21:35 -04:00
Rafael Aquini	d41514ca9f	xattr: simple_xattr_set() return old_xattr to be freed JIRA: https://issues.redhat.com/browse/RHEL-27743 Conflicts: * mm/shmem.c: this commit had a merge conflict upstream with commit 6528733416f1 ("shmem: convert to ctime accessor functions"), backported earlier in this set. The conflict was solved via merge commit ecd7db20474c ("Merge tag 'v6.6-vfs.tmpfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs"), from which we borrow the hunk adjustment for this backport. This patch is a backport of the following upstream commit: commit 5de75970c9fd7220e394b76e6d20fbafa1369b5a Author: Hugh Dickins <hughd@google.com> Date: Tue Aug 8 21:30:59 2023 -0700 xattr: simple_xattr_set() return old_xattr to be freed tmpfs wants to support limited user extended attributes, but kernfs (or cgroupfs, the only kernfs with KERNFS_ROOT_SUPPORT_USER_XATTR) already supports user extended attributes through simple xattrs: but limited by a policy (128KiB per inode) too liberal to be used on tmpfs. To allow a different limiting policy for tmpfs, without affecting the policy for kernfs, change simple_xattr_set() to return the replaced or removed xattr (if any), leaving the caller to update their accounting then free the xattr (by simple_xattr_free(), renamed from the static free_simple_xattr()). Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Message-Id: <158c6585-2aa7-d4aa-90ff-f7c3f8fe407c@google.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-10-01 11:21:34 -04:00
Rafael Aquini	f8d1f89f03	mm/shmem.c: use helper macro K() JIRA: https://issues.redhat.com/browse/RHEL-27743 This patch is a backport of the following upstream commit: commit b91742d84d29c39b643992b95560cfb7337eab18 Author: ZhangPeng <zhangpeng362@huawei.com> Date: Fri Aug 4 09:25:56 2023 +0800 mm/shmem.c: use helper macro K() Use helper macro K() to improve code readability. No functional modification involved. Link: https://lkml.kernel.org/r/20230804012559.2617515-5-zhangpeng362@huawei.com Signed-off-by: ZhangPeng <zhangpeng362@huawei.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Nanyong Sun <sunnanyong@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-10-01 11:20:55 -04:00
Rafael Aquini	a87aa37b6f	tmpfs: verify {g,u}id mount options correctly JIRA: https://issues.redhat.com/browse/RHEL-27743 This patch is a backport of the following upstream commit: commit 0200679fc7953177941e41c2a4241d0b6c2c5de8 Author: Christian Brauner <brauner@kernel.org> Date: Tue Aug 1 18:17:04 2023 +0200 tmpfs: verify {g,u}id mount options correctly A while ago we received the following report: "The other outstanding issue I noticed comes from the fact that fsconfig syscalls may occur in a different userns than that which called fsopen. That means that resolving the uid/gid via current_user_ns() can save a kuid that isn't mapped in the associated namespace when the filesystem is finally mounted. This means that it is possible for an unprivileged user to create files owned by any group in a tmpfs mount (since we can set the SUID bit on the tmpfs directory), or a tmpfs that is owned by any user, including the root group/user." The contract for {g,u}id mount options and {g,u}id values in general set from userspace has always been that they are translated according to the caller's idmapping. In so far, tmpfs has been doing the correct thing. But since tmpfs is mountable in unprivileged contexts it is also necessary to verify that the resulting {k,g}uid is representable in the namespace of the superblock to avoid such bugs as above. The new mount api's cross-namespace delegation abilities are already widely used. After having talked to a bunch of userspace this is the most faithful solution with minimal regression risks. I know of one users - systemd - that makes use of the new mount api in this way and they don't set unresolable {g,u}ids. So the regression risk is minimal. Link: https://lore.kernel.org/lkml/CALxfFW4BXhEwxR0Q5LSkg-8Vb4r2MONKCcUCVioehXQKr35eHg@mail.gmail.com Fixes: `f32356261d` ("vfs: Convert ramfs, shmem, tmpfs, devtmpfs, rootfs to use the new mount API") Reviewed-by: "Seth Forshee (DigitalOcean)" <sforshee@kernel.org> Reported-by: Seth Jenkins <sethjenkins@google.com> Message-Id: <20230801-vfs-fs_context-uidgid-v1-1-daf46a050bbf@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-10-01 11:20:14 -04:00
Rafael Aquini	4b5fb83182	mm: make PTE_MARKER_SWAPIN_ERROR more general JIRA: https://issues.redhat.com/browse/RHEL-27743 This patch is a backport of the following upstream commit: commit af19487f00f34ff8643921d7909dbb3fedc7e329 Author: Axel Rasmussen <axelrasmussen@google.com> Date: Fri Jul 7 14:55:33 2023 -0700 mm: make PTE_MARKER_SWAPIN_ERROR more general Patch series "add UFFDIO_POISON to simulate memory poisoning with UFFD", v4. This series adds a new userfaultfd feature, UFFDIO_POISON. See commit 4 for a detailed description of the feature. This patch (of 8): Future patches will reuse PTE_MARKER_SWAPIN_ERROR to implement UFFDIO_POISON, so make some various preparations for that: First, rename it to just PTE_MARKER_POISONED. The "SWAPIN" can be confusing since we're going to re-use it for something not really related to swap. This can be particularly confusing for things like hugetlbfs, which doesn't support swap whatsoever. Also rename some various helper functions. Next, fix pte marker copying for hugetlbfs. Previously, it would WARN on seeing a PTE_MARKER_SWAPIN_ERROR, since hugetlbfs doesn't support swap. But, since we're going to re-use it, we want it to go ahead and copy it just like non-hugetlbfs memory does today. Since the code to do this is more complicated now, pull it out into a helper which can be re-used in both places. While we're at it, also make it slightly more explicit in its handling of e.g. uffd wp markers. For non-hugetlbfs page faults, instead of returning VM_FAULT_SIGBUS for an error entry, return VM_FAULT_HWPOISON. For most cases this change doesn't matter, e.g. a userspace program would receive a SIGBUS either way. But for UFFDIO_POISON, this change will let KVM guests get an MCE out of the box, instead of giving a SIGBUS to the hypervisor and requiring it to somehow inject an MCE. Finally, for hugetlbfs faults, handle PTE_MARKER_POISONED, and return VM_FAULT_HWPOISON_LARGE in such cases. Note that this can't happen today because the lack of swap support means we'll never end up with such a PTE anyway, but this behavior will be needed once such entries can show up via UFFDIO_POISON. Link: https://lkml.kernel.org/r/20230707215540.2324998-1-axelrasmussen@google.com Link: https://lkml.kernel.org/r/20230707215540.2324998-2-axelrasmussen@google.com Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> Acked-by: Peter Xu <peterx@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Brian Geffon <bgeffon@google.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: James Houghton <jthoughton@google.com> Cc: Jan Alexander Steffens (heftig) <heftig@archlinux.org> Cc: Jiaqi Yan <jiaqiyan@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nadav Amit <namit@vmware.com> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: T.J. Alumbaugh <talumbau@google.com> Cc: Yu Zhao <yuzhao@google.com> Cc: ZhangPeng <zhangpeng362@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-10-01 11:18:03 -04:00
Rafael Aquini	79e59ae792	shmem: convert to ctime accessor functions JIRA: https://issues.redhat.com/browse/RHEL-27743 Conflicts: * mm/shmem.c: minor context conflicts on the 2nd, 3rd, and 4th hunks due to RHEL missing commits 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts") and its follow-up series, as well as due to out-of-order backport of commit 7a80e5b8c6fa ("shmem: support idmapped mounts for tmpfs") which folds bits of commit 138060ba92b3 ("fs: pass dentry to set acl method") This patch is a backport of the following upstream commit: commit 6528733416f13dd67eda1f34e74a2242af36d638 Author: Jeff Layton <jlayton@kernel.org> Date: Wed Jul 5 15:01:52 2023 -0400 shmem: convert to ctime accessor functions In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230705190309.579783-85-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-10-01 11:17:47 -04:00
Rafael Aquini	d14035b708	shmem: convert to simple_rename_timestamp JIRA: https://issues.redhat.com/browse/RHEL-27743 This patch is a backport of the following upstream commit: commit 944d0d9def9de37f0209ff73f3d8daa1baccab67 Author: Jeff Layton <jlayton@kernel.org> Date: Wed Jul 5 15:00:36 2023 -0400 shmem: convert to simple_rename_timestamp A rename potentially involves updating 4 different inode timestamps. Convert to the new simple_rename_timestamp helper function. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230705190309.579783-9-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-10-01 11:17:46 -04:00
Rafael Aquini	89b7c01962	mm: increase usage of folio_next_index() helper JIRA: https://issues.redhat.com/browse/RHEL-27743 This patch is a backport of the following upstream commit: commit 87b11f862254396a93636f0998377ac3f6648f5f Author: Sidhartha Kumar <sidhartha.kumar@oracle.com> Date: Tue Jun 27 10:43:49 2023 -0700 mm: increase usage of folio_next_index() helper Simplify code pattern of 'folio->index + folio_nr_pages(folio)' by using the existing helper folio_next_index(). Link: https://lkml.kernel.org/r/20230627174349.491803-1-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Suggested-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-10-01 11:17:23 -04:00
Waiman Long	1af51c424f	mm: shmem: fix getting incorrect lruvec when replacing a shmem folio JIRA: https://issues.redhat.com/browse/RHEL-56023 commit 9094b4a1c76cfe84b906cc152bab34d4ba26fa5c Author: Baolin Wang <baolin.wang@linux.alibaba.com> Date: Thu, 13 Jun 2024 16:21:19 +0800 mm: shmem: fix getting incorrect lruvec when replacing a shmem folio When testing shmem swapin, I encountered the warning below on my machine. The reason is that replacing an old shmem folio with a new one causes mem_cgroup_migrate() to clear the old folio's memcg data. As a result, the old folio cannot get the correct memcg's lruvec needed to remove itself from the LRU list when it is being freed. This could lead to possible serious problems, such as LRU list crashes due to holding the wrong LRU lock, and incorrect LRU statistics. To fix this issue, we can fallback to use the mem_cgroup_replace_folio() to replace the old shmem folio. [ 5241.100311] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x5d9960 [ 5241.100317] head: order:4 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [ 5241.100319] flags: 0x17fffe0000040068(uptodate\|lru\|head\|swapbacked\|node=0\|zone=2\|lastcpupid=0x3ffff) [ 5241.100323] raw: 17fffe0000040068 fffffdffd6687948 fffffdffd69ae008 0000000000000000 [ 5241.100325] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 [ 5241.100326] head: 17fffe0000040068 fffffdffd6687948 fffffdffd69ae008 0000000000000000 [ 5241.100327] head: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 [ 5241.100328] head: 17fffe0000000204 fffffdffd6665801 ffffffffffffffff 0000000000000000 [ 5241.100329] head: 0000000a00000010 0000000000000000 00000000ffffffff 0000000000000000 [ 5241.100330] page dumped because: VM_WARN_ON_ONCE_FOLIO(!memcg && !mem_cgroup_disabled()) [ 5241.100338] ------------[ cut here ]------------ [ 5241.100339] WARNING: CPU: 19 PID: 78402 at include/linux/memcontrol.h:775 folio_lruvec_lock_irqsave+0x140/0x150 [...] [ 5241.100374] pc : folio_lruvec_lock_irqsave+0x140/0x150 [ 5241.100375] lr : folio_lruvec_lock_irqsave+0x138/0x150 [ 5241.100376] sp : ffff80008b38b930 [...] [ 5241.100398] Call trace: [ 5241.100399] folio_lruvec_lock_irqsave+0x140/0x150 [ 5241.100401] __page_cache_release+0x90/0x300 [ 5241.100404] __folio_put+0x50/0x108 [ 5241.100406] shmem_replace_folio+0x1b4/0x240 [ 5241.100409] shmem_swapin_folio+0x314/0x528 [ 5241.100411] shmem_get_folio_gfp+0x3b4/0x930 [ 5241.100412] shmem_fault+0x74/0x160 [ 5241.100414] __do_fault+0x40/0x218 [ 5241.100417] do_shared_fault+0x34/0x1b0 [ 5241.100419] do_fault+0x40/0x168 [ 5241.100420] handle_pte_fault+0x80/0x228 [ 5241.100422] __handle_mm_fault+0x1c4/0x440 [ 5241.100424] handle_mm_fault+0x60/0x1f0 [ 5241.100426] do_page_fault+0x120/0x488 [ 5241.100429] do_translation_fault+0x4c/0x68 [ 5241.100431] do_mem_abort+0x48/0xa0 [ 5241.100434] el0_da+0x38/0xc0 [ 5241.100436] el0t_64_sync_handler+0x68/0xc0 [ 5241.100437] el0t_64_sync+0x14c/0x150 [ 5241.100439] ---[ end trace 0000000000000000 ]--- [baolin.wang@linux.alibaba.com: remove less helpful comments, per Matthew] Link: https://lkml.kernel.org/r/ccad3fe1375b468ebca3227b6b729f3eaf9d8046.1718423197.git.baolin.wang@linux.alibaba.com Link: https://lkml.kernel.org/r/3c11000dd6c1df83015a8321a859e9775ebbc23e.1718266112.git.baolin.wang@linux.alibaba.com Fixes: 85ce2c517ade ("memcontrol: only transfer the memcg data for migration") Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Muchun Song <songmuchun@bytedance.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Waiman Long <longman@redhat.com>	2024-09-30 09:47:03 -04:00
Rafael Aquini	0a98879655	shmem: minor fixes to splice-read implementation JIRA: https://issues.redhat.com/browse/RHEL-27742 This patch is a backport of the following upstream commit: commit fa598952fac059054316dccb2213478ccb81a0d1 Author: Hugh Dickins <hughd@google.com> Date: Sun Jul 23 14:05:54 2023 -0700 shmem: minor fixes to splice-read implementation HWPoison: my reading of folio_test_hwpoison() is that it only tests the head page of a large folio, whereas splice_folio_into_pipe() will splice as much of the folio as it can: so for safety we should also check the has_hwpoisoned flag, set if any of the folio's pages are hwpoisoned. (Perhaps that ugliness can be improved at the mm end later.) The call to splice_zeropage_into_pipe() risked overrunning past EOF: ask it for "part" not "len". Link: https://lkml.kernel.org/r/32c72c9c-72a8-115f-407d-f0148f368@google.com Fixes: bd194b187115 ("shmem: Implement splice-read") Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: David Howells <dhowells@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-09-05 20:35:53 -04:00
Rafael Aquini	3af248f180	mm: shmem: fix UAF bug in shmem_show_options() JIRA: https://issues.redhat.com/browse/RHEL-27742 Conflicts: * minor context diff due to out-of-order backport of upstream's v6.6 commit b4d3de57cab2 ("shmem: properly report quota mount options") This patch is a backport of the following upstream commit: commit 283ebdee2da30f65cba04c8fe690b97acfc7f4c4 Author: Tu Jinjiang <tujinjiang@huawei.com> Date: Thu May 25 11:16:40 2023 +0800 mm: shmem: fix UAF bug in shmem_show_options() shmem_show_options() uses sbinfo->mpol without adding it's refcnt. This may lead to race with replacement of the mpol by remount. The execution sequence is as follows. CPU0 CPU1 shmem_show_options() shmem_reconfigure() shmem_show_mpol(seq, sbinfo->mpol) mpol = sbinfo->mpol mpol_put(mpol) mpol->mode The KASAN report is as follows. BUG: KASAN: slab-use-after-free in shmem_show_options+0x21b/0x340 Read of size 2 at addr ffff888124324004 by task mount/2388 CPU: 2 PID: 2388 Comm: mount Not tainted 6.4.0-rc3-00017-g9d646009f65d-dirty #8 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x37/0x50 print_report+0xd0/0x620 ? shmem_show_options+0x21b/0x340 ? __virt_addr_valid+0xf4/0x180 ? shmem_show_options+0x21b/0x340 kasan_report+0xb8/0xe0 ? shmem_show_options+0x21b/0x340 shmem_show_options+0x21b/0x340 ? __pfx_shmem_show_options+0x10/0x10 ? strchr+0x2c/0x50 ? strlen+0x23/0x40 ? seq_puts+0x7d/0x90 show_vfsmnt+0x1e6/0x260 ? __pfx_show_vfsmnt+0x10/0x10 ? __kasan_kmalloc+0x7f/0x90 seq_read_iter+0x57a/0x740 vfs_read+0x2e2/0x4a0 ? __pfx_vfs_read+0x10/0x10 ? down_write_killable+0xb8/0x140 ? __pfx_down_write_killable+0x10/0x10 ? __fget_light+0xa9/0x1e0 ? up_write+0x3f/0x80 ksys_read+0xb8/0x150 ? __pfx_ksys_read+0x10/0x10 ? fpregs_assert_state_consistent+0x55/0x60 ? exit_to_user_mode_prepare+0x2d/0x120 do_syscall_64+0x3c/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc </TASK> Allocated by task 2387: kasan_save_stack+0x22/0x50 kasan_set_track+0x25/0x30 __kasan_slab_alloc+0x59/0x70 kmem_cache_alloc+0xdd/0x220 mpol_new+0x83/0x150 mpol_parse_str+0x280/0x4a0 shmem_parse_one+0x364/0x520 vfs_parse_fs_param+0xf8/0x1a0 vfs_parse_fs_string+0xc9/0x130 shmem_parse_options+0xb2/0x110 path_mount+0x597/0xdf0 do_mount+0xcd/0xf0 __x64_sys_mount+0xbd/0x100 do_syscall_64+0x3c/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc Freed by task 2389: kasan_save_stack+0x22/0x50 kasan_set_track+0x25/0x30 kasan_save_free_info+0x2e/0x50 __kasan_slab_free+0x10e/0x1a0 kmem_cache_free+0x9c/0x350 shmem_reconfigure+0x278/0x370 reconfigure_super+0x383/0x450 path_mount+0xcc5/0xdf0 do_mount+0xcd/0xf0 __x64_sys_mount+0xbd/0x100 do_syscall_64+0x3c/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc The buggy address belongs to the object at ffff888124324000 which belongs to the cache numa_policy of size 32 The buggy address is located 4 bytes inside of freed 32-byte region [ffff888124324000, ffff888124324020) ================================================================== To fix the bug, shmem_get_sbmpol() / mpol_put() needs to be called before / after shmem_show_mpol() call. Link: https://lkml.kernel.org/r/20230525031640.593733-1-tujinjiang@huawei.com Signed-off-by: Tu Jinjiang <tujinjiang@huawei.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Nanyong Sun <sunnanyong@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-09-05 20:35:52 -04:00
Rafael Aquini	b0287192e6	shmem: Implement splice-read JIRA: https://issues.redhat.com/browse/RHEL-27742 This patch is a backport of the following upstream commit: commit bd194b187115da7b98b660b049315f6c9c8267d1 Author: David Howells <dhowells@redhat.com> Date: Mon May 22 14:49:56 2023 +0100 shmem: Implement splice-read The new filemap_splice_read() has an implicit expectation via filemap_get_pages() that ->read_folio() exists if ->readahead() doesn't fully populate the pagecache of the file it is reading from[1], potentially leading to a jump to NULL if this doesn't exist. shmem, however, (and by extension, tmpfs, ramfs and rootfs), doesn't have ->read_folio(), Work around this by equipping shmem with its own splice-read implementation, based on filemap_splice_read(), but able to paste in zero_page when there's a page missing. Signed-off-by: David Howells <dhowells@redhat.com> cc: Daniel Golle <daniel@makrotopia.org> cc: Guenter Roeck <groeck7@gmail.com> cc: Christoph Hellwig <hch@lst.de> cc: Jens Axboe <axboe@kernel.dk> cc: Al Viro <viro@zeniv.linux.org.uk> cc: John Hubbard <jhubbard@nvidia.com> cc: David Hildenbrand <david@redhat.com> cc: Matthew Wilcox <willy@infradead.org> cc: Hugh Dickins <hughd@google.com> cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/Y+pdHFFTk1TTEBsO@makrotopia.org/ [1] Link: https://lore.kernel.org/r/20230522135018.2742245-10-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-09-05 20:35:52 -04:00
Lucas Zampieri	2424e8e040	Merge: mm: follow up work for the MM v6.4 update and disable CONFIG_PER_VMA_LOCK until it is fixed MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4749 JIRA: https://issues.redhat.com/browse/RHEL-48221 It was identified that our process to bring in code-base updates has been unwittingly missing some of the peripheric commits not touching directly the core code under mm/ the directory. While most of these identified peripheric commits are simple and basic clean-ups, some are relevant changesets that might end up causing real(and subtle) issues for RHEL deployments if they remain missing. The intent of this patchset is to close the aforementioned GAP by bringing in the missing peripheric commits from v5.14 up to v6.4, which is the level we're parking our codebase for RHEL-9.5. A secondary intent of this patchset is to bring in upstream's v6.5 commit that disables the PER_VMA_LOCK feature which was recently introduced (to RHEL-9.5) but was marked BROKEN upstream circa release v6.5, in order to avoid the reported issues with memory corruptions in the upstream builds. Signed-off-by: Rafael Aquini <aquini@redhat.com> Approved-by: Mark Langsdorf <mlangsdo@redhat.com> Approved-by: Waiman Long <longman@redhat.com> Approved-by: David Arcari <darcari@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Lucas Zampieri <lzampier@redhat.com>	2024-08-06 14:21:52 +00:00

1 2 3 4 5 ...

833 Commits