Centos-kernel-stream-9

Commit Graph

Author	SHA1	Message	Date
Ming Lei	ec1abb2bd5	fs: Move enum rw_hint into a new header file JIRA: https://issues.redhat.com/browse/RHEL-79409 commit fe3944fb245ab99570552a3bf970b00058a9ca6d Author: Bart Van Assche <bvanassche@acm.org> Date: Fri Feb 2 12:39:23 2024 -0800 fs: Move enum rw_hint into a new header file Move enum rw_hint into a new header file to prepare for using this data type in the block layer. Add the attribute __packed to reduce the space occupied by instances of this data type from four bytes to one byte. Change the data type of i_write_hint from u8 into enum rw_hint. Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Chao Yu <chao@kernel.org> # for the F2FS part Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240202203926.2478590-5-bvanassche@acm.org Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2025-03-14 16:48:20 +08:00
Rafael Aquini	43fc22497a	fs: improve dump_mapping() robustness JIRA: https://issues.redhat.com/browse/RHEL-27745 This patch is a backport of the following upstream commit: commit 8b3d838139bcd1e552f1899191f734264ce2a1a5 Author: Baolin Wang <baolin.wang@linux.alibaba.com> Date: Tue Jan 16 15:53:35 2024 +0800 fs: improve dump_mapping() robustness We met a kernel crash issue when running stress-ng testing, and the system crashes when printing the dentry name in dump_mapping(). Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 pc : dentry_name+0xd8/0x224 lr : pointer+0x22c/0x370 sp : ffff800025f134c0 ...... Call trace: dentry_name+0xd8/0x224 pointer+0x22c/0x370 vsnprintf+0x1ec/0x730 vscnprintf+0x2c/0x60 vprintk_store+0x70/0x234 vprintk_emit+0xe0/0x24c vprintk_default+0x3c/0x44 vprintk_func+0x84/0x2d0 printk+0x64/0x88 __dump_page+0x52c/0x530 dump_page+0x14/0x20 set_migratetype_isolate+0x110/0x224 start_isolate_page_range+0xc4/0x20c offline_pages+0x124/0x474 memory_block_offline+0x44/0xf4 memory_subsys_offline+0x3c/0x70 device_offline+0xf0/0x120 ...... The root cause is that, one thread is doing page migration, and we will use the target page's ->mapping field to save 'anon_vma' pointer between page unmap and page move, and now the target page is locked and refcount is 1. Currently, there is another stress-ng thread performing memory hotplug, attempting to offline the target page that is being migrated. It discovers that the refcount of this target page is 1, preventing the offline operation, thus proceeding to dump the page. However, page_mapping() of the target page may return an incorrect file mapping to crash the system in dump_mapping(), since the target page->mapping only saves 'anon_vma' pointer without setting PAGE_MAPPING_ANON flag. The page migration issue has been fixed by commit d1adb25df711 ("mm: migrate: fix getting incorrect page mapping during page migration"). In addition, Matthew suggested we should also improve dump_mapping()'s robustness to resilient against the kernel crash [1]. With checking the 'dentry.parent' and 'dentry.d_name.name' used by dentry_name(), I can see dump_mapping() will output the invalid dentry instead of crashing the system when this issue is reproduced again. [12211.189128] page:fffff7de047741c0 refcount:1 mapcount:0 mapping:ffff989117f55ea0 index:0x1 pfn:0x211dd07 [12211.189144] aops:0x0 ino:1 invalid dentry:74786574206e6870 [12211.189148] flags: 0x57ffffc0000001(locked\|node=1\|zone=2\|lastcpupid=0x1fffff) [12211.189150] page_type: 0xffffffff() [12211.189153] raw: 0057ffffc0000001 0000000000000000 dead000000000122 ffff989117f55ea0 [12211.189154] raw: 0000000000000001 0000000000000001 00000001ffffffff 0000000000000000 [12211.189155] page dumped because: unmovable page [1] https://lore.kernel.org/all/ZXxn%2F0oixJxxAnpF@casper.infradead.org/ Suggested-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Link: https://lore.kernel.org/r/937ab1f87328516821d39be672b6bc18861d9d3e.1705391420.git.baolin.wang@linux.alibaba.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-12-09 12:24:10 -05:00
CKI Backport Bot	a1abdec3cc	fs/inode: Prevent dump_mapping() accessing invalid dentry.d_name.name JIRA: https://issues.redhat.com/browse/RHEL-64530 CVE: CVE-2024-49934 commit 7f7b850689ac06a62befe26e1fd1806799e7f152 Author: Li Zhijian <lizhijian@fujitsu.com> Date: Mon Aug 26 13:55:03 2024 +0800 fs/inode: Prevent dump_mapping() accessing invalid dentry.d_name.name It's observed that a crash occurs during hot-remove a memory device, in which user is accessing the hugetlb. See calltrace as following: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 14045 at arch/x86/mm/fault.c:1278 do_user_addr_fault+0x2a0/0x790 Modules linked in: kmem device_dax cxl_mem cxl_pmem cxl_port cxl_pci dax_hmem dax_pmem nd_pmem cxl_acpi nd_btt cxl_core crc32c_intel nvme virtiofs fuse nvme_core nfit libnvdimm dm_multipath scsi_dh_rdac scsi_dh_emc s mirror dm_region_hash dm_log dm_mod CPU: 1 PID: 14045 Comm: daxctl Not tainted 6.10.0-rc2-lizhijian+ #492 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 RIP: 0010:do_user_addr_fault+0x2a0/0x790 Code: 48 8b 00 a8 04 0f 84 b5 fe ff ff e9 1c ff ff ff 4c 89 e9 4c 89 e2 be 01 00 00 00 bf 02 00 00 00 e8 b5 ef 24 00 e9 42 fe ff ff <0f> 0b 48 83 c4 08 4c 89 ea 48 89 ee 4c 89 e7 5b 5d 41 5c 41 5d 41 RSP: 0000:ffffc90000a575f0 EFLAGS: 00010046 RAX: ffff88800c303600 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000001000 RSI: ffffffff82504162 RDI: ffffffff824b2c36 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffc90000a57658 R13: 0000000000001000 R14: ffff88800bc2e040 R15: 0000000000000000 FS: 00007f51cb57d880(0000) GS:ffff88807fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000001000 CR3: 00000000072e2004 CR4: 00000000001706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __warn+0x8d/0x190 ? do_user_addr_fault+0x2a0/0x790 ? report_bug+0x1c3/0x1d0 ? handle_bug+0x3c/0x70 ? exc_invalid_op+0x14/0x70 ? asm_exc_invalid_op+0x16/0x20 ? do_user_addr_fault+0x2a0/0x790 ? exc_page_fault+0x31/0x200 exc_page_fault+0x68/0x200 <...snip...> BUG: unable to handle page fault for address: 0000000000001000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 800000000ad92067 P4D 800000000ad92067 PUD 7677067 PMD 0 Oops: Oops: 0000 [#1] PREEMPT SMP PTI ---[ end trace 0000000000000000 ]--- BUG: unable to handle page fault for address: 0000000000001000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 800000000ad92067 P4D 800000000ad92067 PUD 7677067 PMD 0 Oops: Oops: 0000 [#1] PREEMPT SMP PTI CPU: 1 PID: 14045 Comm: daxctl Kdump: loaded Tainted: G W 6.10.0-rc2-lizhijian+ #492 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 RIP: 0010:dentry_name+0x1f4/0x440 <...snip...> ? dentry_name+0x2fa/0x440 vsnprintf+0x1f3/0x4f0 vprintk_store+0x23a/0x540 vprintk_emit+0x6d/0x330 _printk+0x58/0x80 dump_mapping+0x10b/0x1a0 ? __pfx_free_object_rcu+0x10/0x10 __dump_page+0x26b/0x3e0 ? vprintk_emit+0xe0/0x330 ? _printk+0x58/0x80 ? dump_page+0x17/0x50 dump_page+0x17/0x50 do_migrate_range+0x2f7/0x7f0 ? do_migrate_range+0x42/0x7f0 ? offline_pages+0x2f4/0x8c0 offline_pages+0x60a/0x8c0 memory_subsys_offline+0x9f/0x1c0 ? lockdep_hardirqs_on+0x77/0x100 ? _raw_spin_unlock_irqrestore+0x38/0x60 device_offline+0xe3/0x110 state_store+0x6e/0xc0 kernfs_fop_write_iter+0x143/0x200 vfs_write+0x39f/0x560 ksys_write+0x65/0xf0 do_syscall_64+0x62/0x130 Previously, some sanity check have been done in dump_mapping() before the print facility parsing '%pd' though, it's still possible to run into an invalid dentry.d_name.name. Since dump_mapping() only needs to dump the filename only, retrieve it by itself in a safer way to prevent an unnecessary crash. Note that either retrieving the filename with '%pd' or strncpy_from_kernel_nofault(), the filename could be unreliable. Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Link: https://lore.kernel.org/r/20240826055503.1522320-1-lizhijian@fujitsu.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>	2024-11-28 00:29:48 +00:00
Rado Vrbovsky	6c6c3d9cfe	Merge: XFS: Update #2 for RHEL9.6 (upstream v6.6-6.7) MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5633 XFS: update for RHEL9.6. Backport upstream v6.6-6.7, including fixes patches post v6.7. JIRA: https://issues.redhat.com/browse/RHEL-62760 Depends: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5188 Signed-off-by: Bill O'Donnell <bodonnel@redhat.com> Approved-by: Chris von Recklinghausen <crecklin@redhat.com> Approved-by: Brian Foster <bfoster@redhat.com> Approved-by: Rafael Aquini <raquini@redhat.com> Approved-by: Eric Sandeen <esandeen@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>	2024-11-25 13:17:37 +00:00
Bill O'Donnell	770d4ac268	filemap: add a per-mapping stable writes flag JIRA: https://issues.redhat.com/browse/RHEL-62760 Conflicts: context errors in pagemap.h commit 762321dab9a72760bf9aec48362f932717c9424d Author: Christoph Hellwig <hch@lst.de> Date: Wed Oct 25 16:10:17 2023 +0200 filemap: add a per-mapping stable writes flag folio_wait_stable waits for writeback to finish before modifying the contents of a folio again, e.g. to support check summing of the data in the block integrity code. Currently this behavior is controlled by the SB_I_STABLE_WRITES flag on the super_block, which means it is uniform for the entire file system. This is wrong for the block device pseudofs which is shared by all block devices, or file systems that can use multiple devices like XFS witht the RT subvolume or btrfs (although btrfs currently reimplements folio_wait_stable anyway). Add a per-address_space AS_STABLE_WRITES flag to control the behavior in a more fine grained way. The existing SB_I_STABLE_WRITES is kept to initialize AS_STABLE_WRITES to the existing default which covers most cases. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231025141020.192413-2-hch@lst.de Tested-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>	2024-11-09 10:06:46 -06:00
Scott Mayhew	f67a532577	fs: drop the timespec64 arg from generic_update_time JIRA: https://issues.redhat.com/browse/RHEL-59704 commit 541d4c798a598854fcce7326d947cbcbd35701d6 Author: Jeff Layton <jlayton@kernel.org> Date: Mon Aug 7 15:38:34 2023 -0400 fs: drop the timespec64 arg from generic_update_time In future patches we're going to change how the ctime is updated to keep track of when it has been queried. The way that the update_time operation works (and a lot of its callers) make this difficult, since they grab a timestamp early and then pass it down to eventually be copied into the inode. All of the existing update_time callers pass in the result of current_time() in some fashion. Drop the "time" parameter from generic_update_time, and rework it to fetch its own timestamp. This change means that an update_time could fetch a different timestamp than was seen in inode_needs_update_time. update_time is only ever called with one of two flag combinations: Either S_ATIME is set, or S_MTIME\|S_CTIME\|S_VERSION are set. With this change we now treat the flags argument as an indicator that some value needed to be updated when last checked, rather than an indication to update specific timestamps. Rework the logic for updating the timestamps and put it in a new inode_update_timestamps helper that other update_time routines can use. S_ATIME is as treated as we always have, but if any of the other three are set, then we attempt to update all three. Also, some callers of generic_update_time need to know what timestamps were actually updated. Change it to return an S_* flag mask to indicate that and rework the callers to expect it. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230807-mgctime-v7-3-d1dec143a704@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Scott Mayhew <smayhew@redhat.com>	2024-10-25 12:35:45 -04:00
Scott Mayhew	0874600fa0	fs: convert to ctime accessor functions JIRA: https://issues.redhat.com/browse/RHEL-59704 commit 2276e5ba8567f683c49a36ba885d0fe6abe2b45e Author: Jeff Layton <jlayton@kernel.org> Date: Wed Jul 5 15:00:50 2023 -0400 fs: convert to ctime accessor functions In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jeff Layton <jlayton@kernel.org> Message-Id: <20230705190309.579783-23-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Scott Mayhew <smayhew@redhat.com>	2024-10-25 12:35:44 -04:00
Ian Kent	4f34d24b81	fs: port fs{g,u}id helpers to mnt_idmap JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus commit c14329d39f2daa8132e1bbe5cc531da387bcf44a Author: Christian Brauner <brauner@kernel.org> Date: Fri Jan 13 12:49:31 2023 +0100 fs: port fs{g,u}id helpers to mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-16 12:30:42 +08:00
Ian Kent	db8603ce12	fs: port i_{g,u}id_into_vfs{g,u}id() to mnt_idmap JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus Conflicts: Dropped hunks for ksmbd because the source is not present in the CentOS Stream source tree. CentOS Stream has commit `bb901646d2` ("ovl: let helper ovl_i_path_real() return the realinode") which wasn't present upstream when this patch was applied, correct manually. CentOS Stream does not have upstream commit c7423dbdbc9ec ("ima: Handle -ESTALE returned by ima_filter_rule_match()") which results in a reject of hunk #3 against security/integrity/ima/ima_policy.c, so manually apply hunk. Upstream merge commit 05e6295f7b5e0 ("Merge tag 'fs.idmapped.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping") together with Upstream commit facd61053cff1 ("fuse: fixes after adapting to new posix acl api") results in a conflict in fs/fuse/acl.c, adjust to suit. Update the call to i_uid_into_vfsuid() from 2740f64cb7f00 ("filelocks: use mount idmapping for setlease permission check") to pass an idmap instead of a user namespace. It looks like Linus made a change to the merge request "Merge tag 8834147f95056 ("fscache-rewrite-20220111") to account for idmap changes (probably the ones in this commit, so add the change here. commit e67fe63341b8117d7e0d9acf0f1222d5138b9266 Author: Christian Brauner <brauner@kernel.org> Date: Fri Jan 13 12:49:30 2023 +0100 fs: port i_{g,u}id_into_vfs{g,u}id() to mnt_idmap Convert to struct mnt_idmap. Remove legacy file_mnt_user_ns() and mnt_user_ns(). Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-16 11:02:01 +08:00
Ian Kent	edf17476c7	fs: port privilege checking helpers to mnt_idmap JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus Conflicts: For consistency drop btrfs hunks because it isn't supported in CentOS Stream and other backports also drop such hunks. Upstream merge commit 05e6295f7b5e0 ("Merge tag 'fs.idmapped.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping") together with Upstream commit facd61053cff1 ("fuse: fixes after adapting to new posix acl api") results in a conflict in fs/fuse/acl.c, adjust to suit. commit 9452e93e6dae862d7aeff2b11236d79bde6f9b66 Author: Christian Brauner <brauner@kernel.org> Date: Fri Jan 13 12:49:27 2023 +0100 fs: port privilege checking helpers to mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-16 10:45:31 +08:00
Ian Kent	dcafde7644	fs: port inode_owner_or_capable() to mnt_idmap JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus Conflicts: For consistency drop btrfs hunks because it isn't supported in CentOS Stream and other backports also drop such hunks. CentOS Stream does not have upstream commit 3db1de0e582c3 ("f2fs: change the current atomic write way") which results in a reject for hunk #1 in f2fs_ioc_start_atomic_write() of fs/f2fs/file.c, so make the required change manually. Upstream commit 7bc155fec5b371 ("f2fs: kill volatile write support") is not present in CentOS Stream so make the additional changes needed. commit 01beba7957a26f9b7179127e8ad56bb5a0f56138 Author: Christian Brauner <brauner@kernel.org> Date: Fri Jan 13 12:49:26 2023 +0100 fs: port inode_owner_or_capable() to mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-16 10:45:27 +08:00
Ian Kent	2171c567b5	fs: port inode_init_owner() to mnt_idmap JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus Conflicts: For consistency drop btrfs hunks because it isn't supported in CentOS Stream and other backports also drop such hunks. CentOS Stream does not have upstream commit 3db1de0e582c3 ("f2fs: change the current atomic write way") so there is no call to f2fs_get_tmpfile() in f2fs_ioc_start_atomic_write() to change. The above patch also adds the definition of f2fs_get_tmpfile() to fs/f2fs/f2fs.h so it's not there to change resulting in a hunk reject for fs/f2fs/f2fs.h. Upstream commit 787caf1bdcd9f ("f2fs: fix to enable compress for newly created file if extension matches") is not present in CentOS Stream resulting in a number of rejects against fs/f2fs/namei.c, manually apply these changes. Dropped hunks for ntfs3 because the source is not present in the CentOS Stream source tree. CentOS Stream commit `892da692fa` ("shmem: support idmapped mounts for tmpfs") which causes a reject in fs/shmem.c, manually apply the hunk (note: taking account of these changes at the times they are needed will result in an updated mm/shmem.c once this series is completed). Update to add incremental changes needed due to CentOS Stream commit `469e1d13f6` ("shmem: quota support"). commit f2d40141d5d90b882e2c35b226f9244a63b82b6e Author: Christian Brauner <brauner@kernel.org> Date: Fri Jan 13 12:49:25 2023 +0100 fs: port inode_init_owner() to mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-16 10:45:26 +08:00
Ian Kent	304ec491ee	fs: port ->permission() to pass mnt_idmap JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus Conflicts: For consistency drop btrfs hunks because it isn't supported in CentOS Stream and other backports also drop such hunks. CentOS Stream commit `48fa94aacd` ("ceph: fscrypt_auth handling for ceph") is presnt which causes fuzz 2 in hunk #1 in fs/ceph/super.h. Upstream commit 427505ffeaa46 ("exportfs: use pr_debug for unreachable debug statements") is not present causing fuzz 2 in hunk #1 against fs/exportfs/expfs.c. Dropped hunks for ksmbd because the source is not present in the CentOS Stream source tree. Upstream commit 03fa86e9f79d8 ("namei: stash the sampled ->d_seq into nameidata") is not present causing a fuzz 1 for hunk #14 against fs/namei.c. CentOS Stream `c4f3dd0731` ("nfsd: handle failure to collect pre/post-op attrs more sanely") is present and causes a rejects for hunks #4 and #5 against fs/nfsd/vfs.c, apply manually. Dropped hunks for ntfs3 because the source is not present in the CentOS Stream source tree. CentOS Stream commit `98ba731fc7` ("ovl: Move xattr support to new xattrs.c file") moves ovl_xattr_set() and ovl_xattr_get() from fs/overlayfs/inode.c to fs/overlayfs/xattrs.c which causes hunks #4 and #5 to fail, manually apply to fs/overlayfs/xattrs.c. CentOS Stream commit `55177e4b83` ("ovl: mark xwhiteouts directory with overlay.opaque='x'") and commit `d17b324bb6` ("ovl: use ovl_numlower() and ovl_lowerstack() accessors") change the first and third hunks of fs/overlayfs/namei.c causing them to fail, manually apply. CentOS Stream commit `98ba731fc7` ("ovl: Move xattr support to new xattrs.c file") causes fuzz 2 in hunk #5 of fs/overlayfs/overlayfs.h CentOS Stream commit `355a9c490a` ("ovl: Add an alternative type of whiteout") changes ovl_cache_update_ino() to ovl_cache_update() in fs/overlayfs/readdir.c, make the change manually. Upstream commit 217af7e2f4deb ("apparmor: refactor profile rules and attachments") is not in CentOS Stream causing hunk #1 to fail to apply so manually apply the change. commit 4609e1f18e19c3b302e1eb4858334bca1532f780 Author: Christian Brauner <brauner@kernel.org> Date: Fri Jan 13 12:49:22 2023 +0100 fs: port ->permission() to pass mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-16 10:45:20 +08:00
Ian Kent	f7a70a9fc1	fs: port vfs_() helpers to struct mnt_idmap JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus Conflicts: There was a whitespasce difference possibly due to CentOS Stream commit `c912400e45` ("fs: Fix description of vfs_tmpfile()") CentOS Stream commit `c4f3dd0731` ("nfsd: handle failure to collect pre/post-op attrs more sanely") is present which caused a hunk reject in fs/nfsd/nfs3proc.c and two hunks to be rejected in fs/nfsd/vfs.c the hunks were manually applied. Upstream commit 79b05beaa5c34 ("af_unix: Acquire/Release per-netns hash table's locks.") is not present in CentOS Stream fixed the conflict manually. Dropped ksmbd hunks, ksmbd source is not present. Upstream commit 3350607dc5637 ("security: Create file_truncate hook from path_truncate hook") is not present in CentOS Stream. commit abf08576afe31506b812c8c1be9714f78613f300 Author: Christian Brauner <brauner@kernel.org> Date: Fri Jan 13 12:49:10 2023 +0100 fs: port vfs_() helpers to struct mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-16 08:29:51 +08:00
Ian Kent	bff9bc5749	fs: use type safe idmapping helpers JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus commit a2bd096fb2d7f50fb4db246b33e7bfcf5e2eda3a Author: Christian Brauner <brauner@kernel.org> Date: Wed Jun 22 22:12:16 2022 +0200 fs: use type safe idmapping helpers We already ported most parts and filesystems over for v6.0 to the new vfs{g,u}id_t type and associated helpers for v6.0. Convert the remaining places so we can remove all the old helpers. This is a non-functional change. Reviewed-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-15 16:12:38 +08:00
Ian Kent	dc1f3bea48	attr: use consistent sgid stripping checks JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus commit ed5a7047d2011cb6b2bf84ceb6680124cc6a7d95 Author: Christian Brauner <brauner@kernel.org> Date: Mon Oct 17 17:06:37 2022 +0200 attr: use consistent sgid stripping checks Currently setgid stripping in file_remove_privs()'s should_remove_suid() helper is inconsistent with other parts of the vfs. Specifically, it only raises ATTR_KILL_SGID if the inode is S_ISGID and S_IXGRP but not if the inode isn't in the caller's groups and the caller isn't privileged over the inode although we require this already in setattr_prepare() and setattr_copy() and so all filesystem implement this requirement implicitly because they have to use setattr_{prepare,copy}() anyway. But the inconsistency shows up in setgid stripping bugs for overlayfs in xfstests (e.g., generic/673, generic/683, generic/685, generic/686, generic/687). For example, we test whether suid and setgid stripping works correctly when performing various write-like operations as an unprivileged user (fallocate, reflink, write, etc.): echo "Test 1 - qa_user, non-exec file $verb" setup_testfile chmod a+rws $junk_file commit_and_check "$qa_user" "$verb" 64k 64k The test basically creates a file with 6666 permissions. While the file has the S_ISUID and S_ISGID bits set it does not have the S_IXGRP set. On a regular filesystem like xfs what will happen is: sys_fallocate() -> vfs_fallocate() -> xfs_file_fallocate() -> file_modified() -> __file_remove_privs() -> dentry_needs_remove_privs() -> should_remove_suid() -> __remove_privs() newattrs.ia_valid = ATTR_FORCE \| kill; -> notify_change() -> setattr_copy() In should_remove_suid() we can see that ATTR_KILL_SUID is raised unconditionally because the file in the test has S_ISUID set. But we also see that ATTR_KILL_SGID won't be set because while the file is S_ISGID it is not S_IXGRP (see above) which is a condition for ATTR_KILL_SGID being raised. So by the time we call notify_change() we have attr->ia_valid set to ATTR_KILL_SUID \| ATTR_FORCE. Now notify_change() sees that ATTR_KILL_SUID is set and does: ia_valid = attr->ia_valid \|= ATTR_MODE attr->ia_mode = (inode->i_mode & ~S_ISUID); which means that when we call setattr_copy() later we will definitely update inode->i_mode. Note that attr->ia_mode still contains S_ISGID. Now we call into the filesystem's ->setattr() inode operation which will end up calling setattr_copy(). Since ATTR_MODE is set we will hit: if (ia_valid & ATTR_MODE) { umode_t mode = attr->ia_mode; vfsgid_t vfsgid = i_gid_into_vfsgid(mnt_userns, inode); if (!vfsgid_in_group_p(vfsgid) && !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID)) mode &= ~S_ISGID; inode->i_mode = mode; } and since the caller in the test is neither capable nor in the group of the inode the S_ISGID bit is stripped. But assume the file isn't suid then ATTR_KILL_SUID won't be raised which has the consequence that neither the setgid nor the suid bits are stripped even though it should be stripped because the inode isn't in the caller's groups and the caller isn't privileged over the inode. If overlayfs is in the mix things become a bit more complicated and the bug shows up more clearly. When e.g., ovl_setattr() is hit from ovl_fallocate()'s call to file_remove_privs() then ATTR_KILL_SUID and ATTR_KILL_SGID might be raised but because the check in notify_change() is questioning the ATTR_KILL_SGID flag again by requiring S_IXGRP for it to be stripped the S_ISGID bit isn't removed even though it should be stripped: sys_fallocate() -> vfs_fallocate() -> ovl_fallocate() -> file_remove_privs() -> dentry_needs_remove_privs() -> should_remove_suid() -> __remove_privs() newattrs.ia_valid = ATTR_FORCE \| kill; -> notify_change() -> ovl_setattr() // TAKE ON MOUNTER'S CREDS -> ovl_do_notify_change() -> notify_change() // GIVE UP MOUNTER'S CREDS // TAKE ON MOUNTER'S CREDS -> vfs_fallocate() -> xfs_file_fallocate() -> file_modified() -> __file_remove_privs() -> dentry_needs_remove_privs() -> should_remove_suid() -> __remove_privs() newattrs.ia_valid = attr_force \| kill; -> notify_change() The fix for all of this is to make file_remove_privs()'s should_remove_suid() helper to perform the same checks as we already require in setattr_prepare() and setattr_copy() and have notify_change() not pointlessly requiring S_IXGRP again. It doesn't make any sense in the first place because the caller must calculate the flags via should_remove_suid() anyway which would raise ATTR_KILL_SGID. While we're at it we move should_remove_suid() from inode.c to attr.c where it belongs with the rest of the iattr helpers. Especially since it returns ATTR_KILL_S{G,U}ID flags. We also rename it to setattr_should_drop_suidgid() to better reflect that it indicates both setuid and setgid bit removal and also that it returns attr flags. Running xfstests with this doesn't report any regressions. We should really try and use consistent checks. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-15 16:12:34 +08:00
Ian Kent	b064d0b523	fs: move should_remove_suid() JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus commit e243e3f94c804ecca9a8241b5babe28f35258ef4 Author: Christian Brauner <brauner@kernel.org> Date: Mon Oct 17 17:06:35 2022 +0200 fs: move should_remove_suid() Move the helper from inode.c to attr.c. This keeps the the core of the set{g,u}id stripping logic in one place when we add follow-up changes. It is the better place anyway, since should_remove_suid() returns ATTR_KILL_S{G,U}ID flags. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-15 16:12:25 +08:00
Ian Kent	d8268a324b	attr: add in_group_or_capable() JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus Conflicts: CentOS Stream has commit `42bfe37a25` ("fs: add ctime accessors infrastructure") so adjust the context to match. commit 11c2a8700cdcabf9b639b7204a1e38e2a0b6798e Author: Christian Brauner <brauner@kernel.org> Date: Mon Oct 17 17:06:34 2022 +0200 attr: add in_group_or_capable() In setattr_{copy,prepare}() we need to perform the same permission checks to determine whether we need to drop the setgid bit or not. Instead of open-coding it twice add a simple helper the encapsulates the logic. We will reuse this helpers to make dropping the setgid bit during write operations more consistent in a follow up patch. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-15 16:12:24 +08:00
Rafael Aquini	9a083ffc2d	list_lru: allow explicit memcg and NUMA node selection JIRA: https://issues.redhat.com/browse/RHEL-40684 Conflicts: * include/linux/list_lru.h: minor context differences due to missing upstream v6.8 commit 7679e14098c9 ("mm: list_lru: Update kernel documentation to follow the requirements") This patch is a backport of the following upstream commit: commit 0a97c01cd20bb96359d8c9dedad92a061ed34e0b Author: Nhat Pham <nphamcs@gmail.com> Date: Thu Nov 30 11:40:18 2023 -0800 list_lru: allow explicit memcg and NUMA node selection Patch series "workload-specific and memory pressure-driven zswap writeback", v8. There are currently several issues with zswap writeback: 1. There is only a single global LRU for zswap, making it impossible to perform worload-specific shrinking - an memcg under memory pressure cannot determine which pages in the pool it owns, and often ends up writing pages from other memcgs. This issue has been previously observed in practice and mitigated by simply disabling memcg-initiated shrinking: https://lore.kernel.org/all/20230530232435.3097106-1-nphamcs@gmail.com/T/#u But this solution leaves a lot to be desired, as we still do not have an avenue for an memcg to free up its own memory locked up in the zswap pool. 2. We only shrink the zswap pool when the user-defined limit is hit. This means that if we set the limit too high, cold data that are unlikely to be used again will reside in the pool, wasting precious memory. It is hard to predict how much zswap space will be needed ahead of time, as this depends on the workload (specifically, on factors such as memory access patterns and compressibility of the memory pages). This patch series solves these issues by separating the global zswap LRU into per-memcg and per-NUMA LRUs, and performs workload-specific (i.e memcg- and NUMA-aware) zswap writeback under memory pressure. The new shrinker does not have any parameter that must be tuned by the user, and can be opted in or out on a per-memcg basis. As a proof of concept, we ran the following synthetic benchmark: build the linux kernel in a memory-limited cgroup, and allocate some cold data in tmpfs to see if the shrinker could write them out and improved the overall performance. Depending on the amount of cold data generated, we observe from 14% to 35% reduction in kernel CPU time used in the kernel builds. This patch (of 6): The interface of list_lru is based on the assumption that the list node and the data it represents belong to the same allocated on the correct node/memcg. While this assumption is valid for existing slab objects LRU such as dentries and inodes, it is undocumented, and rather inflexible for certain potential list_lru users (such as the upcoming zswap shrinker and the THP shrinker). It has caused us a lot of issues during our development. This patch changes list_lru interface so that the caller must explicitly specify numa node and memcg when adding and removing objects. The old list_lru_add() and list_lru_del() are renamed to list_lru_add_obj() and list_lru_del_obj(), respectively. It also extends the list_lru API with a new function, list_lru_putback, which undoes a previous list_lru_isolate call. Unlike list_lru_add, it does not increment the LRU node count (as list_lru_isolate does not decrement the node count). list_lru_putback also allows for explicit memcg and NUMA node selection. Link: https://lkml.kernel.org/r/20231130194023.4102148-1-nphamcs@gmail.com Link: https://lkml.kernel.org/r/20231130194023.4102148-2-nphamcs@gmail.com Signed-off-by: Nhat Pham <nphamcs@gmail.com> Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Tested-by: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Chris Li <chrisl@kernel.org> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Seth Jennings <sjenning@redhat.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rafael Aquini <aquini@redhat.com>	2024-06-28 12:24:14 -04:00
Chris von Recklinghausen	41d58c77af	mm: vmscan: refactor updating current->reclaim_state Conflicts: mm/slob.c - We already have 6630e950d532 ("mm/slob: remove slob.c") so the file is gone. JIRA: https://issues.redhat.com/browse/RHEL-27741 commit c7b23b68e2aa93f86a206222d23ccd9a21f5982a Author: Yosry Ahmed <yosryahmed@google.com> Date: Thu Apr 13 10:40:34 2023 +0000 mm: vmscan: refactor updating current->reclaim_state During reclaim, we keep track of pages reclaimed from other means than LRU-based reclaim through scan_control->reclaim_state->reclaimed_slab, which we stash a pointer to in current task_struct. However, we keep track of more than just reclaimed slab pages through this. We also use it for clean file pages dropped through pruned inodes, and xfs buffer pages freed. Rename reclaimed_slab to reclaimed, and add a helper function that wraps updating it through current, so that future changes to this logic are contained within include/linux/swap.h. Link: https://lkml.kernel.org/r/20230413104034.1086717-4-yosryahmed@google.c om Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christoph Lameter <cl@linux.com> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: NeilBrown <neilb@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2024-04-30 07:00:59 -04:00
Ming Lei	c3745bf638	fs: simplify invalidate_inodes JIRA: https://issues.redhat.com/browse/RHEL-29564 commit e127b9bccdb04e5fc4444431de37309a68aedafa Author: Christoph Hellwig <hch@lst.de> Date: Fri Aug 11 12:08:28 2023 +0200 fs: simplify invalidate_inodes kill_dirty has always been true for a long time, so hard code it and remove the unused return value. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Message-Id: <20230811100828.1897174-18-hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2024-04-17 09:46:43 +08:00
Lucas Zampieri	60e48163b8	Merge: ceph+fscrypt: add fscrypt support MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3502 JIRA: https://issues.redhat.com/browse/RHEL-19813 Add ceph fscrypt feature support. This will also backport the latest fs-crypt patches together. In upstream there still some patches not applied and need to wait for days. commit acfaf9453bde61be725c104ba29b7b0902bc246e Author: Xiubo Li <xiubli@redhat.com> Date: Tue Dec 12 12:11:41 2023 +0800 ceph: implement -o test_dummy_encryption mount option JIRA: https://issues.redhat.com/browse/RHEL-19542 commit 6b5717bd30ab7f35792d20b71211055bdb43e6de Author: Jeff Layton <jlayton@kernel.org> Date: Tue Sep 8 09:47:40 2020 -0400 ceph: implement -o test_dummy_encryption mount option Add support for the test_dummy_encryption mount option. This allows us to test the encrypted codepaths in ceph without having to manually set keys, etc. [ lhenriques: fix potential fsc->fsc_dummy_enc_policy memory leak in ceph_real_mount() ] Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Xiubo Li <xiubli@redhat.com> Related to RHEL-19813 Approved-by: Venky Shankar <vshankar@redhat.com> Approved-by: Milind Changire <mchangir@redhat.com> Merged-by: Lucas Zampieri <lzampier@redhat.com>	2024-04-12 14:43:30 -03:00
Lucas Zampieri	35d215e024	Merge: USB/TB code rebase of supported drivers to upstream v6.6 MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3875 JIRA: https://issues.redhat.com/browse/RHEL-28809 BUILD INFO: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=59819606 FUNCTIONAL TESTING: QA SMOKE TESTING: On a HP EliteBook 840 G8, smoke testing didn't present any regressions between the regular x rebased kernel with the insertion of: * Generic 4 port USB HUB: - USB 2.0 Sandisk typeA 16G - USB Headset Gamer ELG Surround 7.1 * USB 2.0 Kingston TypeA 32G * USB 3.2 Gen2 Sandisk TypeC 32G DESCRIPTION: This rebases supported USB and Thunderbolt drivers to upstream kernel v6.6. By design, changes on this rebase are limited to supported usb and thunderbolt drivers. Changes which happen to touch the drivers but are tree-wide are selectively or partially pulled in, when relevant. Signed-off-by: Desnes Nunes <desnesn@redhat.com> Approved-by: Bastien Nocera <bnocera@redhat.com> Approved-by: Eric Chanudet <echanude@redhat.com> Approved-by: José Ignacio Tornos Martínez <jtornosm@redhat.com> Merged-by: Lucas Zampieri <lzampier@redhat.com>	2024-04-10 15:35:27 -03:00
Xiubo Li	1f4cd8ccfb	fs: change test in inode_insert5 for adding to the sb list JIRA: https://issues.redhat.com/browse/RHEL-19813 commit 18cc912b8a2acaf32589241fbac47192ab90db14 Author: Jeff Layton <jlayton@kernel.org> Date: Thu Mar 31 16:29:00 2022 -0400 fs: change test in inode_insert5 for adding to the sb list inode_insert5 currently looks at I_CREATING to decide whether to insert the inode into the sb list. This test is a bit ambiguous, as I_CREATING state is not directly related to that list. This test is also problematic for some upcoming ceph changes to add fscrypt support. We need to be able to allocate an inode using new_inode and insert it into the hash later iff we end up using it, and doing that now means that we double add it and corrupt the list. What we really want to know in this test is whether the inode is already in its superblock list, and then add it if it isn't. Have it test for list_empty instead and ensure that we always initialize the list by doing it in inode_init_once. It's only ever removed from the list with list_del_init, so that should be sufficient. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Xiubo Li <xiubli@redhat.com>	2024-03-26 10:24:11 +08:00
Prarit Bhargava	c8a912b06d	locking: remove spin_lock_prefetch JIRA: https://issues.redhat.com/browse/RHEL-25415 commit c8afaa1b0f8bc93d013ab2ea6b9649958af3f1d3 Author: Mateusz Guzik <mjguzik@gmail.com> Date: Sat Aug 12 18:15:54 2023 +0200 locking: remove spin_lock_prefetch The only remaining consumer is new_inode, where it showed up in 2001 as commit c37fa164f793 ("v2.4.9.9 -> v2.4.9.10") in a historical repo [1] with a changelog which does not mention it. Since then the line got only touched up to keep compiling. While it may have been of benefit back in the day, it is guaranteed to at best not get in the way in the multicore setting -- as the code performs a lot of work between the prefetch and actual lock acquire, any contention means the cacheline is already invalid by the time the routine calls spin_lock(). It adds spurious traffic, for short. On top of it prefetch is notoriously tricky to use for single-threaded purposes, making it questionable from the get go. As such, remove it. I admit upfront I did not see value in benchmarking this change, but I can do it if that is deemed appropriate. Removal from new_inode and of the entire thing are in the same patch as requested by Linus, so whatever weird looks can be directed at that guy. Link: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/fs/inode.c?id=c37fa164f793735b32aa3f53154ff1a7659e6442 [1] Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Prarit Bhargava <prarit@redhat.com>	2024-03-20 09:43:20 -04:00
Desnes Nunes	42bfe37a25	fs: add ctime accessors infrastructure JIRA: https://issues.redhat.com/browse/RHEL-28809 Conflicts: * Avoiding commit <11c2a8700cdc> ("attr: add in_group_or_capable()") commit 9b6304c1d53745c300b86f202d0dcff395e2d2db Author: Jeff Layton <jlayton@kernel.org> Date: Wed, 5 Jul 2023 14:58:10 -0400 struct timespec64 has unused bits in the tv_nsec field that can be used for other purposes. In future patches, we're going to change how the inode->i_ctime is accessed in certain inodes in order to make use of them. In order to do that safely though, we'll need to eradicate raw accesses of the inode->i_ctime field from the kernel. Add new accessor functions for the ctime that we use to replace them. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Message-Id: <20230705185812.579118-2-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Desnes Nunes <desnesn@redhat.com>	2024-03-18 15:42:25 -03:00
Ming Lei	f6a6eee21c	fs: remove the special !CONFIG_BLOCK def_blk_fops JIRA: https://issues.redhat.com/browse/RHEL-1516 Conflicts: context difference because direct-io.o in Makefile commit bda2795a630b2f6c417675bfbf4d90ef7503dfc7 Author: Christoph Hellwig <hch@lst.de> Date: Mon May 8 07:44:05 2023 -0700 fs: remove the special !CONFIG_BLOCK def_blk_fops def_blk_fops always returns -ENODEV, which dosn't match the return value of a non-existing block device with CONFIG_BLOCK, which is -ENXIO. Just remove the extra implementation and fall back to the default no_open_fops that always returns -ENXIO. Fixes: `9361401eb7` ("[PATCH] BLOCK: Make it possible to disable the block layer [try #6]") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230508144405.41792-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2023-09-18 15:59:29 +08:00
Jeff Moyer	bde319ceec	fs: Add async write file modification handling. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2068237 commit 66fa3cedf16abc82d19b943e3289c82e685419d5 Author: Stefan Roesch <shr@fb.com> Date: Thu Jun 23 10:51:53 2022 -0700 fs: Add async write file modification handling. This adds a file_modified_async() function to return -EAGAIN if the request either requires to remove privileges or needs to update the file modification time. This is required for async buffered writes, so the request gets handled in the io worker of io-uring. Signed-off-by: Stefan Roesch <shr@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20220623175157.1715274-11-shr@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2023-04-29 07:47:02 -04:00
Jeff Moyer	0ba1326a1a	fs: Split off inode_needs_update_time and __file_update_time Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2068237 Conflicts: We are missing commit e60feb445fce ("fs: export an inode_update_time helper"), so keep using update_time(). commit 6a2aa5d85de534471dd023773236f113eaef26f0 Author: Stefan Roesch <shr@fb.com> Date: Thu Jun 23 10:51:52 2022 -0700 fs: Split off inode_needs_update_time and __file_update_time This splits off the functions inode_needs_update_time() and __file_update_time() from the function file_update_time(). This is required to support async buffered writes. No intended functional changes in this patch. Signed-off-by: Stefan Roesch <shr@fb.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20220623175157.1715274-10-shr@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2023-04-29 07:46:02 -04:00
Jeff Moyer	3ae1c2ccf9	fs: __file_remove_privs(): restore call to inode_has_no_xattr() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2068237 commit 41191cf6bf565f4139046d7be68ec30c290af92d Author: Stefan Roesch <shr@fb.com> Date: Tue Aug 16 08:31:58 2022 -0700 fs: __file_remove_privs(): restore call to inode_has_no_xattr() This restores the call to inode_has_no_xattr() in the function __file_remove_privs(). In case the dentry_meeds_remove_privs() returned 0, the function inode_has_no_xattr() was not called. Signed-off-by: Stefan Roesch <shr@fb.com> Fixes: faf99b563558 ("fs: add __remove_file_privs() with flags parameter") Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Link: https://lore.kernel.org/r/20220816153158.1925040-1-shr@fb.com Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2023-04-29 07:45:02 -04:00
Jeff Moyer	bc26c4000b	fs: add __remove_file_privs() with flags parameter Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2068237 commit faf99b563558f74188b7ca34faae1c1da49a7261 Author: Stefan Roesch <shr@fb.com> Date: Thu Jun 23 10:51:51 2022 -0700 fs: add __remove_file_privs() with flags parameter This adds the function __remove_file_privs, which allows the caller to pass the kiocb flags parameter. No intended functional changes in this patch. Signed-off-by: Stefan Roesch <shr@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20220623175157.1715274-9-shr@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2023-04-29 07:44:02 -04:00
Chris von Recklinghausen	c991a31fce	mm: Remove __delete_from_page_cache() Bugzilla: https://bugzilla.redhat.com/2160210 commit 6ffcd825e7d0416d78fd41cd5b7856a78122cc8c Author: Matthew Wilcox (Oracle) <willy@infradead.org> Date: Tue Jun 28 20:41:40 2022 -0400 mm: Remove __delete_from_page_cache() This wrapper is no longer used. Remove it and all references to it. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2023-03-24 11:19:15 -04:00
Chris von Recklinghausen	eecd85274e	fs: move inode sysctls to its own file Bugzilla: https://bugzilla.redhat.com/2160210 commit 1d67fe585049d3e2448b997af78c68cbf90ada09 Author: Luis Chamberlain <mcgrof@kernel.org> Date: Fri Jan 21 22:12:52 2022 -0800 fs: move inode sysctls to its own file Patch series "sysctl: 4th set of kernel/sysctl cleanups". This is slimming down the fs uses of kernel/sysctl.c to the point that the next step is to just get rid of the fs base directory for it and move that elsehwere, so that next patch series starts dealing with that to demo how we can end up cleaning up a full base directory from kernel/sysctl.c, one at a time. This patch (of 9): kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. So move the inode sysctls to its own file. Since we are no longer using this outside of fs/ remove the extern declaration of its respective proc helper. We use early_initcall() as it is the earliest we can use. [arnd@arndb.de: avoid unused-variable warning] Link: https://lkml.kernel.org/r/20211203190123.874239-1-arnd@kernel.org Link: https://lkml.kernel.org/r/20211129205548.605569-1-mcgrof@kernel.org Link: https://lkml.kernel.org/r/20211129205548.605569-2-mcgrof@kernel.org Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Xiaoming Ni <nixiaoming@huawei.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Stephen Kitt <steve@sk2.org> Cc: Lukas Middendorf <kernel@tuxforce.de> Cc: Antti Palosaari <crope@iki.fi> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Rafael Aquini <aquini@redhat.com> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2023-03-24 11:18:31 -04:00
Andrey Albershteyn	40322e8902	fs: move S_ISGID stripping into the vfs_() helpers Bugzilla: http://bugzilla.redhat.com/2128900 Bugzilla: http://bugzilla.redhat.com/2128898 Conflicts: Multiple mnt_userns arguments commit 1639a49ccdce58ea248841ed9b23babcce6dbb0b Author: Yang Xu <xuyang2018.jy@fujitsu.com> Date: Thu Jul 14 14:11:27 2022 +0800 fs: move S_ISGID stripping into the vfs_() helpers Move setgid handling out of individual filesystems and into the VFS itself to stop the proliferation of setgid inheritance bugs. Creating files that have both the S_IXGRP and S_ISGID bit raised in directories that themselves have the S_ISGID bit set requires additional privileges to avoid security issues. When a filesystem creates a new inode it needs to take care that the caller is either in the group of the newly created inode or they have CAP_FSETID in their current user namespace and are privileged over the parent directory of the new inode. If any of these two conditions is true then the S_ISGID bit can be raised for an S_IXGRP file and if not it needs to be stripped. However, there are several key issues with the current implementation: * S_ISGID stripping logic is entangled with umask stripping. If a filesystem doesn't support or enable POSIX ACLs then umask stripping is done directly in the vfs before calling into the filesystem. If the filesystem does support POSIX ACLs then unmask stripping may be done in the filesystem itself when calling posix_acl_create(). Since umask stripping has an effect on S_ISGID inheritance, e.g., by stripping the S_IXGRP bit from the file to be created and all relevant filesystems have to call posix_acl_create() before inode_init_owner() where we currently take care of S_ISGID handling S_ISGID handling is order dependent. IOW, whether or not you get a setgid bit depends on POSIX ACLs and umask and in what order they are called. Note that technically filesystems are free to impose their own ordering between posix_acl_create() and inode_init_owner() meaning that there's additional ordering issues that influence S_SIGID inheritance. * Filesystems that don't rely on inode_init_owner() don't get S_ISGID stripping logic. While that may be intentional (e.g. network filesystems might just defer setgid stripping to a server) it is often just a security issue. This is not just ugly it's unsustainably messy especially since we do still have bugs in this area years after the initial round of setgid bugfixes. So the current state is quite messy and while we won't be able to make it completely clean as posix_acl_create() is still a filesystem specific call we can improve the S_SIGD stripping situation quite a bit by hoisting it out of inode_init_owner() and into the vfs creation operations. This means we alleviate the burden for filesystems to handle S_ISGID stripping correctly and can standardize the ordering between S_ISGID and umask stripping in the vfs. We add a new helper vfs_prepare_mode() so S_ISGID handling is now done in the VFS before umask handling. This has S_ISGID handling is unaffected unaffected by whether umask stripping is done by the VFS itself (if no POSIX ACLs are supported or enabled) or in the filesystem in posix_acl_create() (if POSIX ACLs are supported). The vfs_prepare_mode() helper is called directly in vfs_() helpers that create new filesystem objects. We need to move them into there to make sure that filesystems like overlayfs hat have callchains like: sys_mknod() -> do_mknodat(mode) -> .mknod = ovl_mknod(mode) -> ovl_create(mode) -> vfs_mknod(mode) get S_ISGID stripping done when calling into lower filesystems via vfs_() creation helpers. Moving vfs_prepare_mode() into e.g. vfs_mknod() takes care of that. This is in any case semantically cleaner because S_ISGID stripping is VFS security requirement. Security hooks so far have seen the mode with the umask applied but without S_ISGID handling done. The relevant hooks are called outside of vfs_() creation helpers so by calling vfs_prepare_mode() from vfs_() helpers the security hooks would now see the mode without umask stripping applied. For now we fix this by passing the mode with umask settings applied to not risk any regressions for LSM hooks. IOW, nothing changes for LSM hooks. It is worth pointing out that security hooks never saw the mode that is seen by the filesystem when actually creating the file. They have always been completely misplaced for that to work. The following filesystems use inode_init_owner() and thus relied on S_ISGID stripping: spufs, 9p, bfs, btrfs, ext2, ext4, f2fs, hfsplus, hugetlbfs, jfs, minix, nilfs2, ntfs3, ocfs2, omfs, overlayfs, ramfs, reiserfs, sysv, ubifs, udf, ufs, xfs, zonefs, bpf, tmpfs. All of the above filesystems end up calling inode_init_owner() when new filesystem objects are created through the ->mkdir(), ->mknod(), ->create(), ->tmpfile(), ->rename() inode operations. Since directories always inherit the S_ISGID bit with the exception of xfs when irix_sgid_inherit mode is turned on S_ISGID stripping doesn't apply. The ->symlink() and ->link() inode operations trivially inherit the mode from the target and the ->rename() inode operation inherits the mode from the source inode. All other creation inode operations will get S_ISGID handling via vfs_prepare_mode() when called from their relevant vfs_() helpers. In addition to this there are filesystems which allow the creation of filesystem objects through ioctl()s or - in the case of spufs - circumventing the vfs in other ways. If filesystem objects are created through ioctl()s the vfs doesn't know about it and can't apply regular permission checking including S_ISGID logic. Therfore, a filesystem relying on S_ISGID stripping in inode_init_owner() in their ioctl() callpath will be affected by moving this logic into the vfs. We audited those filesystems: btrfs allows the creation of filesystem objects through various ioctls(). Snapshot creation literally takes a snapshot and so the mode is fully preserved and S_ISGID stripping doesn't apply. Creating a new subvolum relies on inode_init_owner() in btrfs_new_subvol_inode() but only creates directories and doesn't raise S_ISGID. * ocfs2 has a peculiar implementation of reflinks. In contrast to e.g. xfs and btrfs FICLONE/FICLONERANGE ioctl() that is only concerned with the actual extents ocfs2 uses a separate ioctl() that also creates the target file. Iow, ocfs2 circumvents the vfs entirely here and did indeed rely on inode_init_owner() to strip the S_ISGID bit. This is the only place where a filesystem needs to call mode_strip_sgid() directly but this is self-inflicted pain. * spufs doesn't go through the vfs at all and doesn't use ioctl()s either. Instead it has a dedicated system call spufs_create() which allows the creation of filesystem objects. But spufs only creates directories and doesn't allo S_SIGID bits, i.e. it specifically only allows 0777 bits. * bpf uses vfs_mkobj() but also doesn't allow S_ISGID bits to be created. The patch will have an effect on ext2 when the EXT2_MOUNT_GRPID mount option is used, on ext4 when the EXT4_MOUNT_GRPID mount option is used, and on xfs when the XFS_FEAT_GRPID mount option is used. When any of these filesystems are mounted with their respective GRPID option then newly created files inherit the parent directories group unconditionally. In these cases non of the filesystems call inode_init_owner() and thus did never strip the S_ISGID bit for newly created files. Moving this logic into the VFS means that they now get the S_ISGID bit stripped. This is a user visible change. If this leads to regressions we will either need to figure out a better way or we need to revert. However, given the various setgid bugs that we found just in the last two years this is a regression risk we should take. Associated with this change is a new set of fstests to enforce the semantics for all new filesystems. Link: https://lore.kernel.org/ceph-devel/20220427092201.wvsdjbnc7b4dttaw@wittgenstein [1] Link: e014f37db1a2 ("xfs: use setattr_copy to set vfs inode attributes") [2] Link: `01ea173e10` ("xfs: fix up non-directory creation in SGID directories") [3] Link: fd84bfdddd16 ("ceph: fix up non-directory creation in SGID directories") [4] Link: https://lore.kernel.org/r/1657779088-2242-3-git-send-email-xuyang2018.jy@fujitsu.com Suggested-by: Dave Chinner <david@fromorbit.com> Suggested-by: Christian Brauner (Microsoft) <brauner@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-and-Tested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com> [<brauner@kernel.org>: rewrote commit message] Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>	2023-01-04 14:44:19 +01:00
Andrey Albershteyn	18e579e600	fs: add mode_strip_sgid() helper Bugzilla: http://bugzilla.redhat.com/2128900 Bugzilla: http://bugzilla.redhat.com/2128898 commit 2b3416ceff5e6bd4922f6d1c61fb68113dd82302 Author: Yang Xu <xuyang2018.jy@fujitsu.com> Date: Thu Jul 14 14:11:25 2022 +0800 fs: add mode_strip_sgid() helper Add a dedicated helper to handle the setgid bit when creating a new file in a setgid directory. This is a preparatory patch for moving setgid stripping into the vfs. The patch contains no functional changes. Currently the setgid stripping logic is open-coded directly in inode_init_owner() and the individual filesystems are responsible for handling setgid inheritance. Since this has proven to be brittle as evidenced by old issues we uncovered over the last months (see [1] to [3] below) we will try to move this logic into the vfs. Link: e014f37db1a2 ("xfs: use setattr_copy to set vfs inode attributes") [1] Link: `01ea173e10` ("xfs: fix up non-directory creation in SGID directories") [2] Link: fd84bfdddd16 ("ceph: fix up non-directory creation in SGID directories") [3] Link: https://lore.kernel.org/r/1657779088-2242-1-git-send-email-xuyang2018.jy@fujitsu.com Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Reviewed-and-Tested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>	2023-01-04 14:44:19 +01:00
Nico Pache	d0afd73f0f	writeback: Fix inode->i_io_list not be protected by inode->i_lock error commit 10e14073107dd0b6d97d9516a02845a8e501c2c9 Author: Jchao Sun <sunjunchao2870@gmail.com> Date: Tue May 24 08:05:40 2022 -0700 writeback: Fix inode->i_io_list not be protected by inode->i_lock error Commit `b35250c081` ("writeback: Protect inode->i_io_list with inode->i_lock") made inode->i_io_list not only protected by wb->list_lock but also inode->i_lock, but inode_io_list_move_locked() was missed. Add lock there and also update comment describing things protected by inode->i_lock. This also fixes a race where __mark_inode_dirty() could move inode under flush worker's hands and thus sync(2) could miss writing some inodes. Fixes: `b35250c081` ("writeback: Protect inode->i_io_list with inode->i_lock") Link: https://lore.kernel.org/r/20220524150540.12552-1-sunjunchao2870@gmail.com CC: stable@vger.kernel.org Signed-off-by: Jchao Sun <sunjunchao2870@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2089498 Signed-off-by: Nico Pache <npache@redhat.com>	2022-11-08 10:11:38 -07:00
Chris von Recklinghausen	a9a76f1477	mm,fs: split dump_mapping() out from dump_page() Bugzilla: https://bugzilla.redhat.com/2120352 commit 3e9d80a891df3b1a5d77db47fa7fdf33ba71e5cb Author: Matthew Wilcox (Oracle) <willy@infradead.org> Date: Fri Jan 14 14:05:04 2022 -0800 mm,fs: split dump_mapping() out from dump_page() dump_mapping() is a big chunk of dump_page(), and it'd be handy to be able to call it when we don't have a struct page. Split it out and move it to fs/inode.c. Take the opportunity to simplify some of the debug messages a little. Link: https://lkml.kernel.org/r/20211121121056.2870061-1-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:27:38 -04:00
Chris von Recklinghausen	3a3ee2bede	vfs: keep inodes with page cache off the inode shrinker LRU Conflicts: mm/filemap.c - We already have 452e9e6992fe ("filemap: Add filemap_remove_folio and __filemap_remove_folio") so just add the spin_lock call. mm/truncate.c - The backport of 51dcbdac28d4 ("mm: Convert find_lock_entries() to use a folio_batch") listed the lack of this patch as a conflict. Keep the ''fbatch->nr = j;' line. mm/vmscan.c - We already have be7c07d60e13 ("mm/vmscan: Convert __remove_mapping() to take a folio") so change a couple of lines from 'if (!PageSwapCache(page))' to 'if (!folio_test_swapcache(folio))' Bugzilla: https://bugzilla.redhat.com/2120352 commit 51b8c1fe250d1bd70c1722dc3c414f5cff2d7cca Author: Johannes Weiner <hannes@cmpxchg.org> Date: Mon Nov 8 18:31:24 2021 -0800 vfs: keep inodes with page cache off the inode shrinker LRU Historically (pre-2.5), the inode shrinker used to reclaim only empty inodes and skip over those that still contained page cache. This caused problems on highmem hosts: struct inode could put fill lowmem zones before the cache was getting reclaimed in the highmem zones. To address this, the inode shrinker started to strip page cache to facilitate reclaiming lowmem. However, this comes with its own set of problems: the shrinkers may drop actively used page cache just because the inodes are not currently open or dirty - think working with a large git tree. It further doesn't respect cgroup memory protection settings and can cause priority inversions between containers. Nowadays, the page cache also holds non-resident info for evicted cache pages in order to detect refaults. We've come to rely heavily on this data inside reclaim for protecting the cache workingset and driving swap behavior. We also use it to quantify and report workload health through psi. The latter in turn is used for fleet health monitoring, as well as driving automated memory sizing of workloads and containers, proactive reclaim and memory offloading schemes. The consequences of dropping page cache prematurely is that we're seeing subtle and not-so-subtle failures in all of the above-mentioned scenarios, with the workload generally entering unexpected thrashing states while losing the ability to reliably detect it. To fix this on non-highmem systems at least, going back to rotating inodes on the LRU isn't feasible. We've tried (commit `a76cf1a474` ("mm: don't reclaim inodes with many attached pages")) and failed (commit `69056ee6a8` ("Revert "mm: don't reclaim inodes with many attached pages"")). The issue is mostly that shrinker pools attract pressure based on their size, and when objects get skipped the shrinkers remember this as deferred reclaim work. This accumulates excessive pressure on the remaining inodes, and we can quickly eat into heavily used ones, or dirty ones that require IO to reclaim, when there potentially is plenty of cold, clean cache around still. Instead, this patch keeps populated inodes off the inode LRU in the first place - just like an open file or dirty state would. An otherwise clean and unused inode then gets queued when the last cache entry disappears. This solves the problem without reintroducing the reclaim issues, and generally is a bit more scalable than having to wade through potentially hundreds of thousands of busy inodes. Locking is a bit tricky because the locks protecting the inode state (i_lock) and the inode LRU (lru_list.lock) don't nest inside the irq-safe page cache lock (i_pages.xa_lock). Page cache deletions are serialized through i_lock, taken before the i_pages lock, to make sure depopulated inodes are queued reliably. Additions may race with deletions, but we'll check again in the shrinker. If additions race with the shrinker itself, we're protected by the i_lock: if find_inode() or iput() win, the shrinker will bail on the elevated i_count or I_REFERENCED; if the shrinker wins and goes ahead with the inode, it will set I_FREEING and inhibit further igets(), which will cause the other side to create a new instance of the inode instead. Link: https://lkml.kernel.org/r/20210614211904.14420-4-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Roman Gushchin <guro@fb.com> Cc: Tejun Heo <tj@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:27:31 -04:00
Patrick Talbert	0e06ec8e0e	Merge: mm: Optimize list lru memory consumption MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/690 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2013413 MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/690 Omitted-fix: b9663a6ff828 ("tools: Add kmem_cache_alloc_lru()") The tools/include/linux/slab.h and the radix-tree tests have not been merged into CS9 yet. This MR backports the upstream patch series "Optimize list lru memory consumption" to reduce memory consumption of kmalloc-32 slab cache on systems with a large number of memory cgroups (containers). In the extreme case, this patch series can save GBs of memory. Signed-off-by: Waiman Long <longman@redhat.com> ~~~ Waiman Long (26): Compiler Attributes: add __alloc_size() for better bounds checking slab: clean up function prototypes slab: add __alloc_size attributes for better bounds checking mm/list_lru.c: prefer struct_size over open coded arithmetic memcg, kmem: further deprecate kmem.limit_in_bytes mm: list_lru: remove holding lru lock mm: list_lru: fix the return value of list_lru_count_one() mm: list_lru: only add memcg-aware lrus to the global lru list memcg: add per-memcg vmalloc stat memcg: add per-memcg total kernel memory stat mm: list_lru: transpose the array of per-node per-memcg lru lists mm: introduce kmem_cache_alloc_lru fs: introduce alloc_inode_sb() to allocate filesystems specific inode fs: allocate inode by using alloc_inode_sb() mm: dcache: use kmem_cache_alloc_lru() to allocate dentry xarray: use kmem_cache_alloc_lru to allocate xa_node mm: memcontrol: move memcg_online_kmem() to mem_cgroup_css_online() mm: list_lru: allocate list_lru_one only when needed mm: list_lru: rename memcg_drain_all_list_lrus to memcg_reparent_list_lrus mm: list_lru: replace linear array with xarray mm: memcontrol: reuse memory cgroup ID for kmem ID mm: memcontrol: fix cannot alloc the maximum memcg ID mm: list_lru: rename list_lru_per_memcg to list_lru_memcg mm: memcontrol: rename memcg_cache_id to memcg_kmem_id slab: remove __alloc_size attribute from __kmalloc_track_caller NFSv4.2: Fix missing removal of SLAB_ACCOUNT on kmem_cache allocation .../admin-guide/cgroup-v1/memory.rst \| 11 +- Documentation/admin-guide/cgroup-v2.rst \| 8 + Documentation/filesystems/porting.rst \| 6 + Makefile \| 15 + block/bdev.c \| 2 +- drivers/dax/super.c \| 2 +- fs/adfs/super.c \| 2 +- fs/affs/super.c \| 2 +- fs/afs/super.c \| 2 +- fs/befs/linuxvfs.c \| 2 +- fs/bfs/inode.c \| 2 +- fs/btrfs/inode.c \| 2 +- fs/ceph/inode.c \| 2 +- fs/cifs/cifsfs.c \| 2 +- fs/coda/inode.c \| 2 +- fs/dcache.c \| 3 +- fs/ecryptfs/super.c \| 2 +- fs/efs/super.c \| 2 +- fs/erofs/super.c \| 2 +- fs/exfat/super.c \| 2 +- fs/ext2/super.c \| 2 +- fs/ext4/super.c \| 2 +- fs/fat/inode.c \| 2 +- fs/freevxfs/vxfs_super.c \| 2 +- fs/fuse/inode.c \| 2 +- fs/gfs2/super.c \| 2 +- fs/hfs/super.c \| 2 +- fs/hfsplus/super.c \| 2 +- fs/hostfs/hostfs_kern.c \| 2 +- fs/hpfs/super.c \| 2 +- fs/hugetlbfs/inode.c \| 2 +- fs/inode.c \| 2 +- fs/isofs/inode.c \| 2 +- fs/jffs2/super.c \| 2 +- fs/jfs/super.c \| 2 +- fs/minix/inode.c \| 2 +- fs/nfs/inode.c \| 2 +- fs/nfs/nfs42xattr.c \| 2 +- fs/nilfs2/super.c \| 2 +- fs/ntfs/inode.c \| 2 +- fs/ocfs2/dlmfs/dlmfs.c \| 2 +- fs/ocfs2/super.c \| 2 +- fs/openpromfs/inode.c \| 2 +- fs/orangefs/super.c \| 2 +- fs/overlayfs/super.c \| 2 +- fs/proc/inode.c \| 2 +- fs/qnx4/inode.c \| 2 +- fs/qnx6/inode.c \| 2 +- fs/reiserfs/super.c \| 2 +- fs/romfs/super.c \| 2 +- fs/squashfs/super.c \| 2 +- fs/sysv/inode.c \| 2 +- fs/ubifs/super.c \| 2 +- fs/udf/super.c \| 2 +- fs/ufs/super.c \| 2 +- fs/vboxsf/super.c \| 2 +- fs/xfs/xfs_icache.c \| 2 +- fs/zonefs/super.c \| 2 +- include/linux/compiler-gcc.h \| 8 + include/linux/compiler_attributes.h \| 10 + include/linux/compiler_types.h \| 12 + include/linux/fs.h \| 11 + include/linux/list_lru.h \| 17 +- include/linux/memcontrol.h \| 63 ++- include/linux/slab.h \| 101 ++-- include/linux/swap.h \| 5 +- include/linux/xarray.h \| 9 +- ipc/mqueue.c \| 2 +- lib/xarray.c \| 10 +- mm/list_lru.c \| 464 ++++++++---------- mm/memcontrol.c \| 213 ++------ mm/shmem.c \| 2 +- mm/slab.c \| 39 +- mm/slab.h \| 25 +- mm/slob.c \| 6 + mm/slub.c \| 42 +- mm/vmalloc.c \| 13 +- mm/workingset.c \| 2 +- net/socket.c \| 2 +- net/sunrpc/rpc_pipe.c \| 2 +- scripts/checkpatch.pl \| 3 +- 81 files changed, 600 insertions(+), 610 deletions(-) Approved-by: Prarit Bhargava <prarit@redhat.com> Approved-by: Aristeu Rozanski <arozansk@redhat.com> Approved-by: Rafael Aquini <aquini@redhat.com> Approved-by: Phil Auld <pauld@redhat.com> Signed-off-by: Patrick Talbert <ptalbert@redhat.com>	2022-05-09 09:48:03 +02:00
Waiman Long	e4e08faece	fs: introduce alloc_inode_sb() to allocate filesystems specific inode Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2013413 Conflicts: A merge conflict for the insertion of <linux/slab.h> hunk into include/linux/fs.h due to missing upstream commit a793d79ea3e0 ("fs: move mapping helpers"). commit 8b9f3ac5b01db85c6cf74c2c3a71280cc3045c9c Author: Muchun Song <songmuchun@bytedance.com> Date: Tue, 22 Mar 2022 14:41:00 -0700 fs: introduce alloc_inode_sb() to allocate filesystems specific inode The allocated inode cache is supposed to be added to its memcg list_lru which should be allocated as well in advance. That can be done by kmem_cache_alloc_lru() which allocates object and list_lru. The file systems is main user of it. So introduce alloc_inode_sb() to allocate file system specific inodes and set up the inode reclaim context properly. The file system is supposed to use alloc_inode_sb() to allocate inodes. In later patches, we will convert all users to the new API. Link: https://lkml.kernel.org/r/20220228122126.37293-4-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Alex Shi <alexs@kernel.org> Cc: Anna Schumaker <Anna.Schumaker@Netapp.com> Cc: Chao Yu <chao@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Fam Zheng <fam.zheng@bytedance.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kari Argillander <kari.argillander@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Waiman Long <longman@redhat.com>	2022-04-07 14:11:12 -04:00
Aristeu Rozanski	cee75af7e2	fs: Remove FS_THP_SUPPORT Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2019485 Tested: ran Rafael's set of sanity tests, other than known issues, seems ok commit ff36da69bc90d80b0c73f47f4b2e270b3ff6da99 Author: Matthew Wilcox (Oracle) <willy@infradead.org> Date: Sun Aug 29 06:07:03 2021 -0400 fs: Remove FS_THP_SUPPORT Instead of setting a bit in the fs_flags to set a bit in the address_space, set the bit in the address_space directly. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>	2022-04-07 09:58:32 -04:00
Rafael Aquini	cf23a735d5	mm: Fully initialize invalidate_lock, amend lock class later Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2023396 This patch is a backport of the following upstream commit: commit 23ca067b3295d935835b71f743235f9e5ab31cc5 Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Date: Wed Sep 1 10:44:03 2021 +0200 mm: Fully initialize invalidate_lock, amend lock class later The function __init_rwsem() is not part of the official API, it just a helper function used by init_rwsem(). Changing the lock's class and name should be done by using lockdep_set_class_and_name() after the has been fully initialized. The overhead of the additional class struct and setting it twice is negligible and it works across all locks. Fully initialize the lock with init_rwsem() and then set the custom class and name for the lock. Fixes: 730633f0b7f95 ("mm: Protect operations adding pages to page cache with invalidate_lock") Link: https://lore.kernel.org/r/20210901084403.g4fezi23cixemlhh@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Rafael Aquini <aquini@redhat.com>	2021-11-29 11:43:50 -05:00
Rafael Aquini	5b5c28b990	fs: inode: count invalidated shadow pages in pginodesteal Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2023396 This patch is a backport of the following upstream commit: commit 7ae12c809f6a31d3da7b96339dbefa141884c711 Author: Johannes Weiner <hannes@cmpxchg.org> Date: Thu Sep 2 14:53:24 2021 -0700 fs: inode: count invalidated shadow pages in pginodesteal pginodesteal is supposed to capture the impact that inode reclaim has on the page cache state. Currently, it doesn't consider shadow pages that get dropped this way, even though this can have a significant impact on paging behavior, memory pressure calculations etc. To improve visibility into these effects, make sure shadow pages get counted when they get dropped through inode reclaim. This changes the return value semantics of invalidate_mapping_pages() semantics slightly, but the only two users are the inode shrinker itsel and a usb driver that logs it for debugging purposes. Link: https://lkml.kernel.org/r/20210614211904.14420-3-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Rafael Aquini <aquini@redhat.com>	2021-11-29 11:40:53 -05:00
Rafael Aquini	4d9de0c3d3	mm: Protect operations adding pages to page cache with invalidate_lock Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2023396 This patch is a backport of the following upstream commit: commit 730633f0b7f951726e87f912a6323641f674ae34 Author: Jan Kara <jack@suse.cz> Date: Thu Jan 28 19:19:45 2021 +0100 mm: Protect operations adding pages to page cache with invalidate_lock Currently, serializing operations such as page fault, read, or readahead against hole punching is rather difficult. The basic race scheme is like: fallocate(FALLOC_FL_PUNCH_HOLE) read / fault / .. truncate_inode_pages_range() <create pages in page cache here> <update fs block mapping and free blocks> Now the problem is in this way read / page fault / readahead can instantiate pages in page cache with potentially stale data (if blocks get quickly reused). Avoiding this race is not simple - page locks do not work because we want to make sure there are no pages in given range. inode->i_rwsem does not work because page fault happens under mmap_sem which ranks below inode->i_rwsem. Also using it for reads makes the performance for mixed read-write workloads suffer. So create a new rw_semaphore in the address_space - invalidate_lock - that protects adding of pages to page cache for page faults / reads / readahead. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Rafael Aquini <aquini@redhat.com>	2021-11-29 11:40:22 -05:00
Hugh Dickins	786b31121a	mm: remove nrexceptional from inode: remove BUG_ON clear_inode()'s BUG_ON(!mapping_empty(&inode->i_data)) is unsafe: we know of two ways in which nodes can and do (on rare occasions) get left behind. Until those are fixed, do not BUG_ON() nor even WARN_ON(). Yes, this will then leak those nodes (or the next user of the struct inode may use them); but this has been happening for years, and the new BUG_ON(!mapping_empty) was only guilty of revealing that. A proper fix will follow, but no hurry. Link: https://lkml.kernel.org/r/alpine.LSU.2.11.2104292229380.16080@eggly.anvils Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-05-05 11:27:20 -07:00
Matthew Wilcox (Oracle)	8bc3c481b3	mm: remove nrexceptional from inode We no longer track anything in nrexceptional, so remove it, saving 8 bytes per inode. Link: https://lkml.kernel.org/r/20201026151849.24232-5-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Vishal Verma <vishal.l.verma@intel.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-05-05 11:27:20 -07:00
Linus Torvalds	34a456eb1f	fs.idmapped.helpers.v5.13 -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYIfiiwAKCRCRxhvAZXjc ogtMAQC+MtgJZdcH5iDHNEyI36JaWUccKRV7PdvfF1YgnXO45gD+IYxR1c/EQQyD kh2AmqhET6jVhe9Nsob5yxduksI+ygo= =oh/d -----END PGP SIGNATURE----- Merge tag 'fs.idmapped.helpers.v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull fs mapping helper updates from Christian Brauner: "This adds kernel-doc to all new idmapping helpers and improves their naming which was triggered by a discussion with some fs developers. Some of the names are based on suggestions by Vivek and Al. Also remove the open-coded permission checking in a few places with simple helpers. Overall this should lead to more clarity and make it easier to maintain" * tag 'fs.idmapped.helpers.v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: fs: introduce two inode i_{u,g}id initialization helpers fs: introduce fsuidgid_has_mapping() helper fs: document and rename fsid helpers fs: document mapping helpers	2021-04-27 12:49:42 -07:00
Miklos Szeredi	51db776a43	vfs: remove unused ioctl helpers Remove vfs_ioc_setflags_prepare(), vfs_ioc_fssetxattr_check() and simple_fill_fsxattr(), which are no longer used. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-04-12 15:04:30 +02:00
Christian Brauner	db998553cf	fs: introduce two inode i_{u,g}id initialization helpers Give filesystem two little helpers that do the right thing when initializing the i_uid and i_gid fields on idmapped and non-idmapped mounts. Filesystems shouldn't have to be concerned with too many details. Link: https://lore.kernel.org/r/20210320122623.599086-5-christian.brauner@ubuntu.com Inspired-by: Vivek Goyal <vgoyal@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>	2021-03-23 11:15:26 +01:00
Christian Brauner	a65e58e791	fs: document and rename fsid helpers Vivek pointed out that the fs{g,u}id_into_mnt() naming scheme can be misleading as it could be understood as implying they do the exact same thing as i_{g,u}id_into_mnt(). The original motivation for this naming scheme was to signal to callers that the helpers will always take care to map the k{g,u}id such that the ownership is expressed in terms of the mnt_users. Get rid of the confusion by renaming those helpers to something more sensible. Al suggested mapped_fs{g,u}id() which seems a really good fit. Usually filesystems don't need to bother with these helpers directly only in some cases where they allocate objects that carry {g,u}ids which are either filesystem specific (e.g. xfs quota objects) or don't have a clean set of helpers as inodes have. Link: https://lore.kernel.org/r/20210320122623.599086-3-christian.brauner@ubuntu.com Inspired-by: Vivek Goyal <vgoyal@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>	2021-03-23 11:13:32 +01:00

1 2 3 4 5 ...

461 Commits