Centos-kernel-stream-9

Commit Graph

Author	SHA1	Message	Date
Jeff Moyer	5331d880be	Add do_ftruncate that truncates a struct file JIRA: https://issues.redhat.com/browse/RHEL-64867 Conflicts: Slight context difference due to out-of-order backport of commit abf08576afe3 ("fs: port vfs_*() helpers to struct mnt_idmap") commit 5f0d594c602f870e3a3872f7ea42bf846a1d26cf Author: Tony Solomonik <tony.solomonik@gmail.com> Date: Fri Feb 2 14:17:23 2024 +0200 Add do_ftruncate that truncates a struct file do_sys_ftruncate receives a file descriptor, fgets the struct file, and finally actually truncates the file. do_ftruncate allows for passing in a file directly, with the caller already holding a reference to it. Signed-off-by: Tony Solomonik <tony.solomonik@gmail.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20240202121724.17461-2-tony.solomonik@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2024-11-28 15:34:44 -05:00
Ian Kent	9cd7658c81	nfs: use vfs setgid helper JIRA: https://issues.redhat.com/browse/RHEL-33888 Upstream status: Linus commit 4f704d9a8352f5c0a8fcdb6213b934630342bd44 Author: Christian Brauner <brauner@kernel.org> Date: Tue Mar 14 12:51:10 2023 +0100 nfs: use vfs setgid helper We've aligned setgid behavior over multiple kernel releases. The details can be found in the following two merge messages: cf619f891971 ("Merge tag 'fs.ovl.setgid.v6.2') 426b4ca2d6a5 ("Merge tag 'fs.setgid.v6.0') Consistent setgid stripping behavior is now encapsulated in the setattr_should_drop_sgid() helper which is used by all filesystems that strip setgid bits outside of vfs proper. Switch nfs to rely on this helper as well. Without this patch the setgid stripping tests in xfstests will fail. Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Message-Id: <20230313-fs-nfs-setgid-v2-1-9a59f436cfc0@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-16 14:20:01 +08:00
Ian Kent	69f3621dc7	fs: move mnt_idmap JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus commit 3707d84c13670bf09b4a9a4dc6733326d8344b31 Author: Christian Brauner <brauner@kernel.org> Date: Fri Jan 13 12:49:33 2023 +0100 fs: move mnt_idmap Now that we converted everything to just rely on struct mnt_idmap move it all into a separate file. This ensure that no code can poke around in struct mnt_idmap without any dedicated helpers and makes it easier to extend it in the future. Filesystems will now not be able to conflate mount and filesystem idmappings as they are two distinct types and require distinct helpers that cannot be used interchangeably. We are now also able to extend struct mnt_idmap as we see fit. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-16 14:19:56 +08:00
Ian Kent	edf17476c7	fs: port privilege checking helpers to mnt_idmap JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus Conflicts: For consistency drop btrfs hunks because it isn't supported in CentOS Stream and other backports also drop such hunks. Upstream merge commit 05e6295f7b5e0 ("Merge tag 'fs.idmapped.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping") together with Upstream commit facd61053cff1 ("fuse: fixes after adapting to new posix acl api") results in a conflict in fs/fuse/acl.c, adjust to suit. commit 9452e93e6dae862d7aeff2b11236d79bde6f9b66 Author: Christian Brauner <brauner@kernel.org> Date: Fri Jan 13 12:49:27 2023 +0100 fs: port privilege checking helpers to mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-16 10:45:31 +08:00
Ian Kent	304ec491ee	fs: port ->permission() to pass mnt_idmap JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus Conflicts: For consistency drop btrfs hunks because it isn't supported in CentOS Stream and other backports also drop such hunks. CentOS Stream commit `48fa94aacd` ("ceph: fscrypt_auth handling for ceph") is presnt which causes fuzz 2 in hunk #1 in fs/ceph/super.h. Upstream commit 427505ffeaa46 ("exportfs: use pr_debug for unreachable debug statements") is not present causing fuzz 2 in hunk #1 against fs/exportfs/expfs.c. Dropped hunks for ksmbd because the source is not present in the CentOS Stream source tree. Upstream commit 03fa86e9f79d8 ("namei: stash the sampled ->d_seq into nameidata") is not present causing a fuzz 1 for hunk #14 against fs/namei.c. CentOS Stream `c4f3dd0731` ("nfsd: handle failure to collect pre/post-op attrs more sanely") is present and causes a rejects for hunks #4 and #5 against fs/nfsd/vfs.c, apply manually. Dropped hunks for ntfs3 because the source is not present in the CentOS Stream source tree. CentOS Stream commit `98ba731fc7` ("ovl: Move xattr support to new xattrs.c file") moves ovl_xattr_set() and ovl_xattr_get() from fs/overlayfs/inode.c to fs/overlayfs/xattrs.c which causes hunks #4 and #5 to fail, manually apply to fs/overlayfs/xattrs.c. CentOS Stream commit `55177e4b83` ("ovl: mark xwhiteouts directory with overlay.opaque='x'") and commit `d17b324bb6` ("ovl: use ovl_numlower() and ovl_lowerstack() accessors") change the first and third hunks of fs/overlayfs/namei.c causing them to fail, manually apply. CentOS Stream commit `98ba731fc7` ("ovl: Move xattr support to new xattrs.c file") causes fuzz 2 in hunk #5 of fs/overlayfs/overlayfs.h CentOS Stream commit `355a9c490a` ("ovl: Add an alternative type of whiteout") changes ovl_cache_update_ino() to ovl_cache_update() in fs/overlayfs/readdir.c, make the change manually. Upstream commit 217af7e2f4deb ("apparmor: refactor profile rules and attachments") is not in CentOS Stream causing hunk #1 to fail to apply so manually apply the change. commit 4609e1f18e19c3b302e1eb4858334bca1532f780 Author: Christian Brauner <brauner@kernel.org> Date: Fri Jan 13 12:49:22 2023 +0100 fs: port ->permission() to pass mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-16 10:45:20 +08:00
Ian Kent	a050a48e12	may_linkat(): constify path JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus commit 8996682b10ff4de4f6f36fc81211f0a1c0437495 Author: Al Viro <viro@zeniv.linux.org.uk> Date: Thu Aug 4 12:53:46 2022 -0400 may_linkat(): constify path Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-16 10:45:19 +08:00
Ian Kent	d584d976a2	acl: conver higher-level helpers to rely on mnt_idmap JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus commit 5a6f52d20ce3cd6d30103a27f18edff337da191b Author: Christian Brauner <brauner@kernel.org> Date: Fri Oct 28 09:56:20 2022 +0200 acl: conver higher-level helpers to rely on mnt_idmap Convert an initial portion to rely on struct mnt_idmap by converting the high level xattr helpers. Reviewed-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-16 08:29:49 +08:00
Ian Kent	dc1f3bea48	attr: use consistent sgid stripping checks JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus commit ed5a7047d2011cb6b2bf84ceb6680124cc6a7d95 Author: Christian Brauner <brauner@kernel.org> Date: Mon Oct 17 17:06:37 2022 +0200 attr: use consistent sgid stripping checks Currently setgid stripping in file_remove_privs()'s should_remove_suid() helper is inconsistent with other parts of the vfs. Specifically, it only raises ATTR_KILL_SGID if the inode is S_ISGID and S_IXGRP but not if the inode isn't in the caller's groups and the caller isn't privileged over the inode although we require this already in setattr_prepare() and setattr_copy() and so all filesystem implement this requirement implicitly because they have to use setattr_{prepare,copy}() anyway. But the inconsistency shows up in setgid stripping bugs for overlayfs in xfstests (e.g., generic/673, generic/683, generic/685, generic/686, generic/687). For example, we test whether suid and setgid stripping works correctly when performing various write-like operations as an unprivileged user (fallocate, reflink, write, etc.): echo "Test 1 - qa_user, non-exec file $verb" setup_testfile chmod a+rws $junk_file commit_and_check "$qa_user" "$verb" 64k 64k The test basically creates a file with 6666 permissions. While the file has the S_ISUID and S_ISGID bits set it does not have the S_IXGRP set. On a regular filesystem like xfs what will happen is: sys_fallocate() -> vfs_fallocate() -> xfs_file_fallocate() -> file_modified() -> __file_remove_privs() -> dentry_needs_remove_privs() -> should_remove_suid() -> __remove_privs() newattrs.ia_valid = ATTR_FORCE \| kill; -> notify_change() -> setattr_copy() In should_remove_suid() we can see that ATTR_KILL_SUID is raised unconditionally because the file in the test has S_ISUID set. But we also see that ATTR_KILL_SGID won't be set because while the file is S_ISGID it is not S_IXGRP (see above) which is a condition for ATTR_KILL_SGID being raised. So by the time we call notify_change() we have attr->ia_valid set to ATTR_KILL_SUID \| ATTR_FORCE. Now notify_change() sees that ATTR_KILL_SUID is set and does: ia_valid = attr->ia_valid \|= ATTR_MODE attr->ia_mode = (inode->i_mode & ~S_ISUID); which means that when we call setattr_copy() later we will definitely update inode->i_mode. Note that attr->ia_mode still contains S_ISGID. Now we call into the filesystem's ->setattr() inode operation which will end up calling setattr_copy(). Since ATTR_MODE is set we will hit: if (ia_valid & ATTR_MODE) { umode_t mode = attr->ia_mode; vfsgid_t vfsgid = i_gid_into_vfsgid(mnt_userns, inode); if (!vfsgid_in_group_p(vfsgid) && !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID)) mode &= ~S_ISGID; inode->i_mode = mode; } and since the caller in the test is neither capable nor in the group of the inode the S_ISGID bit is stripped. But assume the file isn't suid then ATTR_KILL_SUID won't be raised which has the consequence that neither the setgid nor the suid bits are stripped even though it should be stripped because the inode isn't in the caller's groups and the caller isn't privileged over the inode. If overlayfs is in the mix things become a bit more complicated and the bug shows up more clearly. When e.g., ovl_setattr() is hit from ovl_fallocate()'s call to file_remove_privs() then ATTR_KILL_SUID and ATTR_KILL_SGID might be raised but because the check in notify_change() is questioning the ATTR_KILL_SGID flag again by requiring S_IXGRP for it to be stripped the S_ISGID bit isn't removed even though it should be stripped: sys_fallocate() -> vfs_fallocate() -> ovl_fallocate() -> file_remove_privs() -> dentry_needs_remove_privs() -> should_remove_suid() -> __remove_privs() newattrs.ia_valid = ATTR_FORCE \| kill; -> notify_change() -> ovl_setattr() // TAKE ON MOUNTER'S CREDS -> ovl_do_notify_change() -> notify_change() // GIVE UP MOUNTER'S CREDS // TAKE ON MOUNTER'S CREDS -> vfs_fallocate() -> xfs_file_fallocate() -> file_modified() -> __file_remove_privs() -> dentry_needs_remove_privs() -> should_remove_suid() -> __remove_privs() newattrs.ia_valid = attr_force \| kill; -> notify_change() The fix for all of this is to make file_remove_privs()'s should_remove_suid() helper to perform the same checks as we already require in setattr_prepare() and setattr_copy() and have notify_change() not pointlessly requiring S_IXGRP again. It doesn't make any sense in the first place because the caller must calculate the flags via should_remove_suid() anyway which would raise ATTR_KILL_SGID. While we're at it we move should_remove_suid() from inode.c to attr.c where it belongs with the rest of the iattr helpers. Especially since it returns ATTR_KILL_S{G,U}ID flags. We also rename it to setattr_should_drop_suidgid() to better reflect that it indicates both setuid and setgid bit removal and also that it returns attr flags. Running xfstests with this doesn't report any regressions. We should really try and use consistent checks. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-15 16:12:34 +08:00
Ian Kent	94eb87da65	attr: add setattr_should_drop_sgid() JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus commit 72ae017c5451860443a16fb2a8c243bff3e396b8 Author: Christian Brauner <brauner@kernel.org> Date: Mon Oct 17 17:06:36 2022 +0200 attr: add setattr_should_drop_sgid() The current setgid stripping logic during write and ownership change operations is inconsistent and strewn over multiple places. In order to consolidate it and make more consistent we'll add a new helper setattr_should_drop_sgid(). The function retains the old behavior where we remove the S_ISGID bit unconditionally when S_IXGRP is set but also when it isn't set and the caller is neither in the group of the inode nor privileged over the inode. We will use this helper both in write operation permission removal such as file_remove_privs() as well as in ownership change operations. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-15 16:12:26 +08:00
Ian Kent	d8268a324b	attr: add in_group_or_capable() JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus Conflicts: CentOS Stream has commit `42bfe37a25` ("fs: add ctime accessors infrastructure") so adjust the context to match. commit 11c2a8700cdcabf9b639b7204a1e38e2a0b6798e Author: Christian Brauner <brauner@kernel.org> Date: Mon Oct 17 17:06:34 2022 +0200 attr: add in_group_or_capable() In setattr_{copy,prepare}() we need to perform the same permission checks to determine whether we need to drop the setgid bit or not. Instead of open-coding it twice add a simple helper the encapsulates the logic. We will reuse this helpers to make dropping the setgid bit during write operations more consistent in a follow up patch. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-15 16:12:24 +08:00
Ian Kent	8c7e81cebd	xattr: use posix acl api JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus commit 318e66856ddec05384f32d60b5598128289f4e7b Author: Christian Brauner <brauner@kernel.org> Date: Thu Sep 22 17:17:22 2022 +0200 xattr: use posix acl api In previous patches we built a new posix api solely around get and set inode operations. Now that we have all the pieces in place we can switch the system calls and the vfs over to only rely on this api when interacting with posix acls. This finally removes all type unsafety and type conversion issues explained in detail in [1] that we aim to get rid of. With the new posix acl api we immediately translate into an appropriate kernel internal struct posix_acl format both when getting and setting posix acls. This is a stark contrast to before were we hacked unsafe raw values into the uapi struct that was stored in a void pointer relying and having filesystems and security modules hack around in the uapi struct as well. Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1] Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-15 16:12:09 +08:00
Ian Kent	1567cdbcd9	internal: add may_write_xattr() JIRA: https://issues.redhat.com/browse/RHEL-33888 Status: Linus commit 56851bc9b9f072dd738f25ed29c0d5abe9f2908b Author: Christian Brauner <brauner@kernel.org> Date: Thu Sep 29 10:47:36 2022 +0200 internal: add may_write_xattr() Split out the generic checks whether an inode allows writing xattrs. Since security.* and system.* xattrs don't have any restrictions and we're going to split out posix acls into a dedicated api we will use this helper to check whether we can write posix acls. Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ian Kent <ikent@redhat.com>	2024-10-15 16:11:53 +08:00
Jeff Moyer	6625cdcc16	file: remove pointless wrapper JIRA: https://issues.redhat.com/browse/RHEL-27755 Conflicts: RHEL is missing commit ed192c59f869 ("file: mostly eliminate spurious relocking in __range_close"), which causes some context differences. commit 24fa3ae9467f49dd9698fd884f2c6b13cc8ea12d Author: Christian Brauner <brauner@kernel.org> Date: Thu Nov 30 13:49:08 2023 +0100 file: remove pointless wrapper Only io_uring uses __close_fd_get_file(). All it does is hide current->files but io_uring accesses files_struct directly right now anyway so it's a bit pointless. Just rename pick_file() to file_close_fd_locked() and let io_uring use it. Add a lockdep assert in there that we expect the caller to hold file_lock while we're at it. Link: https://lore.kernel.org/r/20231130-vfs-files-fixes-v1-2-e73ca6f4ea83@kernel.org Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2024-07-02 10:12:34 -04:00
Ming Lei	5c2c363ba8	fs: move sb_init_dio_done_wq out of direct-io.c JIRA: https://issues.redhat.com/browse/RHEL-29564 commit 439bc39b3cf0014b1b75075812f7ef0f8baa9674 Author: Christoph Hellwig <hch@lst.de> Date: Wed Jan 25 07:58:38 2023 +0100 fs: move sb_init_dio_done_wq out of direct-io.c sb_init_dio_done_wq is also used by the iomap code, so move it to super.c in preparation for building direct-io.c conditionally. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230125065839.191256-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2024-04-17 10:06:22 +08:00
Ming Lei	e749b02780	fs: remove emergency_thaw_bdev JIRA: https://issues.redhat.com/browse/RHEL-29564 commit 4a8b719f95c0dcd15fb7a04b806ad8139fa7c850 Author: Christoph Hellwig <hch@lst.de> Date: Tue Aug 1 19:21:56 2023 +0200 fs: remove emergency_thaw_bdev Fold emergency_thaw_bdev into it's only caller, to prepare for buffer.c to be built only when buffer_head support is enabled. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230801172201.1923299-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2024-04-17 09:52:44 +08:00
Ming Lei	46e90a4fe5	super: make locking naming consistent JIRA: https://issues.redhat.com/browse/RHEL-29564 commit d8ce82efdece373b570f35acc8a29487b2087b84 Author: Christian Brauner <brauner@kernel.org> Date: Fri Aug 18 16:00:49 2023 +0200 super: make locking naming consistent Make the naming consistent with the earlier introduced super_lock_{read,write}() helpers. Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230818-vfs-super-fixes-v3-v3-2-9f0b1876e46b@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2024-04-17 09:46:43 +08:00
Ming Lei	c3745bf638	fs: simplify invalidate_inodes JIRA: https://issues.redhat.com/browse/RHEL-29564 commit e127b9bccdb04e5fc4444431de37309a68aedafa Author: Christoph Hellwig <hch@lst.de> Date: Fri Aug 11 12:08:28 2023 +0200 fs: simplify invalidate_inodes kill_dirty has always been true for a long time, so hard code it and remove the unused return value. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Message-Id: <20230811100828.1897174-18-hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2024-04-17 09:46:43 +08:00
Ming Lei	58e6b9cfbf	dentry.h: trim externs JIRA: https://issues.redhat.com/browse/RHEL-29564 commit 0d486510f86eb8162022ed61e6dc424a10909a10 Author: Al Viro <viro@zeniv.linux.org.uk> Date: Fri Nov 10 15:22:40 2023 -0500 dentry.h: trim externs d_instantiate_unique() had been gone for 7 years; __d_lookup...() and shrink_dcache_for_umount() are fs/internal.h fodder. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2024-04-17 09:46:41 +08:00
Chris von Recklinghausen	a62c96b4b2	don't use __kernel_write() on kmap_local_page() JIRA: https://issues.redhat.com/browse/RHEL-1848 commit 06bbaa6dc53cb72040db952053432541acb9adc7 Author: Al Viro <viro@zeniv.linux.org.uk> Date: Mon Sep 26 11:59:14 2022 -0400 [coredump] don't use __kernel_write() on kmap_local_page() passing kmap_local_page() result to __kernel_write() is unsafe - random ->write_iter() might (and 9p one does) get unhappy when passed ITER_KVEC with pointer that came from kmap_local_page(). Fix by providing a variant of __kernel_write() that takes an iov_iter from caller (__kernel_write() becomes a trivial wrapper) and adding dump_emit_page() that parallels dump_emit(), except that instead of __kernel_write() it uses __kernel_write_iter() with ITER_BVEC source. Fixes: `3159ed5779` "fs/coredump: use kmap_local_page()" Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2023-10-20 06:13:08 -04:00
Jeff Moyer	8a92fcb818	fs: export rw_verify_area() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2068237 commit 871129332d74c9e94bd110932ac4445833995639 Author: Omar Sandoval <osandov@fb.com> Date: Wed Sep 4 12:13:25 2019 -0700 fs: export rw_verify_area() I'm adding btrfs ioctls to read and write compressed data, and rather than duplicating the checks in rw_verify_area(), let's just export it. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2023-04-29 04:47:02 -04:00
Jan Stancek	f302196b1b	Merge: io_uring: update to v5.19 MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/2190 Sync up our io_uring code with upstream v5.19, but do not enable it. The goal is to be bug-for-bug compatible with this version of the code. I'll post further MRs that will sync to later releases, and then a final MR with remaining fixes. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2123490 Omitted-fix: df6d3422d3ee ("io_uring/kbuf: fix not advancing READV kbuf ring") Fixes will be pulled in by later merge requests Omitted-fix: 9d94c04c0db0 ("io_uring/filetable: fix file reference underflow") Fixes will be pulled in by later merge requests Omitted-fix: 48ba08374e77 ("io_uring: fix size calculation when registering buf ring") Fixes will be pulled in by later merge requests Omitted-fix: 36632d062975 ("io_uring: Replace 0-length array with flexible array") Fixes will be pulled in by later merge requests Omitted-fix: 336d28a8f380 ("io_uring: recycle kbuf recycle on tw requeue") Fixes will be pulled in by later merge requests Omitted-fix: 91482864768a ("io_uring: fix multishot accept request leaks") Fixes will be pulled in by later merge requests Omitted-fix: dd9373402280 ("Smack: Provide read control for io_uring_cmd") Fixes will be pulled in by later merge requests Omitted-fix: f4d653dcaa4e ("selinux: implement the security_uring_cmd() LSM hook") Fixes will be pulled in by later merge requests Omitted-fix: 2a5840124009 ("lsm,io_uring: add LSM hooks for the new uring_cmd file op") Fixes will be pulled in by later merge requests Omitted-fix: 3b8fdd1dc35e ("io_uring/fdinfo: fix sqe dumping for IORING_SETUP_SQE128") Fixes will be pulled in by later merge requests Omitted-fix: 00927931cb63 ("io_uring: fix fdinfo sqe offsets calculation") Fixes will be pulled in by later merge requests Omitted-fix: 9d2789ac9d60 ("block/io_uring: pass in issue_flags for uring_cmd task_work handling") Fixes will be pulled in by later merge requests Omitted-fix: 02a4d923e440 ("io_uring/rsrc: fix null-ptr-deref in io_file_bitmap_get()") Fixes will be pulled in by later merge requests Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Approved-by: Brian Foster <bfoster@redhat.com> Approved-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Jan Stancek <jstancek@redhat.com>	2023-04-13 07:46:41 +02:00
Chris von Recklinghausen	0fd18f7da9	fs/buffer: Convert __block_write_begin_int() to take a folio Bugzilla: https://bugzilla.redhat.com/2160210 commit d1bd0b4ebfe0521964e6937195bd2f76866660c7 Author: Matthew Wilcox (Oracle) <willy@infradead.org> Date: Wed Nov 3 14:05:47 2021 -0400 fs/buffer: Convert __block_write_begin_int() to take a folio There are no plans to convert buffer_head infrastructure to use large folios, but __block_write_begin_int() is called from iomap, and it's more convenient and less error-prone if we pass in a folio from iomap. It also has a nice saving of almost 200 bytes of code from removing repeated calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2023-03-24 11:18:43 -04:00
Jeff Moyer	38f8af8bb4	Unify the primitives for file descriptor closing Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2123490 commit 6319194ec57b0452dcda4589d24c4e7db299c5bf Author: Al Viro <viro@zeniv.linux.org.uk> Date: Thu May 12 17:08:03 2022 -0400 Unify the primitives for file descriptor closing Currently we have 3 primitives for removing an opened file from descriptor table - pick_file(), __close_fd_get_file() and close_fd_get_file(). Their calling conventions are rather odd and there's a code duplication for no good reason. They can be unified - 1) have __range_close() cap max_fd in the very beginning; that way we don't need separate way for pick_file() to report being past the end of descriptor table. 2) make {__,}close_fd_get_file() return file (or NULL) directly, rather than returning it via struct file ** argument. Don't bother with (bogus) return value - nobody wants that -ENOENT. 3) make pick_file() return NULL on unopened descriptor - the only caller that used to care about the distinction between descriptor past the end of descriptor table and finding NULL in descriptor table doesn't give a damn after (1). 4) lift ->files_lock out of pick_file() That actually simplifies the callers, as well as the primitives themselves. Code duplication is also gone... Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2023-03-16 16:32:34 -04:00
Jeff Moyer	88e5e1f8d2	fs: split off do_getxattr from getxattr Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2123490 commit c975cad931570004b5f51248424a2a696fb65630 Author: Stefan Roesch <shr@fb.com> Date: Sun Apr 24 18:13:50 2022 -0600 fs: split off do_getxattr from getxattr This splits off do_getxattr function from the getxattr function. This will allow io_uring to call it from its io worker. Signed-off-by: Stefan Roesch <shr@fb.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/r/20220323154420.3301504-3-shr@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2023-03-16 08:25:43 -04:00
Jeff Moyer	c38214410c	fs: split off setxattr_copy and do_setxattr function from setxattr Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2123490 Conflicts: RHEL does not have 705191b03d50 ("fs: fix acl translation"), which added an argument to posix_acl_fix_xattr_from_user. Keep the old calling convention. commit 1a91794ce8481a293c5ef432feb440aee1455619 Author: Stefan Roesch <shr@fb.com> Date: Sun Apr 24 18:10:46 2022 -0600 fs: split off setxattr_copy and do_setxattr function from setxattr This splits of the setup part of the function setxattr in its own dedicated function called setxattr_copy. In addition it also exposes a new function called do_setxattr for making the setxattr call. This makes it possible to call these two functions from io_uring in the processing of an xattr request. Signed-off-by: Stefan Roesch <shr@fb.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/r/20220323154420.3301504-2-shr@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2023-03-16 08:24:43 -04:00
Jeff Moyer	7758b47b36	io-uring: Make statx API stable Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2113073 commit 1b6fe6e0dfecf8c82a64fb87148ad9333fa2f62e Author: Stefan Roesch <shr@fb.com> Date: Fri Feb 25 10:53:26 2022 -0800 io-uring: Make statx API stable One of the key architectual tenets is to keep the parameters for io-uring stable. After the call has been submitted, its value can be changed. Unfortunaltely this is not the case for the current statx implementation. IO-Uring change: This changes replaces the const char * filename pointer in the io_statx structure with a struct filename . In addition it also creates the filename object during the prepare phase. With this change, the opcode also needs to invoke cleanup, so the filename object gets freed after processing the request. fs change: This replaces the const char __user filename parameter in the two functions do_statx and vfs_statx with a struct filename *. In addition to be able to correctly construct a filename object a new helper function getname_statx_lookup_flags is introduced. The function makes sure that do_statx and vfs_statx is invoked with the correct lookup flags. Signed-off-by: Stefan Roesch <shr@fb.com> Link: https://lore.kernel.org/r/20220225185326.1373304-2-shr@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2022-11-08 09:31:48 -05:00
Frantisek Hrbata	49fc72059c	Merge: iomap update to v5.16 Merge conflicts: ----------------- fs/iomap/buffered-io.c - to_iomap_page() HEAD(!1370) contains `4b86405d81` ("iomap: Convert to_iomap_page to take a folio") which is missing in !1417. Resolved in favor of HEAD(!1370) fs/iomap/direct-io.c - iomap_dio_bio_iter() Keep changes from !1417, but remove definition of align variable, because this was removed in HEAD(!1407) by `73d48cec18` ("iomap: add support for dma aligned direct-io") MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1417 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2130933 Update iomap code to upstream v5.16 Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Approved-by: Andreas Gruenbacher <agruenba@redhat.com> Approved-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>	2022-11-08 02:23:49 -05:00
Frantisek Hrbata	e9e9bc8da2	Merge: mm changes through v5.18 for 9.2 Merge conflicts: ----------------- Conflicts with !1142(merged) "io_uring: update to v5.15" fs/io-wq.c - static bool io_wqe_create_worker(struct io_wqe wqe, struct io_wqe_acct acct) !1142 already contains backport of 3146cba99aa2 ("io-wq: make worker creation resilient against signals") along with other commits which are not present in !1370. Resolved in favor of HEAD(!1142) - static int io_wqe_worker(void data) !1370 does not contain 767a65e9f317 ("io-wq: fix potential race of acct->nr_workers") Resolved in favor of HEAD(!1142) - static void io_init_new_worker(struct io_wqe wqe, struct io_worker worker, HEAD(!1142) does not contain e32cf5dfbe22 ("kthread: Generalize pf_io_worker so it can point to struct kthread") Resolved in favor of !1370 - static void create_worker_cont(struct callback_head cb) !1370 does not contain 66e70be72288 ("io-wq: fix memory leak in create_io_worker()") Resolved in favor of HEAD(!1142) - static void io_workqueue_create(struct work_struct work) !1370 does not contain 66e70be72288 ("io-wq: fix memory leak in create_io_worker()") Resolved in favor of HEAD(!1142) - static bool create_io_worker(struct io_wq wq, struct io_wqe wqe, int index) !1370 does not contain 66e70be72288 ("io-wq: fix memory leak in create_io_worker()") Resolved in favor of HEAD(!1142) - static bool io_wq_work_match_item(struct io_wq_work work, void data) !1370 does not contain 713b9825a4c4 ("io-wq: fix cancellation on create-worker failure") Resolved in favor of HEAD(!1142) - static void io_wqe_enqueue(struct io_wqe wqe, struct io_wq_work work) !1370 is missing 713b9825a4c4 ("io-wq: fix cancellation on create-worker failure") removed wrongly merged run_cancel label Resolved in favor of HEAD(!1142) - static bool io_task_work_match(struct callback_head cb, void data) !1370 is missing 3b33e3f4a6c0 ("io-wq: fix silly logic error in io_task_work_match()") Resolved in favor of HEAD(!1142) - static void io_wq_exit_workers(struct io_wq wq) !1370 is missing 3b33e3f4a6c0 ("io-wq: fix silly logic error in io_task_work_match()") Resolved in favor of HEAD(!1142) - int io_wq_max_workers(struct io_wq wq, int new_count) !1370 is missing 3b33e3f4a6c0 ("io-wq: fix silly logic error in io_task_work_match()") fs/io_uring.c - static int io_register_iowq_max_workers(struct io_ring_ctx ctx, !1370 is missing bunch of commits after 2e480058ddc2 ("io-wq: provide a way to limit max number of workers") Resolved in favor of HEAD(!1142) include/uapi/linux/io_uring.h - !1370 is missing dd47c104533d ("io-wq: provide IO_WQ_ constants for IORING_REGISTER_IOWQ_MAX_WORKERS arg items") just a comment conflict Resolved in favor of HEAD(!1142) kernel/exit.c - void __noreturn do_exit(long code) - !1370 contains bunch of commits after f552a27afe67 ("io_uring: remove files pointer in cancellation functions") Resolved in favor of !1370 Conflicts with !1357(merged) "NFS refresh for RHEL-9.2" fs/nfs/callback.c - nfs4_callback_svc(void vrqstp) !1370 is missing f49169c97fce ("NFSD: Remove svc_serv_ops::svo_module") where the module_put_and_kthread_exit() was removed Resolved in favor of HEAD(!1357) fs/nfs/file.c !1357 is missing 187c82cb0380 ("fs: Convert trivial uses of __set_page_dirty_nobuffers to filemap_dirty_folio") Resolved in favor of HEAD(!1370) fs/nfsd/nfssvc.c - nfsd(void vrqstp) !1370 is missing f49169c97fce ("NFSD: Remove svc_serv_ops::svo_module") Resolved in favor of HEAD(!1357) ----------------- MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1370 Bugzilla: https://bugzilla.redhat.com/2120352 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2099722 Patches 1-9 are changes to selftests Patches 10-31 are reverts of RHEL-only patches to address COR CVE Patches 32-320 are the machine dependent mm changes ported by Rafael Patch 321 reverts the backport of 6692c98c7df5. See below. Patches 322-981 are the machine independent mm changes Patches 982-1016 are David Hildebrand's upstream changes to address the COR CVE RHEL commit `b23c298982` fork: Stop protecting back_fork_cleanup_cgroup_lock with CONFIG_NUMA which is a backport of upstream 6692c98c7df5 and is reverted early in this series. 6692c98c7df5 is a fix for upstream 40966e316f86 which was not in RHEL until this series. 6692c98c7df5 is re-added after 40966e316f86. Omitted-fix: 310d1344e3c5 ("Revert "powerpc: Remove unused FW_FEATURE_NATIVE references"") to be fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2131716 Omitted-fix: 465d0eb0dc31 ("Docs/admin-guide/mm/damon/usage: fix the example code snip") to be fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2131716 Omitted-fix: 317314527d17 ("mm/hugetlb: correct demote page offset logic") to be fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2131716 Omitted-fix: 37dcc673d065 ("frontswap: don't call ->init if no ops are registered") to be fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2131716 Omitted-fix: 30c19366636f ("mm: fix BUG splat with kvmalloc + GFP_ATOMIC") to be fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2131716 Omitted: fix: fa84693b3c89 io_uring: ensure IORING_REGISTER_IOWQ_MAX_WORKERS works with SQPOLL fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107656 Omitted-fix: 009ad9f0c6ee io_uring: drop ctx->uring_lock before acquiring sqd->lock fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107656 Omitted-fix: bc369921d670 io-wq: max_worker fixes fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107743 Omitted-fix: e139a1ec92f8 io_uring: apply max_workers limit to all future users fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107743 Omitted-fix: 71c9ce27bb57 io-wq: fix max-workers not correctly set on multi-node system fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107743 Omitted-fix: 41d3a6bd1d37 io_uring: pin SQPOLL data before unlocking ring lock fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107656 Omitted-fix: bad119b9a000 io_uring: honour zeroes as io-wq worker limits fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107743 Omitted-fix: 08bdbd39b584 io-wq: ensure that hash wait lock is IRQ disabling fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107656 Omitted-fix: 713b9825a4c4 io-wq: fix cancellation on create-worker failure fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107656 Omitted-fix: 3b33e3f4a6c0 io-wq: fix silly logic error in io_task_work_match() fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107656 Omitted-fix: 71e1cef2d794 io-wq: Remove duplicate code in io_workqueue_create() fixed under https://bugzilla.redhat.com/show_bug.cgi?id=210774 Omitted-fix: a226abcd5d42 io-wq: don't retry task_work creation failure on fatal conditions fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107743 Omitted-fix: fa84693b3c89 io_uring: ensure IORING_REGISTER_IOWQ_MAX_WORKERS works with SQPOLL fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107656 Omitted-fix: dd47c104533d io-wq: provide IO_WQ_* constants for IORING_REGISTER_IOWQ_MAX_WORKERS arg items fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107656 Omitted-fix: 4f0712ccec09 hexagon: Fix function name in die() unsupported arch Omitted-fix: 751971af2e36 csky: Fix function name in csky_alignment() and die() unsupported arch Omitted-fix: dcbc65aac283 ptrace: Remove duplicated include in ptrace.c unsupported arch Omitted-fix: eb48d4219879 drm/i915: Fix oops due to missing stack depot fixed in RHEL commit `105d2d4832` Merge DRM changes from upstream v5.16..v5.17 Omitted-fix: 751a9d69b197 drm/i915: Fix oops due to missing stack depot fixed in RHEL commit `99fc716fc4` Merge DRM changes from upstream v5.17..v5.18 Omitted-fix: eb48d4219879 drm/i915: Fix oops due to missing stack depot fixed in RHEL commit `105d2d4832` Merge DRM changes from upstream v5.16..v5.17 Omitted-fix: 751a9d69b197 drm/i915: Fix oops due to missing stack depot fixed in RHEL commit `99fc716fc4` Merge DRM changes from upstream v5.17..v5.18 Omitted-fix: b95dc06af3e6 drm/amdgpu: disable runpm if we are the primary adapter reverted later Omitted-fix: 5a90c24ad028 Revert "drm/amdgpu: disable runpm if we are the primary adapter" revert of above omitted fix Omitted-fix: 724bbe49c5e4 fs/ntfs3: provide block_invalidate_folio to fix memory leak unsupported fs Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com> Approved-by: John W. Linville <linville@redhat.com> Approved-by: Jiri Benc <jbenc@redhat.com> Approved-by: Jarod Wilson <jarod@redhat.com> Approved-by: Prarit Bhargava <prarit@redhat.com> Approved-by: Lyude Paul <lyude@redhat.com> Approved-by: Donald Dutile <ddutile@redhat.com> Approved-by: Rafael Aquini <aquini@redhat.com> Approved-by: Phil Auld <pauld@redhat.com> Approved-by: Waiman Long <longman@redhat.com> Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>	2022-10-23 19:49:41 +02:00
Chris von Recklinghausen	3a3ee2bede	vfs: keep inodes with page cache off the inode shrinker LRU Conflicts: mm/filemap.c - We already have 452e9e6992fe ("filemap: Add filemap_remove_folio and __filemap_remove_folio") so just add the spin_lock call. mm/truncate.c - The backport of 51dcbdac28d4 ("mm: Convert find_lock_entries() to use a folio_batch") listed the lack of this patch as a conflict. Keep the ''fbatch->nr = j;' line. mm/vmscan.c - We already have be7c07d60e13 ("mm/vmscan: Convert __remove_mapping() to take a folio") so change a couple of lines from 'if (!PageSwapCache(page))' to 'if (!folio_test_swapcache(folio))' Bugzilla: https://bugzilla.redhat.com/2120352 commit 51b8c1fe250d1bd70c1722dc3c414f5cff2d7cca Author: Johannes Weiner <hannes@cmpxchg.org> Date: Mon Nov 8 18:31:24 2021 -0800 vfs: keep inodes with page cache off the inode shrinker LRU Historically (pre-2.5), the inode shrinker used to reclaim only empty inodes and skip over those that still contained page cache. This caused problems on highmem hosts: struct inode could put fill lowmem zones before the cache was getting reclaimed in the highmem zones. To address this, the inode shrinker started to strip page cache to facilitate reclaiming lowmem. However, this comes with its own set of problems: the shrinkers may drop actively used page cache just because the inodes are not currently open or dirty - think working with a large git tree. It further doesn't respect cgroup memory protection settings and can cause priority inversions between containers. Nowadays, the page cache also holds non-resident info for evicted cache pages in order to detect refaults. We've come to rely heavily on this data inside reclaim for protecting the cache workingset and driving swap behavior. We also use it to quantify and report workload health through psi. The latter in turn is used for fleet health monitoring, as well as driving automated memory sizing of workloads and containers, proactive reclaim and memory offloading schemes. The consequences of dropping page cache prematurely is that we're seeing subtle and not-so-subtle failures in all of the above-mentioned scenarios, with the workload generally entering unexpected thrashing states while losing the ability to reliably detect it. To fix this on non-highmem systems at least, going back to rotating inodes on the LRU isn't feasible. We've tried (commit `a76cf1a474` ("mm: don't reclaim inodes with many attached pages")) and failed (commit `69056ee6a8` ("Revert "mm: don't reclaim inodes with many attached pages"")). The issue is mostly that shrinker pools attract pressure based on their size, and when objects get skipped the shrinkers remember this as deferred reclaim work. This accumulates excessive pressure on the remaining inodes, and we can quickly eat into heavily used ones, or dirty ones that require IO to reclaim, when there potentially is plenty of cold, clean cache around still. Instead, this patch keeps populated inodes off the inode LRU in the first place - just like an open file or dirty state would. An otherwise clean and unused inode then gets queued when the last cache entry disappears. This solves the problem without reintroducing the reclaim issues, and generally is a bit more scalable than having to wade through potentially hundreds of thousands of busy inodes. Locking is a bit tricky because the locks protecting the inode state (i_lock) and the inode LRU (lru_list.lock) don't nest inside the irq-safe page cache lock (i_pages.xa_lock). Page cache deletions are serialized through i_lock, taken before the i_pages lock, to make sure depopulated inodes are queued reliably. Additions may race with deletions, but we'll check again in the shrinker. If additions race with the shrinker itself, we're protected by the i_lock: if find_inode() or iput() win, the shrinker will bail on the elevated i_count or I_REFERENCED; if the shrinker wins and goes ahead with the inode, it will set I_FREEING and inhibit further igets(), which will cause the other side to create a new instance of the inode instead. Link: https://lkml.kernel.org/r/20210614211904.14420-4-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Roman Gushchin <guro@fb.com> Cc: Tejun Heo <tj@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:27:31 -04:00
Carlos Maiolino	06017b9485	fs: mark the iomap argument to __block_write_begin_int const Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2130933 __block_write_begin_int never modifies the passed in iomap, so mark it const. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> (cherry picked from commit 6d49cc8545e9e9e9e5a14e75fd044f049bd6077e) Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>	2022-10-03 15:41:31 +02:00
Jeff Moyer	1fbefb3b5d	io_uring: add support for IORING_OP_LINKAT Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2107656 commit cf30da90bc3a26911d369f199411f38b701394de Author: Dmitry Kadashev <dkadashev@gmail.com> Date: Thu Jul 8 13:34:47 2021 +0700 io_uring: add support for IORING_OP_LINKAT IORING_OP_LINKAT behaves like linkat(2) and takes the same flags and arguments. In some internal places 'hardlink' is used instead of 'link' to avoid confusion with the SQE links. Name 'link' conflicts with the existing 'link' member of io_kiocb. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Suggested-by: Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/io-uring/20210514145259.wtl4xcsp52woi6ab@wittgenstein/ Signed-off-by: Dmitry Kadashev <dkadashev@gmail.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/r/20210708063447.3556403-12-dkadashev@gmail.com [axboe: add splice_fd_in check] Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2022-07-15 14:58:36 -04:00
Jeff Moyer	be79cf9740	io_uring: add support for IORING_OP_SYMLINKAT Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2107656 commit 7a8721f84fcb3b2946a92380b6fc311e017ff02c Author: Dmitry Kadashev <dkadashev@gmail.com> Date: Thu Jul 8 13:34:46 2021 +0700 io_uring: add support for IORING_OP_SYMLINKAT IORING_OP_SYMLINKAT behaves like symlinkat(2) and takes the same flags and arguments. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Suggested-by: Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/io-uring/20210514145259.wtl4xcsp52woi6ab@wittgenstein/ Signed-off-by: Dmitry Kadashev <dkadashev@gmail.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/r/20210708063447.3556403-11-dkadashev@gmail.com [axboe: add splice_fd_in check] Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2022-07-15 14:58:36 -04:00
Jeff Moyer	0d9e019bc5	namei: update do_() helpers to return ints Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2107656 commit 45f30dab395730aa3b3da14d9f19ea0d7d43db53 Author: Dmitry Kadashev <dkadashev@gmail.com> Date: Thu Jul 8 13:34:44 2021 +0700 namei: update do_() helpers to return ints Update the following to return int rather than long, for uniformity with the rest of the do_* helpers in namei.c: * do_rmdir() * do_unlinkat() * do_mkdirat() * do_mknodat() * do_symlinkat() Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/io-uring/20210514143202.dmzfcgz5hnauy7ze@wittgenstein/ Signed-off-by: Dmitry Kadashev <dkadashev@gmail.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/r/20210708063447.3556403-9-dkadashev@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2022-07-15 14:58:36 -04:00
Jeff Moyer	5c9ba391d7	namei: make do_mkdirat() take struct filename Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2107656 commit 584d3226d665214dc1c498045c253529acdd3134 Author: Dmitry Kadashev <dkadashev@gmail.com> Date: Thu Jul 8 13:34:39 2021 +0700 namei: make do_mkdirat() take struct filename Pass in the struct filename pointers instead of the user string, and update the three callers to do the same. This is heavily based on commit dbea8d345177 ("fs: make do_renameat2() take struct filename"). This behaves like do_unlinkat() and do_renameat2(). Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Dmitry Kadashev <dkadashev@gmail.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/r/20210708063447.3556403-4-dkadashev@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>	2022-07-15 14:58:35 -04:00
Ming Lei	204b92f755	block: simplify the block device syncing code Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2018403 commit 1e03a36bdff4709c1bbf0f57f60ae3f776d51adf Author: Christoph Hellwig <hch@lst.de> Date: Tue Oct 19 08:25:30 2021 +0200 block: simplify the block device syncing code Get rid of the indirections and just provide a sync_bdevs helper for the generic sync code. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211019062530.2174626-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2021-12-06 16:45:28 +08:00
Ming Lei	93522f6059	block: remove __sync_blockdev Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2018403 commit 70164eb6ccb76ab679b016b4b60123bf4ec6c162 Author: Christoph Hellwig <hch@lst.de> Date: Tue Oct 19 08:25:25 2021 +0200 block: remove __sync_blockdev Instead offer a new sync_blockdev_nowait helper for the !wait case. This new helper is exported as it will grow modular callers in a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211019062530.2174626-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2021-12-06 16:45:28 +08:00
Ming Lei	f6352118db	block: move fs/block_dev.c to block/bdev.c Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2018403 commit 0dca4462ed0681649fdcd5700a6ddfbaa65fa178 Author: Christoph Hellwig <hch@lst.de> Date: Tue Sep 7 16:13:03 2021 +0200 block: move fs/block_dev.c to block/bdev.c Move it together with the rest of the block layer. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210907141303.1371844-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ming Lei <ming.lei@redhat.com>	2021-12-06 16:38:56 +08:00
Paul Gortmaker	1e7107c5ef	cgroup1: fix leaked context root causing sporadic NULL deref in LTP Richard reported sporadic (roughly one in 10 or so) null dereferences and other strange behaviour for a set of automated LTP tests. Things like: BUG: kernel NULL pointer dereference, address: 0000000000000008 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 1516 Comm: umount Not tainted 5.10.0-yocto-standard #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-48-gd9c812dda519-prebuilt.qemu.org 04/01/2014 RIP: 0010:kernfs_sop_show_path+0x1b/0x60 ...or these others: RIP: 0010:do_mkdirat+0x6a/0xf0 RIP: 0010:d_alloc_parallel+0x98/0x510 RIP: 0010:do_readlinkat+0x86/0x120 There were other less common instances of some kind of a general scribble but the common theme was mount and cgroup and a dubious dentry triggering the NULL dereference. I was only able to reproduce it under qemu by replicating Richard's setup as closely as possible - I never did get it to happen on bare metal, even while keeping everything else the same. In commit `71d883c37e` ("cgroup_do_mount(): massage calling conventions") we see this as a part of the overall change: -------------- struct cgroup_subsys ss; - struct dentry dentry; [...] - dentry = cgroup_do_mount(&cgroup_fs_type, fc->sb_flags, root, - CGROUP_SUPER_MAGIC, ns); [...] - if (percpu_ref_is_dying(&root->cgrp.self.refcnt)) { - struct super_block sb = dentry->d_sb; - dput(dentry); + ret = cgroup_do_mount(fc, CGROUP_SUPER_MAGIC, ns); + if (!ret && percpu_ref_is_dying(&root->cgrp.self.refcnt)) { + struct super_block sb = fc->root->d_sb; + dput(fc->root); deactivate_locked_super(sb); msleep(10); return restart_syscall(); } -------------- In changing from the local "*dentry" variable to using fc->root, we now export/leave that dentry pointer in the file context after doing the dput() in the unlikely "is_dying" case. With LTP doing a crazy amount of back to back mount/unmount [testcases/bin/cgroup_regression_5_1.sh] the unlikely becomes slightly likely and then bad things happen. A fix would be to not leave the stale reference in fc->root as follows: -------------- dput(fc->root); + fc->root = NULL; deactivate_locked_super(sb); -------------- ...but then we are just open-coding a duplicate of fc_drop_locked() so we simply use that instead. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Tejun Heo <tj@kernel.org> Cc: Zefan Li <lizefan.x@bytedance.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: stable@vger.kernel.org # v5.1+ Reported-by: Richard Purdie <richard.purdie@linuxfoundation.org> Fixes: `71d883c37e` ("cgroup_do_mount(): massage calling conventions") Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2021-07-21 06:39:20 -10:00
Al Viro	ffb37ca3bd	switch file_open_root() to struct path ... and provide file_open_root_mnt(), using the root of given mount. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2021-04-07 13:56:43 -04:00
Linus Torvalds	7d6beb71da	idmapped-mounts-v5.12 -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYCegywAKCRCRxhvAZXjc ouJ6AQDlf+7jCQlQdeKKoN9QDFfMzG1ooemat36EpRRTONaGuAD8D9A4sUsG4+5f 4IU5Lj9oY4DEmF8HenbWK2ZHsesL2Qg= =yPaw -----END PGP SIGNATURE----- Merge tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull idmapped mounts from Christian Brauner: "This introduces idmapped mounts which has been in the making for some time. Simply put, different mounts can expose the same file or directory with different ownership. This initial implementation comes with ports for fat, ext4 and with Christoph's port for xfs with more filesystems being actively worked on by independent people and maintainers. Idmapping mounts handle a wide range of long standing use-cases. Here are just a few: - Idmapped mounts make it possible to easily share files between multiple users or multiple machines especially in complex scenarios. For example, idmapped mounts will be used in the implementation of portable home directories in systemd-homed.service(8) where they allow users to move their home directory to an external storage device and use it on multiple computers where they are assigned different uids and gids. This effectively makes it possible to assign random uids and gids at login time. - It is possible to share files from the host with unprivileged containers without having to change ownership permanently through chown(2). - It is possible to idmap a container's rootfs and without having to mangle every file. For example, Chromebooks use it to share the user's Download folder with their unprivileged containers in their Linux subsystem. - It is possible to share files between containers with non-overlapping idmappings. - Filesystem that lack a proper concept of ownership such as fat can use idmapped mounts to implement discretionary access (DAC) permission checking. - They allow users to efficiently changing ownership on a per-mount basis without having to (recursively) chown(2) all files. In contrast to chown (2) changing ownership of large sets of files is instantenous with idmapped mounts. This is especially useful when ownership of a whole root filesystem of a virtual machine or container is changed. With idmapped mounts a single syscall mount_setattr syscall will be sufficient to change the ownership of all files. - Idmapped mounts always take the current ownership into account as idmappings specify what a given uid or gid is supposed to be mapped to. This contrasts with the chown(2) syscall which cannot by itself take the current ownership of the files it changes into account. It simply changes the ownership to the specified uid and gid. This is especially problematic when recursively chown(2)ing a large set of files which is commong with the aforementioned portable home directory and container and vm scenario. - Idmapped mounts allow to change ownership locally, restricting it to specific mounts, and temporarily as the ownership changes only apply as long as the mount exists. Several userspace projects have either already put up patches and pull-requests for this feature or will do so should you decide to pull this: - systemd: In a wide variety of scenarios but especially right away in their implementation of portable home directories. https://systemd.io/HOME_DIRECTORY/ - container runtimes: containerd, runC, LXD:To share data between host and unprivileged containers, unprivileged and privileged containers, etc. The pull request for idmapped mounts support in containerd, the default Kubernetes runtime is already up for quite a while now: https://github.com/containerd/containerd/pull/4734 - The virtio-fs developers and several users have expressed interest in using this feature with virtual machines once virtio-fs is ported. - ChromeOS: Sharing host-directories with unprivileged containers. I've tightly synced with all those projects and all of those listed here have also expressed their need/desire for this feature on the mailing list. For more info on how people use this there's a bunch of talks about this too. Here's just two recent ones: https://www.cncf.io/wp-content/uploads/2020/12/Rootless-Containers-in-Gitpod.pdf https://fosdem.org/2021/schedule/event/containers_idmap/ This comes with an extensive xfstests suite covering both ext4 and xfs: https://git.kernel.org/brauner/xfstests-dev/h/idmapped_mounts It covers truncation, creation, opening, xattrs, vfscaps, setid execution, setgid inheritance and more both with idmapped and non-idmapped mounts. It already helped to discover an unrelated xfs setgid inheritance bug which has since been fixed in mainline. It will be sent for inclusion with the xfstests project should you decide to merge this. In order to support per-mount idmappings vfsmounts are marked with user namespaces. The idmapping of the user namespace will be used to map the ids of vfs objects when they are accessed through that mount. By default all vfsmounts are marked with the initial user namespace. The initial user namespace is used to indicate that a mount is not idmapped. All operations behave as before and this is verified in the testsuite. Based on prior discussions we want to attach the whole user namespace and not just a dedicated idmapping struct. This allows us to reuse all the helpers that already exist for dealing with idmappings instead of introducing a whole new range of helpers. In addition, if we decide in the future that we are confident enough to enable unprivileged users to setup idmapped mounts the permission checking can take into account whether the caller is privileged in the user namespace the mount is currently marked with. The user namespace the mount will be marked with can be specified by passing a file descriptor refering to the user namespace as an argument to the new mount_setattr() syscall together with the new MOUNT_ATTR_IDMAP flag. The system call follows the openat2() pattern of extensibility. The following conditions must be met in order to create an idmapped mount: - The caller must currently have the CAP_SYS_ADMIN capability in the user namespace the underlying filesystem has been mounted in. - The underlying filesystem must support idmapped mounts. - The mount must not already be idmapped. This also implies that the idmapping of a mount cannot be altered once it has been idmapped. - The mount must be a detached/anonymous mount, i.e. it must have been created by calling open_tree() with the OPEN_TREE_CLONE flag and it must not already have been visible in the filesystem. The last two points guarantee easier semantics for userspace and the kernel and make the implementation significantly simpler. By default vfsmounts are marked with the initial user namespace and no behavioral or performance changes are observed. The manpage with a detailed description can be found here: `1d7b902e28` In order to support idmapped mounts, filesystems need to be changed and mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The patches to convert individual filesystem are not very large or complicated overall as can be seen from the included fat, ext4, and xfs ports. Patches for other filesystems are actively worked on and will be sent out separately. The xfstestsuite can be used to verify that port has been done correctly. The mount_setattr() syscall is motivated independent of the idmapped mounts patches and it's been around since July 2019. One of the most valuable features of the new mount api is the ability to perform mounts based on file descriptors only. Together with the lookup restrictions available in the openat2() RESOLVE_* flag namespace which we added in v5.6 this is the first time we are close to hardened and race-free (e.g. symlinks) mounting and path resolution. While userspace has started porting to the new mount api to mount proper filesystems and create new bind-mounts it is currently not possible to change mount options of an already existing bind mount in the new mount api since the mount_setattr() syscall is missing. With the addition of the mount_setattr() syscall we remove this last restriction and userspace can now fully port to the new mount api, covering every use-case the old mount api could. We also add the crucial ability to recursively change mount options for a whole mount tree, both removing and adding mount options at the same time. This syscall has been requested multiple times by various people and projects. There is a simple tool available at https://github.com/brauner/mount-idmapped that allows to create idmapped mounts so people can play with this patch series. I'll add support for the regular mount binary should you decide to pull this in the following weeks: Here's an example to a simple idmapped mount of another user's home directory: u1001@f2-vm:/$ sudo ./mount --idmap both:1000:1001:1 /home/ubuntu/ /mnt u1001@f2-vm:/$ ls -al /home/ubuntu/ total 28 drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 . drwxr-xr-x 4 root root 4096 Oct 28 04:00 .. -rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 ubuntu ubuntu 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 ubuntu ubuntu 807 Feb 25 2020 .profile -rw-r--r-- 1 ubuntu ubuntu 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ ls -al /mnt/ total 28 drwxr-xr-x 2 u1001 u1001 4096 Oct 28 22:07 . drwxr-xr-x 29 root root 4096 Oct 28 22:01 .. -rw------- 1 u1001 u1001 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 u1001 u1001 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 u1001 u1001 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 u1001 u1001 807 Feb 25 2020 .profile -rw-r--r-- 1 u1001 u1001 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 u1001 u1001 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ touch /mnt/my-file u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file u1001@f2-vm:/$ ls -al /mnt/my-file -rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file u1001@f2-vm:/$ ls -al /home/ubuntu/my-file -rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file u1001@f2-vm:/$ getfacl /mnt/my-file getfacl: Removing leading '/' from absolute path names # file: mnt/my-file # owner: u1001 # group: u1001 user::rw- user:u1001:rwx group::rw- mask::rwx other::r-- u1001@f2-vm:/$ getfacl /home/ubuntu/my-file getfacl: Removing leading '/' from absolute path names # file: home/ubuntu/my-file # owner: ubuntu # group: ubuntu user::rw- user:ubuntu:rwx group::rw- mask::rwx other::r--" * tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: (41 commits) xfs: remove the possibly unused mp variable in xfs_file_compat_ioctl xfs: support idmapped mounts ext4: support idmapped mounts fat: handle idmapped mounts tests: add mount_setattr() selftests fs: introduce MOUNT_ATTR_IDMAP fs: add mount_setattr() fs: add attr_flags_to_mnt_flags helper fs: split out functions to hold writers namespace: only take read lock in do_reconfigure_mnt() mount: make {lock,unlock}_mount_hash() static namespace: take lock_mount_hash() directly when changing flags nfs: do not export idmapped mounts overlayfs: do not mount on top of idmapped mounts ecryptfs: do not mount on top of idmapped mounts ima: handle idmapped mounts apparmor: handle idmapped mounts fs: make helpers idmap mount aware exec: handle idmapped mounts would_dump: handle idmapped mounts ...	2021-02-23 13:39:45 -08:00
Linus Torvalds	5bbb336ba7	for-5.12/io_uring-2021-02-17 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmAtYbYQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgppeWD/4xKhzBCGZWOkdycaaPhsUTOjNNIPmCBhlz QQj4KFSEuJNKACUg53Ak0oECJTaH5976kjKkKs7Z+hzmkEwboLBI4erkcT9MGC3M mPx349qBq9X3sYaFrUJF3h0sjRr+wa60nWQ01oVH8HkfI4bCNCHoqo5jDvMPWsYT ksFbUm8YWEZmi0K2yXFWXuJIN2bVBd72a8CrvtF3ksdEMYxbWWTOAcrhYJ4H5/U7 BQjWIxiIVsAoJohcXWq/Swh8cgvgb5uJVpNUU8VEFob/jI3Gc3YojIToISB6soUL DNhDJLeyZjuXfE1Ej+ySas9bpdG4LgxzsDBl9lFl9EQkSo1c3h/lEx85aeixAZla QfjTOVUabzdPzvZ9H1yDQISxjVLy2PotnhVMy/rSSrnDKlowtNB9iEzd6cpzFzxU fxomz1d6+w8rZY9jaRIAcMNa6bEOuYmcP9V8rIzGeg3Mm3jqL7H/JgJu5s2YbjpN InmTNu4cwLeTO65DzqVxF8UGbZ2tHbMm5pNeVBYxuY1adRgJFlIOP5kYlNlyiY+D Bt41CRuK3hqpYfXh7nSK8U4BKEhMikTCS0W4aKL5EzLZ20rxjgTlaHZiOBqd9vep 1tqNjPIvL2jWfF+5shwAZbupj3WKbuVqi4S2jXljv+Wkmk4ZVLSX3fQZv2I7JTHM I2qa59PB4A== =8MX/ -----END PGP SIGNATURE----- Merge tag 'for-5.12/io_uring-2021-02-17' of git://git.kernel.dk/linux-block Pull io_uring updates from Jens Axboe: "Highlights from this cycles are things like request recycling and task_work optimizations, which net us anywhere from 10-20% of speedups on workloads that mostly are inline. This work was originally done to put io_uring under memcg, which adds considerable overhead. But it's a really nice win as well. Also worth highlighting is the LOOKUP_CACHED work in the VFS, and using it in io_uring. Greatly speeds up the fast path for file opens. Summary: - Put io_uring under memcg protection. We accounted just the rings themselves under rlimit memlock before, now we account everything. - Request cache recycling, persistent across invocations (Pavel, me) - First part of a cleanup/improvement to buffer registration (Bijan) - SQPOLL fixes (Hao) - File registration NULL pointer fixup (Dan) - LOOKUP_CACHED support for io_uring - Disable /proc/thread-self/ for io_uring, like we do for /proc/self - Add Pavel to the io_uring MAINTAINERS entry - Tons of code cleanups and optimizations (Pavel) - Support for skip entries in file registration (Noah)" * tag 'for-5.12/io_uring-2021-02-17' of git://git.kernel.dk/linux-block: (103 commits) io_uring: tctx->task_lock should be IRQ safe proc: don't allow async path resolution of /proc/thread-self components io_uring: kill cached requests from exiting task closing the ring io_uring: add helper to free all request caches io_uring: allow task match to be passed to io_req_cache_free() io-wq: clear out worker ->fs and ->files io_uring: optimise io_init_req() flags setting io_uring: clean io_req_find_next() fast check io_uring: don't check PF_EXITING from syscall io_uring: don't split out consume out of SQE get io_uring: save ctx put/get for task_work submit io_uring: don't duplicate io_req_task_queue() io_uring: optimise SQPOLL mm/files grabbing io_uring: optimise out unlikely link queue io_uring: take compl state from submit state io_uring: inline io_complete_rw_common() io_uring: move res check out of io_rw_reissue() io_uring: simplify iopoll reissuing io_uring: clean up io_req_free_batch_finish() io_uring: move submit side state closer in the ring ...	2021-02-21 11:10:39 -08:00
Jens Axboe	53dec2ea74	fs: provide locked helper variant of close_fd_get_file() Assumes current->files->file_lock is already held on invocation. Helps the caller check the file before removing the fd, if it needs to. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-02-01 10:02:42 -07:00
Al Viro	b964bf53e5	teach sendfile(2) to handle send-to-pipe directly no point going through the intermediate pipe Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2021-01-25 23:29:36 -05:00
Christian Brauner	ba73d98745	namei: handle idmapped mounts in may_*() helpers The may_follow_link(), may_linkat(), may_lookup(), may_open(), may_o_create(), may_create_in_sticky(), may_delete(), and may_create() helpers determine whether the caller is privileged enough to perform the associated operations. Let them handle idmapped mounts by mapping the inode or fsids according to the mount's user namespace. Afterwards the checks are identical to non-idmapped inodes. The patch takes care to retrieve the mount's user namespace right before performing permission checks and passing it down into the fileystem so the user namespace can't change in between by someone idmapping a mount that is currently not idmapped. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: https://lore.kernel.org/r/20210121131959.646623-13-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>	2021-01-24 14:27:17 +01:00
Linus Torvalds	ac7ac4618c	for-5.11/block-2020-12-14 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl/Xec8QHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpoLbEACzXypgZWwMdfgRckA/Vt333rXHtbhUV+hK 2XP+P81iRvr9Esi31UPbRp82vrgcDO0cpI1QmQojS5U5TIQP88BfXptfRZZu48eb wT5RDDNQ34HItqAh/yEuYsv9yUKcxeIrB99tBVvM+4UmQg9zTdIW3mg6PvCBdbhV N38jI0tCF/PJatjfRuphT/nXonQLPWBlVDmZk06KZQFOwQe9ep1vUi1+nbiRPuo3 geFBpTh1Kp6Vl1B3n4RpECs6Y7I0RRuJdaH2sDizICla1/BW91F9fQwHimNnUxUq e1Q1kMuh6ftcQGkYlHSYcPhuv6CvorldTZCO5arPxWpcwvxriTSMRPWAgUr5pEiF fhiGhqeDu9e6vl9vS31wUD1B30hy+jFz9wyjRrDwJ3cPHH1JVBjTzvdX+cIh/1ku IbIwUMteUtvUrzqAv/DzbGhedp7xWtOFaVo8j0QFYh9zkjd6b8yDOF/yztwX2gjY Xt1cd+KpDSiN449ZRaoMI0sCJAxqzhMa6nsWlb0L7KuNyWKAbvKQBm9Rb47FLV9A Vx70KC+zkFoyw23capvIahmQazerriUJ5PGe0lVm6ROgmIFdCpXTPDjnrvq/6RZ/ GEpD7gTW9atGJ7EuEE8686sAfKD5kneChWLX5EHXf0d0AG5Mr2lKsluiGp5LpPJg Q1Xqs6xwww== =zo4w -----END PGP SIGNATURE----- Merge tag 'for-5.11/block-2020-12-14' of git://git.kernel.dk/linux-block Pull block updates from Jens Axboe: "Another series of killing more code than what is being added, again thanks to Christoph's relentless cleanups and tech debt tackling. This contains: - blk-iocost improvements (Baolin Wang) - part0 iostat fix (Jeffle Xu) - Disable iopoll for split bios (Jeffle Xu) - block tracepoint cleanups (Christoph Hellwig) - Merging of struct block_device and hd_struct (Christoph Hellwig) - Rework/cleanup of how block device sizes are updated (Christoph Hellwig) - Simplification of gendisk lookup and removal of block device aliasing (Christoph Hellwig) - Block device ioctl cleanups (Christoph Hellwig) - Removal of bdget()/blkdev_get() as exported API (Christoph Hellwig) - Disk change rework, avoid ->revalidate_disk() (Christoph Hellwig) - sbitmap improvements (Pavel Begunkov) - Hybrid polling fix (Pavel Begunkov) - bvec iteration improvements (Pavel Begunkov) - Zone revalidation fixes (Damien Le Moal) - blk-throttle limit fix (Yu Kuai) - Various little fixes" * tag 'for-5.11/block-2020-12-14' of git://git.kernel.dk/linux-block: (126 commits) blk-mq: fix msec comment from micro to milli seconds blk-mq: update arg in comment of blk_mq_map_queue blk-mq: add helper allocating tagset->tags Revert "block: Fix a lockdep complaint triggered by request queue flushing" nvme-loop: use blk_mq_hctx_set_fq_lock_class to set loop's lock class blk-mq: add new API of blk_mq_hctx_set_fq_lock_class block: disable iopoll for split bio block: Improve blk_revalidate_disk_zones() checks sbitmap: simplify wrap check sbitmap: replace CAS with atomic and sbitmap: remove swap_lock sbitmap: optimise sbitmap_deferred_clear() blk-mq: skip hybrid polling if iopoll doesn't spin blk-iocost: Factor out the base vrate change into a separate function blk-iocost: Factor out the active iocgs' state check into a separate function blk-iocost: Move the usage ratio calculation to the correct place blk-iocost: Remove unnecessary advance declaration blk-iocost: Fix some typos in comments blktrace: fix up a kerneldoc comment block: remove the request_queue to argument request based tracepoints ...	2020-12-16 12:57:51 -08:00
Jens Axboe	e886663cfd	fs: make do_renameat2() take struct filename Pass in the struct filename pointers instead of the user string, and update the three callers to do the same. This behaves like do_unlinkat(), which also takes a filename struct and puts it when it is done. Converting callers is then trivial. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-12-09 12:03:59 -07:00
Christoph Hellwig	4e7b5671c6	block: remove i_bdev Switch the block device lookup interfaces to directly work with a dev_t so that struct block_device references are only acquired by the blkdev_get variants (and the blk-cgroup special case). This means that we now don't need an extra reference in the inode and can generally simplify handling of struct block_device to keep the lookups contained in the core block layer code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Coly Li <colyli@suse.de> [bcache] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-12-01 14:53:39 -07:00
Christoph Hellwig	60b498852b	fs: remove get_super_thawed and get_super_exclusive_thawed Just open code the wait in the only caller of both functions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-12-01 14:53:38 -07:00
Christoph Hellwig	028abd9222	fs: remove compat_sys_mount compat_sys_mount is identical to the regular sys_mount now, so remove it and use the native version everywhere. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-09-22 23:45:57 -04:00
Linus Torvalds	e1ec517e18	Merge branch 'hch.init_path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull init and set_fs() cleanups from Al Viro: "Christoph's 'getting rid of ksys_...() uses under KERNEL_DS' series" * 'hch.init_path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (50 commits) init: add an init_dup helper init: add an init_utimes helper init: add an init_stat helper init: add an init_mknod helper init: add an init_mkdir helper init: add an init_symlink helper init: add an init_link helper init: add an init_eaccess helper init: add an init_chmod helper init: add an init_chown helper init: add an init_chroot helper init: add an init_chdir helper init: add an init_rmdir helper init: add an init_unlink helper init: add an init_umount helper init: add an init_mount helper init: mark create_dev as __init init: mark console_on_rootfs as __init init: initialize ramdisk_execute_command at compile time devtmpfs: refactor devtmpfsd() ...	2020-08-07 09:40:34 -07:00

1 2 3 4 5

235 Commits