JIRA: https://issues.redhat.com/browse/RHEL-77720
The last user of this flag was removed in commit b77b4a4815a9 ("gfs2:
Rework freeze / thaw logic").
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
(cherry picked from commit 0b93bac2271e11beb980fca037a34a9819c7dc37)
JIRA: https://issues.redhat.com/browse/RHEL-31991
Upstream status: Linus
commit d3c4bdcc756e60b95365c66ff58844ce75d1c8f8
Author: Jingbo Xu <jefflexu@linux.alibaba.com>
Date: Mon Mar 13 21:53:09 2023 +0800
erofs: set block size to the on-disk block size
Set the block size to that specified in on-disk superblock.
Also remove the hard constraint of PAGE_SIZE block size for the
uncompressed device backend. This constraint is temporarily remained
for compressed device and fscache backend, as there is more work needed
to handle the condition where the block size is not equal to PAGE_SIZE.
It is worth noting that the on-disk block size is read prior to
erofs_superblock_csum_verify(), as the read block size is needed in the
latter.
Besides, later we are going to make erofs refer to tar data blobs (which
is 512-byte aligned) for OCI containers, where the block size is 512
bytes. In this case, the 512-byte block size may not be adequate for a
directory to contain enough dirents. To fix this, we are also going to
introduce directory block size independent on the block size.
Due to we have already supported block size smaller than PAGE_SIZE now,
disable all these images with such separated directory block size until
we supported this feature later.
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20230313135309.75269-3-jefflexu@linux.alibaba.com
[ Gao Xiang: update documentation. ]
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Ian Kent <ikent@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-31991
Upstream status: Linus
commit b22c7b97189d461d7143052da83b36390c623b54
Author: Jingbo Xu <jefflexu@linux.alibaba.com>
Date: Thu Jan 12 14:54:30 2023 +0800
erofs: add documentation for 'domain_id' mount option
Since the EROFS share domain feature for fscache mode has been available
since Linux v6.1, let's add documentation for 'domain_id' mount option.
Cc: linux-doc@vger.kernel.org
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20230112065431.124926-2-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Ian Kent <ikent@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-31991
Upstream status: Linus
commit e6687b89225ee9c817e6dcbadc873f6a4691e5c2
Author: Jingbo Xu <jefflexu@linux.alibaba.com>
Date: Thu Dec 1 15:42:56 2022 +0800
erofs: enable large folios for fscache mode
Enable large folios for fscache mode. Enable this feature for
non-compressed format for now, until the compression part supports large
folios later.
One thing worth noting is that, the feature is not enabled for the meta
data routine since meta inodes don't need large folios for now, nor do
they support readahead yet.
Also document this new feature.
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Link: https://lore.kernel.org/r/20221201074256.16639-3-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Ian Kent <ikent@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-31991
Upstream status: Linus
commit 6e95d0a01899ed176b3450db057c3c0a9609cf47
Author: Gao Xiang <xiang@kernel.org>
Date: Fri May 27 15:01:33 2022 +0800
erofs: update documentation
- refine the filesystem overview for better description of recent
new features like FSDAX and Fscache;
- add the new `fsid' mount option;
- fix some typos.
Link: https://lore.kernel.org/r/20220527070133.77962-1-hsiangkao@linux.alibaba.com
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Ian Kent <ikent@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-31991
Upstream status: Linus
commit a1108dcd9373a98f7018aa4310076260b8ecfc0b
Author: David Anderson <dvander@google.com>
Date: Thu Mar 17 19:49:59 2022 +0800
erofs: rename ctime to mtime
EROFS images should inherit modification time rather than change time,
since users and host tooling have no easy way to control change time.
To reflect the new timestamp meaning, i_ctime and i_ctime_nsec are
renamed to i_mtime and i_mtime_nsec.
Link: https://lore.kernel.org/r/20220311041829.3109511-1-dvander@google.com # v1
Signed-off-by: David Anderson <dvander@google.com>
[ Gao Xiang: update document as well. ]
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20220317114959.106787-1-hsiangkao@linux.alibaba.com # v2
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Ian Kent <ikent@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-31991
Upstream status: Linus
Conflict: Include hunks dropped by CentOS Stream commits 7f25e45b02
and 20ce3bbee1. (upstream commits cd913c76f489 and de2051147771).
CentOS Stream commit 794cb59448 alters z_erofs_submit_queue() and
has been applied later upstream, make a best effort to follow
the requirements of upstream commit 07888c665b405 ("block: pass
a block_device and opf to bio_alloc") at around line 1305 in
fs/erofs/zdata.c
commit dfeab2e95a75a424adf39992ac62dcb9e9517d4a
Author: Gao Xiang <xiang@kernel.org>
Date: Thu Oct 14 16:10:10 2021 +0800
erofs: add multiple device support
In order to support multi-layer container images, add multiple
device feature to EROFS. Two ways are available to use for now:
- Devices can be mapped into 32-bit global block address space;
- Device ID can be specified with the chunk indexes format.
Note that it assumes no extent would cross device boundary and mkfs
should take care of it seriously.
In the future, a dedicated device manager could be introduced then
thus extra devices can be automatically scanned by UUID as well.
Link: https://lore.kernel.org/r/20211014081010.43485-1-hsiangkao@linux.alibaba.com
Reviewed-by: Chao Yu <chao@kernel.org>
Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Ian Kent <ikent@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-31991
Upstream status: Linus
Conflicts: The hunk that includes <linux/dax.h> has been removed because
it was added in CentOS Stream backport commit c70fa990f3.
Include hunks dropped by CentOS Stream commits 7f25e45b02,
f4673910d2 and 687ee3abd4 (upstream commits cd913c76f489d,
8012b86608552 and 7b0800d00dae8).
Update to accomodate existing CentOS Stream commit 930d4bbabf
("mm: remove enum page_entry_size").
commit 06252e9ce05b94b587e522667b85848a30197b15
Author: Gao Xiang <hsiangkao@linux.alibaba.com>
Date: 2021-08-05 08:36:00 +0800
erofs: dax support for non-tailpacking regular file
DAX is quite useful for some VM use cases in order to save guest
memory extremely with minimal lightweight EROFS.
In order to prepare for such use cases, add preliminary dax support
for non-tailpacking regular files for now.
Tested with the DRAM-emulated PMEM and the EROFS image generated by
"mkfs.erofs -Enoinline_data enwik9.fsdax.img enwik9"
Link: https://lore.kernel.org/r/20210805003601.183063-3-hsiangkao@linux.alibaba.com
Cc: nvdimm@lists.linux.dev
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Ian Kent <ikent@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-67675
The demote_ok glock operation is only still used to prevent the inode
glocks of the "jindex" and "rindex" directories from getting recycled
while they are still referenced by sdp->sd_jindex and sdp->sd_rindex.
However, the LRU walking code will no longer recycle glocks which are
referenced, so the demote_ok glock operation is obsolete and can be
removed.
Each of a glock's holders in the gl_holders list is holding a reference
on the glock, so when the list of holders isn't empty in demote_ok(),
the existing reference count check will already prevent the glock from
getting released. This means that demote_ok() is obsolete as well.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
(cherry picked from commit 713f8834389f4b34bc8b449412202543c8b32214)
JIRA: https://issues.redhat.com/browse/RHEL-67675
Rearrange the table of locking modes and associated caching capability
to be in order of increasing caching capability.
Update the description of the glock operations.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
(cherry picked from commit 97d6fdcd79752af8686ab58a0b9389ba80ae0fae)
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4324
JIRA: https://issues.redhat.com/browse/RHEL-33888
This MR back ports idmapping changes to sync. our RHEL-9 kernel with the
upstream kernel to version 6.3.
Our current kernel has idmapped mounts support but there have been many
changes since this initial implementation in the base kernel. In
particular we need the type safety changes and we have seen difficulty
back porting other requested changes on more than one occassion.
The Jira this MR has been raised for is arother example of such a request.
It is needed for a back port of a BPF feature to RHEL 9 which allows BPF
programs to do file verification with LSM and fsverity. To satisfy this
request changes made in the upstream 6.3 kernel are needed which is the
reason we have chosen upstream 6.3 as the target release for the MR.
The first fix has been omitted because it appears to be the same as
24b5308cf5ee ("selftests/filesystems: grant executable permission to
run_fat_tests.sh"). In any case the requirement is to make the path
tools/testing/selftests/filesystems/fat/run_fat_tests.sh executable which
is done.
The second and third Omitted patches are a straight apply and revert leaving
the source unchanged.
Omitted-Fix: 1d4beeb4edc7 ("selftests/filesystems: grant executable permission to run_fat_tests.sh")
Omitted-Fix: 4a47c6385bb4 ovl: turn of SB_POSIXACL with idmapped layers temporarily
Omitted-Fix: 7c4d37c269ac Revert "ovl: turn of SB_POSIXACL with idmapped layers temporarily"
Signed-off-by: Ian Kent <ikent@redhat.com>
Approved-by: Scott Mayhew <smayhew@redhat.com>
Approved-by: Chris von Recklinghausen <crecklin@redhat.com>
Approved-by: Xin Long <lxin@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>
Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5252
JIRA: https://issues.redhat.com/browse/RHEL-27743
JIRA: https://issues.redhat.com/browse/RHEL-59459
CVE: CVE-2024-46787
Depends: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4961
This MR brings RHEL9 core MM code up to upstream's v6.6 LTS level.
This work follows up on the previous v6.5 update (RHEL-27742) and as such,
the bulk of this changeset is comprised of refactoring and clean-ups of
the internal implementation of several APIs as it further advances the
conversion to FOLIOS, and follow up on the per-VMA locking changes.
Also, with the rebase to v6.6 LTS, we complete the infrastructure to allow
Control-flow Enforcement Technology, a.k.a. Shadow Stacks, for x86 builds,
and we add a potential extra level of protection (assessment pending) to help
on mitigating kernel heap exploits dubbed as "SlubStick".
Follow-up fixes are omitted from this series either because they are irrelevant to
the bits we support on RHEL or because they depend on bigger changesets introduced
upstream more recently. A follow-up ticket (RHEL-27745) will deal with these and other cases separately.
Omitted-fix: e540b8c5da04 ("mips: mm: add slab availability checking in ioremap_prot")
Omitted-fix: f7875966dc0c ("tools headers UAPI: Sync files changed by new fchmodat2 and map_shadow_stack syscalls with the kernel sources")
Omitted-fix: df39038cd895 ("s390/mm: Fix VM_FAULT_HWPOISON handling in do_exception()")
Omitted-fix: 12bbaae7635a ("mm: create FOLIO_FLAG_FALSE and FOLIO_TYPE_OPS macros")
Omitted-fix: fd1a745ce03e ("mm: support page_mapcount() on page_has_type() pages")
Omitted-fix: d99e3140a4d3 ("mm: turn folio_test_hugetlb into a PageType")
Omitted-fix: fa2690af573d ("mm: page_ref: remove folio_try_get_rcu()")
Omitted-fix: f442fa614137 ("mm: gup: stop abusing try_grab_folio")
Omitted-fix: cb0f01beb166 ("mm/mprotect: fix dax pud handling")
Signed-off-by: Rafael Aquini <raquini@redhat.com>
Approved-by: John W. Linville <linville@redhat.com>
Approved-by: Mark Salter <msalter@redhat.com>
Approved-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Approved-by: Chris von Recklinghausen <crecklin@redhat.com>
Approved-by: Steve Best <sbest@redhat.com>
Approved-by: David Airlie <airlied@redhat.com>
Approved-by: Michal Schmidt <mschmidt@redhat.com>
Approved-by: Baoquan He <5820488-baoquan_he@users.noreply.gitlab.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>
Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33888
Upstream status: Linus
commit e538b0985a05cfe245ada0bb92f177efec6b8a88
Author: Eric Biggers <ebiggers@google.com>
Date: Thu Jul 1 23:53:50 2021 -0700
fscrypt: remove mention of symlink st_size quirk from documentation
Now that the correct st_size is reported for encrypted symlinks on all
filesystems, update the documentation accordingly.
Link: https://lore.kernel.org/r/20210702065350.209646-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ian Kent <ikent@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus
Conflicts: The cifs source has been moved in CentOS Stream so manually
apply rejected hunk to fs/smb/client/xattr.c.
Dropped hunks for ntfs3 because the source is not present in
the CentOS Stream source tree.
CentOS Stream commit 98ba731fc7 ("ovl: Move xattr support
to new xattrs.c file") moved ovl_own_xattr_set(), manually apply
changes.
CentOS Stream commit 67e2fcb2f3 ("evm: don't copy up
'security.evm' xattr") is present causing hunk #1 against
include/linux/evm.h to be rejected, manually apply.
Upstream commit 5d1ef2ce13a90 ("ima: Introduce
ima_get_current_hash_algo()") is not present in CentOS Stream
which causes fuzz 1 for hunk #1 against include/linux/ima.h.
There's a reject of hunk #1 for include/linux/lsm_hooks.h but
I can't see any reason for it, manually applied the hunk.
CentOS Stream does not have upstream commit ce5bb5a86e5eb
("ima: Return int in the functions to measure a buffer") which
results in a reject of hunk #2 against security/integrity/ima/ima.h
and hunks #8 and #11 against security/integrity/ima/ima_main.c, so
manually apply hunks. There also appears to be a whitespace
mismatch causing hunk #7 to report fuzz 2 on application.
CentOS Stream does not have upstream commit c7423dbdbc9ec
("ima: Handle -ESTALE returned by ima_filter_rule_match()")
which results in a reject of hunk #3 against
security/integrity/ima/ima_policy.c, so manually apply hunk.
commit 39f60c1ccee72caa0104145b5dbf5d37cce1ea39
Author: Christian Brauner <brauner@kernel.org>
Date: Fri Jan 13 12:49:23 2023 +0100
fs: port xattr to mnt_idmap
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Ian Kent <ikent@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus
Conflicts: For consistency drop btrfs hunks because it isn't supported in
CentOS Stream and other backports also drop such hunks.
CentOS Stream commit 48fa94aacd ("ceph: fscrypt_auth handling
for ceph") is presnt which causes fuzz 2 in hunk #1 in
fs/ceph/super.h.
Upstream commit 427505ffeaa46 ("exportfs: use pr_debug for
unreachable debug statements") is not present causing fuzz 2
in hunk #1 against fs/exportfs/expfs.c.
Dropped hunks for ksmbd because the source is not present in the
CentOS Stream source tree.
Upstream commit 03fa86e9f79d8 ("namei: stash the sampled ->d_seq
into nameidata") is not present causing a fuzz 1 for hunk #14
against fs/namei.c.
CentOS Stream c4f3dd0731 ("nfsd: handle failure to collect
pre/post-op attrs more sanely") is present and causes a rejects
for hunks #4 and #5 against fs/nfsd/vfs.c, apply manually.
Dropped hunks for ntfs3 because the source is not present in the
CentOS Stream source tree.
CentOS Stream commit 98ba731fc7 ("ovl: Move xattr support to
new xattrs.c file") moves ovl_xattr_set() and ovl_xattr_get()
from fs/overlayfs/inode.c to fs/overlayfs/xattrs.c which causes
hunks #4 and #5 to fail, manually apply to fs/overlayfs/xattrs.c.
CentOS Stream commit 55177e4b83 ("ovl: mark xwhiteouts directory
with overlay.opaque='x'") and commit d17b324bb6 ("ovl: use
ovl_numlower() and ovl_lowerstack() accessors") change the first
and third hunks of fs/overlayfs/namei.c causing them to fail,
manually apply.
CentOS Stream commit 98ba731fc7 ("ovl: Move xattr support to
new xattrs.c file") causes fuzz 2 in hunk #5 of
fs/overlayfs/overlayfs.h
CentOS Stream commit 355a9c490a ("ovl: Add an alternative
type of whiteout") changes ovl_cache_update_ino() to
ovl_cache_update() in fs/overlayfs/readdir.c, make the change
manually.
Upstream commit 217af7e2f4deb ("apparmor: refactor profile
rules and attachments") is not in CentOS Stream causing hunk #1
to fail to apply so manually apply the change.
commit 4609e1f18e19c3b302e1eb4858334bca1532f780
Author: Christian Brauner <brauner@kernel.org>
Date: Fri Jan 13 12:49:22 2023 +0100
fs: port ->permission() to pass mnt_idmap
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Ian Kent <ikent@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus
Conflicts: For consistency drop btrfs hunks because it isn't supported in
CentOS Stream and other backports also drop such hunks.
commit 8782a9aea3ab4d697ad67d1f8ebca38a4e1c24ab
Author: Christian Brauner <brauner@kernel.org>
Date: Fri Jan 13 12:49:21 2023 +0100
fs: port ->fileattr_set() to pass mnt_idmap
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Ian Kent <ikent@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus
Conflicts: For consistency drop btrfs hunks because it isn't supported in
CentOS Stream and other backports also drop such hunks.
The cifs source has been moved in CentOS Stream so manually
apply rejected hunks to fs/smb/client/cifsacl.c and
fs/smb/client/cifsproto.h.
Dropped hunks for ntfs3 and ksmbd because the source is not
present in the CentOS Stream source tree.
CentOS Stream commit 892da692fa ("shmem: support idmapped
mounts for tmpfs") is present, which cuases hunk #1 against
mm/shmem.c to be rejected, manually apply the hunk.
CentOS Stream commit 48fa94aacd ("ceph: fscrypt_auth handling
for ceph") is present which causes fuzz 1 of hunk #1 against
fs/ceph/inode.c.
commit 13e83a4923bea7c4f2f6714030cb7e56d20ef7e5
Author: Christian Brauner <brauner@kernel.org>
Date: Fri Jan 13 12:49:20 2023 +0100
fs: port ->set_acl() to pass mnt_idmap
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Ian Kent <ikent@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus
Conflicts: The cifs source has been moved in CentOS Stream so manually
apply rejected hunks to fs/smb/client/cifsacl.c and
fs/smb/client/cifsproto.h.
Upstream merge commit 05e6295f7b5e0 ("Merge tag 'fs.idmapped.v6.3'
of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping")
has changes already applied in Upstream commit facd61053cff1 (
"fuse: fixes after adapting to new posix acl api") so just apply
the additional changes (those that relate to the ->get_act() port).
CentOS Stream commit 98ba731fc7 ("ovl: Move xattr support
to new xattrs.c file") resulted in fuzz for its hunk against
fs/overlayfs/overlayfs.h.
commit 77435322777d8a8a08264a39111bef94e32b871b
Author: Christian Brauner <brauner@kernel.org>
Date: Fri Jan 13 12:49:19 2023 +0100
fs: port ->get_acl() to pass mnt_idmap
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Ian Kent <ikent@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus
Conflicts: For consistency drop btrfs hunks because it isn't supported in
CentOS Stream and other backports also drop such hunks.
Upstream commit 863f144f12add ("vfs: open inside ->tmpfile()") is
not present which caused a reject in fs/f2fs/namei.c for hunk #1,
applied manually.
The hunk of the patch against fs/minix/namei.c was rejected but I
can't see any reason for it, applied manually.
CentOS Stream has commit 9e0a1fff8d ("ubifs: Implement
RENAME_WHITEOUT") which caused a reject in the hunk against
fs/ubifs/dir.c, manually applied.
commit 011e2b717b1b921d3706a9d48ff83a025563e826
Author: Christian Brauner <brauner@kernel.org>
Date: Fri Jan 13 12:49:18 2023 +0100
fs: port ->tmpfile() to pass mnt_idmap
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Ian Kent <ikent@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus
Conflicts: For consistency drop btrfs hunks because it isn't supported in
CentOS Stream and other backports also drop such hunks.
The cifs source has been moved in CentOS Stream so manually
apply rejected hunks to fs/smb/client/cifsfs.h and
fs/smb/client/dir.c.
Dropped hunks for ntfs3 because the source is not present in the
CentOS Stream source tree.
CentOS Stream commit 892da692fa ("shmem: support idmapped
mounts for tmpfs") is present, which cuases hunks #2-#4 to be
rejected, manually apply the hunks.
CentOS Stream commit f0f830cd7e ("ceph: create symlinks with
encrypted and base64-encoded targets") is present and resulted
in fuzz against fs/ceph/dir.c hunk #2.
Upstream commit 863f144f12add ("vfs: open inside ->tmpfile()")
is missing causing fuzz against fs/ext2/namei.c.
Upstream commit 7d37539037c2f ("fuse: implement ->tmpfile()")
is missing causing fuzz in hunk #4 against fs/fuse/dir.c.
CentOS Stream commit 892da692fa ("shmem: support idmapped
mounts for tmpfs") is present, so a patch reorder was needed
with appropriate adjustments.
commit 5ebb29bee8d5fc173b774e0755be8cb335503ee3
Author: Christian Brauner <brauner@kernel.org>
Date: Fri Jan 13 12:49:16 2023 +0100
fs: port ->mknod() to pass mnt_idmap
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Ian Kent <ikent@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus
Conflicts: For consistency drop btrfs hunks because it isn't supported in
CentOS Stream and other backports also drop such hunks.
The cifs source has been moved in CentOS Stream so manually
apply rejected hunks to fs/smb/client/cifsfs.h and
fs/smb/client/inode.c.
Dropped hunks for ntfs3 because the source is not present in the
CentOS Stream source tree.
Upstream commit cc14d24026704 ("hpfs: Convert symlinks to
read_folio") is not present which causes fuzz 1 for hunk #1.
CentOS Stream commit 892da692fa ("shmem: support idmapped
mounts for tmpfs") is present, so a patch reorder was needed
with appropriate adjustments.
commit e18275ae55e07a2937e48134589c2f4c1d99a369
Author: Christian Brauner <brauner@kernel.org>
Date: Fri Jan 13 12:49:17 2023 +0100
fs: port ->rename() to pass mnt_idmap
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Ian Kent <ikent@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus
Conflicts: For consistency drop btrfs hunks because it isn't supported in
CentOS Stream and other backports also drop such hunks.
The cifs source has been moved in CentOS Stream so manually
apply rejected hunks to fs/smb/client/cifsfs.h and
fs/smb/client/inode.c.
Dropped hunks for ntfs3 because the source is not present in the
CentOS Stream source tree.
commit c54bd91e9eaba43f09aadc25b52ea869ff3b5587
Author: Christian Brauner <brauner@kernel.org>
Date: Fri Jan 13 12:49:15 2023 +0100
fs: port ->mkdir() to pass mnt_idmap
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Ian Kent <ikent@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus
Conflicts: The cifs source has been moved in CentOS Stream so manually
apply rejected hunks to fs/smb/client/cifsfs.h and
fs/smb/client/link.c.
Dropped hunks for ntfs3 because the source is not present in the
CentOS Stream source tree.
CentOS Stream commit f0f830cd7e ("ceph: create symlinks with
encrypted and base64-encoded targets") is present and resulted
in fuzz against fs/ceph/dir.c.
commit 7a77db95511c39be4b2db2ceca152ef589adc2dc
Author: Christian Brauner <brauner@kernel.org>
Date: Fri Jan 13 12:49:14 2023 +0100
fs: port ->symlink() to pass mnt_idmap
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Ian Kent <ikent@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus
Conflicts: For consistency drop btrfs hunks because it isn't supported in
CentOS Stream and other backports also drop such hunks.
The cifs source has been moved in CentOS Stream so manually
apply rejected hunks to fs/smb/client/cifsfs.h and
fs/smb/client/dir.c.
Dropped hunks for ntfs3 because the source is not present in the
CentOS Stream source tree.
CentOS Stream commit 892da692fa ("shmem: support idmapped
mounts for tmpfs") is present, which cuases fuzz in mm/shmem.c.
commit 6c960e68aaed335a0040f16654f3c5e5bfcf9249
Author: Christian Brauner <brauner@kernel.org>
Date: Fri Jan 13 12:49:13 2023 +0100
fs: port ->create() to pass mnt_idmap
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Ian Kent <ikent@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus
Conflicts: CentOS Stream has commit 3e0b6f1fa9 ("afs: use
read_seqbegin() in afs_check_validity() and afs_getattr()"),
manually apply hunk #2 to fs/afs/inode.c.
CentOS Stream commit 3b06927229 {"afs: split
afs_pagecache_valid() out of afs_validate()") is present which
causes a reject in fs/afs/internal.h, manually apply hunk to
fs/afs/internal.h.
For consistency drop btrfs hunks because it isn't supported in
CentOS Stream and other backports also drop such hunks.
CentOS Stream commit 48fa94aacd ("ceph: fscrypt_auth handling
for ceph") alters the definition of _ceph_setattr() causing fuzz.
The cifs source has been moved in CentOS Stream so manually
apply rejected hunks to fs/smb/client/cifsfs.h and
fs/smb/client/inode.c.
Upstream commit 2e1d66379e ("staging: erofs: drop the extern
prefix for function definitions") caused strange behaviour when
applying this patch, there was a conflict in fs/erofs/internal.h but
after a refresh the hunk and context looked ok. The hunk had to be
manually applied.
Upstream commit 2db0487faa211 ("f2fs: move f2fs_force_buffered_io()
into file.c") is not present in CentOS Stream which causes fuzz
when applying the first hunk to fs/f2fs/file.c.
Upstream commit 30abce053f811 ("fat: report creation time in statx")
is not present in CentOS Stream which caused a reject so apply change
manually.
Dropped hunks for ksmbd because the source is not present in the
CentOS Stream source tree.
Dropped hunks for ntfs3 because the source is not present in the
CentOS Stream source tree.
There was fuzz with hunk #2 against fs/nfs/inode.c but I was
unable to see any difference.
CentOS Stream commit 98ba731fc7 ("ovl: Move xattr support
to new xattrs.c file") is present which caused fuzz in
fs/overlayfs/overlayfs.h.
Upstream commit d919a1e79bac8 ("proc: fix a dentry lock race
between release_task and lookup") is not present in CentOS
Stream causing fuzz applying hunk #1 against fs/proc/base.c.
CentOS Stream commit 20c470188c ("vfs: plumb i_version
handling into struct kstat") is present causing fuzz in hunk
#2 against fs/stat.c.
Upstream commit e0c49bd2b4d3c ("fs: sysv: Fix sysv_nblocks()
returns wrong value") is not present in CentOS Stream causing
fuzz applying hunk#1 against fs/sysv/itree.c.
CentOS Stream commit 892da692fa ("shmem: support idmapped
mounts for tmpfs") is present so it's ok to pass idmap to
generic_fillattr().
CentOS Stream commit f0f830cd7e {"ceph: create symlinks
with encrypted and base64-encoded targets") uses the old
struct user_namespace and so leaves those changes out, make
those getattr() changes here.
Allow for CentOS Stream commit 6c3396a0d8 ("kernfs: Introduce
separate rwsem to protect inode attributes") which is already
present.
CentOS Stream commit f5219db0c0 ("KVM: fix Add KVM_CREATE_GUEST_MEMFD
ioctl() for guest-specific backing memory") updated the upstream commit
a7800aa80ea4d ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific
backing memory") to account for missing idmapping commits. Now we have
updated the second and final place these changes were made make the final
needed adjustment to match the original upstream patch.
commit b74d24f7a74ffd2d42ca883d84b7422b8d545901
Author: Christian Brauner <brauner@kernel.org>
Date: Fri Jan 13 12:49:12 2023 +0100
fs: port ->getattr() to pass mnt_idmap
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Ian Kent <ikent@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus
commit f1f1f2569901ec5b9d425f2e91c09a0e320768f3
Author: Ivan Babrou <ivan@cloudflare.com>
Date: Thu Sep 22 15:40:26 2022 -0700
proc: report open files as size in stat() for /proc/pid/fd
Many monitoring tools include open file count as a metric. Currently the
only way to get this number is to enumerate the files in /proc/pid/fd.
The problem with the current approach is that it does many things people
generally don't care about when they need one number for a metric. In our
tests for cadvisor, which reports open file counts per cgroup, we observed
that reading the number of open files is slow. Out of 35.23% of CPU time
spent in `proc_readfd_common`, we see 29.43% spent in `proc_fill_cache`,
which is responsible for filling dentry info. Some of this extra time is
spinlock contention, but it's a contention for the lock we don't want to
take to begin with.
We considered putting the number of open files in /proc/pid/status.
Unfortunately, counting the number of fds involves iterating the
open_files bitmap, which has a linear complexity in proportion with the
number of open files (bitmap slots really, but it's close). We don't want
to make /proc/pid/status any slower, so instead we put this info in
/proc/pid/fd as a size member of the stat syscall result. Previously the
reported number was zero, so there's very little risk of breaking
anything, while still providing a somewhat logical way to count the open
files with a fallback if it's zero.
RFC for this patch included iterating open fds under RCU. Thanks to Frank
Hofmann for the suggestion to use the bitmap instead.
Previously:
```
$ sudo stat /proc/1/fd | head -n2
File: /proc/1/fd
Size: 0 Blocks: 0 IO Block: 1024 directory
```
With this patch:
```
$ sudo stat /proc/1/fd | head -n2
File: /proc/1/fd
Size: 65 Blocks: 0 IO Block: 1024 directory
```
Correctness check:
```
$ sudo ls /proc/1/fd | wc -l
65
```
I added the docs for /proc/<pid>/fd while I'm at it.
[ivan@cloudflare.com: use bitmap_weight() to count the bits]
Link: https://lkml.kernel.org/r/20221018045844.37697-1-ivan@cloudflare.com
[akpm@linux-foundation.org: include linux/bitmap.h for bitmap_weight()]
[ivan@cloudflare.com: return errno from proc_fd_getattr() instead of setting negative size]
Link: https://lkml.kernel.org/r/20221024173140.30673-1-ivan@cloudflare.com
Link: https://lkml.kernel.org/r/20220922224027.59266-1-ivan@cloudflare.com
Signed-off-by: Ivan Babrou <ivan@cloudflare.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Ivan Babrou <ivan@cloudflare.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ian Kent <ikent@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus
Conflicts: CentOS Stream commit 3c29fadfb1 ("afs: split
afs_pagecache_valid() out of afs_validate()") is present, manually
adjust hunk #1 of fs/afs/internal.h.
For consistency drop btrfs hunks because it isn't supported in
CentOS Stream and other backports also drop such hunks.
CentOS Stream commit 48fa94aacd ("ceph: fscrypt_auth handling
for ceph") alters the definition of _ceph_setattr(), adjust
manually.
CentOS Stream commit 34b2a2b5a3 {"ceph: add some fscrypt
guardrails") introduces a call to fscrypt_prepare_setattr() which
causes fuzz when applying.
The cifs source has been moved in CentOS Stream so manually
apply rejected hunks to fs/smb/client/cifsfs.h and
fs/smb/client/inode.c.
Upstream commit 5a646fb3a3e2d ("coda: avoid doing bad things on
inode type changes during revalidation") is not present which
causes fuzz in fs/coda/coda_linux.h.
Dropped hunks for ntfs3 because the source is not present in
the CentOS Stream source tree.
CentOS Stream commit 98ba731fc7 ("ovl: Move xattr support
to new xattrs.c file") is presnt so manually apply hunk.
CentOS Stream commit 892da692fa ("shmem: support idmapped
mounts for tmpfs") is present so it's ok to pass idmap to
setattr_prepare() and setattr_copy().
Update to add incremental changes needed due to CentOS Stream
commit 469e1d13f6 ("shmem: quota support").
Allow for CentOS Stream commit 6c3396a0d8 ("kernfs: Introduce
separate rwsem to protect inode attributes") which is already
present.
CentOS Stream commit f5219db0c0 ("KVM: fix Add KVM_CREATE_GUEST_MEMFD
ioctl() for guest-specific backing memory") updated the upstream commit
a7800aa80ea4d ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific
backing memory") to account for missing idmapping commits. Now we have
updated one of the two places these changes were made make one of the
needed adjustments to match the original upstream patch.
commit c1632a0f11209338fc300c66252bcc4686e609e8
Author: Christian Brauner <brauner@kernel.org>
Date: Fri Jan 13 12:49:11 2023 +0100
fs: port ->setattr() to pass mnt_idmap
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Ian Kent <ikent@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus
commit 7420332a6ff407ba2d3d25f5e8430bf426131d1d
Author: Christian Brauner <brauner@kernel.org>
Date: Thu Sep 22 17:17:01 2022 +0200
fs: add new get acl method
The current way of setting and getting posix acls through the generic
xattr interface is error prone and type unsafe. The vfs needs to
interpret and fixup posix acls before storing or reporting it to
userspace. Various hacks exist to make this work. The code is hard to
understand and difficult to maintain in it's current form. Instead of
making this work by hacking posix acls through xattr handlers we are
building a dedicated posix acl api around the get and set inode
operations. This removes a lot of hackiness and makes the codepaths
easier to maintain. A lot of background can be found in [1].
Since some filesystem rely on the dentry being available to them when
setting posix acls (e.g., 9p and cifs) they cannot rely on the old get
acl inode operation to retrieve posix acl and need to implement their
own custom handlers because of that.
In a previous patch we renamed the old get acl inode operation to
->get_inode_acl(). We decided to rename it and implement a new one since
->get_inode_acl() is called generic_permission() and inode_permission()
both of which can be called during an filesystem's ->permission()
handler. So simply passing a dentry argument to ->get_acl() would have
amounted to also having to pass a dentry argument to ->permission(). We
avoided that change.
This adds a new ->get_acl() inode operations which takes a dentry
argument which filesystems such as 9p, cifs, and overlayfs can implement
to get posix acls.
Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1]
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Ian Kent <ikent@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus
Conflicts: Upstream commit eadcd6b5a1eb3 ("erofs: add fiemap support
with iomap") is not (yet) present in CentOS Stream.
The changes for fs/ksmbd/* were dropped as the directory doesn't
exist in CentOS Stream.
The changes for fs/ntfs3/* were dropped as the directory doesn't
exist in CentOS Stream.
commit cac2f8b8d8b50ef32b3e34f6dcbbf08937e4f616
Author: Christian Brauner <brauner@kernel.org>
Date: Thu Sep 22 17:17:00 2022 +0200
fs: rename current get acl method
The current way of setting and getting posix acls through the generic
xattr interface is error prone and type unsafe. The vfs needs to
interpret and fixup posix acls before storing or reporting it to
userspace. Various hacks exist to make this work. The code is hard to
understand and difficult to maintain in it's current form. Instead of
making this work by hacking posix acls through xattr handlers we are
building a dedicated posix acl api around the get and set inode
operations. This removes a lot of hackiness and makes the codepaths
easier to maintain. A lot of background can be found in [1].
The current inode operation for getting posix acls takes an inode
argument but various filesystems (e.g., 9p, cifs, overlayfs) need access
to the dentry. In contrast to the ->set_acl() inode operation we cannot
simply extend ->get_acl() to take a dentry argument. The ->get_acl()
inode operation is called from:
acl_permission_check()
-> check_acl()
-> get_acl()
which is part of generic_permission() which in turn is part of
inode_permission(). Both generic_permission() and inode_permission() are
called in the ->permission() handler of various filesystems (e.g.,
overlayfs). So simply passing a dentry argument to ->get_acl() would
amount to also having to pass a dentry argument to ->permission(). We
should avoid this unnecessary change.
So instead of extending the existing inode operation rename it from
->get_acl() to ->get_inode_acl() and add a ->get_acl() method later that
passes a dentry argument and which filesystems that need access to the
dentry can implement instead of ->get_inode_acl(). Filesystems like cifs
which allow setting and getting posix acls but not using them for
permission checking during lookup can simply not implement
->get_inode_acl().
This is intended to be a non-functional change.
Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1]
Suggested-by/Inspired-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Ian Kent <ikent@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus
Conflicts: I didn't want to just drop the btrfs hunks so I made the
change to btrfs_setattr() init_user_ns instead of the expected
mnt_userns. That should at least cause a conflict if btrfs changes
to a supported fs in the future.
CentOS Stream commit 48fa94aacd ("ceph: fscrypt_auth handling for
ceph") is present, make necessary adjustment.
CentOS Stream commit 892da692fa ("shmem: support idmapped mounts
for tmpfs") is present, make necessary adjustment.
The changes for fs/ksmbd/* were dropped as the directory doesn't
exist in CentOS Stream.
The changes for fs/ntfs3/* were dropped as the directory doesn't
exist in CentOS Stream.
commit 138060ba92b3b0d77c8e6818d0f33398b23ea42e
Author: Christian Brauner <brauner@kernel.org>
Date: Fri Sep 23 10:29:39 2022 +0200
fs: pass dentry to set acl method
The current way of setting and getting posix acls through the generic
xattr interface is error prone and type unsafe. The vfs needs to
interpret and fixup posix acls before storing or reporting it to
userspace. Various hacks exist to make this work. The code is hard to
understand and difficult to maintain in it's current form. Instead of
making this work by hacking posix acls through xattr handlers we are
building a dedicated posix acl api around the get and set inode
operations. This removes a lot of hackiness and makes the codepaths
easier to maintain. A lot of background can be found in [1].
Since some filesystem rely on the dentry being available to them when
setting posix acls (e.g., 9p and cifs) they cannot rely on set acl inode
operation. But since ->set_acl() is required in order to use the generic
posix acl xattr handlers filesystems that do not implement this inode
operation cannot use the handler and need to implement their own
dedicated posix acl handlers.
Update the ->set_acl() inode method to take a dentry argument. This
allows all filesystems to rely on ->set_acl().
As far as I can tell all codepaths can be switched to rely on the dentry
instead of just the inode. Note that the original motivation for passing
the dentry separate from the inode instead of just the dentry in the
xattr handlers was because of security modules that call
security_d_instantiate(). This hook is called during
d_instantiate_new(), d_add(), __d_instantiate_anon(), and
d_splice_alias() to initialize the inode's security context and possibly
to set security.* xattrs. Since this only affects security.* xattrs this
is completely irrelevant for posix acls.
Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1]
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Ian Kent <ikent@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus
Conflicts: CentOS Stream commit d592b7f96f ("9p: fix a bunch of
checkpatch warnings") removes extern from declarations in
fs/9p/acl.h.
CentOS Stream commit 98ba731fc7 ("ovl: Move xattr support to
new xattrs.c file") moved the declaration of ovl_xattr_get() and
ovl_listxattr() to fs/overlayfs/xattr.c.
CentOS Stream commit fdb679f7a3 ("xfs: improve __xfs_set_acl")
changes the declarations in fs/xfs/xfs_acl.h.
commit 0cad6246621b5887d5b33fea84219d2a71f2f99a
Author: Miklos Szeredi <mszeredi@redhat.com>
Date: Wed Aug 18 22:08:24 2021 +0200
vfs: add rcu argument to ->get_acl() callback
Add a rcu argument to the ->get_acl() callback to allow
get_cached_acl_rcu() to call the ->get_acl() method in the next patch.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Ian Kent <ikent@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus
commit ccbd0c991985afc53670e2b01840517922fc30e4
Author: Rodrigo Campos <rodrigo@sdfg.com.ar>
Date: Fri Apr 29 15:57:48 2022 +0200
docs: Add small intro to idmap examples
When reading the documentation, I didn't understand why this list
examples of things that fail without using the mount idmap feature.
It seems pretty pointless and I doubted if I was missing something,
until I finished the examples, the next section and saw the examples
revisited. After that, it all made sense.
Let's add one small sentence before, so the reader knows where this is
going and why examples that don't might seem relevant are used.
Link: https://lore.kernel.org/r/20220429135748.481301-1-rodrigo@sdfg.com.ar
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Rodrigo Campos <rodrigo@sdfg.com.ar>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Ian Kent <ikent@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus
commit ad19607a90b29eef044660aba92a2a2d63b1e977
Author: Christian Brauner <brauner@kernel.org>
Date: Tue Jul 27 12:44:16 2021 +0200
doc: give a more thorough id handling explanation
Currently there's no document explaining how idmappings work at all.
Add a document that gives an introduction and also goes into a bit more
detail for more advanced use-cases.
Link: https://lore.kernel.org/r/20210727104416.828293-1-brauner@kernel.org
Cc: Seth Forshee <seth.forshee@digitalocean.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Ian Kent <ikent@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27743
This patch is a backport of the following upstream commit:
commit 40d49a3c9e4a0e5cf7a6fcebc8d4d7d63d1f3f1b
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date: Fri Aug 18 21:23:34 2023 +0100
mm: allow ->huge_fault() to be called without the mmap_lock held
Remove the checks for the VMA lock being held, allowing the page fault
path to call into the filesystem instead of retrying with the mmap_lock
held. This will improve scalability for DAX page faults. Also update the
documentation to match (and fix some other changes that have happened
recently).
Link: https://lkml.kernel.org/r/20230818202335.2739663-3-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rafael Aquini <raquini@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27743
This patch is a backport of the following upstream commit:
commit 3bd786f76de2e01745f462844fd1a206052ee8b8
Author: Yin Fengwei <fengwei.yin@intel.com>
Date: Wed Aug 2 16:14:04 2023 +0100
mm: convert do_set_pte() to set_pte_range()
set_pte_range() allows to setup page table entries for a specific
range. It takes advantage of batched rmap update for large folio.
It now takes care of calling update_mmu_cache_range().
Link: https://lkml.kernel.org/r/20230802151406.3735276-37-willy@infradead.org
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rafael Aquini <raquini@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27743
This patch is a backport of the following upstream commit:
commit 54007f818206dc27309ca423df4c87dd160a7208
Author: Yu-cheng Yu <yu-cheng.yu@intel.com>
Date: Mon Jun 12 17:10:40 2023 -0700
mm: Introduce VM_SHADOW_STACK for shadow stack memory
New hardware extensions implement support for shadow stack memory, such
as x86 Control-flow Enforcement Technology (CET). Add a new VM flag to
identify these areas, for example, to be used to properly indicate shadow
stack PTEs to the hardware.
Shadow stack VMA creation will be tightly controlled and limited to
anonymous memory to make the implementation simpler and since that is all
that is required. The solution will rely on pte_mkwrite() to create the
shadow stack PTEs, so it will not be required for vm_get_page_prot() to
learn how to create shadow stack memory. For this reason document that
VM_SHADOW_STACK should not be mixed with VM_SHARED.
Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Tested-by: Mark Brown <broonie@kernel.org>
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/all/20230613001108.3040476-15-rick.p.edgecombe%40intel.com
Signed-off-by: Rafael Aquini <raquini@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-54186
Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
commit 212c5c078d83d780cf2873ca931df135771e8bb7
Author: Pasha Tatashin <pasha.tatashin@soleen.com>
Date: Sat Apr 13 00:25:22 2024 +0000
iommu: account IOMMU allocated memory
In order to be able to limit the amount of memory that is allocated
by IOMMU subsystem, the memory must be accounted.
Account IOMMU as part of the secondary pagetables as it was discussed
at LPC.
The value of SecPageTables now contains mmeory allocation by IOMMU
and KVM.
There is a difference between GFP_ACCOUNT and what NR_IOMMU_PAGES shows.
GFP_ACCOUNT is set only where it makes sense to charge to user
processes, i.e. IOMMU Page Tables, but there more IOMMU shared data
that should not really be charged to a specific process.
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: David Rientjes <rientjes@google.com>
Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/r/20240413002522.1101315-12-pasha.tatashin@soleen.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
(cherry picked from commit 212c5c078d83d780cf2873ca931df135771e8bb7)
Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4749
JIRA: https://issues.redhat.com/browse/RHEL-48221
It was identified that our process to bring in code-base updates
has been unwittingly missing some of the peripheric commits not
touching directly the core code under mm/ the directory.
While most of these identified peripheric commits are simple
and basic clean-ups, some are relevant changesets that might end
up causing real(and subtle) issues for RHEL deployments if they
remain missing.
The intent of this patchset is to close the aforementioned GAP
by bringing in the missing peripheric commits from v5.14 up to
v6.4, which is the level we're parking our codebase for RHEL-9.5.
A secondary intent of this patchset is to bring in upstream's
v6.5 commit that disables the PER_VMA_LOCK feature which was
recently introduced (to RHEL-9.5) but was marked BROKEN upstream
circa release v6.5, in order to avoid the reported issues with
memory corruptions in the upstream builds.
Signed-off-by: Rafael Aquini <aquini@redhat.com>
Approved-by: Mark Langsdorf <mlangsdo@redhat.com>
Approved-by: Waiman Long <longman@redhat.com>
Approved-by: David Arcari <darcari@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>
Merged-by: Lucas Zampieri <lzampier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-7768
Tested: with xfstests
Allow system administrator to set default global quota limits at tmpfs
mount time.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-Id: <20230725144510.253763-7-cem@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
(cherry picked from commit de4c0e7ca8b526a82ff7e5ee5533787bb6d01724)
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-7768
Tested: with xfstests
Conflicts:
- We have no support for iattr vfs{g,u}ids, so open core calls
to i_{ug}id_needs_update()
- We don't have support for struct idmap, so use user_namespace
struct wherever appropriate.
Now the basic infra-structure is in place, enable quota support for tmpfs.
This offers user and group quotas to tmpfs (project quotas will be added
later). Also, as other filesystems, the tmpfs quota is not supported
within user namespaces yet, so idmapping is not translated.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-Id: <20230725144510.253763-6-cem@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
(cherry picked from commit e09764cff44b5d31c2ca5477444565e3080637d2)
JIRA: https://issues.redhat.com/browse/RHEL-48221
This patch is a backport of the following upstream commit:
commit bd23024b9774e681cbe6cc3afcb24244dfcb2390
Author: Tomas Mudrunka <tomas.mudrunka@gmail.com>
Date: Tue Mar 21 11:34:30 2023 +0100
mm/memtest: add results of early memtest to /proc/meminfo
Currently the memtest results were only presented in dmesg.
When running a large fleet of devices without ECC RAM it's currently not
easy to do bulk monitoring for memory corruption. You have to parse
dmesg, but that's a ring buffer so the error might disappear after some
time. In general I do not consider dmesg to be a great API to query RAM
status.
In several companies I've seen such errors remain undetected and cause
issues for way too long. So I think it makes sense to provide a
monitoring API, so that we can safely detect and act upon them.
This adds /proc/meminfo entry which can be easily used by scripts.
Link: https://lkml.kernel.org/r/20230321103430.7130-1-tomas.mudrunka@gmail.com
Signed-off-by: Tomas Mudrunka <tomas.mudrunka@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rafael Aquini <aquini@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-34875
commit 2dd10de8e6bcbacf85ad758b904543c294820c63
Author: Alexander Aring <aahringo@redhat.com>
Date: Tue Sep 12 17:53:18 2023 -0400
lockd: introduce safe async lock op
This patch reverts mostly commit 40595cdc93ed ("nfs: block notification
on fs with its own ->lock") and introduces an EXPORT_OP_ASYNC_LOCK
export flag to signal that the "own ->lock" implementation supports
async lock requests. The only main user is DLM that is used by GFS2 and
OCFS2 filesystem. Those implement their own lock() implementation and
return FILE_LOCK_DEFERRED as return value. Since commit 40595cdc93ed
("nfs: block notification on fs with its own ->lock") the DLM
implementation were never updated. This patch should prepare for DLM
to set the EXPORT_OP_ASYNC_LOCK export flag and update the DLM
plock implementation regarding to it.
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-34875
commit b38a6023da6a12b561f0421c6a5a1f7624a1529c
Author: Chuck Lever <chuck.lever@oracle.com>
Date: Fri Aug 25 15:04:23 2023 -0400
Documentation: Add missing documentation for EXPORT_OP flags
The commits that introduced these flags neglected to update the
Documentation/filesystems/nfs/exporting.rst file.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
commit 253e5df8b8f0145adb090f57c6f4e6efa52d738e
Author: Hugh Dickins <hughd@google.com>
Date: Sun Jul 23 13:55:00 2023 -0700
tmpfs: fix Documentation of noswap and huge mount options
The noswap mount option is surely not one of the three options for sizing:
move its description down.
The huge= mount option does not accept numeric values: those are just in
an internal enum. Delete those numbers, and follow the manpage text more
closely (but there's not yet any fadvise() or fcntl() which applies here).
/sys/kernel/mm/transparent_hugepage/shmem_enabled is hard to describe, and
barely relevant to mounting a tmpfs: just refer to transhuge.rst (while
still using the words deny and force, to help as informal reminders).
[rdunlap@infradead.org: fixup Docs table for huge mount options]
Link: https://lkml.kernel.org/r/20230725052333.26857-1-rdunlap@infradead.org
Link: https://lkml.kernel.org/r/986cb0bf-9780-354-9bb-4bf57aadbab@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Fixes: d0f5a85442d1 ("shmem: update documentation")
Fixes: 2c6efe9cf2d7 ("shmem: add support to ignore swap")
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
JIRA: https://issues.redhat.com/browse/RHEL-5619
Signed-off-by: Nico Pache <npache@redhat.com>