linux-kernelorg-stable/kernel
Lorenzo Stoakes e8e17ee90e mm: drop the assumption that VM_SHARED always implies writable
Patch series "permit write-sealed memfd read-only shared mappings", v4.

The man page for fcntl() describing memfd file seals states the following
about F_SEAL_WRITE:-

    Furthermore, trying to create new shared, writable memory-mappings via
    mmap(2) will also fail with EPERM.

With emphasis on 'writable'.  In turns out in fact that currently the
kernel simply disallows all new shared memory mappings for a memfd with
F_SEAL_WRITE applied, rendering this documentation inaccurate.

This matters because users are therefore unable to obtain a shared mapping
to a memfd after write sealing altogether, which limits their usefulness. 
This was reported in the discussion thread [1] originating from a bug
report [2].

This is a product of both using the struct address_space->i_mmap_writable
atomic counter to determine whether writing may be permitted, and the
kernel adjusting this counter when any VM_SHARED mapping is performed and
more generally implicitly assuming VM_SHARED implies writable.

It seems sensible that we should only update this mapping if VM_MAYWRITE
is specified, i.e.  whether it is possible that this mapping could at any
point be written to.

If we do so then all we need to do to permit write seals to function as
documented is to clear VM_MAYWRITE when mapping read-only.  It turns out
this functionality already exists for F_SEAL_FUTURE_WRITE - we can
therefore simply adapt this logic to do the same for F_SEAL_WRITE.

We then hit a chicken and egg situation in mmap_region() where the check
for VM_MAYWRITE occurs before we are able to clear this flag.  To work
around this, perform this check after we invoke call_mmap(), with careful
consideration of error paths.

Thanks to Andy Lutomirski for the suggestion!

[1]:https://lore.kernel.org/all/20230324133646.16101dfa666f253c4715d965@linux-foundation.org/
[2]:https://bugzilla.kernel.org/show_bug.cgi?id=217238


This patch (of 3):

There is a general assumption that VMAs with the VM_SHARED flag set are
writable.  If the VM_MAYWRITE flag is not set, then this is simply not the
case.

Update those checks which affect the struct address_space->i_mmap_writable
field to explicitly test for this by introducing
[vma_]is_shared_maywrite() helper functions.

This remains entirely conservative, as the lack of VM_MAYWRITE guarantees
that the VMA cannot be written to.

Link: https://lkml.kernel.org/r/cover.1697116581.git.lstoakes@gmail.com
Link: https://lkml.kernel.org/r/d978aefefa83ec42d18dfa964ad180dbcde34795.1697116581.git.lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18 14:34:19 -07:00
..
bpf Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf 2023-09-16 11:16:00 +01:00
cgroup hugetlb: memcg: account hugetlb-backed memory in memory controller 2023-10-18 14:34:17 -07:00
configs Kbuild updates for v6.6 2023-09-05 11:01:47 -07:00
debug printk changes for 6.6 2023-09-04 13:20:19 -07:00
dma swiotlb: fix the check whether a device has used software IO TLB 2023-09-27 11:19:15 +02:00
entry entry: Remove empty addr_limit_user_check() 2023-08-23 10:32:39 +02:00
events mm/gup: adapt get_user_page_vma_remote() to never return NULL 2023-10-18 14:34:15 -07:00
futex
gcov gcov: shut up missing prototype warnings for internal stubs 2023-08-18 10:18:58 -07:00
irq Boring updates for the interrupt subsystem: 2023-08-28 14:33:11 -07:00
kcsan mm: delete checks for xor_unlock_is_negative_byte() 2023-10-18 14:34:17 -07:00
livepatch
locking - An extensive rework of kexec and crash Kconfig from Eric DeVolder 2023-08-29 14:53:51 -07:00
module module/decompress: use vmalloc() for zstd decompression workspace 2023-08-29 09:39:08 -07:00
power PM: hibernate: Fix the exclusive get block device in test_resume mode 2023-09-12 11:45:15 +02:00
printk Revert "printk: export symbols for debug modules" 2023-09-07 14:19:42 +02:00
rcu rcu: dynamically allocate the rcu-kfree shrinker 2023-10-04 10:32:24 -07:00
sched sched: remove wait bookmarks 2023-10-18 14:34:18 -07:00
time Fix false positive "softirq work is pending" messages on -rt 2023-09-02 09:01:48 -07:00
trace tracing/user_events: Align set_bit() address for all archs 2023-09-30 16:25:41 -04:00
.gitignore
Kconfig.freezer
Kconfig.hz
Kconfig.kexec crash: hotplug support for kexec_load() 2023-08-24 16:25:14 -07:00
Kconfig.locks
Kconfig.preempt
Makefile
acct.c audit/stable-6.6 PR 20230829 2023-08-30 08:17:35 -07:00
async.c
audit.c
audit.h
audit_fsnotify.c
audit_tree.c
audit_watch.c
auditfilter.c
auditsc.c Including fixes from netfilter and bpf. 2023-09-07 18:33:07 -07:00
backtracetest.c
bounds.c
capability.c
cfi.c
compat.c
configs.c
context_tracking.c
cpu.c cpu/hotplug: Prevent self deadlock on CPU hot-unplug 2023-08-30 12:24:22 +02:00
cpu_pm.c
crash_core.c Crash: add lock to serialize crash hotplug handling 2023-09-29 17:20:48 -07:00
crash_dump.c
cred.c cred: convert printks to pr_<level> 2023-08-18 10:18:49 -07:00
delayacct.c
dma.c
exec_domain.c
exit.c mm: remove remnants of SPLIT_RSS_COUNTING 2023-10-04 10:32:20 -07:00
extable.c
fail_function.c
fork.c mm: drop the assumption that VM_SHARED always implies writable 2023-10-18 14:34:19 -07:00
freezer.c
gen_kheaders.sh
groups.c
hung_task.c
iomem.c kernel/iomem.c: remove __weak ioremap_cache helper 2023-08-21 13:37:28 -07:00
irq_work.c
jump_label.c
kallsyms.c kallsyms: Change func signature for cleanup_symbol_name() 2023-08-25 15:00:36 -07:00
kallsyms_internal.h
kallsyms_selftest.c Modules changes for v6.6-rc1 2023-08-29 17:32:32 -07:00
kallsyms_selftest.h
kcmp.c
kcov.c
kexec.c crash: hotplug support for kexec_load() 2023-08-24 16:25:14 -07:00
kexec_core.c crash: add generic infrastructure for crash hotplug support 2023-08-24 16:25:13 -07:00
kexec_elf.c
kexec_file.c integrity-v6.6 2023-08-30 09:16:56 -07:00
kexec_internal.h
kheaders.c
kprobes.c kernel: kprobes: Use struct_size() 2023-08-23 09:38:17 +09:00
ksyms_common.c
ksysfs.c crash: hotplug support for kexec_load() 2023-08-24 16:25:14 -07:00
kthread.c mm: remove remnants of SPLIT_RSS_COUNTING 2023-10-04 10:32:20 -07:00
latencytop.c
module_signature.c
notifier.c
nsproxy.c nsproxy: Convert nsproxy.count to refcount_t 2023-08-21 11:29:12 -07:00
padata.c
panic.c panic: Reenable preemption in WARN slowpath 2023-09-15 11:28:08 +02:00
params.c
pid.c pidfd: prevent a kernel-doc warning 2023-09-19 13:21:33 -07:00
pid_namespace.c memfd: replace ratcheting feature from vm.memfd_noexec with hierarchy 2023-08-21 13:37:59 -07:00
pid_sysctl.h memfd: replace ratcheting feature from vm.memfd_noexec with hierarchy 2023-08-21 13:37:59 -07:00
profile.c
ptrace.c mm: make __access_remote_vm() static 2023-10-18 14:34:15 -07:00
range.c
reboot.c
regset.c
relay.c kernel: relay: remove unnecessary NULL values from relay_open_buf 2023-08-18 10:18:55 -07:00
resource.c
resource_kunit.c
rseq.c
scftorture.c
scs.c
seccomp.c
signal.c signal: print comm and exe name on fatal signals 2023-08-18 10:18:50 -07:00
smp.c
smpboot.c
smpboot.h
softirq.c
stackleak.c
stacktrace.c
static_call.c
static_call_inline.c
stop_machine.c
sys.c mm: add a NO_INHERIT flag to the PR_SET_MDWE prctl 2023-10-06 14:44:11 -07:00
sys_ni.c
sysctl-test.c
sysctl.c
task_work.c task_work: add kerneldoc annotation for 'data' argument 2023-09-19 13:21:32 -07:00
taskstats.c
torture.c
tracepoint.c
tsacct.c
ucount.c
uid16.c
uid16.h
umh.c
up.c
user-return-notifier.c
user.c
user_namespace.c
usermode_driver.c
utsname.c
utsname_sysctl.c
vhost_task.c
watch_queue.c
watchdog.c watchdog/hardlockup: avoid large stack frames in watchdog_hardlockup_check() 2023-08-18 10:19:00 -07:00
watchdog_buddy.c
watchdog_perf.c
workqueue.c workqueue: Fix missed pwq_release_worker creation in wq_cpu_intensive_thresh_init() 2023-09-18 08:50:31 -10:00
workqueue_internal.h