Commit Graph

4 Commits

Author SHA1 Message Date
Chris von Recklinghausen 56644a77a7 signal: Replace force_fatal_sig with force_exit_sig when in doubt
Conflicts: drop changes to arch/m68k/kernel/traps.c,
	arch/sparc/kernel/signal_32.c, arch/sparc/kernel/windows.c -
		unsupported arches

Bugzilla: https://bugzilla.redhat.com/2120352

commit fcb116bc43c8c37c052530ead79872f8b2615711
Author: Eric W. Biederman <ebiederm@xmission.com>
Date:   Thu Nov 18 14:23:21 2021 -0600

    signal: Replace force_fatal_sig with force_exit_sig when in doubt

    Recently to prevent issues with SECCOMP_RET_KILL and similar signals
    being changed before they are delivered SA_IMMUTABLE was added.

    Unfortunately this broke debuggers[1][2] which reasonably expect
    to be able to trap synchronous SIGTRAP and SIGSEGV even when
    the target process is not configured to handle those signals.

    Add force_exit_sig and use it instead of force_fatal_sig where
    historically the code has directly called do_exit.  This has the
    implementation benefits of going through the signal exit path
    (including generating core dumps) without the danger of allowing
    userspace to ignore or change these signals.

    This avoids userspace regressions as older kernels exited with do_exit
    which debuggers also can not intercept.

    In the future is should be possible to improve the quality of
    implementation of the kernel by changing some of these force_exit_sig
    calls to force_fatal_sig.  That can be done where it matters on
    a case-by-case basis with careful analysis.

    Reported-by: Kyle Huey <me@kylehuey.com>
    Reported-by: kernel test robot <oliver.sang@intel.com>
    [1] https://lkml.kernel.org/r/CAP045AoMY4xf8aC_4QU_-j7obuEPYgTcnQQP3Yxk=2X90
jtpjw@mail.gmail.com
    [2] https://lkml.kernel.org/r/20211117150258.GB5403@xsang-OptiPlex-9020
    Fixes: 00b06da29cf9 ("signal: Add SA_IMMUTABLE to ensure forced siganls do n
ot get changed")
    Fixes: a3616a3c0272 ("signal/m68k: Use force_sigsegv(SIGSEGV) in fpsp040_die
")
    Fixes: 83a1f27ad773 ("signal/powerpc: On swapcontext failure force SIGSEGV")
    Fixes: 9bc508cf0791 ("signal/s390: Use force_sigsegv in default_trap_handler
")
    Fixes: 086ec444f866 ("signal/sparc32: In setup_rt_frame and setup_fram use f
orce_fatal_sig")
    Fixes: c317d306d550 ("signal/sparc32: Exit with a fatal signal when try_to_c
lear_window_buffer fails")
    Fixes: 695dd0d634df ("signal/x86: In emulate_vsyscall force a signal instead
 of calling do_exit")
    Fixes: 1fbd60df8a85 ("signal/vm86_32: Properly send SIGSEGV when the vm86 st
ate cannot be saved.")
    Fixes: 941edc5bf174 ("exit/syscall_user_dispatch: Send ordinary signals on f
ailure")
    Link: https://lkml.kernel.org/r/871r3dqfv8.fsf_-_@email.froward.int.ebiederm
.org
    Reviewed-by: Kees Cook <keescook@chromium.org>
    Tested-by: Kees Cook <keescook@chromium.org>
    Tested-by: Kyle Huey <khuey@kylehuey.com>
    Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2022-10-12 07:27:31 -04:00
Chris von Recklinghausen 9aa718d1c7 exit/syscall_user_dispatch: Send ordinary signals on failure
Bugzilla: https://bugzilla.redhat.com/2120352

commit 941edc5bf174b67f94db19817cbeab0a93e0c32a
Author: Eric W. Biederman <ebiederm@xmission.com>
Date:   Wed Oct 20 12:44:00 2021 -0500

    exit/syscall_user_dispatch: Send ordinary signals on failure

    Use force_fatal_sig instead of calling do_exit directly.  This ensures
    the ordinary signal handling path gets invoked, core dumps as
    appropriate get created, and for multi-threaded processes all of the
    threads are terminated not just a single thread.

    When asked Gabriel Krisman Bertazi <krisman@collabora.com> said [1]:
    > ebiederm@xmission.com (Eric W. Biederman) asked:
    >
    > > Why does do_syscal_user_dispatch call do_exit(SIGSEGV) and
    > > do_exit(SIGSYS) instead of force_sig(SIGSEGV) and force_sig(SIGSYS)?
    > >
    > > Looking at the code these cases are not expected to happen, so I would
    > > be surprised if userspace depends on any particular behaviour on the
    > > failure path so I think we can change this.
    >
    > Hi Eric,
    >
    > There is not really a good reason, and the use case that originated the
    > feature doesn't rely on it.
    >
    > Unless I'm missing yet another problem and others correct me, I think
    > it makes sense to change it as you described.
    >
    > > Is using do_exit in this way something you copied from seccomp?
    >
    > I'm not sure, its been a while, but I think it might be just that.  The
    > first prototype of SUD was implemented as a seccomp mode.

    If at some point it becomes interesting we could relax
    "force_fatal_sig(SIGSEGV)" to instead say
    "force_sig_fault(SIGSEGV, SEGV_MAPERR, sd->selector)".

    I avoid doing that in this patch to avoid making it possible
    to catch currently uncatchable signals.

    Cc: Gabriel Krisman Bertazi <krisman@collabora.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Andy Lutomirski <luto@kernel.org>
    [1] https://lkml.kernel.org/r/87mtr6gdvi.fsf@collabora.com
    Link: https://lkml.kernel.org/r/20211020174406.17889-14-ebiederm@xmission.com
    Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2022-10-12 07:27:26 -04:00
Gabriel Krisman Bertazi 36a6c843fd entry: Use different define for selector variable in SUD
Michael Kerrisk suggested that, from an API perspective, it is a bad
idea to share the PR_SYS_DISPATCH_ defines between the prctl operation
and the selector variable.

Therefore, define two new constants to be used by SUD's selector variable
and update the corresponding documentation and test cases.

While this changes the API syscall user dispatch has never been part of a
Linux release, it will show up for the first time in 5.11.

Suggested-by: Michael Kerrisk (man-pages) <mtk.manpages@gmail.com>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210205184321.2062251-1-krisman@collabora.com
2021-02-06 00:21:42 +01:00
Gabriel Krisman Bertazi 1446e1df9e kernel: Implement selective syscall userspace redirection
Introduce a mechanism to quickly disable/enable syscall handling for a
specific process and redirect to userspace via SIGSYS.  This is useful
for processes with parts that require syscall redirection and parts that
don't, but who need to perform this boundary crossing really fast,
without paying the cost of a system call to reconfigure syscall handling
on each boundary transition.  This is particularly important for Windows
games running over Wine.

The proposed interface looks like this:

  prctl(PR_SET_SYSCALL_USER_DISPATCH, <op>, <off>, <length>, [selector])

The range [<offset>,<offset>+<length>) is a part of the process memory
map that is allowed to by-pass the redirection code and dispatch
syscalls directly, such that in fast paths a process doesn't need to
disable the trap nor the kernel has to check the selector.  This is
essential to return from SIGSYS to a blocked area without triggering
another SIGSYS from rt_sigreturn.

selector is an optional pointer to a char-sized userspace memory region
that has a key switch for the mechanism. This key switch is set to
either PR_SYS_DISPATCH_ON, PR_SYS_DISPATCH_OFF to enable and disable the
redirection without calling the kernel.

The feature is meant to be set per-thread and it is disabled on
fork/clone/execv.

Internally, this doesn't add overhead to the syscall hot path, and it
requires very little per-architecture support.  I avoided using seccomp,
even though it duplicates some functionality, due to previous feedback
that maybe it shouldn't mix with seccomp since it is not a security
mechanism.  And obviously, this should never be considered a security
mechanism, since any part of the program can by-pass it by using the
syscall dispatcher.

For the sysinfo benchmark, which measures the overhead added to
executing a native syscall that doesn't require interception, the
overhead using only the direct dispatcher region to issue syscalls is
pretty much irrelevant.  The overhead of using the selector goes around
40ns for a native (unredirected) syscall in my system, and it is (as
expected) dominated by the supervisor-mode user-address access.  In
fact, with SMAP off, the overhead is consistently less than 5ns on my
test box.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20201127193238.821364-4-krisman@collabora.com
2020-12-02 15:07:56 +01:00