Commit Graph

11 Commits

Author SHA1 Message Date
Julia Denham f95c973004 entry/kvm: Exit to user mode when TIF_NOTIFY_SIGNAL is set
JIRA: https://issues.redhat.com/browse/RHEL-257

commit 3e684903a8574ffc9475fdf13c4780a7adb506ad
Author: Seth Forshee <sforshee@digitalocean.com>
Date:   Wed May 4 13:08:40 2022 -0500

entry/kvm: Exit to user mode when TIF_NOTIFY_SIGNAL is set

A livepatch transition may stall indefinitely when a kvm vCPU is heavily
loaded. To the host, the vCPU task is a user thread which is spending a
very long time in the ioctl(KVM_RUN) syscall. During livepatch
transition, set_notify_signal() will be called on such tasks to
interrupt the syscall so that the task can be transitioned. This
interrupts guest execution, but when xfer_to_guest_mode_work() sees that
TIF_NOTIFY_SIGNAL is set but not TIF_SIGPENDING it concludes that an
exit to user mode is unnecessary, and guest execution is resumed without
transitioning the task for the livepatch.

This handling of TIF_NOTIFY_SIGNAL is incorrect, as set_notify_signal()
is expected to break tasks out of interruptible kernel loops and cause
them to return to userspace. Change xfer_to_guest_mode_work() to handle
TIF_NOTIFY_SIGNAL the same as TIF_SIGPENDING, signaling to the vCPU run
loop that an exit to userpsace is needed. Any pending task_work will be
run when get_signal() is called from exit_to_user_mode_loop(), so there
is no longer any need to run task work from xfer_to_guest_mode_work().

Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Petr Mladek <pmladek@suse.com>
Signed-off-by: Seth Forshee <sforshee@digitalocean.com>
Message-Id: <20220504180840.2907296-1-sforshee@digitalocean.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 3e684903a8574ffc9475fdf13c4780a7adb506ad)

Signed-off-by: Julia Denham <jdenham@redhat.com>
2023-04-10 11:52:39 -04:00
Chris von Recklinghausen 3b8acb1eac resume_user_mode: Move to resume_user_mode.h
Conflicts: block/blk-cgroup.c - We already have
	672fdcf0e7de block: partition include/linux/blk-cgroup.h
	so keep include of linux/blk-cgroup.h

Bugzilla: https://bugzilla.redhat.com/2120352

commit 03248addadf1a5ef0a03cbcd5ec905b49adb9658
Author: Eric W. Biederman <ebiederm@xmission.com>
Date:   Wed Feb 9 12:20:45 2022 -0600

    resume_user_mode: Move to resume_user_mode.h

    Move set_notify_resume and tracehook_notify_resume into resume_user_mode.h.
    While doing that rename tracehook_notify_resume to resume_user_mode_work.

    Update all of the places that included tracehook.h for these functions to
    include resume_user_mode.h instead.

    Update all of the callers of tracehook_notify_resume to call
    resume_user_mode_work.

    Reviewed-by: Kees Cook <keescook@chromium.org>
    Link: https://lkml.kernel.org/r/20220309162454.123006-12-ebiederm@xmission.c
om
    Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2022-10-12 07:27:47 -04:00
Chris von Recklinghausen 5a3ee243fe task_work: Decouple TIF_NOTIFY_SIGNAL and task_work
Bugzilla: https://bugzilla.redhat.com/2120352

commit 7c5d8fa6fbb12a3f0eefe8762bfede508e147cb3
Author: Eric W. Biederman <ebiederm@xmission.com>
Date:   Wed Feb 9 11:18:54 2022 -0600

    task_work: Decouple TIF_NOTIFY_SIGNAL and task_work

    There are a small handful of reasons besides pending signals that the
    kernel might want to break out of interruptible sleeps.  The flag
    TIF_NOTIFY_SIGNAL and the helpers that set and clear TIF_NOTIFY_SIGNAL
    provide that the infrastructure for breaking out of interruptible
    sleeps and entering the return to user space slow path for those
    cases.

    Expand tracehook_notify_signal inline in it's callers and remove it,
    which makes clear that TIF_NOTIFY_SIGNAL and task_work are separate
    concepts.

    Update the comment on set_notify_signal to more accurately describe
    it's purpose.

    Reviewed-by: Kees Cook <keescook@chromium.org>
    Link: https://lkml.kernel.org/r/20220309162454.123006-9-ebiederm@xmission.com
    Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2022-10-12 07:27:47 -04:00
Chris von Recklinghausen 6fb7c30612 task_work: Call tracehook_notify_signal from get_signal on all architectures
Bugzilla: https://bugzilla.redhat.com/2120352

commit 8ba62d37949e248c698c26e0d82d72fda5d33ebf
Author: Eric W. Biederman <ebiederm@xmission.com>
Date:   Wed Feb 9 09:51:14 2022 -0600

    task_work: Call tracehook_notify_signal from get_signal on all architectures

    Always handle TIF_NOTIFY_SIGNAL in get_signal.  With commit 35d0b389f3
    ("task_work: unconditionally run task_work from get_signal()") always
    calling task_work_run all of the work of tracehook_notify_signal is
    already happening except clearing TIF_NOTIFY_SIGNAL.

    Factor clear_notify_signal out of tracehook_notify_signal and use it in
    get_signal so that get_signal only needs one call of task_work_run.

    To keep the semantics in sync update xfer_to_guest_mode_work (which
    does not call get_signal) to call tracehook_notify_signal if either
    _TIF_SIGPENDING or _TIF_NOTIFY_SIGNAL.

    Reviewed-by: Kees Cook <keescook@chromium.org>
    Link: https://lkml.kernel.org/r/20220309162454.123006-8-ebiederm@xmission.com
    Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2022-10-12 07:27:47 -04:00
Chris von Recklinghausen 38e6b77fa9 entry: rseq: Call rseq_handle_notify_resume() in tracehook_notify_resume()
Bugzilla: https://bugzilla.redhat.com/2120352

commit a68de80f61f6af397bc06fb391ff2e571c9c4d80
Author: Sean Christopherson <seanjc@google.com>
Date:   Wed Sep 1 13:30:27 2021 -0700

    entry: rseq: Call rseq_handle_notify_resume() in tracehook_notify_resume()

    Invoke rseq_handle_notify_resume() from tracehook_notify_resume() now
    that the two function are always called back-to-back by architectures
    that have rseq.  The rseq helper is stubbed out for architectures that
    don't support rseq, i.e. this is a nop across the board.

    Note, tracehook_notify_resume() is horribly named and arguably does not
    belong in tracehook.h as literally every line of code in it has nothing
    to do with tracing.  But, that's been true since commit a42c6ded82
    ("move key_repace_session_keyring() into tracehook_notify_resume()")
    first usurped tracehook_notify_resume() back in 2012.  Punt cleaning that
    mess up to future patches.

    No functional change intended.

    Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20210901203030.1292304-3-seanjc@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2022-10-12 07:27:24 -04:00
Vitaly Kuznetsov df89611134 entry: Snapshot thread flags
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2074832

commit 6ce895128b3bff738fe8d9dd74747a03e319e466
Author: Mark Rutland <mark.rutland@arm.com>
Date:   Mon Nov 29 13:06:44 2021 +0000

    entry: Snapshot thread flags

    Some thread flags can be set remotely, and so even when IRQs are disabled,
    the flags can change under our feet. Generally this is unlikely to cause a
    problem in practice, but it is somewhat unsound, and KCSAN will
    legitimately warn that there is a data race.

    To avoid such issues, a snapshot of the flags has to be taken prior to
    using them. Some places already use READ_ONCE() for that, others do not.

    Convert them all to the new flag accessor helpers.

    Signed-off-by: Mark Rutland <mark.rutland@arm.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Paul E. McKenney <paulmck@kernel.org>
    Link: https://lore.kernel.org/r/20211129130653.2037928-3-mark.rutland@arm.com

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
2022-05-30 16:46:29 +02:00
Vitaly Kuznetsov e53893a0fd KVM: rseq: Update rseq when processing NOTIFY_RESUME on xfer to KVM guest
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2009338

commit 8646e53633f314e4d746a988240d3b951a92f94a
Author: Sean Christopherson <seanjc@google.com>
Date:   Wed Sep 1 13:30:26 2021 -0700

    KVM: rseq: Update rseq when processing NOTIFY_RESUME on xfer to KVM guest

    Invoke rseq's NOTIFY_RESUME handler when processing the flag prior to
    transferring to a KVM guest, which is roughly equivalent to an exit to
    userspace and processes many of the same pending actions.  While the task
    cannot be in an rseq critical section as the KVM path is reachable only
    by via ioctl(KVM_RUN), the side effects that apply to rseq outside of a
    critical section still apply, e.g. the current CPU needs to be updated if
    the task is migrated.

    Clearing TIF_NOTIFY_RESUME without informing rseq can lead to segfaults
    and other badness in userspace VMMs that use rseq in combination with KVM,
    e.g. due to the CPU ID being stale after task migration.

    Fixes: 72c3c0fe54 ("x86/kvm: Use generic xfer to guest work function")
    Reported-by: Peter Foley <pefoley@google.com>
    Bisected-by: Doug Evans <dje@google.com>
    Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: stable@vger.kernel.org
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20210901203030.1292304-2-seanjc@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
2021-12-08 10:43:17 +01:00
Thomas Gleixner 01be83eea0 Merge branch 'core/urgent' into core/entry
Pick up the entry fix before further modifications.
2020-11-04 18:14:52 +01:00
Jens Axboe 12db8b6900 entry: Add support for TIF_NOTIFY_SIGNAL
Add TIF_NOTIFY_SIGNAL handling in the generic entry code, which if set,
will return true if signal_pending() is used in a wait loop. That causes an
exit of the loop so that notify_signal tracehooks can be run. If the wait
loop is currently inside a system call, the system call is restarted once
task_work has been processed.

In preparation for only having arch_do_signal() handle syscall restarts if
_TIF_SIGPENDING isn't set, rename it to arch_do_signal_or_restart().  Pass
in a boolean that tells the architecture specific signal handler if it
should attempt to get a signal, or just process a potential syscall
restart.

For !CONFIG_GENERIC_ENTRY archs, add the TIF_NOTIFY_SIGNAL handling to
get_signal(). This is done to minimize the needed architecture changes to
support this feature.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20201026203230.386348-3-axboe@kernel.dk
2020-10-29 09:37:36 +01:00
Jens Axboe 3c532798ec tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume()
All the callers currently do this, clean it up and move the clearing
into tracehook_notify_resume() instead.

Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17 15:04:36 -06:00
Thomas Gleixner 935ace2fb5 entry: Provide infrastructure for work before transitioning to guest mode
Entering a guest is similar to exiting to user space. Pending work like
handling signals, rescheduling, task work etc. needs to be handled before
that.

Provide generic infrastructure to avoid duplication of the same handling
code all over the place.

The transfer to guest mode handling is different from the exit to usermode
handling, e.g. vs. rseq and live patching, so a separate function is used.

The initial list of work items handled is:

    TIF_SIGPENDING, TIF_NEED_RESCHED, TIF_NOTIFY_RESUME

Architecture specific TIF flags can be added via defines in the
architecture specific include files.

The calling convention is also different from the syscall/interrupt entry
functions as KVM invokes this from the outer vcpu_run() loop with
interrupts and preemption enabled. To prevent missing a pending work item
it invokes a check for pending TIF work from interrupt disabled code right
before transitioning to guest mode. The lockdep, RCU and tracing state
handling is also done directly around the switch to and from guest mode.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200722220519.833296398@linutronix.de
2020-07-24 15:03:42 +02:00