linux-kernelorg-stable/rust/kernel/task.rs

394 lines
15 KiB
Rust
Raw Normal View History

// SPDX-License-Identifier: GPL-2.0
//! Tasks (threads and processes).
//!
//! C header: [`include/linux/sched.h`](srctree/include/linux/sched.h).
rust: types: add `NotThreadSafe` This introduces a new marker type for types that shouldn't be thread safe. By adding a field of this type to a struct, it becomes non-Send and non-Sync, which means that it cannot be accessed in any way from threads other than the one it was created on. This is useful for APIs that require globals such as `current` to remain constant while the value exists. We update two existing users in the Kernel to use this helper: * `Task::current()` - moving the return type of this value to a different thread would not be safe as you can no longer be guaranteed that the `current` pointer remains valid. * Lock guards. Mutexes and spinlocks should be unlocked on the same thread as where they were locked, so we enforce this using the Send trait. There are also additional users in later patches of this patchset. See [1] and [2] for the discussion that led to the introduction of this patch. Link: https://lore.kernel.org/all/nFDPJFnzE9Q5cqY7FwSMByRH2OAn_BpI4H53NQfWIlN6I2qfmAqnkp2wRqn0XjMO65OyZY4h6P4K2nAGKJpAOSzksYXaiAK_FoH_8QbgBI4=@proton.me/ [1] Link: https://lore.kernel.org/all/nFDPJFnzE9Q5cqY7FwSMByRH2OAn_BpI4H53NQfWIlN6I2qfmAqnkp2wRqn0XjMO65OyZY4h6P4K2nAGKJpAOSzksYXaiAK_FoH_8QbgBI4=@proton.me/ [2] Suggested-by: Benno Lossin <benno.lossin@proton.me> Reviewed-by: Benno Lossin <benno.lossin@proton.me> Reviewed-by: Trevor Gross <tmgross@umich.edu> Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com> Reviewed-by: Björn Roy Baron <bjorn3_gh@protonmail.com> Reviewed-by: Gary Guo <gary@garyguo.net> Signed-off-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20240915-alice-file-v10-1-88484f7a3dcf@google.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-15 14:31:27 +00:00
use crate::{
bindings,
ffi::{c_int, c_long, c_uint},
task: rust: rework how current is accessed Introduce a new type called `CurrentTask` that lets you perform various operations that are only safe on the `current` task. Use the new type to provide a way to access the current mm without incrementing its refcount. With this change, you can write stuff such as let vma = current!().mm().lock_vma_under_rcu(addr); without incrementing any refcounts. This replaces the existing abstractions for accessing the current pid namespace. With the old approach, every field access to current involves both a macro and a unsafe helper function. The new approach simplifies that to a single safe function on the `CurrentTask` type. This makes it less heavy-weight to add additional current accessors in the future. That said, creating a `CurrentTask` type like the one in this patch requires that we are careful to ensure that it cannot escape the current task or otherwise access things after they are freed. To do this, I declared that it cannot escape the current "task context" where I defined a "task context" as essentially the region in which `current` remains unchanged. So e.g., release_task() or begin_new_exec() would leave the task context. If a userspace thread returns to userspace and later makes another syscall, then I consider the two syscalls to be different task contexts. This allows values stored in that task to be modified between syscalls, even if they're guaranteed to be immutable during a syscall. Ensuring correctness of `CurrentTask` is slightly tricky if we also want the ability to have a safe `kthread_use_mm()` implementation in Rust. To support that safely, there are two patterns we need to ensure are safe: // Case 1: current!() called inside the scope. let mm; kthread_use_mm(some_mm, || { mm = current!().mm(); }); drop(some_mm); mm.do_something(); // UAF and: // Case 2: current!() called before the scope. let mm; let task = current!(); kthread_use_mm(some_mm, || { mm = task.mm(); }); drop(some_mm); mm.do_something(); // UAF The existing `current!()` abstraction already natively prevents the first case: The `&CurrentTask` would be tied to the inner scope, so the borrow-checker ensures that no reference derived from it can escape the scope. Fixing the second case is a bit more tricky. The solution is to essentially pretend that the contents of the scope execute on an different thread, which means that only thread-safe types can cross the boundary. Since `CurrentTask` is marked `NotThreadSafe`, attempts to move it to another thread will fail, and this includes our fake pretend thread boundary. This has the disadvantage that other types that aren't thread-safe for reasons unrelated to `current` also cannot be moved across the `kthread_use_mm()` boundary. I consider this an acceptable tradeoff. Link: https://lkml.kernel.org/r/20250408-vma-v16-8-d8b446e885d9@google.com Signed-off-by: Alice Ryhl <aliceryhl@google.com> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org> Reviewed-by: Gary Guo <gary@garyguo.net> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Balbir Singh <balbirs@nvidia.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: Björn Roy Baron <bjorn3_gh@protonmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jann Horn <jannh@google.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Trevor Gross <tmgross@umich.edu> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-08 09:22:45 +00:00
mm::MmWithUser,
rust: add PidNamespace The lifetime of `PidNamespace` is bound to `Task` and `struct pid`. The `PidNamespace` of a `Task` doesn't ever change once the `Task` is alive. A `unshare(CLONE_NEWPID)` or `setns(fd_pidns/pidfd, CLONE_NEWPID)` will not have an effect on the calling `Task`'s pid namespace. It will only effect the pid namespace of children created by the calling `Task`. This invariant guarantees that after having acquired a reference to a `Task`'s pid namespace it will remain unchanged. When a task has exited and been reaped `release_task()` will be called. This will set the `PidNamespace` of the task to `NULL`. So retrieving the `PidNamespace` of a task that is dead will return `NULL`. Note, that neither holding the RCU lock nor holding a referencing count to the `Task` will prevent `release_task()` being called. In order to retrieve the `PidNamespace` of a `Task` the `task_active_pid_ns()` function can be used. There are two cases to consider: (1) retrieving the `PidNamespace` of the `current` task (2) retrieving the `PidNamespace` of a non-`current` task From system call context retrieving the `PidNamespace` for case (1) is always safe and requires neither RCU locking nor a reference count to be held. Retrieving the `PidNamespace` after `release_task()` for current will return `NULL` but no codepath like that is exposed to Rust. Retrieving the `PidNamespace` from system call context for (2) requires RCU protection. Accessing `PidNamespace` outside of RCU protection requires a reference count that must've been acquired while holding the RCU lock. Note that accessing a non-`current` task means `NULL` can be returned as the non-`current` task could have already passed through `release_task()`. To retrieve (1) the `current_pid_ns!()` macro should be used which ensure that the returned `PidNamespace` cannot outlive the calling scope. The associated `current_pid_ns()` function should not be called directly as it could be abused to created an unbounded lifetime for `PidNamespace`. The `current_pid_ns!()` macro allows Rust to handle the common case of accessing `current`'s `PidNamespace` without RCU protection and without having to acquire a reference count. For (2) the `task_get_pid_ns()` method must be used. This will always acquire a reference on `PidNamespace` and will return an `Option` to force the caller to explicitly handle the case where `PidNamespace` is `None`, something that tends to be forgotten when doing the equivalent operation in `C`. Missing RCU primitives make it difficult to perform operations that are otherwise safe without holding a reference count as long as RCU protection is guaranteed. But it is not important currently. But we do want it in the future. Note for (2) the required RCU protection around calling `task_active_pid_ns()` synchronizes against putting the last reference of the associated `struct pid` of `task->thread_pid`. The `struct pid` stored in that field is used to retrieve the `PidNamespace` of the caller. When `release_task()` is called `task->thread_pid` will be `NULL`ed and `put_pid()` on said `struct pid` will be delayed in `free_pid()` via `call_rcu()` allowing everyone with an RCU protected access to the `struct pid` acquired from `task->thread_pid` to finish. Link: https://lore.kernel.org/r/20241002-brauner-rust-pid_namespace-v5-1-a90e70d44fde@kernel.org Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-02 11:38:10 +00:00
pid_namespace::PidNamespace,
types::{ARef, NotThreadSafe, Opaque},
rust: types: add `NotThreadSafe` This introduces a new marker type for types that shouldn't be thread safe. By adding a field of this type to a struct, it becomes non-Send and non-Sync, which means that it cannot be accessed in any way from threads other than the one it was created on. This is useful for APIs that require globals such as `current` to remain constant while the value exists. We update two existing users in the Kernel to use this helper: * `Task::current()` - moving the return type of this value to a different thread would not be safe as you can no longer be guaranteed that the `current` pointer remains valid. * Lock guards. Mutexes and spinlocks should be unlocked on the same thread as where they were locked, so we enforce this using the Send trait. There are also additional users in later patches of this patchset. See [1] and [2] for the discussion that led to the introduction of this patch. Link: https://lore.kernel.org/all/nFDPJFnzE9Q5cqY7FwSMByRH2OAn_BpI4H53NQfWIlN6I2qfmAqnkp2wRqn0XjMO65OyZY4h6P4K2nAGKJpAOSzksYXaiAK_FoH_8QbgBI4=@proton.me/ [1] Link: https://lore.kernel.org/all/nFDPJFnzE9Q5cqY7FwSMByRH2OAn_BpI4H53NQfWIlN6I2qfmAqnkp2wRqn0XjMO65OyZY4h6P4K2nAGKJpAOSzksYXaiAK_FoH_8QbgBI4=@proton.me/ [2] Suggested-by: Benno Lossin <benno.lossin@proton.me> Reviewed-by: Benno Lossin <benno.lossin@proton.me> Reviewed-by: Trevor Gross <tmgross@umich.edu> Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com> Reviewed-by: Björn Roy Baron <bjorn3_gh@protonmail.com> Reviewed-by: Gary Guo <gary@garyguo.net> Signed-off-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20240915-alice-file-v10-1-88484f7a3dcf@google.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-15 14:31:27 +00:00
};
use core::{
cmp::{Eq, PartialEq},
ops::Deref,
ptr,
};
/// A sentinel value used for infinite timeouts.
pub const MAX_SCHEDULE_TIMEOUT: c_long = c_long::MAX;
rust: sync: update integer types in CondVar Reduce the chances of compilation failures due to integer type mismatches in `CondVar`. When an integer is defined using a #define in C, bindgen doesn't know which integer type it is supposed to be, so it will just use `u32` by default (if it fits in an u32). Whenever the right type is something else, we insert a cast in Rust. However, this means that the code has a lot of extra casts, and sometimes the code will be missing casts if u32 happens to be correct on the developer's machine, even though the type might be something else on a different platform. This patch updates all uses of such constants in `rust/kernel/sync/condvar.rs` to use constants defined with the right type. This allows us to remove various unnecessary casts, while also future-proofing for the case where `unsigned int != u32` (even though that is unlikely to ever happen in the kernel). I wrote this patch at the suggestion of Benno in [1]. Link: https://lore.kernel.org/all/nAEg-6vbtX72ZY3oirDhrSEf06TBWmMiTt73EklMzEAzN4FD4mF3TPEyAOxBZgZtjzoiaBYtYr3s8sa9wp1uYH9vEWRf2M-Lf4I0BY9rAgk=@proton.me/ [1] Suggested-by: Benno Lossin <benno.lossin@proton.me> Reviewed-by: Tiago Lam <tiagolam@gmail.com> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Benno Lossin <benno.lossin@proton.me> Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com> Signed-off-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20240108-rb-new-condvar-methods-v4-4-88e0c871cc05@google.com [ Added note on the unlikeliness of `sizeof(int)` changing. ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-01-08 14:50:00 +00:00
/// Bitmask for tasks that are sleeping in an interruptible state.
pub const TASK_INTERRUPTIBLE: c_int = bindings::TASK_INTERRUPTIBLE as c_int;
/// Bitmask for tasks that are sleeping in an uninterruptible state.
pub const TASK_UNINTERRUPTIBLE: c_int = bindings::TASK_UNINTERRUPTIBLE as c_int;
/// Bitmask for tasks that are sleeping in a freezable state.
pub const TASK_FREEZABLE: c_int = bindings::TASK_FREEZABLE as c_int;
rust: sync: update integer types in CondVar Reduce the chances of compilation failures due to integer type mismatches in `CondVar`. When an integer is defined using a #define in C, bindgen doesn't know which integer type it is supposed to be, so it will just use `u32` by default (if it fits in an u32). Whenever the right type is something else, we insert a cast in Rust. However, this means that the code has a lot of extra casts, and sometimes the code will be missing casts if u32 happens to be correct on the developer's machine, even though the type might be something else on a different platform. This patch updates all uses of such constants in `rust/kernel/sync/condvar.rs` to use constants defined with the right type. This allows us to remove various unnecessary casts, while also future-proofing for the case where `unsigned int != u32` (even though that is unlikely to ever happen in the kernel). I wrote this patch at the suggestion of Benno in [1]. Link: https://lore.kernel.org/all/nAEg-6vbtX72ZY3oirDhrSEf06TBWmMiTt73EklMzEAzN4FD4mF3TPEyAOxBZgZtjzoiaBYtYr3s8sa9wp1uYH9vEWRf2M-Lf4I0BY9rAgk=@proton.me/ [1] Suggested-by: Benno Lossin <benno.lossin@proton.me> Reviewed-by: Tiago Lam <tiagolam@gmail.com> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Benno Lossin <benno.lossin@proton.me> Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com> Signed-off-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20240108-rb-new-condvar-methods-v4-4-88e0c871cc05@google.com [ Added note on the unlikeliness of `sizeof(int)` changing. ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-01-08 14:50:00 +00:00
/// Convenience constant for waking up tasks regardless of whether they are in interruptible or
/// uninterruptible sleep.
pub const TASK_NORMAL: c_uint = bindings::TASK_NORMAL as c_uint;
/// Returns the currently running task.
#[macro_export]
macro_rules! current {
() => {
task: rust: rework how current is accessed Introduce a new type called `CurrentTask` that lets you perform various operations that are only safe on the `current` task. Use the new type to provide a way to access the current mm without incrementing its refcount. With this change, you can write stuff such as let vma = current!().mm().lock_vma_under_rcu(addr); without incrementing any refcounts. This replaces the existing abstractions for accessing the current pid namespace. With the old approach, every field access to current involves both a macro and a unsafe helper function. The new approach simplifies that to a single safe function on the `CurrentTask` type. This makes it less heavy-weight to add additional current accessors in the future. That said, creating a `CurrentTask` type like the one in this patch requires that we are careful to ensure that it cannot escape the current task or otherwise access things after they are freed. To do this, I declared that it cannot escape the current "task context" where I defined a "task context" as essentially the region in which `current` remains unchanged. So e.g., release_task() or begin_new_exec() would leave the task context. If a userspace thread returns to userspace and later makes another syscall, then I consider the two syscalls to be different task contexts. This allows values stored in that task to be modified between syscalls, even if they're guaranteed to be immutable during a syscall. Ensuring correctness of `CurrentTask` is slightly tricky if we also want the ability to have a safe `kthread_use_mm()` implementation in Rust. To support that safely, there are two patterns we need to ensure are safe: // Case 1: current!() called inside the scope. let mm; kthread_use_mm(some_mm, || { mm = current!().mm(); }); drop(some_mm); mm.do_something(); // UAF and: // Case 2: current!() called before the scope. let mm; let task = current!(); kthread_use_mm(some_mm, || { mm = task.mm(); }); drop(some_mm); mm.do_something(); // UAF The existing `current!()` abstraction already natively prevents the first case: The `&CurrentTask` would be tied to the inner scope, so the borrow-checker ensures that no reference derived from it can escape the scope. Fixing the second case is a bit more tricky. The solution is to essentially pretend that the contents of the scope execute on an different thread, which means that only thread-safe types can cross the boundary. Since `CurrentTask` is marked `NotThreadSafe`, attempts to move it to another thread will fail, and this includes our fake pretend thread boundary. This has the disadvantage that other types that aren't thread-safe for reasons unrelated to `current` also cannot be moved across the `kthread_use_mm()` boundary. I consider this an acceptable tradeoff. Link: https://lkml.kernel.org/r/20250408-vma-v16-8-d8b446e885d9@google.com Signed-off-by: Alice Ryhl <aliceryhl@google.com> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org> Reviewed-by: Gary Guo <gary@garyguo.net> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Balbir Singh <balbirs@nvidia.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: Björn Roy Baron <bjorn3_gh@protonmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jann Horn <jannh@google.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Trevor Gross <tmgross@umich.edu> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-08 09:22:45 +00:00
// SAFETY: This expression creates a temporary value that is dropped at the end of the
// caller's scope. The following mechanisms ensure that the resulting `&CurrentTask` cannot
// leave current task context:
//
// * To return to userspace, the caller must leave the current scope.
// * Operations such as `begin_new_exec()` are necessarily unsafe and the caller of
// `begin_new_exec()` is responsible for safety.
// * Rust abstractions for things such as a `kthread_use_mm()` scope must require the
// closure to be `Send`, so the `NotThreadSafe` field of `CurrentTask` ensures that the
// `&CurrentTask` cannot cross the scope in either direction.
unsafe { &*$crate::task::Task::current() }
};
}
/// Wraps the kernel's `struct task_struct`.
///
/// # Invariants
///
/// All instances are valid tasks created by the C portion of the kernel.
///
/// Instances of this type are always refcounted, that is, a call to `get_task_struct` ensures
/// that the allocation remains valid at least until the matching call to `put_task_struct`.
///
/// # Examples
///
/// The following is an example of getting the PID of the current thread with zero additional cost
/// when compared to the C version:
///
/// ```
/// let pid = current!().pid();
/// ```
///
/// Getting the PID of the current process, also zero additional cost:
///
/// ```
/// let pid = current!().group_leader().pid();
/// ```
///
/// Getting the current task and storing it in some struct. The reference count is automatically
/// incremented when creating `State` and decremented when it is dropped:
///
/// ```
/// use kernel::{task::Task, types::ARef};
///
/// struct State {
/// creator: ARef<Task>,
/// index: u32,
/// }
///
/// impl State {
/// fn new() -> Self {
/// Self {
task: rust: rework how current is accessed Introduce a new type called `CurrentTask` that lets you perform various operations that are only safe on the `current` task. Use the new type to provide a way to access the current mm without incrementing its refcount. With this change, you can write stuff such as let vma = current!().mm().lock_vma_under_rcu(addr); without incrementing any refcounts. This replaces the existing abstractions for accessing the current pid namespace. With the old approach, every field access to current involves both a macro and a unsafe helper function. The new approach simplifies that to a single safe function on the `CurrentTask` type. This makes it less heavy-weight to add additional current accessors in the future. That said, creating a `CurrentTask` type like the one in this patch requires that we are careful to ensure that it cannot escape the current task or otherwise access things after they are freed. To do this, I declared that it cannot escape the current "task context" where I defined a "task context" as essentially the region in which `current` remains unchanged. So e.g., release_task() or begin_new_exec() would leave the task context. If a userspace thread returns to userspace and later makes another syscall, then I consider the two syscalls to be different task contexts. This allows values stored in that task to be modified between syscalls, even if they're guaranteed to be immutable during a syscall. Ensuring correctness of `CurrentTask` is slightly tricky if we also want the ability to have a safe `kthread_use_mm()` implementation in Rust. To support that safely, there are two patterns we need to ensure are safe: // Case 1: current!() called inside the scope. let mm; kthread_use_mm(some_mm, || { mm = current!().mm(); }); drop(some_mm); mm.do_something(); // UAF and: // Case 2: current!() called before the scope. let mm; let task = current!(); kthread_use_mm(some_mm, || { mm = task.mm(); }); drop(some_mm); mm.do_something(); // UAF The existing `current!()` abstraction already natively prevents the first case: The `&CurrentTask` would be tied to the inner scope, so the borrow-checker ensures that no reference derived from it can escape the scope. Fixing the second case is a bit more tricky. The solution is to essentially pretend that the contents of the scope execute on an different thread, which means that only thread-safe types can cross the boundary. Since `CurrentTask` is marked `NotThreadSafe`, attempts to move it to another thread will fail, and this includes our fake pretend thread boundary. This has the disadvantage that other types that aren't thread-safe for reasons unrelated to `current` also cannot be moved across the `kthread_use_mm()` boundary. I consider this an acceptable tradeoff. Link: https://lkml.kernel.org/r/20250408-vma-v16-8-d8b446e885d9@google.com Signed-off-by: Alice Ryhl <aliceryhl@google.com> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org> Reviewed-by: Gary Guo <gary@garyguo.net> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Balbir Singh <balbirs@nvidia.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: Björn Roy Baron <bjorn3_gh@protonmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jann Horn <jannh@google.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Trevor Gross <tmgross@umich.edu> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-08 09:22:45 +00:00
/// creator: ARef::from(&**current!()),
/// index: 0,
/// }
/// }
/// }
/// ```
#[repr(transparent)]
pub struct Task(pub(crate) Opaque<bindings::task_struct>);
// SAFETY: By design, the only way to access a `Task` is via the `current` function or via an
// `ARef<Task>` obtained through the `AlwaysRefCounted` impl. This means that the only situation in
// which a `Task` can be accessed mutably is when the refcount drops to zero and the destructor
// runs. It is safe for that to happen on any thread, so it is ok for this type to be `Send`.
unsafe impl Send for Task {}
// SAFETY: It's OK to access `Task` through shared references from other threads because we're
// either accessing properties that don't change (e.g., `pid`, `group_leader`) or that are properly
// synchronised by C code (e.g., `signal_pending`).
unsafe impl Sync for Task {}
task: rust: rework how current is accessed Introduce a new type called `CurrentTask` that lets you perform various operations that are only safe on the `current` task. Use the new type to provide a way to access the current mm without incrementing its refcount. With this change, you can write stuff such as let vma = current!().mm().lock_vma_under_rcu(addr); without incrementing any refcounts. This replaces the existing abstractions for accessing the current pid namespace. With the old approach, every field access to current involves both a macro and a unsafe helper function. The new approach simplifies that to a single safe function on the `CurrentTask` type. This makes it less heavy-weight to add additional current accessors in the future. That said, creating a `CurrentTask` type like the one in this patch requires that we are careful to ensure that it cannot escape the current task or otherwise access things after they are freed. To do this, I declared that it cannot escape the current "task context" where I defined a "task context" as essentially the region in which `current` remains unchanged. So e.g., release_task() or begin_new_exec() would leave the task context. If a userspace thread returns to userspace and later makes another syscall, then I consider the two syscalls to be different task contexts. This allows values stored in that task to be modified between syscalls, even if they're guaranteed to be immutable during a syscall. Ensuring correctness of `CurrentTask` is slightly tricky if we also want the ability to have a safe `kthread_use_mm()` implementation in Rust. To support that safely, there are two patterns we need to ensure are safe: // Case 1: current!() called inside the scope. let mm; kthread_use_mm(some_mm, || { mm = current!().mm(); }); drop(some_mm); mm.do_something(); // UAF and: // Case 2: current!() called before the scope. let mm; let task = current!(); kthread_use_mm(some_mm, || { mm = task.mm(); }); drop(some_mm); mm.do_something(); // UAF The existing `current!()` abstraction already natively prevents the first case: The `&CurrentTask` would be tied to the inner scope, so the borrow-checker ensures that no reference derived from it can escape the scope. Fixing the second case is a bit more tricky. The solution is to essentially pretend that the contents of the scope execute on an different thread, which means that only thread-safe types can cross the boundary. Since `CurrentTask` is marked `NotThreadSafe`, attempts to move it to another thread will fail, and this includes our fake pretend thread boundary. This has the disadvantage that other types that aren't thread-safe for reasons unrelated to `current` also cannot be moved across the `kthread_use_mm()` boundary. I consider this an acceptable tradeoff. Link: https://lkml.kernel.org/r/20250408-vma-v16-8-d8b446e885d9@google.com Signed-off-by: Alice Ryhl <aliceryhl@google.com> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org> Reviewed-by: Gary Guo <gary@garyguo.net> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Balbir Singh <balbirs@nvidia.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: Björn Roy Baron <bjorn3_gh@protonmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jann Horn <jannh@google.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Trevor Gross <tmgross@umich.edu> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-08 09:22:45 +00:00
/// Represents the [`Task`] in the `current` global.
///
/// This type exists to provide more efficient operations that are only valid on the current task.
/// For example, to retrieve the pid-namespace of a task, you must use rcu protection unless it is
/// the current task.
///
/// # Invariants
///
/// Each value of this type must only be accessed from the task context it was created within.
///
/// Of course, every thread is in a different task context, but for the purposes of this invariant,
/// these operations also permanently leave the task context:
///
/// * Returning to userspace from system call context.
/// * Calling `release_task()`.
/// * Calling `begin_new_exec()` in a binary format loader.
///
/// Other operations temporarily create a new sub-context:
///
/// * Calling `kthread_use_mm()` creates a new context, and `kthread_unuse_mm()` returns to the
/// old context.
///
/// This means that a `CurrentTask` obtained before a `kthread_use_mm()` call may be used again
/// once `kthread_unuse_mm()` is called, but it must not be used between these two calls.
/// Conversely, a `CurrentTask` obtained between a `kthread_use_mm()`/`kthread_unuse_mm()` pair
/// must not be used after `kthread_unuse_mm()`.
#[repr(transparent)]
pub struct CurrentTask(Task, NotThreadSafe);
// Make all `Task` methods available on `CurrentTask`.
impl Deref for CurrentTask {
type Target = Task;
#[inline]
fn deref(&self) -> &Task {
&self.0
}
}
/// The type of process identifiers (PIDs).
pub type Pid = bindings::pid_t;
rust: file: add `Kuid` wrapper Adds a wrapper around `kuid_t` called `Kuid`. This allows us to define various operations on kuids such as equality and current_euid. It also lets us provide conversions from kuid into userspace values. Rust Binder needs these operations because it needs to compare kuids for equality, and it needs to tell userspace about the pid and uid of incoming transactions. To read kuids from a `struct task_struct`, you must currently use various #defines that perform the appropriate field access under an RCU read lock. Currently, we do not have a Rust wrapper for rcu_read_lock, which means that for this patch, there are two ways forward: 1. Inline the methods into Rust code, and use __rcu_read_lock directly rather than the rcu_read_lock wrapper. This gives up lockdep for these usages of RCU. 2. Wrap the various #defines in helpers and call the helpers from Rust. This patch uses the second option. One possible disadvantage of the second option is the possible introduction of speculation gadgets, but as discussed in [1], the risk appears to be acceptable. Of course, once a wrapper for rcu_read_lock is available, it is preferable to use that over either of the two above approaches. Link: https://lore.kernel.org/all/202312080947.674CD2DC7@keescook/ [1] Reviewed-by: Benno Lossin <benno.lossin@proton.me> Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com> Reviewed-by: Trevor Gross <tmgross@umich.edu> Signed-off-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20240915-alice-file-v10-7-88484f7a3dcf@google.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-15 14:31:33 +00:00
/// The type of user identifiers (UIDs).
#[derive(Copy, Clone)]
pub struct Kuid {
kuid: bindings::kuid_t,
}
impl Task {
/// Returns a raw pointer to the current task.
///
/// It is up to the user to use the pointer correctly.
#[inline]
pub fn current_raw() -> *mut bindings::task_struct {
// SAFETY: Getting the current pointer is always safe.
unsafe { bindings::get_current() }
}
/// Returns a task reference for the currently executing task/thread.
///
/// The recommended way to get the current task/thread is to use the
/// [`current`] macro because it is safe.
///
/// # Safety
///
task: rust: rework how current is accessed Introduce a new type called `CurrentTask` that lets you perform various operations that are only safe on the `current` task. Use the new type to provide a way to access the current mm without incrementing its refcount. With this change, you can write stuff such as let vma = current!().mm().lock_vma_under_rcu(addr); without incrementing any refcounts. This replaces the existing abstractions for accessing the current pid namespace. With the old approach, every field access to current involves both a macro and a unsafe helper function. The new approach simplifies that to a single safe function on the `CurrentTask` type. This makes it less heavy-weight to add additional current accessors in the future. That said, creating a `CurrentTask` type like the one in this patch requires that we are careful to ensure that it cannot escape the current task or otherwise access things after they are freed. To do this, I declared that it cannot escape the current "task context" where I defined a "task context" as essentially the region in which `current` remains unchanged. So e.g., release_task() or begin_new_exec() would leave the task context. If a userspace thread returns to userspace and later makes another syscall, then I consider the two syscalls to be different task contexts. This allows values stored in that task to be modified between syscalls, even if they're guaranteed to be immutable during a syscall. Ensuring correctness of `CurrentTask` is slightly tricky if we also want the ability to have a safe `kthread_use_mm()` implementation in Rust. To support that safely, there are two patterns we need to ensure are safe: // Case 1: current!() called inside the scope. let mm; kthread_use_mm(some_mm, || { mm = current!().mm(); }); drop(some_mm); mm.do_something(); // UAF and: // Case 2: current!() called before the scope. let mm; let task = current!(); kthread_use_mm(some_mm, || { mm = task.mm(); }); drop(some_mm); mm.do_something(); // UAF The existing `current!()` abstraction already natively prevents the first case: The `&CurrentTask` would be tied to the inner scope, so the borrow-checker ensures that no reference derived from it can escape the scope. Fixing the second case is a bit more tricky. The solution is to essentially pretend that the contents of the scope execute on an different thread, which means that only thread-safe types can cross the boundary. Since `CurrentTask` is marked `NotThreadSafe`, attempts to move it to another thread will fail, and this includes our fake pretend thread boundary. This has the disadvantage that other types that aren't thread-safe for reasons unrelated to `current` also cannot be moved across the `kthread_use_mm()` boundary. I consider this an acceptable tradeoff. Link: https://lkml.kernel.org/r/20250408-vma-v16-8-d8b446e885d9@google.com Signed-off-by: Alice Ryhl <aliceryhl@google.com> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org> Reviewed-by: Gary Guo <gary@garyguo.net> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Balbir Singh <balbirs@nvidia.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: Björn Roy Baron <bjorn3_gh@protonmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jann Horn <jannh@google.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Trevor Gross <tmgross@umich.edu> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-08 09:22:45 +00:00
/// Callers must ensure that the returned object is only used to access a [`CurrentTask`]
/// within the task context that was active when this function was called. For more details,
/// see the invariants section for [`CurrentTask`].
pub unsafe fn current() -> impl Deref<Target = CurrentTask> {
struct TaskRef {
task: *const CurrentTask,
}
task: rust: rework how current is accessed Introduce a new type called `CurrentTask` that lets you perform various operations that are only safe on the `current` task. Use the new type to provide a way to access the current mm without incrementing its refcount. With this change, you can write stuff such as let vma = current!().mm().lock_vma_under_rcu(addr); without incrementing any refcounts. This replaces the existing abstractions for accessing the current pid namespace. With the old approach, every field access to current involves both a macro and a unsafe helper function. The new approach simplifies that to a single safe function on the `CurrentTask` type. This makes it less heavy-weight to add additional current accessors in the future. That said, creating a `CurrentTask` type like the one in this patch requires that we are careful to ensure that it cannot escape the current task or otherwise access things after they are freed. To do this, I declared that it cannot escape the current "task context" where I defined a "task context" as essentially the region in which `current` remains unchanged. So e.g., release_task() or begin_new_exec() would leave the task context. If a userspace thread returns to userspace and later makes another syscall, then I consider the two syscalls to be different task contexts. This allows values stored in that task to be modified between syscalls, even if they're guaranteed to be immutable during a syscall. Ensuring correctness of `CurrentTask` is slightly tricky if we also want the ability to have a safe `kthread_use_mm()` implementation in Rust. To support that safely, there are two patterns we need to ensure are safe: // Case 1: current!() called inside the scope. let mm; kthread_use_mm(some_mm, || { mm = current!().mm(); }); drop(some_mm); mm.do_something(); // UAF and: // Case 2: current!() called before the scope. let mm; let task = current!(); kthread_use_mm(some_mm, || { mm = task.mm(); }); drop(some_mm); mm.do_something(); // UAF The existing `current!()` abstraction already natively prevents the first case: The `&CurrentTask` would be tied to the inner scope, so the borrow-checker ensures that no reference derived from it can escape the scope. Fixing the second case is a bit more tricky. The solution is to essentially pretend that the contents of the scope execute on an different thread, which means that only thread-safe types can cross the boundary. Since `CurrentTask` is marked `NotThreadSafe`, attempts to move it to another thread will fail, and this includes our fake pretend thread boundary. This has the disadvantage that other types that aren't thread-safe for reasons unrelated to `current` also cannot be moved across the `kthread_use_mm()` boundary. I consider this an acceptable tradeoff. Link: https://lkml.kernel.org/r/20250408-vma-v16-8-d8b446e885d9@google.com Signed-off-by: Alice Ryhl <aliceryhl@google.com> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org> Reviewed-by: Gary Guo <gary@garyguo.net> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Balbir Singh <balbirs@nvidia.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: Björn Roy Baron <bjorn3_gh@protonmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jann Horn <jannh@google.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Trevor Gross <tmgross@umich.edu> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-08 09:22:45 +00:00
impl Deref for TaskRef {
type Target = CurrentTask;
fn deref(&self) -> &Self::Target {
task: rust: rework how current is accessed Introduce a new type called `CurrentTask` that lets you perform various operations that are only safe on the `current` task. Use the new type to provide a way to access the current mm without incrementing its refcount. With this change, you can write stuff such as let vma = current!().mm().lock_vma_under_rcu(addr); without incrementing any refcounts. This replaces the existing abstractions for accessing the current pid namespace. With the old approach, every field access to current involves both a macro and a unsafe helper function. The new approach simplifies that to a single safe function on the `CurrentTask` type. This makes it less heavy-weight to add additional current accessors in the future. That said, creating a `CurrentTask` type like the one in this patch requires that we are careful to ensure that it cannot escape the current task or otherwise access things after they are freed. To do this, I declared that it cannot escape the current "task context" where I defined a "task context" as essentially the region in which `current` remains unchanged. So e.g., release_task() or begin_new_exec() would leave the task context. If a userspace thread returns to userspace and later makes another syscall, then I consider the two syscalls to be different task contexts. This allows values stored in that task to be modified between syscalls, even if they're guaranteed to be immutable during a syscall. Ensuring correctness of `CurrentTask` is slightly tricky if we also want the ability to have a safe `kthread_use_mm()` implementation in Rust. To support that safely, there are two patterns we need to ensure are safe: // Case 1: current!() called inside the scope. let mm; kthread_use_mm(some_mm, || { mm = current!().mm(); }); drop(some_mm); mm.do_something(); // UAF and: // Case 2: current!() called before the scope. let mm; let task = current!(); kthread_use_mm(some_mm, || { mm = task.mm(); }); drop(some_mm); mm.do_something(); // UAF The existing `current!()` abstraction already natively prevents the first case: The `&CurrentTask` would be tied to the inner scope, so the borrow-checker ensures that no reference derived from it can escape the scope. Fixing the second case is a bit more tricky. The solution is to essentially pretend that the contents of the scope execute on an different thread, which means that only thread-safe types can cross the boundary. Since `CurrentTask` is marked `NotThreadSafe`, attempts to move it to another thread will fail, and this includes our fake pretend thread boundary. This has the disadvantage that other types that aren't thread-safe for reasons unrelated to `current` also cannot be moved across the `kthread_use_mm()` boundary. I consider this an acceptable tradeoff. Link: https://lkml.kernel.org/r/20250408-vma-v16-8-d8b446e885d9@google.com Signed-off-by: Alice Ryhl <aliceryhl@google.com> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org> Reviewed-by: Gary Guo <gary@garyguo.net> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Balbir Singh <balbirs@nvidia.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: Björn Roy Baron <bjorn3_gh@protonmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jann Horn <jannh@google.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Trevor Gross <tmgross@umich.edu> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-08 09:22:45 +00:00
// SAFETY: The returned reference borrows from this `TaskRef`, so it cannot outlive
// the `TaskRef`, which the caller of `Task::current()` has promised will not
// outlive the task/thread for which `self.task` is the `current` pointer. Thus, it
// is okay to return a `CurrentTask` reference here.
unsafe { &*self.task }
}
}
TaskRef {
task: rust: rework how current is accessed Introduce a new type called `CurrentTask` that lets you perform various operations that are only safe on the `current` task. Use the new type to provide a way to access the current mm without incrementing its refcount. With this change, you can write stuff such as let vma = current!().mm().lock_vma_under_rcu(addr); without incrementing any refcounts. This replaces the existing abstractions for accessing the current pid namespace. With the old approach, every field access to current involves both a macro and a unsafe helper function. The new approach simplifies that to a single safe function on the `CurrentTask` type. This makes it less heavy-weight to add additional current accessors in the future. That said, creating a `CurrentTask` type like the one in this patch requires that we are careful to ensure that it cannot escape the current task or otherwise access things after they are freed. To do this, I declared that it cannot escape the current "task context" where I defined a "task context" as essentially the region in which `current` remains unchanged. So e.g., release_task() or begin_new_exec() would leave the task context. If a userspace thread returns to userspace and later makes another syscall, then I consider the two syscalls to be different task contexts. This allows values stored in that task to be modified between syscalls, even if they're guaranteed to be immutable during a syscall. Ensuring correctness of `CurrentTask` is slightly tricky if we also want the ability to have a safe `kthread_use_mm()` implementation in Rust. To support that safely, there are two patterns we need to ensure are safe: // Case 1: current!() called inside the scope. let mm; kthread_use_mm(some_mm, || { mm = current!().mm(); }); drop(some_mm); mm.do_something(); // UAF and: // Case 2: current!() called before the scope. let mm; let task = current!(); kthread_use_mm(some_mm, || { mm = task.mm(); }); drop(some_mm); mm.do_something(); // UAF The existing `current!()` abstraction already natively prevents the first case: The `&CurrentTask` would be tied to the inner scope, so the borrow-checker ensures that no reference derived from it can escape the scope. Fixing the second case is a bit more tricky. The solution is to essentially pretend that the contents of the scope execute on an different thread, which means that only thread-safe types can cross the boundary. Since `CurrentTask` is marked `NotThreadSafe`, attempts to move it to another thread will fail, and this includes our fake pretend thread boundary. This has the disadvantage that other types that aren't thread-safe for reasons unrelated to `current` also cannot be moved across the `kthread_use_mm()` boundary. I consider this an acceptable tradeoff. Link: https://lkml.kernel.org/r/20250408-vma-v16-8-d8b446e885d9@google.com Signed-off-by: Alice Ryhl <aliceryhl@google.com> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org> Reviewed-by: Gary Guo <gary@garyguo.net> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Balbir Singh <balbirs@nvidia.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: Björn Roy Baron <bjorn3_gh@protonmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jann Horn <jannh@google.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Trevor Gross <tmgross@umich.edu> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-08 09:22:45 +00:00
// CAST: The layout of `struct task_struct` and `CurrentTask` is identical.
task: Task::current_raw().cast(),
rust: add PidNamespace The lifetime of `PidNamespace` is bound to `Task` and `struct pid`. The `PidNamespace` of a `Task` doesn't ever change once the `Task` is alive. A `unshare(CLONE_NEWPID)` or `setns(fd_pidns/pidfd, CLONE_NEWPID)` will not have an effect on the calling `Task`'s pid namespace. It will only effect the pid namespace of children created by the calling `Task`. This invariant guarantees that after having acquired a reference to a `Task`'s pid namespace it will remain unchanged. When a task has exited and been reaped `release_task()` will be called. This will set the `PidNamespace` of the task to `NULL`. So retrieving the `PidNamespace` of a task that is dead will return `NULL`. Note, that neither holding the RCU lock nor holding a referencing count to the `Task` will prevent `release_task()` being called. In order to retrieve the `PidNamespace` of a `Task` the `task_active_pid_ns()` function can be used. There are two cases to consider: (1) retrieving the `PidNamespace` of the `current` task (2) retrieving the `PidNamespace` of a non-`current` task From system call context retrieving the `PidNamespace` for case (1) is always safe and requires neither RCU locking nor a reference count to be held. Retrieving the `PidNamespace` after `release_task()` for current will return `NULL` but no codepath like that is exposed to Rust. Retrieving the `PidNamespace` from system call context for (2) requires RCU protection. Accessing `PidNamespace` outside of RCU protection requires a reference count that must've been acquired while holding the RCU lock. Note that accessing a non-`current` task means `NULL` can be returned as the non-`current` task could have already passed through `release_task()`. To retrieve (1) the `current_pid_ns!()` macro should be used which ensure that the returned `PidNamespace` cannot outlive the calling scope. The associated `current_pid_ns()` function should not be called directly as it could be abused to created an unbounded lifetime for `PidNamespace`. The `current_pid_ns!()` macro allows Rust to handle the common case of accessing `current`'s `PidNamespace` without RCU protection and without having to acquire a reference count. For (2) the `task_get_pid_ns()` method must be used. This will always acquire a reference on `PidNamespace` and will return an `Option` to force the caller to explicitly handle the case where `PidNamespace` is `None`, something that tends to be forgotten when doing the equivalent operation in `C`. Missing RCU primitives make it difficult to perform operations that are otherwise safe without holding a reference count as long as RCU protection is guaranteed. But it is not important currently. But we do want it in the future. Note for (2) the required RCU protection around calling `task_active_pid_ns()` synchronizes against putting the last reference of the associated `struct pid` of `task->thread_pid`. The `struct pid` stored in that field is used to retrieve the `PidNamespace` of the caller. When `release_task()` is called `task->thread_pid` will be `NULL`ed and `put_pid()` on said `struct pid` will be delayed in `free_pid()` via `call_rcu()` allowing everyone with an RCU protected access to the `struct pid` acquired from `task->thread_pid` to finish. Link: https://lore.kernel.org/r/20241002-brauner-rust-pid_namespace-v5-1-a90e70d44fde@kernel.org Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-02 11:38:10 +00:00
}
}
/// Returns a raw pointer to the task.
#[inline]
pub fn as_ptr(&self) -> *mut bindings::task_struct {
self.0.get()
}
/// Returns the group leader of the given task.
pub fn group_leader(&self) -> &Task {
// SAFETY: The group leader of a task never changes after initialization, so reading this
// field is not a data race.
let ptr = unsafe { *ptr::addr_of!((*self.as_ptr()).group_leader) };
// SAFETY: The lifetime of the returned task reference is tied to the lifetime of `self`,
// and given that a task has a reference to its group leader, we know it must be valid for
// the lifetime of the returned task reference.
unsafe { &*ptr.cast() }
}
/// Returns the PID of the given task.
pub fn pid(&self) -> Pid {
// SAFETY: The pid of a task never changes after initialization, so reading this field is
// not a data race.
unsafe { *ptr::addr_of!((*self.as_ptr()).pid) }
}
rust: file: add `Kuid` wrapper Adds a wrapper around `kuid_t` called `Kuid`. This allows us to define various operations on kuids such as equality and current_euid. It also lets us provide conversions from kuid into userspace values. Rust Binder needs these operations because it needs to compare kuids for equality, and it needs to tell userspace about the pid and uid of incoming transactions. To read kuids from a `struct task_struct`, you must currently use various #defines that perform the appropriate field access under an RCU read lock. Currently, we do not have a Rust wrapper for rcu_read_lock, which means that for this patch, there are two ways forward: 1. Inline the methods into Rust code, and use __rcu_read_lock directly rather than the rcu_read_lock wrapper. This gives up lockdep for these usages of RCU. 2. Wrap the various #defines in helpers and call the helpers from Rust. This patch uses the second option. One possible disadvantage of the second option is the possible introduction of speculation gadgets, but as discussed in [1], the risk appears to be acceptable. Of course, once a wrapper for rcu_read_lock is available, it is preferable to use that over either of the two above approaches. Link: https://lore.kernel.org/all/202312080947.674CD2DC7@keescook/ [1] Reviewed-by: Benno Lossin <benno.lossin@proton.me> Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com> Reviewed-by: Trevor Gross <tmgross@umich.edu> Signed-off-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20240915-alice-file-v10-7-88484f7a3dcf@google.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-15 14:31:33 +00:00
/// Returns the UID of the given task.
pub fn uid(&self) -> Kuid {
// SAFETY: It's always safe to call `task_uid` on a valid task.
Kuid::from_raw(unsafe { bindings::task_uid(self.as_ptr()) })
rust: file: add `Kuid` wrapper Adds a wrapper around `kuid_t` called `Kuid`. This allows us to define various operations on kuids such as equality and current_euid. It also lets us provide conversions from kuid into userspace values. Rust Binder needs these operations because it needs to compare kuids for equality, and it needs to tell userspace about the pid and uid of incoming transactions. To read kuids from a `struct task_struct`, you must currently use various #defines that perform the appropriate field access under an RCU read lock. Currently, we do not have a Rust wrapper for rcu_read_lock, which means that for this patch, there are two ways forward: 1. Inline the methods into Rust code, and use __rcu_read_lock directly rather than the rcu_read_lock wrapper. This gives up lockdep for these usages of RCU. 2. Wrap the various #defines in helpers and call the helpers from Rust. This patch uses the second option. One possible disadvantage of the second option is the possible introduction of speculation gadgets, but as discussed in [1], the risk appears to be acceptable. Of course, once a wrapper for rcu_read_lock is available, it is preferable to use that over either of the two above approaches. Link: https://lore.kernel.org/all/202312080947.674CD2DC7@keescook/ [1] Reviewed-by: Benno Lossin <benno.lossin@proton.me> Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com> Reviewed-by: Trevor Gross <tmgross@umich.edu> Signed-off-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20240915-alice-file-v10-7-88484f7a3dcf@google.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-15 14:31:33 +00:00
}
/// Returns the effective UID of the given task.
pub fn euid(&self) -> Kuid {
// SAFETY: It's always safe to call `task_euid` on a valid task.
Kuid::from_raw(unsafe { bindings::task_euid(self.as_ptr()) })
rust: file: add `Kuid` wrapper Adds a wrapper around `kuid_t` called `Kuid`. This allows us to define various operations on kuids such as equality and current_euid. It also lets us provide conversions from kuid into userspace values. Rust Binder needs these operations because it needs to compare kuids for equality, and it needs to tell userspace about the pid and uid of incoming transactions. To read kuids from a `struct task_struct`, you must currently use various #defines that perform the appropriate field access under an RCU read lock. Currently, we do not have a Rust wrapper for rcu_read_lock, which means that for this patch, there are two ways forward: 1. Inline the methods into Rust code, and use __rcu_read_lock directly rather than the rcu_read_lock wrapper. This gives up lockdep for these usages of RCU. 2. Wrap the various #defines in helpers and call the helpers from Rust. This patch uses the second option. One possible disadvantage of the second option is the possible introduction of speculation gadgets, but as discussed in [1], the risk appears to be acceptable. Of course, once a wrapper for rcu_read_lock is available, it is preferable to use that over either of the two above approaches. Link: https://lore.kernel.org/all/202312080947.674CD2DC7@keescook/ [1] Reviewed-by: Benno Lossin <benno.lossin@proton.me> Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com> Reviewed-by: Trevor Gross <tmgross@umich.edu> Signed-off-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20240915-alice-file-v10-7-88484f7a3dcf@google.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-15 14:31:33 +00:00
}
/// Determines whether the given task has pending signals.
pub fn signal_pending(&self) -> bool {
// SAFETY: It's always safe to call `signal_pending` on a valid task.
unsafe { bindings::signal_pending(self.as_ptr()) != 0 }
}
rust: add PidNamespace The lifetime of `PidNamespace` is bound to `Task` and `struct pid`. The `PidNamespace` of a `Task` doesn't ever change once the `Task` is alive. A `unshare(CLONE_NEWPID)` or `setns(fd_pidns/pidfd, CLONE_NEWPID)` will not have an effect on the calling `Task`'s pid namespace. It will only effect the pid namespace of children created by the calling `Task`. This invariant guarantees that after having acquired a reference to a `Task`'s pid namespace it will remain unchanged. When a task has exited and been reaped `release_task()` will be called. This will set the `PidNamespace` of the task to `NULL`. So retrieving the `PidNamespace` of a task that is dead will return `NULL`. Note, that neither holding the RCU lock nor holding a referencing count to the `Task` will prevent `release_task()` being called. In order to retrieve the `PidNamespace` of a `Task` the `task_active_pid_ns()` function can be used. There are two cases to consider: (1) retrieving the `PidNamespace` of the `current` task (2) retrieving the `PidNamespace` of a non-`current` task From system call context retrieving the `PidNamespace` for case (1) is always safe and requires neither RCU locking nor a reference count to be held. Retrieving the `PidNamespace` after `release_task()` for current will return `NULL` but no codepath like that is exposed to Rust. Retrieving the `PidNamespace` from system call context for (2) requires RCU protection. Accessing `PidNamespace` outside of RCU protection requires a reference count that must've been acquired while holding the RCU lock. Note that accessing a non-`current` task means `NULL` can be returned as the non-`current` task could have already passed through `release_task()`. To retrieve (1) the `current_pid_ns!()` macro should be used which ensure that the returned `PidNamespace` cannot outlive the calling scope. The associated `current_pid_ns()` function should not be called directly as it could be abused to created an unbounded lifetime for `PidNamespace`. The `current_pid_ns!()` macro allows Rust to handle the common case of accessing `current`'s `PidNamespace` without RCU protection and without having to acquire a reference count. For (2) the `task_get_pid_ns()` method must be used. This will always acquire a reference on `PidNamespace` and will return an `Option` to force the caller to explicitly handle the case where `PidNamespace` is `None`, something that tends to be forgotten when doing the equivalent operation in `C`. Missing RCU primitives make it difficult to perform operations that are otherwise safe without holding a reference count as long as RCU protection is guaranteed. But it is not important currently. But we do want it in the future. Note for (2) the required RCU protection around calling `task_active_pid_ns()` synchronizes against putting the last reference of the associated `struct pid` of `task->thread_pid`. The `struct pid` stored in that field is used to retrieve the `PidNamespace` of the caller. When `release_task()` is called `task->thread_pid` will be `NULL`ed and `put_pid()` on said `struct pid` will be delayed in `free_pid()` via `call_rcu()` allowing everyone with an RCU protected access to the `struct pid` acquired from `task->thread_pid` to finish. Link: https://lore.kernel.org/r/20241002-brauner-rust-pid_namespace-v5-1-a90e70d44fde@kernel.org Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-02 11:38:10 +00:00
/// Returns task's pid namespace with elevated reference count
pub fn get_pid_ns(&self) -> Option<ARef<PidNamespace>> {
// SAFETY: By the type invariant, we know that `self.0` is valid.
vfs-6.13.rust.pid_namespace -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZzcWKgAKCRCRxhvAZXjc osPAAP9bLzOPIF51IgP9mQTBlKKrpCWCMQVss5xRDseyNEfCEQD/fR9TSSnX9Suw iad9oBkxkzCjyxWIH46rvbdnc38lRwo= =aawA -----END PGP SIGNATURE----- Merge tag 'vfs-6.13.rust.pid_namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull pid_namespace rust bindings from Christian Brauner: "This contains my Rust bindings for pid namespaces needed for various rust drivers. Here's a description of the basic C semantics and how they are mapped to Rust. The pid namespace of a task doesn't ever change once the task is alive. A unshare(CLONE_NEWPID) or setns(fd_pidns/pidfd, CLONE_NEWPID) will not have an effect on the calling task's pid namespace. It will only effect the pid namespace of children created by the calling task. This invariant guarantees that after having acquired a reference to a task's pid namespace it will remain unchanged. When a task has exited and been reaped release_task() will be called. This will set the pid namespace of the task to NULL. So retrieving the pid namespace of a task that is dead will return NULL. Note, that neither holding the RCU lock nor holding a reference count to the task will prevent release_task() from being called. In order to retrieve the pid namespace of a task the task_active_pid_ns() function can be used. There are two cases to consider: (1) retrieving the pid namespace of the current task (2) retrieving the pid namespace of a non-current task From system call context retrieving the pid namespace for case (1) is always safe and requires neither RCU locking nor a reference count to be held. Retrieving the pid namespace after release_task() for current will return NULL but no codepath like that is exposed to Rust. Retrieving the pid namespace from system call context for (2) requires RCU protection. Accessing a pid namespace outside of RCU protection requires a reference count that must've been acquired while holding the RCU lock. Note that accessing a non-current task means NULL can be returned as the non-current task could have already passed through release_task(). To retrieve (1) the current_pid_ns!() macro should be used. It ensures that the returned pid namespace cannot outlive the calling scope. The associated current_pid_ns() function should not be called directly as it could be abused to created an unbounded lifetime for the pid namespace. The current_pid_ns!() macro allows Rust to handle the common case of accessing current's pid namespace without RCU protection and without having to acquire a reference count. For (2) the task_get_pid_ns() method must be used. This will always acquire a reference on the pid namespace and will return an Option to force the caller to explicitly handle the case where pid namespace is None. Something that tends to be forgotten when doing the equivalent operation in C. Missing RCU primitives make it difficult to perform operations that are otherwise safe without holding a reference count as long as RCU protection is guaranteed. But it is not important currently. But we do want it in the future. Note that for (2) the required RCU protection around calling task_active_pid_ns() synchronizes against putting the last reference of the associated struct pid of task->thread_pid. The struct pid stored in that field is used to retrieve the pid namespace of the caller. When release_task() is called task->thread_pid will be NULLed and put_pid() on said struct pid will be delayed in free_pid() via call_rcu() allowing everyone with an RCU protected access to the struct pid acquired from task->thread_pid to finish" * tag 'vfs-6.13.rust.pid_namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: rust: add PidNamespace
2024-11-26 21:18:00 +00:00
let ptr = unsafe { bindings::task_get_pid_ns(self.as_ptr()) };
rust: add PidNamespace The lifetime of `PidNamespace` is bound to `Task` and `struct pid`. The `PidNamespace` of a `Task` doesn't ever change once the `Task` is alive. A `unshare(CLONE_NEWPID)` or `setns(fd_pidns/pidfd, CLONE_NEWPID)` will not have an effect on the calling `Task`'s pid namespace. It will only effect the pid namespace of children created by the calling `Task`. This invariant guarantees that after having acquired a reference to a `Task`'s pid namespace it will remain unchanged. When a task has exited and been reaped `release_task()` will be called. This will set the `PidNamespace` of the task to `NULL`. So retrieving the `PidNamespace` of a task that is dead will return `NULL`. Note, that neither holding the RCU lock nor holding a referencing count to the `Task` will prevent `release_task()` being called. In order to retrieve the `PidNamespace` of a `Task` the `task_active_pid_ns()` function can be used. There are two cases to consider: (1) retrieving the `PidNamespace` of the `current` task (2) retrieving the `PidNamespace` of a non-`current` task From system call context retrieving the `PidNamespace` for case (1) is always safe and requires neither RCU locking nor a reference count to be held. Retrieving the `PidNamespace` after `release_task()` for current will return `NULL` but no codepath like that is exposed to Rust. Retrieving the `PidNamespace` from system call context for (2) requires RCU protection. Accessing `PidNamespace` outside of RCU protection requires a reference count that must've been acquired while holding the RCU lock. Note that accessing a non-`current` task means `NULL` can be returned as the non-`current` task could have already passed through `release_task()`. To retrieve (1) the `current_pid_ns!()` macro should be used which ensure that the returned `PidNamespace` cannot outlive the calling scope. The associated `current_pid_ns()` function should not be called directly as it could be abused to created an unbounded lifetime for `PidNamespace`. The `current_pid_ns!()` macro allows Rust to handle the common case of accessing `current`'s `PidNamespace` without RCU protection and without having to acquire a reference count. For (2) the `task_get_pid_ns()` method must be used. This will always acquire a reference on `PidNamespace` and will return an `Option` to force the caller to explicitly handle the case where `PidNamespace` is `None`, something that tends to be forgotten when doing the equivalent operation in `C`. Missing RCU primitives make it difficult to perform operations that are otherwise safe without holding a reference count as long as RCU protection is guaranteed. But it is not important currently. But we do want it in the future. Note for (2) the required RCU protection around calling `task_active_pid_ns()` synchronizes against putting the last reference of the associated `struct pid` of `task->thread_pid`. The `struct pid` stored in that field is used to retrieve the `PidNamespace` of the caller. When `release_task()` is called `task->thread_pid` will be `NULL`ed and `put_pid()` on said `struct pid` will be delayed in `free_pid()` via `call_rcu()` allowing everyone with an RCU protected access to the `struct pid` acquired from `task->thread_pid` to finish. Link: https://lore.kernel.org/r/20241002-brauner-rust-pid_namespace-v5-1-a90e70d44fde@kernel.org Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-02 11:38:10 +00:00
if ptr.is_null() {
None
} else {
// SAFETY: `ptr` is valid by the safety requirements of this function. And we own a
// reference count via `task_get_pid_ns()`.
// CAST: `Self` is a `repr(transparent)` wrapper around `bindings::pid_namespace`.
Some(unsafe { ARef::from_raw(ptr::NonNull::new_unchecked(ptr.cast::<PidNamespace>())) })
}
}
/// Returns the given task's pid in the provided pid namespace.
#[doc(alias = "task_tgid_nr_ns")]
pub fn tgid_nr_ns(&self, pidns: Option<&PidNamespace>) -> Pid {
let pidns = match pidns {
Some(pidns) => pidns.as_ptr(),
None => core::ptr::null_mut(),
};
// SAFETY: By the type invariant, we know that `self.0` is valid. We received a valid
// PidNamespace that we can use as a pointer or we received an empty PidNamespace and
// thus pass a null pointer. The underlying C function is safe to be used with NULL
// pointers.
vfs-6.13.rust.pid_namespace -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZzcWKgAKCRCRxhvAZXjc osPAAP9bLzOPIF51IgP9mQTBlKKrpCWCMQVss5xRDseyNEfCEQD/fR9TSSnX9Suw iad9oBkxkzCjyxWIH46rvbdnc38lRwo= =aawA -----END PGP SIGNATURE----- Merge tag 'vfs-6.13.rust.pid_namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull pid_namespace rust bindings from Christian Brauner: "This contains my Rust bindings for pid namespaces needed for various rust drivers. Here's a description of the basic C semantics and how they are mapped to Rust. The pid namespace of a task doesn't ever change once the task is alive. A unshare(CLONE_NEWPID) or setns(fd_pidns/pidfd, CLONE_NEWPID) will not have an effect on the calling task's pid namespace. It will only effect the pid namespace of children created by the calling task. This invariant guarantees that after having acquired a reference to a task's pid namespace it will remain unchanged. When a task has exited and been reaped release_task() will be called. This will set the pid namespace of the task to NULL. So retrieving the pid namespace of a task that is dead will return NULL. Note, that neither holding the RCU lock nor holding a reference count to the task will prevent release_task() from being called. In order to retrieve the pid namespace of a task the task_active_pid_ns() function can be used. There are two cases to consider: (1) retrieving the pid namespace of the current task (2) retrieving the pid namespace of a non-current task From system call context retrieving the pid namespace for case (1) is always safe and requires neither RCU locking nor a reference count to be held. Retrieving the pid namespace after release_task() for current will return NULL but no codepath like that is exposed to Rust. Retrieving the pid namespace from system call context for (2) requires RCU protection. Accessing a pid namespace outside of RCU protection requires a reference count that must've been acquired while holding the RCU lock. Note that accessing a non-current task means NULL can be returned as the non-current task could have already passed through release_task(). To retrieve (1) the current_pid_ns!() macro should be used. It ensures that the returned pid namespace cannot outlive the calling scope. The associated current_pid_ns() function should not be called directly as it could be abused to created an unbounded lifetime for the pid namespace. The current_pid_ns!() macro allows Rust to handle the common case of accessing current's pid namespace without RCU protection and without having to acquire a reference count. For (2) the task_get_pid_ns() method must be used. This will always acquire a reference on the pid namespace and will return an Option to force the caller to explicitly handle the case where pid namespace is None. Something that tends to be forgotten when doing the equivalent operation in C. Missing RCU primitives make it difficult to perform operations that are otherwise safe without holding a reference count as long as RCU protection is guaranteed. But it is not important currently. But we do want it in the future. Note that for (2) the required RCU protection around calling task_active_pid_ns() synchronizes against putting the last reference of the associated struct pid of task->thread_pid. The struct pid stored in that field is used to retrieve the pid namespace of the caller. When release_task() is called task->thread_pid will be NULLed and put_pid() on said struct pid will be delayed in free_pid() via call_rcu() allowing everyone with an RCU protected access to the struct pid acquired from task->thread_pid to finish" * tag 'vfs-6.13.rust.pid_namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: rust: add PidNamespace
2024-11-26 21:18:00 +00:00
unsafe { bindings::task_tgid_nr_ns(self.as_ptr(), pidns) }
rust: file: add `Kuid` wrapper Adds a wrapper around `kuid_t` called `Kuid`. This allows us to define various operations on kuids such as equality and current_euid. It also lets us provide conversions from kuid into userspace values. Rust Binder needs these operations because it needs to compare kuids for equality, and it needs to tell userspace about the pid and uid of incoming transactions. To read kuids from a `struct task_struct`, you must currently use various #defines that perform the appropriate field access under an RCU read lock. Currently, we do not have a Rust wrapper for rcu_read_lock, which means that for this patch, there are two ways forward: 1. Inline the methods into Rust code, and use __rcu_read_lock directly rather than the rcu_read_lock wrapper. This gives up lockdep for these usages of RCU. 2. Wrap the various #defines in helpers and call the helpers from Rust. This patch uses the second option. One possible disadvantage of the second option is the possible introduction of speculation gadgets, but as discussed in [1], the risk appears to be acceptable. Of course, once a wrapper for rcu_read_lock is available, it is preferable to use that over either of the two above approaches. Link: https://lore.kernel.org/all/202312080947.674CD2DC7@keescook/ [1] Reviewed-by: Benno Lossin <benno.lossin@proton.me> Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com> Reviewed-by: Trevor Gross <tmgross@umich.edu> Signed-off-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20240915-alice-file-v10-7-88484f7a3dcf@google.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-15 14:31:33 +00:00
}
/// Wakes up the task.
pub fn wake_up(&self) {
// SAFETY: It's always safe to call `wake_up_process` on a valid task, even if the task
// running.
unsafe { bindings::wake_up_process(self.as_ptr()) };
}
}
task: rust: rework how current is accessed Introduce a new type called `CurrentTask` that lets you perform various operations that are only safe on the `current` task. Use the new type to provide a way to access the current mm without incrementing its refcount. With this change, you can write stuff such as let vma = current!().mm().lock_vma_under_rcu(addr); without incrementing any refcounts. This replaces the existing abstractions for accessing the current pid namespace. With the old approach, every field access to current involves both a macro and a unsafe helper function. The new approach simplifies that to a single safe function on the `CurrentTask` type. This makes it less heavy-weight to add additional current accessors in the future. That said, creating a `CurrentTask` type like the one in this patch requires that we are careful to ensure that it cannot escape the current task or otherwise access things after they are freed. To do this, I declared that it cannot escape the current "task context" where I defined a "task context" as essentially the region in which `current` remains unchanged. So e.g., release_task() or begin_new_exec() would leave the task context. If a userspace thread returns to userspace and later makes another syscall, then I consider the two syscalls to be different task contexts. This allows values stored in that task to be modified between syscalls, even if they're guaranteed to be immutable during a syscall. Ensuring correctness of `CurrentTask` is slightly tricky if we also want the ability to have a safe `kthread_use_mm()` implementation in Rust. To support that safely, there are two patterns we need to ensure are safe: // Case 1: current!() called inside the scope. let mm; kthread_use_mm(some_mm, || { mm = current!().mm(); }); drop(some_mm); mm.do_something(); // UAF and: // Case 2: current!() called before the scope. let mm; let task = current!(); kthread_use_mm(some_mm, || { mm = task.mm(); }); drop(some_mm); mm.do_something(); // UAF The existing `current!()` abstraction already natively prevents the first case: The `&CurrentTask` would be tied to the inner scope, so the borrow-checker ensures that no reference derived from it can escape the scope. Fixing the second case is a bit more tricky. The solution is to essentially pretend that the contents of the scope execute on an different thread, which means that only thread-safe types can cross the boundary. Since `CurrentTask` is marked `NotThreadSafe`, attempts to move it to another thread will fail, and this includes our fake pretend thread boundary. This has the disadvantage that other types that aren't thread-safe for reasons unrelated to `current` also cannot be moved across the `kthread_use_mm()` boundary. I consider this an acceptable tradeoff. Link: https://lkml.kernel.org/r/20250408-vma-v16-8-d8b446e885d9@google.com Signed-off-by: Alice Ryhl <aliceryhl@google.com> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org> Reviewed-by: Gary Guo <gary@garyguo.net> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Balbir Singh <balbirs@nvidia.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: Björn Roy Baron <bjorn3_gh@protonmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jann Horn <jannh@google.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Trevor Gross <tmgross@umich.edu> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-08 09:22:45 +00:00
impl CurrentTask {
/// Access the address space of the current task.
///
/// This function does not touch the refcount of the mm.
#[inline]
pub fn mm(&self) -> Option<&MmWithUser> {
// SAFETY: The `mm` field of `current` is not modified from other threads, so reading it is
// not a data race.
let mm = unsafe { (*self.as_ptr()).mm };
if mm.is_null() {
return None;
}
// SAFETY: If `current->mm` is non-null, then it references a valid mm with a non-zero
// value of `mm_users`. Furthermore, the returned `&MmWithUser` borrows from this
// `CurrentTask`, so it cannot escape the scope in which the current pointer was obtained.
//
// This is safe even if `kthread_use_mm()`/`kthread_unuse_mm()` are used. There are two
// relevant cases:
// * If the `&CurrentTask` was created before `kthread_use_mm()`, then it cannot be
// accessed during the `kthread_use_mm()`/`kthread_unuse_mm()` scope due to the
// `NotThreadSafe` field of `CurrentTask`.
// * If the `&CurrentTask` was created within a `kthread_use_mm()`/`kthread_unuse_mm()`
// scope, then the `&CurrentTask` cannot escape that scope, so the returned `&MmWithUser`
// also cannot escape that scope.
// In either case, it's not possible to read `current->mm` and keep using it after the
// scope is ended with `kthread_unuse_mm()`.
Some(unsafe { MmWithUser::from_raw(mm) })
}
/// Access the pid namespace of the current task.
///
/// This function does not touch the refcount of the namespace or use RCU protection.
///
/// To access the pid namespace of another task, see [`Task::get_pid_ns`].
#[doc(alias = "task_active_pid_ns")]
#[inline]
pub fn active_pid_ns(&self) -> Option<&PidNamespace> {
// SAFETY: It is safe to call `task_active_pid_ns` without RCU protection when calling it
// on the current task.
let active_ns = unsafe { bindings::task_active_pid_ns(self.as_ptr()) };
if active_ns.is_null() {
return None;
}
// The lifetime of `PidNamespace` is bound to `Task` and `struct pid`.
//
// The `PidNamespace` of a `Task` doesn't ever change once the `Task` is alive.
//
// From system call context retrieving the `PidNamespace` for the current task is always
// safe and requires neither RCU locking nor a reference count to be held. Retrieving the
// `PidNamespace` after `release_task()` for current will return `NULL` but no codepath
// like that is exposed to Rust.
//
// SAFETY: If `current`'s pid ns is non-null, then it references a valid pid ns.
// Furthermore, the returned `&PidNamespace` borrows from this `CurrentTask`, so it cannot
// escape the scope in which the current pointer was obtained, e.g. it cannot live past a
// `release_task()` call.
Some(unsafe { PidNamespace::from_ptr(active_ns) })
}
}
// SAFETY: The type invariants guarantee that `Task` is always refcounted.
unsafe impl crate::types::AlwaysRefCounted for Task {
fn inc_ref(&self) {
// SAFETY: The existence of a shared reference means that the refcount is nonzero.
unsafe { bindings::get_task_struct(self.as_ptr()) };
}
unsafe fn dec_ref(obj: ptr::NonNull<Self>) {
// SAFETY: The safety requirements guarantee that the refcount is nonzero.
unsafe { bindings::put_task_struct(obj.cast().as_ptr()) }
}
}
rust: file: add `Kuid` wrapper Adds a wrapper around `kuid_t` called `Kuid`. This allows us to define various operations on kuids such as equality and current_euid. It also lets us provide conversions from kuid into userspace values. Rust Binder needs these operations because it needs to compare kuids for equality, and it needs to tell userspace about the pid and uid of incoming transactions. To read kuids from a `struct task_struct`, you must currently use various #defines that perform the appropriate field access under an RCU read lock. Currently, we do not have a Rust wrapper for rcu_read_lock, which means that for this patch, there are two ways forward: 1. Inline the methods into Rust code, and use __rcu_read_lock directly rather than the rcu_read_lock wrapper. This gives up lockdep for these usages of RCU. 2. Wrap the various #defines in helpers and call the helpers from Rust. This patch uses the second option. One possible disadvantage of the second option is the possible introduction of speculation gadgets, but as discussed in [1], the risk appears to be acceptable. Of course, once a wrapper for rcu_read_lock is available, it is preferable to use that over either of the two above approaches. Link: https://lore.kernel.org/all/202312080947.674CD2DC7@keescook/ [1] Reviewed-by: Benno Lossin <benno.lossin@proton.me> Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com> Reviewed-by: Trevor Gross <tmgross@umich.edu> Signed-off-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20240915-alice-file-v10-7-88484f7a3dcf@google.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-15 14:31:33 +00:00
impl Kuid {
/// Get the current euid.
#[inline]
pub fn current_euid() -> Kuid {
// SAFETY: Just an FFI call.
Self::from_raw(unsafe { bindings::current_euid() })
}
/// Create a `Kuid` given the raw C type.
#[inline]
pub fn from_raw(kuid: bindings::kuid_t) -> Self {
Self { kuid }
}
/// Turn this kuid into the raw C type.
#[inline]
pub fn into_raw(self) -> bindings::kuid_t {
self.kuid
}
/// Converts this kernel UID into a userspace UID.
///
/// Uses the namespace of the current task.
#[inline]
pub fn into_uid_in_current_ns(self) -> bindings::uid_t {
// SAFETY: Just an FFI call.
unsafe { bindings::from_kuid(bindings::current_user_ns(), self.kuid) }
}
}
impl PartialEq for Kuid {
#[inline]
fn eq(&self, other: &Kuid) -> bool {
// SAFETY: Just an FFI call.
unsafe { bindings::uid_eq(self.kuid, other.kuid) }
}
}
impl Eq for Kuid {}