Centos-kernel-stream-9

Commit Graph

Author	SHA1	Message	Date
Radostin Stoyanov	d46a537401	signal: restore the override_rlimit logic JIRA: https://issues.redhat.com/browse/RHEL-68020 CVE: CVE-2024-50271 commit 9e05e5c7ee8758141d2db7e8fea2cab34500c6ed Author: Roman Gushchin <roman.gushchin@linux.dev> Date: Mon Nov 4 19:54:19 2024 +0000 signal: restore the override_rlimit logic Prior to commit `d646969055` ("Reimplement RLIMIT_SIGPENDING on top of ucounts") UCOUNT_RLIMIT_SIGPENDING rlimit was not enforced for a class of signals. However now it's enforced unconditionally, even if override_rlimit is set. This behavior change caused production issues. For example, if the limit is reached and a process receives a SIGSEGV signal, sigqueue_alloc fails to allocate the necessary resources for the signal delivery, preventing the signal from being delivered with siginfo. This prevents the process from correctly identifying the fault address and handling the error. From the user-space perspective, applications are unaware that the limit has been reached and that the siginfo is effectively 'corrupted'. This can lead to unpredictable behavior and crashes, as we observed with java applications. Fix this by passing override_rlimit into inc_rlimit_get_ucounts() and skip the comparison to max there if override_rlimit is set. This effectively restores the old behavior. Link: https://lkml.kernel.org/r/20241104195419.3962584-1-roman.gushchin@linux.dev Fixes: `d646969055` ("Reimplement RLIMIT_SIGPENDING on top of ucounts") Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Co-developed-by: Andrei Vagin <avagin@google.com> Signed-off-by: Andrei Vagin <avagin@google.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Alexey Gladkov <legion@kernel.org> Cc: Kees Cook <kees@kernel.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Radostin Stoyanov <radostin@redhat.com>	2024-12-20 15:33:02 +00:00
Rafael Aquini	e5cf0b4377	mm: suppress mm fault logging if fatal signal already pending JIRA: https://issues.redhat.com/browse/RHEL-27742 This patch is a backport of the following upstream commit: commit 5f0bc0b042fc77ff70e14c790abdec960cde4ec1 Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Tue Jul 25 09:38:32 2023 -0700 mm: suppress mm fault logging if fatal signal already pending Commit eda0047296a1 ("mm: make the page fault mmap locking killable") intentionally made it much easier to trigger the "page fault fails because a fatal signal is pending" situation, by having the mmap locking fail early in that case. We have long aborted page faults in other fatal cases when the actual IO for a page is interrupted by SIGKILL - which is particularly useful for the traditional case of NFS hanging due to network issues, but local filesystems could cause it too if you happened to get the SIGKILL while waiting for a page to be faulted in (eg lock_folio_maybe_drop_mmap()). So aborting the page fault wasn't a new condition - but it now triggers earlier, before we even get to 'handle_mm_fault()'. And as a result the error doesn't go through our 'fault_signal_pending()' logic, and doesn't get filtered away there. Normally you'd never even notice, because if a fatal signal is pending, the new SIGSEGV we send ends up being ignored anyway. But it turns out that there is one very noticeable exception: if you enable 'show_unhandled_signals', the aborted page fault will be logged in the kernel messages, and you'll get a scary line looking something like this in your logs: pverados[2183248]: segfault at 55e5a00f9ae0 ip 000055e5a00f9ae0 sp 00007ffc0720bea8 error 14 in perl[55e5a00d4000+195000] likely on CPU 10 (core 4, socket 0) which is rather misleading. It's not really a segfault at all, it's just "the thread was killed before the page fault completed, so we aborted the page fault". Fix this by just making it clear that a pending fatal signal means that any new signal coming in after that is implicitly handled. This will avoid the misleading logging, since now the signal isn't 'unhandled' any more. Reported-and-tested-by: Fiona Ebner <f.ebner@proxmox.com> Tested-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Link: https://lore.kernel.org/lkml/8d063a26-43f5-0bb7-3203-c6a04dc159f8@proxmox.com/ Acked-by: Oleg Nesterov <oleg@redhat.com> Fixes: eda0047296a1 ("mm: make the page fault mmap locking killable") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-09-05 20:37:05 -04:00
Waiman Long	6d0328a7cf	Revert "Revert "Merge: cgroup: Backport upstream cgroup commits up to v6.8"" JIRA: https://issues.redhat.com/browse/RHEL-36683 Upstream Status: RHEL only This reverts commit `08637d76a2` which is a revert of "Merge: cgroup: Backport upstream cgroup commits up to v6.8" Signed-off-by: Waiman Long <longman@redhat.com>	2024-05-18 21:38:20 -04:00
Lucas Zampieri	08637d76a2	Revert "Merge: cgroup: Backport upstream cgroup commits up to v6.8" This reverts merge request !4128	2024-05-16 15:26:41 +00:00
Waiman Long	724656e7cf	freezer,sched: Rewrite core freezer logic JIRA: https://issues.redhat.com/browse/RHEL-34600 Conflicts: 1) A merge conflict in the kernel/signal.c hunk due to the presence of RHEL-only commit `975d318867` ("signal: Don't disable preemption in ptrace_stop() on PREEMPT_RT."). 2) A merge conflict in the kernel/time/hrtimer.c hunk due to the presence of RHEL-only commit `5f76194136` ("time/hrtimer: Embed hrtimer mode into hrtimer_sleeper"). 3) The fs/cifs/inode.c hunk was applied to fs/smb/client/inode.c due to the presence of upstream commit 38c8a9a52082 ("smb: move client and server files to common directory fs/smb"). 4) Similarly, the fs/cifs/transport.c hunk was applied to fs/smb/client/transport.c manually due to the presence of a later upstream commit d527f51331ca ("cifs: Fix UAF in cifs_demultiplex_thread()"). Note that all the prerequiste patches in the same patch series (https://lore.kernel.org/lkml/20220822111816.760285417@infradead.org/) had already been merged into RHEL9. commit f5d39b020809146cc28e6e73369bf8065e0310aa Author: Peter Zijlstra <peterz@infradead.org> Date: Mon, 22 Aug 2022 13:18:22 +0200 freezer,sched: Rewrite core freezer logic Rewrite the core freezer to behave better wrt thawing and be simpler in general. By replacing PF_FROZEN with TASK_FROZEN, a special block state, it is ensured frozen tasks stay frozen until thawed and don't randomly wake up early, as is currently possible. As such, it does away with PF_FROZEN and PF_FREEZER_SKIP, freeing up two PF_flags (yay!). Specifically; the current scheme works a little like: freezer_do_not_count(); schedule(); freezer_count(); And either the task is blocked, or it lands in try_to_freezer() through freezer_count(). Now, when it is blocked, the freezer considers it frozen and continues. However, on thawing, once pm_freezing is cleared, freezer_count() stops working, and any random/spurious wakeup will let a task run before its time. That is, thawing tries to thaw things in explicit order; kernel threads and workqueues before doing bringing SMP back before userspace etc.. However due to the above mentioned races it is entirely possible for userspace tasks to thaw (by accident) before SMP is back. This can be a fatal problem in asymmetric ISA architectures (eg ARMv9) where the userspace task requires a special CPU to run. As said; replace this with a special task state TASK_FROZEN and add the following state transitions: TASK_FREEZABLE -> TASK_FROZEN __TASK_STOPPED -> TASK_FROZEN __TASK_TRACED -> TASK_FROZEN The new TASK_FREEZABLE can be set on any state part of TASK_NORMAL (IOW. TASK_INTERRUPTIBLE and TASK_UNINTERRUPTIBLE) -- any such state is already required to deal with spurious wakeups and the freezer causes one such when thawing the task (since the original state is lost). The special __TASK_{STOPPED,TRACED} states can be restored since their canonical state is in ->jobctl. With this, frozen tasks need an explicit TASK_FROZEN wakeup and are free of undue (early / spurious) wakeups. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20220822114649.055452969@infradead.org Signed-off-by: Waiman Long <longman@redhat.com>	2024-04-26 22:49:06 -04:00
Eder Zulian	269be86edd	signal: Add proper comment about the preempt-disable in ptrace_stop(). JIRA: https://issues.redhat.com/browse/RHEL-3988 Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git commit bf1069f8c099d4c10e2f884dc72a83a4653fc6e4 Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Date: Thu Aug 3 12:09:31 2023 +0200 signal: Add proper comment about the preempt-disable in ptrace_stop(). Commit `53da1d9456` ("fix ptrace slowness") added a preempt-disable section between read_unlock() and the following schedule() invocation without explaining why it is needed. Replace the comment with an explanation why this is needed. Clarify that it is needed for correctness but for performance reasons. Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/20230803100932.325870-2-bigeasy@linutronix.de Signed-off-by: Eder Zulian <ezulian@redhat.com>	2023-11-06 12:29:40 +01:00
Oleg Nesterov	213383d9db	undo Revert "signal: Don't disable preemption in ptrace_stop() on PREEMPT_RT." Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2174325 Upstream Status: RHEL-only Reintroduce the temporarily reverted rhel-only commit `975d318867` ("signal: Don't disable preemption in ptrace_stop() on PREEMPT_RT."). Signed-off-by: Oleg Nesterov <oleg@redhat.com>	2023-07-06 15:55:32 +02:00
Oleg Nesterov	87bd1b747c	signal handling: don't use BUG_ON() for debugging Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2174325 commit a382f8fee42ca10c9bfce0d2352d4153f931f5dc Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Wed Jul 6 12:20:59 2022 -0700 signal handling: don't use BUG_ON() for debugging These are indeed "should not happen" situations, but it turns out recent changes made the 'task_is_stopped_or_trace()' case trigger (fix for that exists, is pending more testing), and the BUG_ON() makes it unnecessarily hard to actually debug for no good reason. It's been that way for a long time, but let's make it clear: BUG_ON() is not good for debugging, and should never be used in situations where you could just say "this shouldn't happen, but we can continue". Use WARN_ON_ONCE() instead to make sure it gets logged, and then just continue running. Instead of making the system basically unusuable because you crashed the machine while potentially holding some very core locks (eg this function is commonly called while holding 'tasklist_lock' for writing). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com>	2023-07-06 15:55:32 +02:00
Oleg Nesterov	c888fdf0b1	sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2174325 Omitted-fix: 3418357a32db ("ptrace: fix clearing of JOBCTL_TRACED in ptrace_unfreeze_traced()") That fix duplicates de2a34771f51 included in this series commit 31cae1eaae4fd65095ad6a3659db467bc3c2599e Author: Peter Zijlstra <peterz@infradead.org> Date: Tue May 3 15:57:47 2022 -0500 sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state Currently ptrace_stop() / do_signal_stop() rely on the special states TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this state exists only in task->__state and nowhere else. There's two spots of bother with this: - PREEMPT_RT has task->saved_state which complicates matters, meaning task_is_{traced,stopped}() needs to check an additional variable. - An alternative freezer implementation that itself relies on a special TASK state would loose TASK_TRACED/TASK_STOPPED and will result in misbehaviour. As such, add additional state to task->jobctl to track this state outside of task->__state. NOTE: this doesn't actually fix anything yet, just adds extra state. --EWB * didn't add a unnecessary newline in signal.h * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up instead of in signal_wake_up_state. This prevents the clearing of TASK_STOPPED and TASK_TRACED from getting lost. * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220421150654.757693825@infradead.org Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20220505182645.497868-12-ebiederm@xmission.com Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com>	2023-07-06 15:55:31 +02:00
Oleg Nesterov	b85b393abb	ptrace: Don't change __state Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2174325 commit 2500ad1c7fa42ad734677853961a3a8bec0772c5 Author: Eric W. Biederman <ebiederm@xmission.com> Date: Fri Apr 29 08:43:34 2022 -0500 ptrace: Don't change __state Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace command is executing. Instead remove TASK_WAKEKILL from the definition of TASK_TRACED, and implement a new jobctl flag TASK_PTRACE_FROZEN. This new flag is set in jobctl_freeze_task and cleared when ptrace_stop is awoken or in jobctl_unfreeze_task (when ptrace_stop remains asleep). In signal_wake_up add __TASK_TRACED to state along with TASK_WAKEKILL when the wake up is for a fatal signal. Skip adding __TASK_TRACED when TASK_PTRACE_FROZEN is not set. This has the same effect as changing TASK_TRACED to __TASK_TRACED as all of the wake_ups that use TASK_KILLABLE go through signal_wake_up. Handle a ptrace_stop being called with a pending fatal signal. Previously it would have been handled by schedule simply failing to sleep. As TASK_WAKEKILL is no longer part of TASK_TRACED schedule will sleep with a fatal_signal_pending. The code in signal_wake_up guarantees that the code will be awaked by any fatal signal that codes after TASK_TRACED is set. Previously the __state value of __TASK_TRACED was changed to TASK_RUNNING when woken up or back to TASK_TRACED when the code was left in ptrace_stop. Now when woken up ptrace_stop now clears JOBCTL_PTRACE_FROZEN and when left sleeping ptrace_unfreezed_traced clears JOBCTL_PTRACE_FROZEN. Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20220505182645.497868-10-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com>	2023-07-06 15:55:31 +02:00
Oleg Nesterov	67192f1fc6	ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2174325 commit 57b6de08b5f6586851c2261ef0cc16cd275615e7 Author: Eric W. Biederman <ebiederm@xmission.com> Date: Wed May 4 13:39:58 2022 -0500 ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs Long ago and far away there was a BUG_ON at the start of ptrace_stop that did "BUG_ON(!(current->ptrace & PT_PTRACED));" [1]. The BUG_ON had never triggered but examination of the code showed that the BUG_ON could actually trigger. To complement removing the BUG_ON an attempt to better handle the race was added. The code detected the tracer had gone away and did not call do_notify_parent_cldstop. The code also attempted to prevent ptrace_report_syscall from sending spurious SIGTRAPs when the tracer went away. The code to detect when the tracer had gone away before sending a signal to tracer was a legitimate fix and continues to work to this date. The code to prevent sending spurious SIGTRAPs is a failure. At the time and until today the code only catches it when the tracer goes away after siglock is dropped and before read_lock is acquired. If the tracer goes away after read_lock is dropped a spurious SIGTRAP can still be sent to the tracee. The tracer going away after read_lock is dropped is the far likelier case as it is the bigger window. Given that the attempt to prevent the generation of a SIGTRAP was a failure and continues to be a failure remove the code that attempts to do that. This simplifies the code in ptrace_stop and makes ptrace_stop much easier to reason about. To successfully deal with the tracer going away, all of the tracer's instrumentation of the child would need to be removed, and reliably detecting when the tracer has set a signal to continue with would need to be implemented. [1] 66519f549ae5 ("[PATCH] fix ptracer death race yielding bogus BUG_ON") History-Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20220505182645.497868-9-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com>	2023-07-06 15:55:31 +02:00
Oleg Nesterov	ac3c6b060a	signal: Use lockdep_assert_held instead of assert_spin_locked Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2174325 commit cb3c19c93d656caa6fe63d6277aabd7e570f1d03 Author: Eric W. Biederman <ebiederm@xmission.com> Date: Fri Apr 29 09:16:10 2022 -0500 signal: Use lockdep_assert_held instead of assert_spin_locked The distinction is that assert_spin_locked() checks if the lock is held byanyone* whereas lockdep_assert_held() asserts the current context holds the lock. Also, the check goes away if you build without lockdep. Suggested-by: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/Ympr/+PX4XgT/UKU@hirez.programming.kicks-ass.net Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20220505182645.497868-6-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com>	2023-07-06 15:55:30 +02:00
Oleg Nesterov	6806642199	signal: Replace __group_send_sig_info with send_signal_locked Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2174325 commit e71ba124078e391879e0bf111529fa2d630d106c Author: Eric W. Biederman <ebiederm@xmission.com> Date: Fri Apr 22 09:28:50 2022 -0500 signal: Replace __group_send_sig_info with send_signal_locked The function __group_send_sig_info is just a light wrapper around send_signal_locked with one parameter fixed to a constant value. As the wrapper adds no real value update the code to directly call the wrapped function. Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20220505182645.497868-2-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com>	2023-07-06 15:55:30 +02:00
Oleg Nesterov	a4b8434f3e	signal: Rename send_signal send_signal_locked Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2174325 commit 157cc18122b4a1456d19048e151a164216c4a704 Author: Eric W. Biederman <ebiederm@xmission.com> Date: Fri Apr 22 09:48:54 2022 -0500 signal: Rename send_signal send_signal_locked Rename send_signal and __send_signal to send_signal_locked and __send_signal_locked to make send_signal usable outside of signal.c. Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20220505182645.497868-1-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com>	2023-07-06 15:55:30 +02:00
Oleg Nesterov	634b3e94ba	ptrace: Return the signal to continue with from ptrace_stop Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2174325 commit 6487d1dab837214ec2fd3f0ddd5f787e63be7c20 Author: Eric W. Biederman <ebiederm@xmission.com> Date: Thu Jan 27 12:19:13 2022 -0600 ptrace: Return the signal to continue with from ptrace_stop The signal a task should continue with after a ptrace stop is inconsistently read, cleared, and sent. Solve this by reading and clearing the signal to be sent in ptrace_stop. In an ideal world everything except ptrace_signal would share a common implementation of continuing with the signal, so ptracers could count on the signal they ask to continue with actually being delivered. For now retain bug compatibility and just return with the signal number the ptracer requested the code continue with. Link: https://lkml.kernel.org/r/875yoe7qdp.fsf_-_@email.froward.int.ebiederm.org Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com>	2023-06-23 14:09:58 +02:00
Oleg Nesterov	4916e577b7	ptrace: Move setting/clearing ptrace_message into ptrace_stop Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2174325 commit 336d4b814bf078fa698488632c19beca47308896 Author: Eric W. Biederman <ebiederm@xmission.com> Date: Thu Jan 27 12:15:32 2022 -0600 ptrace: Move setting/clearing ptrace_message into ptrace_stop Today ptrace_message is easy to overlook as it not a core part of ptrace_stop. It has been overlooked so much that there are places that set ptrace_message and don't clear it, and places that never set it. So if you get an unlucky sequence of events the ptracer may be able to read a ptrace_message that does not apply to the current ptrace stop. Move setting of ptrace_message into ptrace_stop so that it always gets set before the stop, and always gets cleared after the stop. This prevents non-sense from being reported to userspace and makes ptrace_message more visible in the ptrace helper functions so that kernel developers can see it. Link: https://lkml.kernel.org/r/87bky67qfv.fsf_-_@email.froward.int.ebiederm.org Acked-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com>	2023-06-23 14:09:42 +02:00
Oleg Nesterov	4203eee018	Revert "signal: Don't disable preemption in ptrace_stop() on PREEMPT_RT." Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2174325 Upstream Status: RHEL-only This reverts commit `975d318867`. Because it doesn't match upstream and thus conflicts with other necessary changes. This fix will be re-introduced at the end of this series. Signed-off-by: Oleg Nesterov <oleg@redhat.com>	2023-06-23 14:07:10 +02:00
Juri Lelli	975d318867	signal: Don't disable preemption in ptrace_stop() on PREEMPT_RT. Bugzilla: https://bugzilla.redhat.com/2171995 Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git Conflicts: Whitespace mismatch and missing series at merge commit 67850b7bdcd2 ("Merge tag 'ptrace_stop-cleanup-for-v5.19'"), which seems a nice-have, but not essential. commit 277af213394c063b8e2b8a712e11d41866d456d9 Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Date: Wed Jun 22 11:36:17 2022 +0200 signal: Don't disable preemption in ptrace_stop() on PREEMPT_RT. Commit `53da1d9456` ("fix ptrace slowness") is just band aid around the problem. The invocation of do_notify_parent_cldstop() wakes the parent and makes it runnable. The scheduler then wants to replace this still running task with the parent. With the read_lock() acquired this is not possible because preemption is disabled and so this is deferred until read_unlock(). This scheduling point is undesired and is avoided by disabling preemption around the unlock operation enabled again before the schedule() invocation without a preemption point. This is only undesired because the parent sleeps a cycle in wait_task_inactive() until the traced task leaves the run-queue in schedule(). It is not a correctness issue, it is just band aid to avoid the visbile delay which sums up over multiple invocations. The task can still be preempted if an interrupt occurs between preempt_enable_no_resched() and freezable_schedule() because on the IRQ-exit path of the interrupt scheduling _will_ happen. This is ignored since it does not happen very often. On PREEMPT_RT keeping preemption disabled during the invocation of cgroup_enter_frozen() becomes a problem because the function acquires css_set_lock which is a sleeping lock on PREEMPT_RT and must not be acquired with disabled preemption. Don't disable preemption on PREEMPT_RT. Remove the TODO regarding adding read_unlock_no_resched() as there is no need for it and will cause harm. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lkml.kernel.org/r/20220720154435.232749-2-bigeasy@linutronix.de Signed-off-by: Juri Lelli <juri.lelli@redhat.com>	2023-02-27 13:46:08 +01:00
Frantisek Hrbata	e235b3c09a	Merge: perf: Sync with upstream v5.19 MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1361 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2123231 Signed-off-by: Michael Petlan <mpetlan@redhat.com> Approved-by: Prarit Bhargava <prarit@redhat.com> Approved-by: Jerome Marchand <jmarchan@redhat.com> Approved-by: Artem Savkov <asavkov@redhat.com> Approved-by: Yauheni Kaliuta <ykaliuta@redhat.com> Approved-by: David Arcari <darcari@redhat.com> Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>	2022-11-07 01:58:21 -05:00
Chris von Recklinghausen	6fb7c30612	task_work: Call tracehook_notify_signal from get_signal on all architectures Bugzilla: https://bugzilla.redhat.com/2120352 commit 8ba62d37949e248c698c26e0d82d72fda5d33ebf Author: Eric W. Biederman <ebiederm@xmission.com> Date: Wed Feb 9 09:51:14 2022 -0600 task_work: Call tracehook_notify_signal from get_signal on all architectures Always handle TIF_NOTIFY_SIGNAL in get_signal. With commit `35d0b389f3` ("task_work: unconditionally run task_work from get_signal()") always calling task_work_run all of the work of tracehook_notify_signal is already happening except clearing TIF_NOTIFY_SIGNAL. Factor clear_notify_signal out of tracehook_notify_signal and use it in get_signal so that get_signal only needs one call of task_work_run. To keep the semantics in sync update xfer_to_guest_mode_work (which does not call get_signal) to call tracehook_notify_signal if either _TIF_SIGPENDING or _TIF_NOTIFY_SIGNAL. Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20220309162454.123006-8-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:27:47 -04:00
Chris von Recklinghausen	00a98ce2d4	task_work: Introduce task_work_pending Bugzilla: https://bugzilla.redhat.com/2120352 commit 7f62d40d9cb50fd146fe8ff071f98fa3c1855083 Author: Eric W. Biederman <ebiederm@xmission.com> Date: Wed Feb 9 08:52:41 2022 -0600 task_work: Introduce task_work_pending Wrap the test of task->task_works in a helper function to make it clear what is being tested. All of the other readers of task->task_work use READ_ONCE and this is even necessary on current as other processes can update task->task_work. So for consistency I have added READ_ONCE into task_work_pending. Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20220309162454.123006-7-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:27:47 -04:00
Chris von Recklinghausen	f64a4f551f	ptrace: Remove tracehook_signal_handler Bugzilla: https://bugzilla.redhat.com/2120352 commit c145137dc990fd67b52fbc52faae5ba46f168cca Author: Eric W. Biederman <ebiederm@xmission.com> Date: Thu Jan 27 12:04:27 2022 -0600 ptrace: Remove tracehook_signal_handler The two line function tracehook_signal_handler is only called from signal_delivered. Expand it inline in signal_delivered and remove it. Just to make it easier to understand what is going on. Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20220309162454.123006-5-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:27:47 -04:00
Chris von Recklinghausen	e149d74e12	signal: HANDLER_EXIT should clear SIGNAL_UNKILLABLE Bugzilla: https://bugzilla.redhat.com/2120352 commit 5c72263ef2fbe99596848f03758ae2dc593adf2c Author: Kees Cook <keescook@chromium.org> Date: Tue Feb 8 00:57:17 2022 -0800 signal: HANDLER_EXIT should clear SIGNAL_UNKILLABLE Fatal SIGSYS signals (i.e. seccomp RET_KILL_* syscall filter actions) were not being delivered to ptraced pid namespace init processes. Make sure the SIGNAL_UNKILLABLE doesn't get set for these cases. Reported-by: Robert Święcki <robert@swiecki.net> Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com> Fixes: 00b06da29cf9 ("signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed") Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Link: https://lore.kernel.org/lkml/878rui8u4a.fsf@email.froward.int.ebiederm.org Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:27:44 -04:00
Chris von Recklinghausen	a9c55ab07f	signal: clean up kernel-doc comments Bugzilla: https://bugzilla.redhat.com/2120352 commit 6410349ea5e177f3e53c2006d2041eed47e986ae Author: Randy Dunlap <rdunlap@infradead.org> Date: Tue Dec 21 19:10:27 2021 -0800 signal: clean up kernel-doc comments Fix kernel-doc warnings in kernel/signal.c: kernel/signal.c:1830: warning: Function parameter or member 'force_coredump' not described in 'force_sig_seccomp' kernel/signal.c:2873: warning: missing initial short description on line: * signal_delivered - Also add a closing parenthesis to the comments in signal_delivered(). Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Richard Weinberger <richard@nod.at> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Marco Elver <elver@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20211222031027.29694-1-rdunlap@infradead.org Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:27:36 -04:00
Chris von Recklinghausen	e545ae66be	signal: Remove the helper signal_group_exit Bugzilla: https://bugzilla.redhat.com/2120352 commit 49697335e0b441b0553598c1b48ee9ebb053d2f1 Author: Eric W. Biederman <ebiederm@xmission.com> Date: Thu Jun 24 02:14:30 2021 -0500 signal: Remove the helper signal_group_exit This helper is misleading. It tests for an ongoing exec as well as the process having received a fatal signal. Sometimes it is appropriate to treat an on-going exec differently than a process that is shutting down due to a fatal signal. In particular taking the fast path out of exit_signals instead of retargeting signals is not appropriate during exec, and not changing the the exit code in do_group_exit during exec. Removing the helper makes it more obvious what is going on as both cases must be coded for explicitly. While removing the helper fix the two cases where I have observed using signal_group_exit resulted in the wrong result. In exit_signals only test for SIGNAL_GROUP_EXIT so that signals are retargetted during an exec. In do_group_exit use 0 as the exit code during an exec as de_thread does not set group_exit_code. As best as I can determine group_exit_code has been is set to 0 most of the time during de_thread. During a thread group stop group_exit_code is set to the stop signal and when the thread group receives SIGCONT group_exit_code is reset to 0. Link: https://lkml.kernel.org/r/20211213225350.27481-8-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:27:36 -04:00
Chris von Recklinghausen	282c129641	signal: Remove SIGNAL_GROUP_COREDUMP Bugzilla: https://bugzilla.redhat.com/2120352 commit 2f824d4d197e02275562359a2ae5274177ce500c Author: Eric W. Biederman <ebiederm@xmission.com> Date: Sat Jan 8 09:48:31 2022 -0600 signal: Remove SIGNAL_GROUP_COREDUMP After the previous cleanups "signal->core_state" is set whenever SIGNAL_GROUP_COREDUMP is set and "signal->core_state" is tested whenver the code wants to know if a coredump is in progress. The remaining tests of SIGNAL_GROUP_COREDUMP also test to see if SIGNAL_GROUP_EXIT is set. Similarly the only place that sets SIGNAL_GROUP_COREDUMP also sets SIGNAL_GROUP_EXIT. Which makes SIGNAL_GROUP_COREDUMP unecessary and redundant. So stop setting SIGNAL_GROUP_COREDUMP, stop testing SIGNAL_GROUP_COREDUMP, and remove it's definition. With the setting of SIGNAL_GROUP_COREDUMP gone, coredump_finish no longer needs to clear SIGNAL_GROUP_COREDUMP out of signal->flags by setting SIGNAL_GROUP_EXIT. Link: https://lkml.kernel.org/r/20211213225350.27481-5-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:27:36 -04:00
Chris von Recklinghausen	d8085e9b32	signal: Make coredump handling explicit in complete_signal Bugzilla: https://bugzilla.redhat.com/2120352 commit 7ba03471ac4ad2432e5ccf67d9d4ab03c177578a Author: Eric W. Biederman <ebiederm@xmission.com> Date: Sat Jan 8 11:01:12 2022 -0600 signal: Make coredump handling explicit in complete_signal Ever since commit `6cd8f0acae` ("coredump: ensure that SIGKILL always kills the dumping thread") it has been possible for a SIGKILL received during a coredump to set SIGNAL_GROUP_EXIT and trigger a process shutdown (for a second time). Update the logic to explicitly allow coredumps so that coredumps can set SIGNAL_GROUP_EXIT and shutdown like an ordinary process. Link: https://lkml.kernel.org/r/87zgo6ytyf.fsf_-_@email.froward.int.ebiederm.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:27:36 -04:00
Chris von Recklinghausen	7e62823861	signal: Have prepare_signal detect coredumps using signal->core_state Bugzilla: https://bugzilla.redhat.com/2120352 commit a0287db0f1d6918919203ba31fd7cda59bf889e8 Author: Eric W. Biederman <ebiederm@xmission.com> Date: Sat Jan 8 09:34:50 2022 -0600 signal: Have prepare_signal detect coredumps using signal->core_state In preparation for removing the flag SIGNAL_GROUP_COREDUMP, change prepare_signal to test signal->core_state instead of the flag SIGNAL_GROUP_COREDUMP. Both fields are protected by siglock and both live in signal_struct so there are no real tradeoffs here, just a change to which field is being tested. Link: https://lkml.kernel.org/r/20211213225350.27481-1-ebiederm@xmission.com Link: https://lkml.kernel.org/r/875yqu14co.fsf_-_@email.froward.int.ebiederm.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:27:36 -04:00
Chris von Recklinghausen	56644a77a7	signal: Replace force_fatal_sig with force_exit_sig when in doubt Conflicts: drop changes to arch/m68k/kernel/traps.c, arch/sparc/kernel/signal_32.c, arch/sparc/kernel/windows.c - unsupported arches Bugzilla: https://bugzilla.redhat.com/2120352 commit fcb116bc43c8c37c052530ead79872f8b2615711 Author: Eric W. Biederman <ebiederm@xmission.com> Date: Thu Nov 18 14:23:21 2021 -0600 signal: Replace force_fatal_sig with force_exit_sig when in doubt Recently to prevent issues with SECCOMP_RET_KILL and similar signals being changed before they are delivered SA_IMMUTABLE was added. Unfortunately this broke debuggers[1][2] which reasonably expect to be able to trap synchronous SIGTRAP and SIGSEGV even when the target process is not configured to handle those signals. Add force_exit_sig and use it instead of force_fatal_sig where historically the code has directly called do_exit. This has the implementation benefits of going through the signal exit path (including generating core dumps) without the danger of allowing userspace to ignore or change these signals. This avoids userspace regressions as older kernels exited with do_exit which debuggers also can not intercept. In the future is should be possible to improve the quality of implementation of the kernel by changing some of these force_exit_sig calls to force_fatal_sig. That can be done where it matters on a case-by-case basis with careful analysis. Reported-by: Kyle Huey <me@kylehuey.com> Reported-by: kernel test robot <oliver.sang@intel.com> [1] https://lkml.kernel.org/r/CAP045AoMY4xf8aC_4QU_-j7obuEPYgTcnQQP3Yxk=2X90 jtpjw@mail.gmail.com [2] https://lkml.kernel.org/r/20211117150258.GB5403@xsang-OptiPlex-9020 Fixes: 00b06da29cf9 ("signal: Add SA_IMMUTABLE to ensure forced siganls do n ot get changed") Fixes: a3616a3c0272 ("signal/m68k: Use force_sigsegv(SIGSEGV) in fpsp040_die ") Fixes: 83a1f27ad773 ("signal/powerpc: On swapcontext failure force SIGSEGV") Fixes: 9bc508cf0791 ("signal/s390: Use force_sigsegv in default_trap_handler ") Fixes: 086ec444f866 ("signal/sparc32: In setup_rt_frame and setup_fram use f orce_fatal_sig") Fixes: c317d306d550 ("signal/sparc32: Exit with a fatal signal when try_to_c lear_window_buffer fails") Fixes: 695dd0d634df ("signal/x86: In emulate_vsyscall force a signal instead of calling do_exit") Fixes: 1fbd60df8a85 ("signal/vm86_32: Properly send SIGSEGV when the vm86 st ate cannot be saved.") Fixes: 941edc5bf174 ("exit/syscall_user_dispatch: Send ordinary signals on f ailure") Link: https://lkml.kernel.org/r/871r3dqfv8.fsf_-_@email.froward.int.ebiederm .org Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Tested-by: Kyle Huey <khuey@kylehuey.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:27:31 -04:00
Chris von Recklinghausen	73393e9677	signal: Don't always set SA_IMMUTABLE for forced signals Bugzilla: https://bugzilla.redhat.com/2120352 commit e349d945fac76bddc78ae1cb92a0145b427a87ce Author: Eric W. Biederman <ebiederm@xmission.com> Date: Thu Nov 18 11:11:13 2021 -0600 signal: Don't always set SA_IMMUTABLE for forced signals Recently to prevent issues with SECCOMP_RET_KILL and similar signals being changed before they are delivered SA_IMMUTABLE was added. Unfortunately this broke debuggers[1][2] which reasonably expect to be able to trap synchronous SIGTRAP and SIGSEGV even when the target process is not configured to handle those signals. Update force_sig_to_task to support both the case when we can allow the debugger to intercept and possibly ignore the signal and the case when it is not safe to let userspace know about the signal until the process has exited. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Kyle Huey <me@kylehuey.com> Reported-by: kernel test robot <oliver.sang@intel.com> Cc: stable@vger.kernel.org [1] https://lkml.kernel.org/r/CAP045AoMY4xf8aC_4QU_-j7obuEPYgTcnQQP3Yxk=2X90jtpjw@mail.gmail.com [2] https://lkml.kernel.org/r/20211117150258.GB5403@xsang-OptiPlex-9020 Fixes: 00b06da29cf9 ("signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed") Link: https://lkml.kernel.org/r/877dd5qfw5.fsf_-_@email.froward.int.ebiederm.org Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Tested-by: Kyle Huey <khuey@kylehuey.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:27:31 -04:00
Chris von Recklinghausen	beff6d154c	signal: Requeue ptrace signals Bugzilla: https://bugzilla.redhat.com/2120352 commit b171f667f3787946a8ba9644305339e93ae799c9 Author: Eric W. Biederman <ebiederm@xmission.com> Date: Mon Nov 15 13:49:45 2021 -0600 signal: Requeue ptrace signals Kyle Huey <me@kylehuey.com> writes: > rr, a userspace record and replay debugger[0], uses the recorded register > state at PTRACE_EVENT_EXIT to find the point in time at which to cease > executing the program during replay. > > If a SIGKILL races with processing another signal in get_signal, it is > possible for the kernel to decline to notify the tracer of the original > signal. But if the original signal had a handler, the kernel proceeds > with setting up a signal handler frame as if the tracer had chosen to > deliver the signal unmodified to the tracee. When the kernel goes to > execute the signal handler that it has now modified the stack and registers > for, it will discover the pending SIGKILL, and terminate the tracee > without executing the handler. When PTRACE_EVENT_EXIT is delivered to > the tracer, however, the effects of handler setup will be visible to > the tracer. > > Because rr (the tracer) was never notified of the signal, it is not aware > that a signal handler frame was set up and expects the state of the program > at PTRACE_EVENT_EXIT to be a state that will be reconstructed naturally > by allowing the program to execute from the last event. When that fails > to happen during replay, rr will assert and die. > > The following patches add an explicit check for a newly pending SIGKILL > after the ptracer has been notified and the siglock has been reacquired. > If this happens, we stop processing the current signal and proceed > immediately to handling the SIGKILL. This makes the state reported at > PTRACE_EVENT_EXIT the unmodified state of the program, and also avoids the > work to set up a signal handler frame that will never be used. > > [0] https://rr-project.org/ The problem is that while the traced process makes it into ptrace_stop, the tracee is killed before the tracer manages to wait for the tracee and discover which signal was about to be delivered. More generally the problem is that while siglock was dropped a signal with process wide effect is short cirucit delivered to the entire process killing it, but the process continues to try and deliver another signal. In general it impossible to avoid all cases where work is performed after the process has been killed. In particular if the process is killed after get_signal returns the code will simply not know it has been killed until after delivering the signal frame to userspace. On the other hand when the code has already discovered the process has been killed and taken user space visible action that shows the kernel knows the process has been killed, it is just silly to then write the signal frame to the user space stack. Instead of being silly detect the process has been killed in ptrace_signal and requeue the signal so the code can pretend it was simply never dequeued for delivery. To test the process has been killed I use fatal_signal_pending rather than signal_group_exit to match the test in signal_pending_state which is used in schedule which is where ptrace_stop detects the process has been killed. Requeuing the signal so the code can pretend it was simply never dequeued improves the user space visible behavior that has been present since ebf5ebe31d2c ("[PATCH] signal-fixes-2.5.59-A4"). Kyle Huey verified that this change in behavior and makes rr happy. Reported-by: Kyle Huey <khuey@kylehuey.com> Reported-by: Marko Mäkelä <marko.makela@mariadb.com> History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.gi Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/87tugcd5p2.fsf_-_@email.froward.int.ebiederm.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:27:31 -04:00
Chris von Recklinghausen	30ab0124bb	signal: Requeue signals in the appropriate queue Bugzilla: https://bugzilla.redhat.com/2120352 commit 5768d8906bc23d512b1a736c1e198aa833a6daa4 Author: Eric W. Biederman <ebiederm@xmission.com> Date: Mon Nov 15 13:47:13 2021 -0600 signal: Requeue signals in the appropriate queue In the event that a tracer changes which signal needs to be delivered and that signal is currently blocked then the signal needs to be requeued for later delivery. With the advent of CLONE_THREAD the kernel has 2 signal queues per task. The per process queue and the per task queue. Update the code so that if the signal is removed from the per process queue it is requeued on the per process queue. This is necessary to make it appear the signal was never dequeued. The rr debugger reasonably believes that the state of the process from the last ptrace_stop it observed until PTRACE_EVENT_EXIT can be recreated by simply letting a process run. If a SIGKILL interrupts a ptrace_stop this is not true today. So return signals to their original queue in ptrace_signal so that signals that are not delivered appear like they were never dequeued. Fixes: 794aa320b79d ("[PATCH] sigfix-2.5.40-D6") History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.gi Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/87zgq4d5r4.fsf_-_@email.froward.int.ebiederm.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:27:31 -04:00
Chris von Recklinghausen	b47643344f	signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed Bugzilla: https://bugzilla.redhat.com/2120352 commit 00b06da29cf9dc633cdba87acd3f57f4df3fd5c7 Author: Eric W. Biederman <ebiederm@xmission.com> Date: Fri Oct 29 09:14:19 2021 -0500 signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed As Andy pointed out that there are races between force_sig_info_to_task and sigaction[1] when force_sig_info_task. As Kees discovered[2] ptrace is also able to change these signals. In the case of seeccomp killing a process with a signal it is a security violation to allow the signal to be caught or manipulated. Solve this problem by introducing a new flag SA_IMMUTABLE that prevents sigaction and ptrace from modifying these forced signals. This flag is carefully made kernel internal so that no new ABI is introduced. Longer term I think this can be solved by guaranteeing short circuit delivery of signals in this case. Unfortunately reliable and guaranteed short circuit delivery of these signals is still a ways off from being implemented, tested, and merged. So I have implemented a much simpler alternative for now. [1] https://lkml.kernel.org/r/b5d52d25-7bde-4030-a7b1-7c6f8ab90660@www.fastmail.com [2] https://lkml.kernel.org/r/202110281136.5CE65399A7@keescook Cc: stable@vger.kernel.org Fixes: 307d522f5eb8 ("signal/seccomp: Refactor seccomp signal and coredump generation") Tested-by: Andrea Righi <andrea.righi@canonical.com> Tested-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:27:26 -04:00
Chris von Recklinghausen	03300292e0	signal: Implement force_fatal_sig Bugzilla: https://bugzilla.redhat.com/2120352 commit 26d5badbccddcc063dc5174a2baffd13a23322aa Author: Eric W. Biederman <ebiederm@xmission.com> Date: Wed Oct 20 12:43:59 2021 -0500 signal: Implement force_fatal_sig Add a simple helper force_fatal_sig that causes a signal to be delivered to a process as if the signal handler was set to SIG_DFL. Reimplement force_sigsegv based upon this new helper. This fixes force_sigsegv so that when it forces the default signal handler to be used the code now forces the signal to be unblocked as well. Reusing the tested logic in force_sig_info_to_task that was built for force_sig_seccomp this makes the implementation trivial. This is interesting both because it makes force_sigsegv simpler and because there are a couple of buggy places in the kernel that call do_exit(SIGILL) or do_exit(SIGSYS) because there is no straight forward way today for those places to simply force the exit of a process with the chosen signal. Creating force_fatal_sig allows those places to be implemented with normal signal exits. Link: https://lkml.kernel.org/r/20211020174406.17889-13-ebiederm@xmission.com Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:27:26 -04:00
Chris von Recklinghausen	f7fb43f6b1	coredump: Don't perform any cleanups before dumping core Bugzilla: https://bugzilla.redhat.com/2120352 commit 92307383082daff5df884a25df9e283efb7ef261 Author: Eric W. Biederman <ebiederm@xmission.com> Date: Wed Sep 1 11:33:50 2021 -0500 coredump: Don't perform any cleanups before dumping core Rename coredump_exit_mm to coredump_task_exit and call it from do_exit before PTRACE_EVENT_EXIT, and before any cleanup work for a task happens. This ensures that an accurate copy of the process can be captured in the coredump as no cleanup for the process happens before the coredump completes. This also ensures that PTRACE_EVENT_EXIT will not be visited by any thread until the coredump is complete. Add a new flag PF_POSTCOREDUMP so that tasks that have passed through coredump_task_exit can be recognized and ignored in zap_process. Now that all of the coredumping happens before exit_mm remove code to test for a coredump in progress from mm_release. Replace "may_ptrace_stop()" with a simple test of "current->ptrace". The other tests in may_ptrace_stop all concern avoiding stopping during a coredump. These tests are no longer necessary as it is now guaranteed that fatal_signal_pending will be set if the code enters ptrace_stop during a coredump. The code in ptrace_stop is guaranteed not to stop if fatal_signal_pending returns true. Until this change "ptrace_event(PTRACE_EVENT_EXIT)" could call ptrace_stop without fatal_signal_pending being true, as signals are dequeued in get_signal before calling do_exit. This is no longer an issue as "ptrace_event(PTRACE_EVENT_EXIT)" is no longer reached until after the coredump completes. Link: https://lkml.kernel.org/r/874kaax26c.fsf@disp2133 Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:27:25 -04:00
Chris von Recklinghausen	52178dcae5	ptrace: Remove the unnecessary arguments from arch_ptrace_stop Bugzilla: https://bugzilla.redhat.com/2120352 commit 4f627af8e6068892cafe031df6c14e8a0aaaa426 Author: Eric W. Biederman <ebiederm@xmission.com> Date: Thu Sep 2 16:10:11 2021 -0500 ptrace: Remove the unnecessary arguments from arch_ptrace_stop Both arch_ptrace_stop_needed and arch_ptrace_stop are called with an exit_code and a siginfo structure. Neither argument is used by any of the implementations so just remove the unneeded arguments. The two arechitectures that implement arch_ptrace_stop are ia64 and sparc. Both architectures flush their register stacks before a ptrace_stack so that all of the register information can be accessed by debuggers. As the question of if a register stack needs to be flushed is independent of why ptrace is stopping not needing arguments make sense. Cc: David Miller <davem@davemloft.net> Cc: sparclinux@vger.kernel.org Link: https://lkml.kernel.org/r/87lf3mx290.fsf@disp2133 Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:27:24 -04:00
Chris von Recklinghausen	c17e10396a	signal: Remove the bogus sigkill_pending in ptrace_stop Bugzilla: https://bugzilla.redhat.com/2120352 commit 7d613f9f72ec8f90ddefcae038fdae5adb8404b3 Author: Eric W. Biederman <ebiederm@xmission.com> Date: Wed Sep 1 13:21:34 2021 -0500 signal: Remove the bogus sigkill_pending in ptrace_stop The existence of sigkill_pending is a little silly as it is functionally a duplicate of fatal_signal_pending that is used in exactly one place. Checking for pending fatal signals and returning early in ptrace_stop is actively harmful. It casues the ptrace_stop called by ptrace_signal to return early before setting current->exit_code. Later when ptrace_signal reads the signal number from current->exit_code is undefined, making it unpredictable what will happen. Instead rely on the fact that schedule will not sleep if there is a pending signal that can awaken a task. Removing the explict sigkill_pending test fixes fixes ptrace_signal when ptrace_stop does not stop because current->exit_code is always set to to signr. Cc: stable@vger.kernel.org Fixes: `3d749b9e67` ("ptrace: simplify ptrace_stop()->sigkill_pending() path") Fixes: `1a669c2f16` ("Add arch_ptrace_stop") Link: https://lkml.kernel.org/r/87pmsyx29t.fsf@disp2133 Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:27:24 -04:00
Chris von Recklinghausen	ba903565a5	signal/seccomp: Refactor seccomp signal and coredump generation Bugzilla: https://bugzilla.redhat.com/2120352 commit 307d522f5eb86cd6ac8c905f5b0577dedac54ec5 Author: Eric W. Biederman <ebiederm@xmission.com> Date: Wed Jun 23 16:44:32 2021 -0500 signal/seccomp: Refactor seccomp signal and coredump generation Factor out force_sig_seccomp from the seccomp signal generation and place it in kernel/signal.c. The function force_sig_seccomp takes a parameter force_coredump to indicate that the sigaction field should be reset to SIGDFL so that a coredump will be generated when the signal is delivered. force_sig_seccomp is then used to replace both seccomp_send_sigsys and seccomp_init_siginfo. force_sig_info_to_task gains an extra parameter to force using the default signal action. With this change seccomp is no longer a special case and there becomes exactly one place do_coredump is called from. Further it no longer becomes necessary for __seccomp_filter to call do_group_exit. Acked-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/87r1gr6qc4.fsf_-_@disp2133 Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:27:23 -04:00
Michael Petlan	2e62db505d	signal: Deliver SIGTRAP on perf event asynchronously if blocked Bugzilla: https://bugzilla.redhat.com/2123231 upstream ======== commit 78ed93d72ded679e3caf0758357209887bda885f Author: Marco Elver <elver@google.com> Date: Mon Apr 4 13:12:04 2022 +0200 description =========== With SIGTRAP on perf events, we have encountered termination of processes due to user space attempting to block delivery of SIGTRAP. Consider this case: <set up SIGTRAP on a perf event> ... sigset_t s; sigemptyset(&s); sigaddset(&s, SIGTRAP \| <and others>); sigprocmask(SIG_BLOCK, &s, ...); ... <perf event triggers> When the perf event triggers, while SIGTRAP is blocked, force_sig_perf() will force the signal, but revert back to the default handler, thus terminating the task. This makes sense for error conditions, but not so much for explicitly requested monitoring. However, the expectation is still that signals generated by perf events are synchronous, which will no longer be the case if the signal is blocked and delivered later. To give user space the ability to clearly distinguish synchronous from asynchronous signals, introduce siginfo_t::si_perf_flags and TRAP_PERF_FLAG_ASYNC (opted for flags in case more binary information is required in future). The resolution to the problem is then to (a) no longer force the signal (avoiding the terminations), but (b) tell user space via si_perf_flags if the signal was synchronous or not, so that such signals can be handled differently (e.g. let user space decide to ignore or consider the data imprecise). The alternative of making the kernel ignore SIGTRAP on perf events if the signal is blocked may work for some usecases, but likely causes issues in others that then have to revert back to interception of sigprocmask() (which we want to avoid). [ A concrete example: when using breakpoint perf events to track data-flow, in a region of code where signals are blocked, data-flow can no longer be tracked accurately. When a relevant asynchronous signal is received after unblocking the signal, the data-flow tracking logic needs to know its state is imprecise. ] Fixes: `97ba62b278` ("perf: Add support for SIGTRAP on perf events") Conflicts: ========== Ignoring hunks in arm32, m68k and sparc files, since we don't support these architectures. Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-09-21 07:22:42 +02:00
Phil Auld	fb68d400e6	signal: In get_signal test for signal_group_exit every time through the loop Bugzilla: https://bugzilla.redhat.com/2120671 commit e7f7c99ba911f56bc338845c1cd72954ba591707 Author: Eric W. Biederman <ebiederm@xmission.com> Date: Mon Nov 15 11:55:57 2021 -0600 signal: In get_signal test for signal_group_exit every time through the loop Recently while investigating a problem with rr and signals I noticed that siglock is dropped in ptrace_signal and get_signal does not jump to relock. Looking farther to see if the problem is anywhere else I see that do_signal_stop also returns if signal_group_exit is true. I believe that test can now never be true, but it is a bit hard to trace through and be certain. Testing signal_group_exit is not expensive, so move the test for signal_group_exit into the for loop inside of get_signal to ensure the test is never skipped improperly. This has been a potential problem since I added the test for signal_group_exit was added. Fixes: `35634ffa17` ("signal: Always notice exiting tasks") Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/875yssekcd.fsf_-_@email.froward.int.ebiederm.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Phil Auld <pauld@redhat.com>	2022-09-01 09:16:55 -04:00
Herton R. Krzesinski	99b4ffe3da	Merge: Enable AMX(TMUL) for Sapphire Rapids MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/174 This is the first draft of the commits which are required to support AMX (aka TMUL) on SPR. I suspect there are some missing commits. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2004190 Tested: Intel to perform initial testing. v2: added missing upstream commits 21e96a2035db and 52d0b8b18776 Signed-off-by: David Arcari <darcari@redhat.com> RH-Acked-by: Rafael Aquini <aquini@redhat.com> RH-Acked-by: Steve Best <sbest@redhat.com> RH-Acked-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Herton R. Krzesinski <herton@redhat.com>	2021-12-22 19:34:58 -03:00
Herton R. Krzesinski	d635b9c68b	Merge: mm: update generic MM code to upstream v5.15 MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/201 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2023396 Brew URL: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=41434412 Testing: KT1-lite + regressions and performance (scheduler, network, and fs) benchmarks, as documented on the BZ. In order to provide support for several future feature requests (virtio-mem, filesystems, core-kernel and memory management) targeted for RHEL-9, the patchset is bringing the core-MM codebase up to upstream v5.15. This patchset is composed of upstream cherry picks that represent the difference between current RHEL-9 v5.14 code base and upstream v5.15 plus their relevant follow-up fixes. Omitted-fix: 15eb7c888e749 ("locking/rwsem: Add missing __init_rwsem() for PREEMPT_RT") already backported into RHEL9 via commit `de3eb21475` Omitted-fix: 6341eb6f39bb7 ("drm/i915/selftests: exercise shmem_writeback with THP") dependencies for this selftest follow up (and the follow-up itself) shall be dealt with via DRM update work done by the graphics team. Omitted-fix: f24b062607678 ("mm/damon: grammar s/works/work/") Omitted-fix: db7a347b26fe0 ("mm/damon/dbgfs: use '__GFP_NOWARN' for user-specified size buffer allocation") Omitted-fix: d78f3853f831e ("mm/damon/dbgfs: fix missed use of damon_dbgfs_lock") albeit DAMON initial integration is part of v5.15, we're explicitly introducing it disabled in this backport. DAMON follow-ups, and future enablement will be dealt with via a separated (already filed) BZ ticket. Omitted-fix: e66435936756d ("mm: fix mismerge of folio page flag manipulators") folio pages are a feature integrated into v5.16, and this merge-fix commit is non-relevant to this particular patchset. Signed-off-by: Rafael Aquini <aquini@redhat.com> RH-Acked-by: John W. Linville <linville@redhat.com> RH-Acked-by: Waiman Long <longman@redhat.com> RH-Acked-by: Prarit Bhargava <prarit@redhat.com> RH-Acked-by: David Arcari <darcari@redhat.com> RH-Acked-by: Aristeu Rozanski <arozansk@redhat.com> RH-Acked-by: Phil Auld <pauld@redhat.com> RH-Acked-by: David Hildenbrand <david@redhat.com> RH-Acked-by: Chris von Recklinghausen <crecklin@redhat.com> Signed-off-by: Herton R. Krzesinski <herton@redhat.com>	2021-12-15 11:00:36 -03:00
Herton R. Krzesinski	f4fa2705fb	Merge: hrtimer updates for RT prerequisites MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/145 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2022896 Upstream Status: Linux Tested: Sanity tested with timer and scheduler tests hrtimer updates for RT prerequisites This is a series for hrtimer and related code that enable the RT patchset to merge more cleanly. Signed-off-by: Phil Auld <pauld@redhat.com> RH-Acked-by: Prarit Bhargava <prarit@redhat.com> RH-Acked-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Herton R. Krzesinski <herton@redhat.com>	2021-12-09 15:01:47 -03:00
David Arcari	dd347f557f	signal: Add an optional check for altstack size Bugzilla: http://bugzilla.redhat.com/2004190 commit 1bdda24c4af64cd2d65dec5192ab624c5fee7ca0 Author: Thomas Gleixner <tglx@linutronix.de> Date: Thu Oct 21 15:55:05 2021 -0700 signal: Add an optional check for altstack size New x86 FPU features will be very large, requiring ~10k of stack in signal handlers. These new features require a new approach called "dynamic features". The kernel currently tries to ensure that altstacks are reasonably sized. Right now, on x86, sys_sigaltstack() requires a size of >=2k. However, that 2k is a constant. Simply raising that 2k requirement to >10k for the new features would break existing apps which have a compiled-in size of 2k. Instead of universally enforcing a larger stack, prohibit a process from using dynamic features without properly-sized altstacks. This must be enforced in two places: * A dynamic feature can not be enabled without an large-enough altstack for each process thread. * Once a dynamic feature is enabled, any request to install a too-small altstack will be rejected The dynamic feature enabling code must examine each thread in a process to ensure that the altstacks are large enough. Add a new lock (sigaltstack_lock()) to ensure that threads can not race and change their altstack after being examined. Add the infrastructure in form of a config option and provide empty stubs for architectures which do not need dynamic altstack size checks. This implementation will be fleshed out for x86 in a future patch called x86/arch_prctl: Add controls for dynamic XSTATE components [dhansen: commit message. ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211021225527.10184-2-chang.seok.bae@intel.com Signed-off-by: David Arcari <darcari@redhat.com>	2021-11-29 12:25:08 -05:00
Rafael Aquini	fb61f7e0c9	memcg: enable accounting for signals Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2023396 This patch is a backport of the following upstream commit: commit 5f58c39819ff78ca5ddbba2b3cd8ff4779b19bb5 Author: Vasily Averin <vvs@virtuozzo.com> Date: Thu Sep 2 14:55:35 2021 -0700 memcg: enable accounting for signals When a user send a signal to any another processes it forces the kernel to allocate memory for 'struct sigqueue' objects. The number of signals is limited by RLIMIT_SIGPENDING resource limit, but even the default settings allow each user to consume up to several megabytes of memory. It makes sense to account for these allocations to restrict the host's memory consumption from inside the memcg-limited container. Link: https://lkml.kernel.org/r/e34e958c-e785-712e-a62a-2c7b66c646c7@virtuozzo.com Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrei Vagin <avagin@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jiri Slaby <jirislaby@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Roman Gushchin <guro@fb.com> Cc: Serge Hallyn <serge@hallyn.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Yutian Yang <nglaive@gmail.com> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Rafael Aquini <aquini@redhat.com>	2021-11-29 11:41:21 -05:00
Phil Auld	f1fe1713a3	posix-cpu-timers: Assert task sighand is locked while starting cputime counter Bugzilla: http://bugzilla.redhat.com/2022896 commit a5dec9f82ab2ae486119f0b0820ea16db3e522c3 Author: Frederic Weisbecker <frederic@kernel.org> Date: Mon Jul 26 14:55:08 2021 +0200 posix-cpu-timers: Assert task sighand is locked while starting cputime counter Starting the process wide cputime counter needs to be done in the same sighand locking sequence than actually arming the related timer otherwise this races against concurrent timers setting/expiring in the same threadgroup. Detecting that the cputime counter is started without holding the sighand lock is a first step toward debugging such situations. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210726125513.271824-2-frederic@kernel.org Signed-off-by: Phil Auld <pauld@redhat.com>	2021-11-15 10:29:56 -05:00
Alexey Gladkov	1cbc87c091	ucounts: Fix signal ucount refcounting Bugzilla: https://bugzilla.redhat.com/2018142 Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git commit 15bc01effefe97757ef02ca09e9d1b927ab22725 Author: Eric W. Biederman <ebiederm@xmission.com> Date: Sat Oct 16 15:59:49 2021 -0500 ucounts: Fix signal ucount refcounting In commit `fda31c5029` ("signal: avoid double atomic counter increments for user accounting") Linus made a clever optimization to how rlimits and the struct user_struct. Unfortunately that optimization does not work in the obvious way when moved to nested rlimits. The problem is that the last decrement of the per user namespace per user sigpending counter might also be the last decrement of the sigpending counter in the parent user namespace as well. Which means that simply freeing the leaf ucount in __free_sigqueue is not enough. Maintain the optimization and handle the tricky cases by introducing inc_rlimit_get_ucounts and dec_rlimit_put_ucounts. By moving the entire optimization into functions that perform all of the work it becomes possible to ensure that every level is handled properly. The new function inc_rlimit_get_ucounts returns 0 on failure to increment the ucount. This is different than inc_rlimit_ucounts which increments the ucounts and returns LONG_MAX if the ucount counter has exceeded it's maximum or it wrapped (to indicate the counter needs to decremented). I wish we had a single user to account all pending signals to across all of the threads of a process so this complexity was not necessary Cc: stable@vger.kernel.org Fixes: `d646969055` ("Reimplement RLIMIT_SIGPENDING on top of ucounts") v1: https://lkml.kernel.org/r/87mtnavszx.fsf_-_@disp2133 Link: https://lkml.kernel.org/r/87fssytizw.fsf_-_@disp2133 Reviewed-by: Alexey Gladkov <legion@kernel.org> Tested-by: Rune Kleveland <rune.kleveland@infomedia.dk> Tested-by: Yu Zhao <yuzhao@google.com> Tested-by: Jordan Glover <Golden_Miller83@protonmail.ch> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Alexey Gladkov <agladkov@redhat.com>	2021-11-05 13:50:32 +01:00
Alexey Gladkov	f3791f4df5	Fix UCOUNT_RLIMIT_SIGPENDING counter leak We must properly handle an errors when we increase the rlimit counter and the ucounts reference counter. We have to this with RCU protection to prevent possible use-after-free that could occur due to concurrent put_cred_rcu(). The following reproducer triggers the problem: $ cat testcase.sh case "${STEP:-0}" in 0) ulimit -Si 1 ulimit -Hi 1 STEP=1 unshare -rU "$0" killall sleep ;; 1) for i in 1 2 3 4 5; do unshare -rU sleep 5 & done ;; esac with the KASAN report being along the lines of BUG: KASAN: use-after-free in put_ucounts+0x17/0xa0 Write of size 4 at addr ffff8880045f031c by task swapper/2/0 CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.13.0+ #19 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-alt4 04/01/2014 Call Trace: <IRQ> put_ucounts+0x17/0xa0 put_cred_rcu+0xd5/0x190 rcu_core+0x3bf/0xcb0 __do_softirq+0xe3/0x341 irq_exit_rcu+0xbe/0xe0 sysvec_apic_timer_interrupt+0x6a/0x90 </IRQ> asm_sysvec_apic_timer_interrupt+0x12/0x20 default_idle_call+0x53/0x130 do_idle+0x311/0x3c0 cpu_startup_entry+0x14/0x20 secondary_startup_64_no_verify+0xc2/0xcb Allocated by task 127: kasan_save_stack+0x1b/0x40 __kasan_kmalloc+0x7c/0x90 alloc_ucounts+0x169/0x2b0 set_cred_ucounts+0xbb/0x170 ksys_unshare+0x24c/0x4e0 __x64_sys_unshare+0x16/0x20 do_syscall_64+0x37/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xae Freed by task 0: kasan_save_stack+0x1b/0x40 kasan_set_track+0x1c/0x30 kasan_set_free_info+0x20/0x30 __kasan_slab_free+0xeb/0x120 kfree+0xaa/0x460 put_cred_rcu+0xd5/0x190 rcu_core+0x3bf/0xcb0 __do_softirq+0xe3/0x341 The buggy address belongs to the object at ffff8880045f0300 which belongs to the cache kmalloc-192 of size 192 The buggy address is located 28 bytes inside of 192-byte region [ffff8880045f0300, ffff8880045f03c0) The buggy address belongs to the page: page:000000008de0a388 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff8880045f0000 pfn:0x45f0 flags: 0x100000000000200(slab\|node=0\|zone=1) raw: 0100000000000200 ffffea00000f4640 0000000a0000000a ffff888001042a00 raw: ffff8880045f0000 000000008010000d 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8880045f0200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880045f0280: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc >ffff8880045f0300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8880045f0380: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff8880045f0400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Disabling lock debugging due to kernel taint Fixes: `d646969055` ("Reimplement RLIMIT_SIGPENDING on top of ucounts") Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Alexey Gladkov <legion@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-07-08 11:43:24 -07:00
Linus Torvalds	71bd934101	Merge branch 'akpm' (patches from Andrew) Merge more updates from Andrew Morton: "190 patches. Subsystems affected by this patch series: mm (hugetlb, userfaultfd, vmscan, kconfig, proc, z3fold, zbud, ras, mempolicy, memblock, migration, thp, nommu, kconfig, madvise, memory-hotplug, zswap, zsmalloc, zram, cleanups, kfence, and hmm), procfs, sysctl, misc, core-kernel, lib, lz4, checkpatch, init, kprobes, nilfs2, hfs, signals, exec, kcov, selftests, compress/decompress, and ipc" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (190 commits) ipc/util.c: use binary search for max_idx ipc/sem.c: use READ_ONCE()/WRITE_ONCE() for use_global_lock ipc: use kmalloc for msg_queue and shmid_kernel ipc sem: use kvmalloc for sem_undo allocation lib/decompressors: remove set but not used variabled 'level' selftests/vm/pkeys: exercise x86 XSAVE init state selftests/vm/pkeys: refill shadow register after implicit kernel write selftests/vm/pkeys: handle negative sys_pkey_alloc() return code selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really random kcov: add __no_sanitize_coverage to fix noinstr for all architectures exec: remove checks in __register_bimfmt() x86: signal: don't do sas_ss_reset() until we are certain that sigframe won't be abandoned hfsplus: report create_date to kstat.btime hfsplus: remove unnecessary oom message nilfs2: remove redundant continue statement in a while-loop kprobes: remove duplicated strong free_insn_page in x86 and s390 init: print out unknown kernel parameters checkpatch: do not complain about positive return values starting with EPOLL checkpatch: improve the indented label test checkpatch: scripts/spdxcheck.py now requires python3 ...	2021-07-02 12:08:10 -07:00
Al Viro	97c885d585	x86: signal: don't do sas_ss_reset() until we are certain that sigframe won't be abandoned Currently we handle SS_AUTODISARM as soon as we have stored the altstack settings into sigframe - that's the point when we have set the things up for eventual sigreturn to restore the old settings. And if we manage to set the sigframe up (we are not done with that yet), everything's fine. However, in case of failure we end up with sigframe-to-be abandoned and SIGSEGV force-delivered. And in that case we end up with inconsistent rules - late failures have altstack reset, early ones do not. It's trivial to get consistent behaviour - just handle SS_AUTODISARM once we have set the sigframe up and are committed to entering the handler, i.e. in signal_delivered(). Link: https://lore.kernel.org/lkml/20200404170604.GN23230@ZenIV.linux.org.uk/ Link: https://github.com/ClangBuiltLinux/linux/issues/876 Link: https://lkml.kernel.org/r/20210422230846.1756380-1-ndesaulniers@google.com Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Tested-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-07-01 11:06:06 -07:00

1 2 3 4 5 ...

749 Commits