JIRA: https://issues.redhat.com/browse/RHEL-20288
commit 68d124b0999919015e6d23008eafea106ec6bb40
Author: Paul E. McKenney <paulmck@kernel.org>
Date: 2024-05-08 20:11:58 -0700
rcu: Add rcutree.nohz_full_patience_delay to reduce nohz_full OS jitter
If a CPU is running either a userspace application or a guest OS in
nohz_full mode, it is possible for a system call to occur just as an
RCU grace period is starting. If that CPU also has the scheduling-clock
tick enabled for any reason (such as a second runnable task), and if the
system was booted with rcutree.use_softirq=0, then RCU can add insult to
injury by awakening that CPU's rcuc kthread, resulting in yet another
task and yet more OS jitter due to switching to that task, running it,
and switching back.
In addition, in the common case where that system call is not of
excessively long duration, awakening the rcuc task is pointless.
This pointlessness is due to the fact that the CPU will enter an extended
quiescent state upon returning to the userspace application or guest OS.
In this case, the rcuc kthread cannot do anything that the main RCU
grace-period kthread cannot do on its behalf, at least if it is given
a few additional milliseconds (for example, given the time duration
specified by rcutree.jiffies_till_first_fqs, give or take scheduling
delays).
This commit therefore adds a rcutree.nohz_full_patience_delay kernel
boot parameter that specifies the grace period age (in milliseconds,
rounded to jiffies) before which RCU will refrain from awakening the
rcuc kthread. Preliminary experimentation suggests a value of 1000,
that is, one second. Increasing rcutree.nohz_full_patience_delay will
increase grace-period latency and in turn increase memory footprint,
so systems with constrained memory might choose a smaller value.
Systems with less-aggressive OS-jitter requirements might choose the
default value of zero, which keeps the traditional immediate-wakeup
behavior, thus avoiding increases in grace-period latency.
[ paulmck: Apply Leonardo Bras feedback. ]
Link: https://lore.kernel.org/all/20240328171949.743211-1-leobras@redhat.com/
Reported-by: Leonardo Bras <leobras@redhat.com>
Suggested-by: Leonardo Bras <leobras@redhat.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Leonardo Bras <leobras@redhat.com>
Signed-off-by: Leonardo Bras <leobras@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-55557
commit ae2b217ab542d0db0ca1a6de4f442201a1982f00
Author: Paul E. McKenney <paulmck@kernel.org>
Date: Fri, 8 Mar 2024 11:15:01 -0800
rcu: Make hotplug operations track GP state, not flags
Currently, there are rcu_data structure fields named ->rcu_onl_gp_seq
and ->rcu_ofl_gp_seq that track the rcu_state.gp_flags field at the
time of the corresponding CPU's last online or offline operation,
respectively. However, this information is not particularly useful.
It would be better to instead track the grace period state kept
in rcu_state.gp_state. This would also be consistent with the
initialization in rcu_boot_init_percpu_data(), which is to RCU_GP_CLEANED
(an rcu_state.gp_state value), and also with the diagnostics in
rcu_implicit_dynticks_qs(), whose format is consistent with an integer,
not a bitmask.
This commit therefore makes this change and changes the names to
->rcu_onl_gp_flags and ->rcu_ofl_gp_flags, respectively.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Waiman Long <longman@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-55557
commit b67cffcbbf9dc759d95d330a5af5d1480af2b1f1
Author: Frederic Weisbecker <frederic@kernel.org>
Date: Fri, 12 Jan 2024 16:46:20 +0100
rcu/exp: Handle parallel exp gp kworkers affinity
Affine the parallel expedited gp kworkers to their respective RCU node
in order to make them close to the cache their are playing with.
This reuses the boost kthreads machinery that probe into CPU hotplug
operations such that the kthreads become/stay affine to their respective
node as soon/long as they contain online CPUs. Otherwise and if the
current CPU going down was the last online on the leaf node, the related
kthread is affine to the housekeeping CPUs.
In the long run, this affinity VS CPU hotplug operation game should
probably be implemented at the generic kthread level.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
[boqun: s/* rcu_boost_task/*rcu_boost_task as reported by checkpatch]
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Waiman Long <longman@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-55557
commit 8e5e621566485a3e160c0d8bfba206cb1d6b980d
Author: Frederic Weisbecker <frederic@kernel.org>
Date: Fri, 12 Jan 2024 16:46:19 +0100
rcu/exp: Make parallel exp gp kworker per rcu node
When CONFIG_RCU_EXP_KTHREAD=n, the expedited grace period per node
initialization is performed in parallel via workqueues (one work per
node).
However in CONFIG_RCU_EXP_KTHREAD=y, this per node initialization is
performed by a single kworker serializing each node initialization (one
work for all nodes).
The second part is certainly less scalable and efficient beyond a single
leaf node.
To improve this, expand this single kworker into per-node kworkers. This
new layout is eventually intended to remove the workqueues based
implementation since it will essentially now become duplicate code.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Waiman Long <longman@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-55557
commit 7836b270607676ed1c0c6a4a840a2ede9437a6a1
Author: Frederic Weisbecker <frederic@kernel.org>
Date: Fri, 12 Jan 2024 16:46:17 +0100
rcu: s/boost_kthread_mutex/kthread_mutex
This mutex is currently protecting per node boost kthreads creation and
affinity setting across CPU hotplug operations.
Since the expedited kworkers will soon be split per node as well, they
will be subject to the same concurrency constraints against hotplug.
Therefore their creation and affinity tuning operations will be grouped
with those of boost kthreads and then rely on the same mutex.
To prepare for that, generalize its name.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Waiman Long <longman@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-34076
commit 9146eb25495ea8bfb5010192e61e3ed5805ce9ef
Author: Paul E. McKenney <paulmck@kernel.org>
Date: Fri, 7 Apr 2023 16:05:38 -0700
rcu: Mark additional concurrent load from ->cpu_no_qs.b.exp
The per-CPU rcu_data structure's ->cpu_no_qs.b.exp field is updated
only on the instance corresponding to the current CPU, but can be read
more widely. Unmarked accesses are OK from the corresponding CPU, but
only if interrupts are disabled, given that interrupt handlers can and
do modify this field.
Unfortunately, although the load from rcu_preempt_deferred_qs() is always
carried out from the corresponding CPU, interrupts are not necessarily
disabled. This commit therefore upgrades this load to READ_ONCE.
Similarly, the diagnostic access from synchronize_rcu_expedited_wait()
might run with interrupts disabled and from some other CPU. This commit
therefore marks this load with data_race().
Finally, the C-language access in rcu_preempt_ctxt_queue() is OK as
is because interrupts are disabled and this load is always from the
corresponding CPU. This commit adds a comment giving the rationale for
this access being safe.
This data race was reported by KCSAN. Not appropriate for backporting
due to failure being unlikely.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-34076
commit 6343402ac35dd534291a6c82924a4f09cf6cd1e5
Author: Pingfan Liu <kernelfans@gmail.com>
Date: Tue, 6 Sep 2022 11:36:42 -0700
rcu: Synchronize ->qsmaskinitnext in rcu_boost_kthread_setaffinity()
Once either rcutree_online_cpu() or rcutree_dead_cpu() is invoked
concurrently, the following rcu_boost_kthread_setaffinity() race can
occur:
CPU 1 CPU2
mask = rcu_rnp_online_cpus(rnp);
...
mask = rcu_rnp_online_cpus(rnp);
...
set_cpus_allowed_ptr(t, cm);
set_cpus_allowed_ptr(t, cm);
This results in CPU2's update being overwritten by that of CPU1, and
thus the possibility of ->boost_kthread_task continuing to run on a
to-be-offlined CPU.
This commit therefore eliminates this race by relying on the pre-existing
acquisition of ->boost_kthread_mutex to serialize the full process of
changing the affinity of ->boost_kthread_task.
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516
commit 528262f50274079740b53e29bcaaabf219aa7417
Author: Zqiang <qiang1.zhang@intel.com>
Date: Tue, 19 Jul 2022 12:39:00 +0800
rcu-tasks: Make RCU Tasks Trace check for userspace execution
Userspace execution is a valid quiescent state for RCU Tasks Trace,
but the scheduling-clock interrupt does not currently report such
quiescent states.
Of course, the scheduling-clock interrupt is not strictly speaking
userspace execution. However, the only way that this code is not
in a quiescent state is if something invoked rcu_read_lock_trace(),
and that would be reflected in the ->trc_reader_nesting field in
the task_struct structure. Furthermore, this field is checked by
rcu_tasks_trace_qs(), which is invoked by rcu_tasks_qs() which is in
turn invoked by rcu_note_voluntary_context_switch() in kernels building
at least one of the RCU Tasks flavors. It is therefore safe to invoke
rcu_tasks_trace_qs() from the rcu_sched_clock_irq().
But rcu_tasks_qs() also invokes rcu_tasks_classic_qs() for RCU
Tasks, which lacks the read-side markers provided by RCU Tasks Trace.
This raises the possibility that an RCU Tasks grace period could start
after the interrupt from userspace execution, but before the call to
rcu_sched_clock_irq(). However, it turns out that this is safe because
the RCU Tasks grace period waits for an RCU grace period, which will
wait for the entire scheduling-clock interrupt handler, including any
RCU Tasks read-side critical section that this handler might contain.
This commit therefore updates the rcu_sched_clock_irq() function's
check for usermode execution and its call to rcu_tasks_classic_qs()
to instead check for both usermode execution and interrupt from idle,
and to instead call rcu_note_voluntary_context_switch(). This
consolidates code and provides more faster RCU Tasks Trace
reporting of quiescent states in kernels that do scheduling-clock
interrupts for userspace execution.
[ paulmck: Consolidate checks into rcu_sched_clock_irq(). ]
Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516
commit 7634b1eaa0cd135d5eedadb04ad3c91b1ecf28a9
Author: Paul E. McKenney <paulmck@kernel.org>
Date: Wed, 24 Aug 2022 14:46:56 -0700
rcu: Exclude outgoing CPU when it is the last to leave
The rcu_boost_kthread_setaffinity() function removes the outgoing CPU
from the set_cpus_allowed() mask for the corresponding leaf rcu_node
structure's rcub priority-boosting kthread. Except that if the outgoing
CPU will leave that structure without any online CPUs, the mask is set
to the housekeeping CPU mask from housekeeping_cpumask(). Which is fine
unless the outgoing CPU happens to be a housekeeping CPU.
This commit therefore removes the outgoing CPU from the housekeeping mask.
This would of course be problematic if the outgoing CPU was the last
online housekeeping CPU, but in that case you are in a world of hurt
anyway. If someone comes up with a valid use case for a system needing
all the housekeeping CPUs to be offline, further adjustments can be made.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516
commit 621189a1fe93cb2b34d62c5cdb9e258bca044813
Author: Zqiang <qiang1.zhang@intel.com>
Date: Mon, 8 Aug 2022 10:26:26 +0800
rcu: Avoid triggering strict-GP irq-work when RCU is idle
Kernels built with PREEMPT_RCU=y and RCU_STRICT_GRACE_PERIOD=y trigger
irq-work from rcu_read_unlock(), and the resulting irq-work handler
invokes rcu_preempt_deferred_qs_handle(). The point of this triggering
is to force grace periods to end quickly in order to give tools like KASAN
a better chance of detecting RCU usage bugs such as leaking RCU-protected
pointers out of an RCU read-side critical section.
However, this irq-work triggering is unconditional. This works, but
there is no point in doing this irq-work unless the current grace period
is waiting on the running CPU or task, which is not the common case.
After all, in the common case there are many rcu_read_unlock() calls
per CPU per grace period.
This commit therefore triggers the irq-work only when the current grace
period is waiting on the running CPU or task.
This change was tested as follows on a four-CPU system:
echo rcu_preempt_deferred_qs_handler > /sys/kernel/debug/tracing/set_ftrace_filter
echo 1 > /sys/kernel/debug/tracing/function_profile_enabled
insmod rcutorture.ko
sleep 20
rmmod rcutorture.ko
echo 0 > /sys/kernel/debug/tracing/function_profile_enabled
echo > /sys/kernel/debug/tracing/set_ftrace_filter
This procedure produces results in this per-CPU set of files:
/sys/kernel/debug/tracing/trace_stat/function*
Sample output from one of these files is as follows:
Function Hit Time Avg s^2
-------- --- ---- --- ---
rcu_preempt_deferred_qs_handle 838746 182650.3 us 0.217 us 0.004 us
The baseline sum of the "Hit" values (the number of calls to this
function) was 3,319,015. With this commit, that sum was 1,140,359,
for a 2.9x reduction. The worst-case variance across the CPUs was less
than 25%, so this large effect size is statistically significant.
The raw data is available in the Link: URL.
Link: https://lore.kernel.org/all/20220808022626.12825-1-qiang1.zhang@intel.com/
Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516
commit 089254fd386eb6800dd7d7863f12a04ada0c35fa
Author: Paul E. McKenney <paulmck@kernel.org>
Date: Wed, 3 Aug 2022 08:48:12 -0700
rcu: Document reason for rcu_all_qs() call to preempt_disable()
Given that rcu_all_qs() is in non-preemptible kernels, why on earth should
it invoke preempt_disable()? This commit adds the reason, which is to
work nicely with debugging enabled in CONFIG_PREEMPT_COUNT=y kernels.
Reported-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Reported-by: Boqun Feng <boqun.feng@gmail.com>
Reported-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516
commit bca4fa8cb0f4c096b515952f64e560fd784a0514
Author: Zqiang <qiang1.zhang@intel.com>
Date: Mon, 20 Jun 2022 14:42:24 +0800
rcu: Update rcu_preempt_deferred_qs() comments for !PREEMPT kernels
In non-premptible kernels, tasks never do context switches within
RCU read-side critical sections. Therefore, in such kernels, each
leaf rcu_node structure's ->blkd_tasks list will always be empty.
The comment on the non-preemptible version of rcu_preempt_deferred_qs()
confuses this point, so this commit therefore fixes it.
Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516
commit 6d60ea03ac2d3dcf6ddee6b45aa7213d8b0461c5
Author: Zqiang <qiang1.zhang@intel.com>
Date: Thu, 16 Jun 2022 21:53:47 +0800
rcu: Fix rcu_read_unlock_strict() strict QS reporting
Kernels built with CONFIG_PREEMPT=n and CONFIG_RCU_STRICT_GRACE_PERIOD=y
report the quiescent state directly from the outermost rcu_read_unlock().
However, the current CPU's rcu_data structure's ->cpu_no_qs.b.norm
might still be set, in which case rcu_report_qs_rdp() will exit early,
thus failing to report quiescent state.
This commit therefore causes rcu_read_unlock_strict() to clear
CPU's rcu_data structure's ->cpu_no_qs.b.norm field before invoking
rcu_report_qs_rdp().
Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516
commit 5103850654fdc651f0a7076ac753b958f018bb85
Author: Zqiang <qiang1.zhang@intel.com>
Date: Fri, 29 Apr 2022 20:42:22 +0800
rcu: Add nocb_cb_kthread check to rcu_is_callbacks_kthread()
Callbacks are invoked in RCU kthreads when calbacks are offloaded
(rcu_nocbs boot parameter) or when RCU's softirq handler has been
offloaded to rcuc kthreads (use_softirq==0). The current code allows
for the rcu_nocbs case but not the use_softirq case. This commit adds
support for the use_softirq case.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516
commit 70a82c3c55c8665d3996dcb9968adcf24d52bbc4
Author: Zqiang <qiang1.zhang@intel.com>
Date: Fri, 13 May 2022 08:42:55 +0800
rcu: Immediately boost preempted readers for strict grace periods
The intent of the CONFIG_RCU_STRICT_GRACE_PERIOD Konfig option is to
cause normal grace periods to complete quickly in order to better catch
errors resulting from improperly leaking pointers from RCU read-side
critical sections. However, kernels built with this option enabled still
wait for some hundreds of milliseconds before boosting RCU readers that
have been preempted within their current critical section. The value
of this delay is set by the CONFIG_RCU_BOOST_DELAY Kconfig option,
which defaults to 500 milliseconds.
This commit therefore causes kernels build with strict grace periods
to ignore CONFIG_RCU_BOOST_DELAY. This causes rcu_initiate_boost()
to start boosting immediately after all CPUs on a given leaf rcu_node
structure have passed through their quiescent states.
Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516
Conflicts: 2 merge conflicts due to upstream merge conflict. Apply the
same resolutions as in upstream merge commit 34bc7b454dc3
("Merge branch 'ctxt.2022.07.05a' into HEAD").
commit 48f8070f5dd8e13148ae4647780a452d53c457a2
Author: Patrick Wang <patrick.wang.shcn@gmail.com>
Date: Tue, 26 Apr 2022 18:45:02 +0800
rcu: Avoid tracing a few functions executed in stop machine
Stop-machine recently started calling additional functions while waiting:
----------------------------------------------------------------
Former stop machine wait loop:
do {
cpu_relax(); => macro
...
} while (curstate != STOPMACHINE_EXIT);
-----------------------------------------------------------------
Current stop machine wait loop:
do {
stop_machine_yield(cpumask); => function (notraced)
...
touch_nmi_watchdog(); => function (notraced, inside calls also notraced)
...
rcu_momentary_dyntick_idle(); => function (notraced, inside calls traced)
} while (curstate != MULTI_STOP_EXIT);
------------------------------------------------------------------
These functions (and the functions that they call) must be marked
notrace to prevent them from being updated while they are executing.
The consequences of failing to mark these functions can be severe:
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: 1-...!: (0 ticks this GP) idle=14f/1/0x4000000000000000 softirq=3397/3397 fqs=0
rcu: 3-...!: (0 ticks this GP) idle=ee9/1/0x4000000000000000 softirq=5168/5168 fqs=0
(detected by 0, t=8137 jiffies, g=5889, q=2 ncpus=4)
Task dump for CPU 1:
task:migration/1 state:R running task stack: 0 pid: 19 ppid: 2 flags:0x00000000
Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
Call Trace:
Task dump for CPU 3:
task:migration/3 state:R running task stack: 0 pid: 29 ppid: 2 flags:0x00000000
Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
Call Trace:
rcu: rcu_preempt kthread timer wakeup didn't happen for 8136 jiffies! g5889 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
rcu: Possible timer handling issue on cpu=2 timer-softirq=594
rcu: rcu_preempt kthread starved for 8137 jiffies! g5889 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2
rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt state:I stack: 0 pid: 14 ppid: 2 flags:0x00000000
Call Trace:
schedule+0x56/0xc2
schedule_timeout+0x82/0x184
rcu_gp_fqs_loop+0x19a/0x318
rcu_gp_kthread+0x11a/0x140
kthread+0xee/0x118
ret_from_exception+0x0/0x14
rcu: Stack dump where RCU GP kthread last ran:
Task dump for CPU 2:
task:migration/2 state:R running task stack: 0 pid: 24 ppid: 2 flags:0x00000000
Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
Call Trace:
This commit therefore marks these functions notrace:
rcu_preempt_deferred_qs()
rcu_preempt_need_deferred_qs()
rcu_preempt_deferred_qs_irqrestore()
[ paulmck: Apply feedback from Neeraj Upadhyay. ]
Signed-off-by: Patrick Wang <patrick.wang.shcn@gmail.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516
commit 172114552701b85d5c3b1a089a73ee85d0d7786b
Author: Frederic Weisbecker <frederic@kernel.org>
Date: Wed, 8 Jun 2022 16:40:33 +0200
rcu/context-tracking: Move RCU-dynticks internal functions to context_tracking
Move the core RCU eqs/dynticks functions to context tracking so that
we can later merge all that code within context tracking.
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Cc: Yu Liao <liaoyu15@huawei.com>
Cc: Phil Auld <pauld@redhat.com>
Cc: Paul Gortmaker<paul.gortmaker@windriver.com>
Cc: Alex Belits <abelits@marvell.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516
commit 6a694411977a6d57ff76a896a745c2f717372dac
Author: Paul E. McKenney <paulmck@kernel.org>
Date: Tue, 24 May 2022 20:33:17 -0700
rcu-tasks: Make rcu_note_context_switch() unconditionally call rcu_tasks_qs()
This commit makes rcu_note_context_switch() unconditionally invoke the
rcu_tasks_qs() function, as opposed to doing so only when RCU (as opposed
to RCU Tasks Trace) urgently needs a grace period to end.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: KP Singh <kpsingh@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491
commit f596e2ce1c0f250bb3ecc179f611be37e862635f
Author: Zqiang <qiang1.zhang@intel.com>
Date: Mon, 4 Apr 2022 07:59:32 +0800
rcu: Use IRQ_WORK_INIT_HARD() to avoid rcu_read_unlock() hangs
When booting kernels built with both CONFIG_RCU_STRICT_GRACE_PERIOD=y
and CONFIG_PREEMPT_RT=y, the rcu_read_unlock_special() function's
invocation of irq_work_queue_on() the init_irq_work() causes the
rcu_preempt_deferred_qs_handler() function to work execute in SCHED_FIFO
irq_work kthreads. Because rcu_read_unlock_special() is invoked on each
rcu_read_unlock() in such kernels, the amount of work just keeps piling
up, resulting in a boot-time hang.
This commit therefore avoids this hang by using IRQ_WORK_INIT_HARD()
instead of init_irq_work(), but only in kernels built with both
CONFIG_PREEMPT_RT=y and CONFIG_RCU_STRICT_GRACE_PERIOD=y.
Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491
commit 88ca472f80604c070526eb58b977ea0a9c3c2e1f
Author: Zqiang <qiang1.zhang@intel.com>
Date: Thu, 24 Mar 2022 19:15:15 +0800
rcu: Check for successful spawn of ->boost_kthread_task
For the spawning of the priority-boost kthreads can fail, improbable
though this might seem. This commit therefore refrains from attemoting
to initiate RCU priority boosting when The ->boost_kthread_task pointer
is NULL.
Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491
commit 90d2efe7bdbde5371b6122174af0718843f805c6
Author: Paul E. McKenney <paulmck@kernel.org>
Date: Wed, 16 Feb 2022 09:54:56 -0800
rcu: Fix rcu_preempt_deferred_qs_irqrestore() strict QS reporting
Suppose we have a kernel built with both CONFIG_RCU_STRICT_GRACE_PERIOD=y
and CONFIG_PREEMPT=y. Suppose further that an RCU reader from which RCU
core needs a quiescent state ends in rcu_preempt_deferred_qs_irqrestore().
This function will then invoke rcu_report_qs_rdp() in order to immediately
report that quiescent state. Unfortunately, it will not have cleared
that reader's CPU's rcu_data structure's ->cpu_no_qs.b.norm field.
As a result, rcu_report_qs_rdp() will take an early exit because it
will believe that this CPU has not yet encountered a quiescent state,
and there will be no reporting of the current quiescent state.
This commit therefore causes rcu_preempt_deferred_qs_irqrestore() to
clear the ->cpu_no_qs.b.norm field before invoking rcu_report_qs_rdp().
Kudos to Boqun Feng and Neeraj Upadhyay for helping with analysis of
this issue!
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491
commit 3352911fa9b47a90165e5c6fed440048c55146d1
Author: Frederic Weisbecker <frederic@kernel.org>
Date: Wed, 16 Feb 2022 16:42:07 +0100
rcu: Initialize boost kthread only for boot node prior SMP initialization
The rcu_spawn_gp_kthread() function is called as an early initcall,
which means that SMP initialization hasn't happened yet and only the
boot CPU is online. Therefore, create only the boost kthread for the
leaf node of the boot CPU.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713
commit c9515875850fefcc79492c5189fe8431e75ddec5
Author: Zqiang <qiang1.zhang@intel.com>
Date: Tue, 25 Jan 2022 10:47:44 +0800
rcu: Add per-CPU rcuc task dumps to RCU CPU stall warnings
When the rcutree.use_softirq kernel boot parameter is set to zero, all
RCU_SOFTIRQ processing is carried out by the per-CPU rcuc kthreads.
If these kthreads are being starved, quiescent states will not be
reported, which in turn means that the grace period will not end, which
can in turn trigger RCU CPU stall warnings. This commit therefore dumps
stack traces of stalled CPUs' rcuc kthreads, which can help identify
what is preventing those kthreads from running.
Suggested-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Reviewed-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713
Conflicts: A merge conflict in rcu_boost_kthread_setaffinity() of
kernel/rcu/tree_plugin.h due to the presence of a later
upstream commit 04d4e665a609 ("sched/isolation: Use single
feature type while referring to housekeeping cpumask").
commit 6a2c1d450a6a328027280a854019c55de989e14e
Author: Yury Norov <yury.norov@gmail.com>
Date: Sun, 23 Jan 2022 10:38:53 -0800
rcu: Replace cpumask_weight with cpumask_empty where appropriate
In some places, RCU code calls cpumask_weight() to check if any bit of a
given cpumask is set. We can do it more efficiently with cpumask_empty()
because cpumask_empty() stops traversing the cpumask as soon as it finds
first set bit, while cpumask_weight() counts all bits unconditionally.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713
commit 10c535787436d62ea28156a4b91365fd89b5a432
Author: Paul E. McKenney <paulmck@kernel.org>
Date: Fri, 21 Jan 2022 12:40:08 -0800
rcu: Don't deboost before reporting expedited quiescent state
Currently rcu_preempt_deferred_qs_irqrestore() releases rnp->boost_mtx
before reporting the expedited quiescent state. Under heavy real-time
load, this can result in this function being preempted before the
quiescent state is reported, which can in turn prevent the expedited grace
period from completing. Tim Murray reports that the resulting expedited
grace periods can take hundreds of milliseconds and even more than one
second, when they should normally complete in less than a millisecond.
This was fine given that there were no particular response-time
constraints for synchronize_rcu_expedited(), as it was designed
for throughput rather than latency. However, some users now need
sub-100-millisecond response-time constratints.
This patch therefore follows Neeraj's suggestion (seconded by Tim and
by Uladzislau Rezki) of simply reversing the two operations.
Reported-by: Tim Murray <timmurray@google.com>
Reported-by: Joel Fernandes <joelaf@google.com>
Reported-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Tested-by: Tim Murray <timmurray@google.com>
Cc: Todd Kjos <tkjos@google.com>
Cc: Sandeep Patil <sspatil@google.com>
Cc: <stable@vger.kernel.org> # 5.4.x
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713
commit 5ae0f1b58b28b53f4ab3708ef9337a2665e79664
Author: Paul E. McKenney <paulmck@kernel.org>
Date: Fri, 10 Dec 2021 13:44:17 -0800
rcu: Create and use an rcu_rdp_cpu_online()
The pattern "rdp->grpmask & rcu_rnp_online_cpus(rnp)" occurs frequently
in RCU code in order to determine whether rdp->cpu is online from an
RCU perspective. This commit therefore creates an rcu_rdp_cpu_online()
function to replace it.
[ paulmck: Apply kernel test robot unused-variable feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713
Conflicts: A fuzz in rcu_boost_kthread_setaffinity() of
kernel/rcu/tree_plugin.h due to the presence of a later
ustream commit 04d4e665a609 ("sched/isolation: Use single
feature type while referring to housekeeping cpumask").
commit 218b957a6959a2fb5b3967fc824072bb89ac2611
Author: David Woodhouse <dwmw@amazon.co.uk>
Date: Wed, 8 Dec 2021 23:41:53 +0000
rcu: Add mutex for rcu boost kthread spawning and affinity setting
As we handle parallel CPU bringup, we will need to take care to avoid
spawning multiple boost threads, or race conditions when setting their
affinity. Spotted by Paul McKenney.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/671
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065222
Depends: https://bugzilla.redhat.com/show_bug.cgi?id=2065994
Tested: Setup isolation and ran scheduler tests, checked that housekeeping
looked right (tasks offloaded from isolated cpus to HK ones etc).
Split the housekeeping flags into finer granularity in preparation
for allowing them to be configured dynamically. There should not be
much functional change.
Signed-off-by: Phil Auld <pauld@redhat.com>
Approved-by: Jiri Benc <jbenc@redhat.com>
Approved-by: Waiman Long <longman@redhat.com>
Approved-by: Prarit Bhargava <prarit@redhat.com>
Approved-by: Paolo Bonzini <bonzini@gnu.org>
Approved-by: Wander Lairson Costa <wander@redhat.com>
Approved-by: David Arcari <darcari@redhat.com>
Signed-off-by: Patrick Talbert <ptalbert@redhat.com>
Bugzilla: http://bugzilla.redhat.com/2065222
commit 04d4e665a60902cf36e7ad39af1179cb5df542ad
Author: Frederic Weisbecker <frederic@kernel.org>
Date: Mon Feb 7 16:59:06 2022 +0100
sched/isolation: Use single feature type while referring to housekeeping cpumask
Refer to housekeeping APIs using single feature types instead of flags.
This prevents from passing multiple isolation features at once to
housekeeping interfaces, which soon won't be possible anymore as each
isolation features will have their own cpumask.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Phil Auld <pauld@redhat.com>
Link: https://lore.kernel.org/r/20220207155910.527133-5-frederic@kernel.org
Signed-off-by: Phil Auld <pauld@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994
commit c2cf0767e98eb4487444e5c7ebba491a866811ce
Author: Zqiang <qiang.zhang1211@gmail.com>
Date: Mon, 15 Nov 2021 13:15:46 +0800
rcu: Avoid running boost kthreads on isolated CPUs
When the boost kthreads are created on systems with nohz_full CPUs,
the cpus_allowed_ptr is set to housekeeping_cpumask(HK_FLAG_KTHREAD).
However, when the rcu_boost_kthread_setaffinity() is called, the original
affinity will be changed and these kthreads can subsequently run on
nohz_full CPUs. This commit makes rcu_boost_kthread_setaffinity()
restrict these boost kthreads to housekeeping CPUs.
Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994
commit 17ea3718824912e773b0fd78579694b2e75ee597
Author: Zhouyi Zhou <zhouzhouyi@gmail.com>
Date: Sun, 24 Oct 2021 08:36:34 +0800
rcu: Improve tree_plugin.h comments and add code cleanups
This commit cleans up some comments and code in kernel/rcu/tree_plugin.h.
Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994
commit 790da248978a0722d92d1471630c881704f7eb0d
Author: Paul E. McKenney <paulmck@kernel.org>
Date: Wed, 29 Sep 2021 11:09:34 -0700
rcu: Make idle entry report expedited quiescent states
In non-preemptible kernels, an unfortunately timed expedited grace period
can result in the rcu_exp_handler() IPI handler setting the rcu_data
structure's cpu_no_qs.b.exp field just as the target CPU enters idle.
There are situations in which this field will not be checked until after
that CPU exits idle. The resulting grace-period latency does not qualify
as "expedited".
This commit therefore checks this field upon non-preemptible idle entry in
the rcu_preempt_deferred_qs() function. It also qualifies the rcu_core()
preempt_count() check with IS_ENABLED(CONFIG_PREEMPT_COUNT) to prevent
false-positive quiescent states from count-free kernels.
Reported-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994
commit 7663ad9a5dbcc27f3090e6bfd192c7e59222709f
Author: Peter Zijlstra <peterz@infradead.org>
Date: Tue, 28 Sep 2021 10:40:21 +0200
rcu: Always inline rcu_dynticks_task*_{enter,exit}()
RCU managed to grow a few noinstr violations:
vmlinux.o: warning: objtool: rcu_dynticks_eqs_enter()+0x0: call to rcu_dynticks_task_trace_enter() leaves .noinstr.text section
vmlinux.o: warning: objtool: rcu_dynticks_eqs_exit()+0xe: call to rcu_dynticks_task_trace_exit() leaves .noinstr.text section
Fix them by adding __always_inline to the relevant trivial functions.
Also replace the noinstr with __always_inline for the existing
rcu_dynticks_task_*() functions since noinstr would force noinline
them, even when empty, which seems silly.
Fixes: 7d0c9c50c5 ("rcu-tasks: Avoid IPIing userspace/idle tasks if kernel is so built")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994
commit 2407a64f8045552203ee5cb9904ce75ce2fceef4
Author: Changbin Du <changbin.du@intel.com>
Date: Tue, 28 Sep 2021 08:21:28 +0800
rcu: in_irq() cleanup
This commit replaces the obsolete and ambiguous macro in_irq() with its
shiny new in_hardirq() equivalent.
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994
commit 6120b72e25e195b6fa15b0a674479a38166c392a
Author: Frederic Weisbecker <frederic@kernel.org>
Date: Thu, 16 Sep 2021 14:10:48 +0200
rcu: Remove rcu_data.exp_deferred_qs and convert to rcu_data.cpu no_qs.b.exp
Having two fields for the same purpose with subtle differences on
different RCU flavours is confusing, especially when both fields always
exist on both RCU flavours.
Fortunately, it is now safe for preemptible RCU to rely on the rcu_data
structure's ->cpu_no_qs.b.exp field, just like non-preemptible RCU.
This commit therefore removes the ad-hoc ->exp_deferred_qs field.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994
commit 6e16b0f7bae3817ea67f4bef4f84298e880fbf66
Author: Frederic Weisbecker <frederic@kernel.org>
Date: Thu, 16 Sep 2021 14:10:47 +0200
rcu: Move rcu_data.cpu_no_qs.b.exp reset to rcu_export_exp_rdp()
On non-preemptible RCU, move clearing of the rcu_data structure's
->cpu_no_qs.b.exp filed to the actual expedited quiescent state report
function, matching hw preemptible RCU handles the ->exp_deferred_qs field.
This prepares for removing ->exp_deferred_qs in favor of ->cpu_no_qs.b.exp
for both preemptible and non-preemptible RCU.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994
commit a4382659487f84c00b5fbb61df25a9ad59396789
Author: Frederic Weisbecker <frederic@kernel.org>
Date: Thu, 16 Sep 2021 14:10:45 +0200
rcu: Ignore rdp.cpu_no_qs.b.exp on preemptible RCU's rcu_qs()
Preemptible RCU does not use the rcu_data structure's ->cpu_no_qs.b.exp,
instead using a separate ->exp_deferred_qs field to record the need for
an expedited quiescent state.
In fact ->cpu_no_qs.b.exp should never be set in preemptible RCU because
preemptible RCU's expedited grace periods use other mechanisms to record
quiescent states.
This commit therefore removes the implicit rcu_qs() reference to
->cpu_no_qs.b.exp in favor of a direct reference to ->cpu_no_qs.b.norm.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2059555
Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=e2c73a6860bdf54f2c6bf8cddc34ddc91a1343e1
commit e2c73a6860bdf54f2c6bf8cddc34ddc91a1343e1
Author: "Paul E. McKenney" <paulmck@kernel.org>
Date: Mon, 27 Sep 2021 14:18:51 -0700
All of the uses of CONFIG_RCU_FAST_NO_HZ=y that I have seen involve
systems with RCU callbacks offloaded. In this situation, all that this
Kconfig option does is slow down idle entry/exit with an additional
allways-taken early exit. If this is the only use case, then this
Kconfig option nothing but an attractive nuisance that needs to go away.
This commit therefore removes the RCU_FAST_NO_HZ Kconfig option.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Desnes A. Nunes do Rosario <drosario@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2022806
commit 521c89b3a4022269c75b35062358d1dae4ebfa79
Author: Paul E. McKenney <paulmck@kernel.org>
Date: Mon, 19 Jul 2021 11:52:12 -0700
rcu: Print human-readable message for schedule() in RCU reader
The WARN_ON_ONCE() invocation within the CONFIG_PREEMPT=y version of
rcu_note_context_switch() triggers when there is a voluntary context
switch in an RCU read-side critical section, but there is quite a gap
between the output of that WARN_ON_ONCE() and this RCU-usage error.
This commit therefore converts the WARN_ON_ONCE() to a WARN_ONCE()
that explicitly describes the problem in its message.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2022806
commit fed31a4dd3adb5455df7c704de2abb639a1dc1c0
Author: Zhouyi Zhou <zhouzhouyi@gmail.com>
Date: Tue, 13 Jul 2021 08:56:45 +0800
rcu: Fix macro name CONFIG_TASKS_RCU_TRACE
This commit fixes several typos where CONFIG_TASKS_RCU_TRACE should
instead be CONFIG_TASKS_TRACE_RCU. Among other things, these typos
could cause CONFIG_TASKS_TRACE_RCU_READ_MB=y kernels to suffer from
memory-ordering bugs that could result in false-positive quiescent
states and too-short grace periods.
Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2022806
commit 5fcb3a5f04ee6422714adb02f5364042228bfc2e
Author: Paul E. McKenney <paulmck@kernel.org>
Date: Thu, 20 May 2021 13:35:50 -0700
rcu: Mark accesses to ->rcu_read_lock_nesting
KCSAN flags accesses to ->rcu_read_lock_nesting as data races, but
in the past, the overhead of marked accesses was excessive. However,
that was long ago, and much has changed since then, both in terms of
hardware and of compilers. Here is data taken on an eight-core laptop
using Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz with a kernel built
using gcc version 9.3.0, with all data in nanoseconds.
Unmarked accesses (status quo), measured by three refscale runs:
Minimum reader duration: 3.286 2.851 3.395
Median reader duration: 3.698 3.531 3.4695
Maximum reader duration: 4.481 5.215 5.157
Marked accesses, also measured by three refscale runs:
Minimum reader duration: 3.501 3.677 3.580
Median reader duration: 4.053 3.723 3.895
Maximum reader duration: 7.307 4.999 5.511
This focused microbenhmark shows only sub-nanosecond differences which
are unlikely to be visible at the system level. This commit therefore
marks data-racing accesses to ->rcu_read_lock_nesting.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2022806
commit dfcb27540213e8061ecffacd4bd8ed54a310a7b0
Author: Frederic Weisbecker <frederic@kernel.org>
Date: Wed, 19 May 2021 02:09:28 +0200
rcu/nocb: Start moving nocb code to its own plugin file
The kernel/rcu/tree_plugin.h file contains not only the plugins for
preemptible RCU, but also many other features including rcu_nocbs
callback offloading. This offloading has become large and complex,
so it is time to put it in its own file.
This commit starts that process.
Suggested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
[ paulmck: Rename to tree_nocb.h, add Frederic as author. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2007032
commit 830e6acc8a1cafe153a0d88f9b2455965b396131
Author: Peter Zijlstra <peterz@infradead.org>
Date: Sun, 15 Aug 2021 23:27:58 +0200
locking/rtmutex: Split out the inner parts of 'struct rtmutex'
RT builds substitutions for rwsem, mutex, spinlock and rwlock around
rtmutexes. Split the inner working out so each lock substitution can use
them with the appropriate lockdep annotations. This avoids having an extra
unused lockdep map in the wrapped rtmutex.
No functional change.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210815211302.784739994@linutronix.de
Signed-off-by: Waiman Long <longman@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1998549
Upstream Status: linux-rcu commit fd07d7b373a8e7c8406a04b206bed89ec3cc2b52
https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
Tested: Before this patch, the symbol rcu_read_unlock_strict is found
in 100s kernel modules. After this patch, the symbol is no longer
found in any of the kernel modules.
commit fd07d7b373a8e7c8406a04b206bed89ec3cc2b52
Author: Waiman Long <longman@redhat.com>
Date: Thu, 26 Aug 2021 22:21:22 -0400
rcu: Avoid unneeded function call in rcu_read_unlock()
Since commit aa40c138cc ("rcu: Report QS for outermost PREEMPT=n
rcu_read_unlock() for strict GPs") the function rcu_read_unlock_strict()
is invoked by the inlined rcu_read_unlock() function. However,
rcu_read_unlock_strict() is an empty function in production kernels,
which are built with CONFIG_RCU_STRICT_GRACE_PERIOD=n.
There is a mention of rcu_read_unlock_strict() in the BPF verifier,
but this is in a deny-list, meaning that BPF does not care whether
rcu_read_unlock_strict() is ever called.
This commit therefore provides a slight performance improvement
by hoisting the check of CONFIG_RCU_STRICT_GRACE_PERIOD from
rcu_read_unlock_strict() into rcu_read_unlock(), thus avoiding the
pointless call to an empty function.
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Pull RCU updates from Paul McKenney:
- Bitmap parsing support for "all" as an alias for all bits
- Documentation updates
- Miscellaneous fixes, including some that overlap into mm and lockdep
- kvfree_rcu() updates
- mem_dump_obj() updates, with acks from one of the slab-allocator
maintainers
- RCU NOCB CPU updates, including limited deoffloading
- SRCU updates
- Tasks-RCU updates
- Torture-test updates
* 'core-rcu-2021.07.04' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (78 commits)
tasks-rcu: Make show_rcu_tasks_gp_kthreads() be static inline
rcu-tasks: Make ksoftirqd provide RCU Tasks quiescent states
rcu: Add missing __releases() annotation
rcu: Remove obsolete rcu_read_unlock() deadlock commentary
rcu: Improve comments describing RCU read-side critical sections
rcu: Create an unrcu_pointer() to remove __rcu from a pointer
srcu: Early test SRCU polling start
rcu: Fix various typos in comments
rcu/nocb: Unify timers
rcu/nocb: Prepare for fine-grained deferred wakeup
rcu/nocb: Only cancel nocb timer if not polling
rcu/nocb: Delete bypass_timer upon nocb_gp wakeup
rcu/nocb: Cancel nocb_timer upon nocb_gp wakeup
rcu/nocb: Allow de-offloading rdp leader
rcu/nocb: Directly call __wake_nocb_gp() from bypass timer
rcu: Don't penalize priority boosting when there is nothing to boost
rcu: Point to documentation of ordering guarantees
rcu: Make rcu_gp_cleanup() be noinline for tracing
rcu: Restrict RCU_STRICT_GRACE_PERIOD to at most four CPUs
rcu: Make show_rcu_gp_kthreads() dump rcu_node structures blocking GP
...
Replace a bunch of 'p->state == TASK_RUNNING' with a new helper:
task_is_running(p).
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210611082838.222401495@infradead.org