Centos-kernel-stream-9

Commit Graph

Author	SHA1	Message	Date
Waiman Long	6b484a545e	rcu: Support direct wake-up of synchronize_rcu() users JIRA: https://issues.redhat.com/browse/RHEL-55557 commit 462df2f543ae360e79fcaa1b498d2a1a0c2a5b63 Author: Uladzislau Rezki (Sony) <urezki@gmail.com> Date: Fri, 8 Mar 2024 18:34:07 +0100 rcu: Support direct wake-up of synchronize_rcu() users This patch introduces a small enhancement which allows to do a direct wake-up of synchronize_rcu() callers. It occurs after a completion of grace period, thus by the gp-kthread. Number of clients is limited by the hard-coded maximum allowed threshold. The remaining part, if still exists is deferred to a main worker. Link: https://lore.kernel.org/lkml/Zd0ZtNu+Rt0qXkfS@lothringen/ Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Waiman Long <longman@redhat.com>	2024-08-26 10:57:38 -04:00
Waiman Long	3097ec69ae	rcu: Add data structures for synchronize_rcu() JIRA: https://issues.redhat.com/browse/RHEL-55557 commit dfd458a95d78ce31855fe06bbfde4f4fe60c40db Author: Uladzislau Rezki (Sony) <urezki@gmail.com> Date: Fri, 8 Mar 2024 18:34:04 +0100 rcu: Add data structures for synchronize_rcu() The synchronize_rcu() call is going to be reworked, thus this patch adds dedicated fields into the rcu_state structure. Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Waiman Long <longman@redhat.com>	2024-08-26 10:57:36 -04:00
Waiman Long	7804cac54b	rcu: Make hotplug operations track GP state, not flags JIRA: https://issues.redhat.com/browse/RHEL-55557 commit ae2b217ab542d0db0ca1a6de4f442201a1982f00 Author: Paul E. McKenney <paulmck@kernel.org> Date: Fri, 8 Mar 2024 11:15:01 -0800 rcu: Make hotplug operations track GP state, not flags Currently, there are rcu_data structure fields named ->rcu_onl_gp_seq and ->rcu_ofl_gp_seq that track the rcu_state.gp_flags field at the time of the corresponding CPU's last online or offline operation, respectively. However, this information is not particularly useful. It would be better to instead track the grace period state kept in rcu_state.gp_state. This would also be consistent with the initialization in rcu_boot_init_percpu_data(), which is to RCU_GP_CLEANED (an rcu_state.gp_state value), and also with the diagnostics in rcu_implicit_dynticks_qs(), whose format is consistent with an integer, not a bitmask. This commit therefore makes this change and changes the names to ->rcu_onl_gp_flags and ->rcu_ofl_gp_flags, respectively. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Waiman Long <longman@redhat.com>	2024-08-26 10:57:32 -04:00
Waiman Long	c0a9325f29	rcu/exp: Remove rcu_par_gp_wq JIRA: https://issues.redhat.com/browse/RHEL-55557 commit 23da2ad64dbe9f3fab10af90484fe41e144337b1 Author: Frederic Weisbecker <frederic@kernel.org> Date: Fri, 12 Jan 2024 16:46:21 +0100 rcu/exp: Remove rcu_par_gp_wq TREE04 running on short iterations can produce writer stalls of the following kind: ??? Writer stall state RTWS_EXP_SYNC(4) g3968 f0x0 ->state 0x2 cpu 0 task:rcu_torture_wri state:D stack:14568 pid:83 ppid:2 flags:0x00004000 Call Trace: <TASK> __schedule+0x2de/0x850 ? trace_event_raw_event_rcu_exp_funnel_lock+0x6d/0xb0 schedule+0x4f/0x90 synchronize_rcu_expedited+0x430/0x670 ? __pfx_autoremove_wake_function+0x10/0x10 ? __pfx_synchronize_rcu_expedited+0x10/0x10 do_rtws_sync.constprop.0+0xde/0x230 rcu_torture_writer+0x4b4/0xcd0 ? __pfx_rcu_torture_writer+0x10/0x10 kthread+0xc7/0xf0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2f/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> Waiting for an expedited grace period and polling for an expedited grace period both are operations that internally rely on the same workqueue performing necessary asynchronous work. However, a dependency chain is involved between those two operations, as depicted below: ====== CPU 0 ======= ====== CPU 1 ======= synchronize_rcu_expedited() exp_funnel_lock() mutex_lock(&rcu_state.exp_mutex); start_poll_synchronize_rcu_expedited queue_work(rcu_gp_wq, &rnp->exp_poll_wq); synchronize_rcu_expedited_queue_work() queue_work(rcu_gp_wq, &rew->rew_work); wait_event() // A, wait for &rew->rew_work completion mutex_unlock() // B //======> switch to kworker sync_rcu_do_polled_gp() { synchronize_rcu_expedited() exp_funnel_lock() mutex_lock(&rcu_state.exp_mutex); // C, wait B .... } // D Since workqueues are usually implemented on top of several kworkers handling the queue concurrently, the above situation wouldn't deadlock most of the time because A then doesn't depend on D. But in case of memory stress, a single kworker may end up handling alone all the works in a serialized way. In that case the above layout becomes a problem because A then waits for D, closing a circular dependency: A -> D -> C -> B -> A This however only happens when CONFIG_RCU_EXP_KTHREAD=n. Indeed synchronize_rcu_expedited() is otherwise implemented on top of a kthread worker while polling still relies on rcu_gp_wq workqueue, breaking the above circular dependency chain. Fix this with making expedited grace period to always rely on kthread worker. The workqueue based implementation is essentially a duplicate anyway now that the per-node initialization is performed by per-node kthread workers. Meanwhile the CONFIG_RCU_EXP_KTHREAD switch is still kept around to manage the scheduler policy of these kthread workers. Reported-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Reported-by: Thomas Gleixner <tglx@linutronix.de> Suggested-by: Joel Fernandes <joel@joelfernandes.org> Suggested-by: Paul E. McKenney <paulmck@kernel.org> Suggested-by: Neeraj upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Waiman Long <longman@redhat.com>	2024-08-26 10:57:15 -04:00
Waiman Long	d5ad8ad294	rcu/exp: Make parallel exp gp kworker per rcu node JIRA: https://issues.redhat.com/browse/RHEL-55557 commit 8e5e621566485a3e160c0d8bfba206cb1d6b980d Author: Frederic Weisbecker <frederic@kernel.org> Date: Fri, 12 Jan 2024 16:46:19 +0100 rcu/exp: Make parallel exp gp kworker per rcu node When CONFIG_RCU_EXP_KTHREAD=n, the expedited grace period per node initialization is performed in parallel via workqueues (one work per node). However in CONFIG_RCU_EXP_KTHREAD=y, this per node initialization is performed by a single kworker serializing each node initialization (one work for all nodes). The second part is certainly less scalable and efficient beyond a single leaf node. To improve this, expand this single kworker into per-node kworkers. This new layout is eventually intended to remove the workqueues based implementation since it will essentially now become duplicate code. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Waiman Long <longman@redhat.com>	2024-08-26 10:57:13 -04:00
Waiman Long	119acfe64c	rcu: s/boost_kthread_mutex/kthread_mutex JIRA: https://issues.redhat.com/browse/RHEL-55557 commit 7836b270607676ed1c0c6a4a840a2ede9437a6a1 Author: Frederic Weisbecker <frederic@kernel.org> Date: Fri, 12 Jan 2024 16:46:17 +0100 rcu: s/boost_kthread_mutex/kthread_mutex This mutex is currently protecting per node boost kthreads creation and affinity setting across CPU hotplug operations. Since the expedited kworkers will soon be split per node as well, they will be subject to the same concurrency constraints against hotplug. Therefore their creation and affinity tuning operations will be grouped with those of boost kthreads and then rely on the same mutex. To prepare for that, generalize its name. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Waiman Long <longman@redhat.com>	2024-08-26 10:57:12 -04:00
Waiman Long	be64d66573	rcu/nocb: Re-arrange call_rcu() NOCB specific code JIRA: https://issues.redhat.com/browse/RHEL-55557 commit afd4e6964745ed98b74cacdcce21d73280a0a253 Author: Frederic Weisbecker <frederic@kernel.org> Date: Tue, 9 Jan 2024 23:24:01 +0100 rcu/nocb: Re-arrange call_rcu() NOCB specific code Currently the call_rcu() function interleaves NOCB and !NOCB enqueue code in a complicated way such that: * The bypass enqueue code may or may not have enqueued and may or may not have locked the ->nocb_lock. Everything that follows is in a Schrödinger locking state for the unwary reviewer's eyes. * The was_alldone is always set but only used in NOCB related code. * The NOCB wake up is distantly related to the locking hopefully performed by the bypass enqueue code that did not enqueue on the bypass list. Unconfuse the whole and gather NOCB and !NOCB specific enqueue code to their own functions. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Waiman Long <longman@redhat.com>	2024-08-26 10:57:09 -04:00
Waiman Long	b91a2b524c	rcu/tree: Defer setting of jiffies during stall reset JIRA: https://issues.redhat.com/browse/RHEL-34076 commit b96e7a5fa0ba9cda32888e04f8f4bac42d49a7f8 Author: Joel Fernandes (Google) <joel@joelfernandes.org> Date: Tue, 5 Sep 2023 00:02:11 +0000 rcu/tree: Defer setting of jiffies during stall reset There are instances where rcu_cpu_stall_reset() is called when jiffies did not get a chance to update for a long time. Before jiffies is updated, the CPU stall detector can go off triggering false-positives where a just-started grace period appears to be ages old. In the past, we disabled stall detection in rcu_cpu_stall_reset() however this got changed [1]. This is resulting in false-positives in KGDB usecase [2]. Fix this by deferring the update of jiffies to the third run of the FQS loop. This is more robust, as, even if rcu_cpu_stall_reset() is called just before jiffies is read, we would end up pushing out the jiffies read by 3 more FQS loops. Meanwhile the CPU stall detection will be delayed and we will not get any false positives. [1] https://lore.kernel.org/all/20210521155624.174524-2-senozhatsky@chromium.org/ [2] https://lore.kernel.org/all/20230814020045.51950-2-chenhuacai@loongson.cn/ Tested with rcutorture.cpu_stall option as well to verify stall behavior with/without patch. Tested-by: Huacai Chen <chenhuacai@loongson.cn> Reported-by: Binbin Zhou <zhoubinbin@loongson.cn> Closes: https://lore.kernel.org/all/20230814020045.51950-2-chenhuacai@loongson.cn/ Suggested-by: Paul McKenney <paulmck@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: a80be428fbc1 ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()") Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Waiman Long <longman@redhat.com>	2024-05-31 10:56:18 -04:00
Waiman Long	576c46a4e4	rcu: Add RCU stall diagnosis information JIRA: https://issues.redhat.com/browse/RHEL-5228 Conflicts: Upstream merge conflicts in kernel/rcu/{rcu.h,update.c} and Documentation/admin-guide/kernel-parameters.txt with commit 92987fe8bdd1 ("rcu: Allow expedited RCU CPU stall warnings to dump task stacks"). Resolved according to upstream merge commit bba8d3d17dc2 ("Merge branch 'stall.2023.01.09a' into HEAD"). commit be42f00b73a0f50710d16eb7cb4efda0cce062dd Author: Zhen Lei <thunder.leizhen@huawei.com> Date: Sat, 19 Nov 2022 17:25:06 +0800 rcu: Add RCU stall diagnosis information Because RCU CPU stall warnings are driven from the scheduling-clock interrupt handler, a workload consisting of a very large number of short-duration hardware interrupts can result in misleading stall-warning messages. On systems supporting only a single level of interrupts, that is, where interrupts handlers cannot be interrupted, this can produce misleading diagnostics. The stack traces will show the innocent-bystander interrupted task, not the interrupts that are at the very least exacerbating the stall. This situation can be improved by displaying the number of interrupts and the CPU time that they have consumed. Diagnosing other types of stalls can be eased by also providing the count of softirqs and the CPU time that they consumed as well as the number of context switches and the task-level CPU time consumed. Consider the following output given this change: rcu: INFO: rcu_preempt self-detected stall on CPU rcu: 0-....: (1250 ticks this GP) <omitted> rcu: hardirqs softirqs csw/system rcu: number: 624 45 0 rcu: cputime: 69 1 2425 ==> 2500(ms) This output shows that the number of hard and soft interrupts is small, there are no context switches, and the system takes up a lot of time. This indicates that the current task is looping with preemption disabled. The impact on system performance is negligible because snapshot is recorded only once for all continuous RCU stalls. This added debugging information is suppressed by default and can be enabled by building the kernel with CONFIG_RCU_CPU_STALL_CPUTIME=y or by booting with rcupdate.rcu_cpu_stall_cputime=1. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Waiman Long <longman@redhat.com>	2023-09-22 13:21:38 -04:00
Waiman Long	3f47536644	rcu: Make call_rcu() lazy to save power JIRA: https://issues.redhat.com/browse/RHEL-5228 commit 3cb278e73be58bfb780ecd55129296d2f74c1fb7 Author: Joel Fernandes (Google) <joel@joelfernandes.org> Date: Sun, 16 Oct 2022 16:22:54 +0000 rcu: Make call_rcu() lazy to save power Implement timer-based RCU callback batching (also known as lazy callbacks). With this we save about 5-10% of power consumed due to RCU requests that happen when system is lightly loaded or idle. By default, all async callbacks (queued via call_rcu) are marked lazy. An alternate API call_rcu_hurry() is provided for the few users, for example synchronize_rcu(), that need the old behavior. The batch is flushed whenever a certain amount of time has passed, or the batch on a particular CPU grows too big. Also memory pressure will flush it in a future patch. To handle several corner cases automagically (such as rcu_barrier() and hotplug), we re-use bypass lists which were originally introduced to address lock contention, to handle lazy CBs as well. The bypass list length has the lazy CB length included in it. A separate lazy CB length counter is also introduced to keep track of the number of lazy CBs. [ paulmck: Fix formatting of inline call_rcu_lazy() definition. ] [ paulmck: Apply Zqiang feedback. ] [ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ] Suggested-by: Paul McKenney <paulmck@kernel.org> Acked-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Waiman Long <longman@redhat.com>	2023-09-22 13:21:15 -04:00
Waiman Long	703b79599b	rcu: Fix missing nocb gp wake on rcu_barrier() JIRA: https://issues.redhat.com/browse/RHEL-5228 commit b8f7aca3f0e0e6223094ba2662bac90353674b04 Author: Frederic Weisbecker <frederic@kernel.org> Date: Sun, 16 Oct 2022 16:22:53 +0000 rcu: Fix missing nocb gp wake on rcu_barrier() In preparation for RCU lazy changes, wake up the RCU nocb gp thread if needed after an entrain. This change prevents the RCU barrier callback from waiting in the queue for several seconds before the lazy callbacks in front of it are serviced. Reported-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Waiman Long <longman@redhat.com>	2023-09-22 13:21:10 -04:00
Waiman Long	e580bb0d98	rcu: Add polled expedited grace-period primitives Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516 commit d96c52fe4907c68adc5e61a0bef7aec0933223d5 Author: Paul E. McKenney <paulmck@kernel.org> Date: Fri, 15 Apr 2022 10:55:42 -0700 rcu: Add polled expedited grace-period primitives This commit adds expedited grace-period functionality to RCU's polled grace-period API, adding start_poll_synchronize_rcu_expedited() and cond_synchronize_rcu_expedited(), which are similar to the existing start_poll_synchronize_rcu() and cond_synchronize_rcu() functions, respectively. Note that although start_poll_synchronize_rcu_expedited() can be invoked very early, the resulting expedited grace periods are not guaranteed to start until after workqueues are fully initialized. On the other hand, both synchronize_rcu() and synchronize_rcu_expedited() can also be invoked very early, and the resulting grace periods will be taken into account as they occur. [ paulmck: Apply feedback from Neeraj Upadhyay. ] Link: https://lore.kernel.org/all/20220121142454.1994916-1-bfoster@redhat.com/ Link: https://docs.google.com/document/d/1RNKWW9jQyfjxw2E8dsXVTdvZYh0HnYeSHDKog9jhdN8/edit?usp=sharing Cc: Brian Foster <bfoster@redhat.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ian Kent <raven@themaw.net> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Waiman Long <longman@redhat.com>	2023-03-30 08:47:52 -04:00
Waiman Long	ce330fc3bc	rcu: Make polled grace-period API account for expedited grace periods Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516 commit dd04140531b5d38b77ad9ff7b18117654be5bf5c Author: Paul E. McKenney <paulmck@kernel.org> Date: Thu, 14 Apr 2022 06:56:35 -0700 rcu: Make polled grace-period API account for expedited grace periods Currently, this code could splat: oldstate = get_state_synchronize_rcu(); synchronize_rcu_expedited(); WARN_ON_ONCE(!poll_state_synchronize_rcu(oldstate)); This situation is counter-intuitive and user-unfriendly. After all, there really was a perfectly valid full grace period right after the call to get_state_synchronize_rcu(), so why shouldn't poll_state_synchronize_rcu() know about it? This commit therefore makes the polled grace-period API aware of expedited grace periods in addition to the normal grace periods that it is already aware of. With this change, the above code is guaranteed not to splat. Please note that the above code can still splat due to counter wrap on the one hand and situations involving partially overlapping normal/expedited grace periods on the other. On 64-bit systems, the second is of course much more likely than the first. It is possible to modify this approach to prevent overlapping grace periods from causing splats, but only at the expense of greatly increasing the probability of counter wrap, as in within milliseconds on 32-bit systems and within minutes on 64-bit systems. This commit is in preparation for polled expedited grace periods. Link: https://lore.kernel.org/all/20220121142454.1994916-1-bfoster@redhat.com/ Link: https://docs.google.com/document/d/1RNKWW9jQyfjxw2E8dsXVTdvZYh0HnYeSHDKog9jhdN8/edit?usp=sharing Cc: Brian Foster <bfoster@redhat.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ian Kent <raven@themaw.net> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Waiman Long <longman@redhat.com>	2023-03-30 08:47:51 -04:00
Waiman Long	7df8a78b55	rcu: Switch polled grace-period APIs to ->gp_seq_polled Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516 commit bf95b2bc3e42f11f4d7a5e8a98376c2b4a2aa82f Author: Paul E. McKenney <paulmck@kernel.org> Date: Wed, 13 Apr 2022 17:46:15 -0700 rcu: Switch polled grace-period APIs to ->gp_seq_polled This commit switches the existing polled grace-period APIs to use a new ->gp_seq_polled counter in the rcu_state structure. An additional ->gp_seq_polled_snap counter in that same structure allows the normal grace period kthread to interact properly with the !SMP !PREEMPT fastpath through synchronize_rcu(). The first of the two to note the end of a given grace period will make knowledge of this transition available to the polled API. This commit is in preparation for polled expedited grace periods. [ paulmck: Fix use of rcu_state.gp_seq_polled to start normal grace period. ] Link: https://lore.kernel.org/all/20220121142454.1994916-1-bfoster@redhat.com/ Link: https://docs.google.com/document/d/1RNKWW9jQyfjxw2E8dsXVTdvZYh0HnYeSHDKog9jhdN8/edit?usp=sharing Cc: Brian Foster <bfoster@redhat.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ian Kent <raven@themaw.net> Co-developed-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Waiman Long <longman@redhat.com>	2023-03-30 08:47:51 -04:00
Waiman Long	3cd6c37180	rcu: Add nocb_cb_kthread check to rcu_is_callbacks_kthread() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516 commit 5103850654fdc651f0a7076ac753b958f018bb85 Author: Zqiang <qiang1.zhang@intel.com> Date: Fri, 29 Apr 2022 20:42:22 +0800 rcu: Add nocb_cb_kthread check to rcu_is_callbacks_kthread() Callbacks are invoked in RCU kthreads when calbacks are offloaded (rcu_nocbs boot parameter) or when RCU's softirq handler has been offloaded to rcuc kthreads (use_softirq==0). The current code allows for the rcu_nocbs case but not the use_softirq case. This commit adds support for the use_softirq case. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Waiman Long <longman@redhat.com>	2023-03-30 08:36:22 -04:00
Waiman Long	8c6af96a89	rcu/nocb: Add/del rdp to iterate from rcuog itself Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516 commit 1598f4a4762be0ea6a1bcd229c2c9ff1ebb212bb Author: Frederic Weisbecker <frederic@kernel.org> Date: Tue, 19 Apr 2022 14:23:18 +0200 rcu/nocb: Add/del rdp to iterate from rcuog itself NOCB rdp's are part of a group whose list is iterated by the corresponding rdp leader. This list is RCU traversed because an rdp can be either added or deleted concurrently. Upon addition, a new iteration to the list after a synchronization point (a pair of LOCK/UNLOCK ->nocb_gp_lock) is forced to make sure: 1) we didn't miss a new element added in the middle of an iteration 2) we didn't ignore a whole subset of the list due to an element being quickly deleted and then re-added. 3) we prevent from probably other surprises... Although this layout is expected to be safe, it doesn't help anybody to sleep well. Simplify instead the nocb state toggling with moving the list modification from the nocb (de-)offloading workqueue to the rcuog kthreads instead. Whenever the rdp leader is expected to (re-)set the SEGCBLIST_KTHREAD_GP flag of a target rdp, the latter is queued so that the leader handles the flag flip along with adding or deleting the target rdp to the list to iterate. This way the list modification and iteration happen from the same kthread and those operations can't race altogether. As a bonus, the flags for each rdp don't need to be checked locklessly before each iteration, which is one less opportunity to produce nightmares. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Uladzislau Rezki <uladzislau.rezki@sony.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Waiman Long <longman@redhat.com>	2023-03-30 08:36:21 -04:00
Waiman Long	5b925bf582	rcu/context-tracking: Move RCU-dynticks internal functions to context_tracking Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516 commit 172114552701b85d5c3b1a089a73ee85d0d7786b Author: Frederic Weisbecker <frederic@kernel.org> Date: Wed, 8 Jun 2022 16:40:33 +0200 rcu/context-tracking: Move RCU-dynticks internal functions to context_tracking Move the core RCU eqs/dynticks functions to context tracking so that we can later merge all that code within context tracking. Acked-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Uladzislau Rezki <uladzislau.rezki@sony.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Nicolas Saenz Julienne <nsaenz@kernel.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com> Cc: Yu Liao <liaoyu15@huawei.com> Cc: Phil Auld <pauld@redhat.com> Cc: Paul Gortmaker<paul.gortmaker@windriver.com> Cc: Alex Belits <abelits@marvell.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Signed-off-by: Waiman Long <longman@redhat.com>	2023-03-30 08:36:18 -04:00
Waiman Long	e0440c243a	rcu/context_tracking: Move dynticks_nmi_nesting to context tracking Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516 commit 95e04f48ec0a634e2f221081f5fa1a904755f326 Author: Frederic Weisbecker <frederic@kernel.org> Date: Wed, 8 Jun 2022 16:40:31 +0200 rcu/context_tracking: Move dynticks_nmi_nesting to context tracking The RCU eqs tracking is going to be performed by the context tracking subsystem. The related nesting counters thus need to be moved to the context tracking structure. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Uladzislau Rezki <uladzislau.rezki@sony.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Nicolas Saenz Julienne <nsaenz@kernel.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com> Cc: Yu Liao <liaoyu15@huawei.com> Cc: Phil Auld <pauld@redhat.com> Cc: Paul Gortmaker<paul.gortmaker@windriver.com> Cc: Alex Belits <abelits@marvell.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Signed-off-by: Waiman Long <longman@redhat.com>	2023-03-30 08:36:17 -04:00
Waiman Long	c1013cee1d	rcu/context_tracking: Move dynticks_nesting to context tracking Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516 commit 904e600e60f46f92eb4bcfb95788b1fedf7e8237 Author: Frederic Weisbecker <frederic@kernel.org> Date: Wed, 8 Jun 2022 16:40:30 +0200 rcu/context_tracking: Move dynticks_nesting to context tracking The RCU eqs tracking is going to be performed by the context tracking subsystem. The related nesting counters thus need to be moved to the context tracking structure. Acked-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Uladzislau Rezki <uladzislau.rezki@sony.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Nicolas Saenz Julienne <nsaenz@kernel.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com> Cc: Yu Liao <liaoyu15@huawei.com> Cc: Phil Auld <pauld@redhat.com> Cc: Paul Gortmaker<paul.gortmaker@windriver.com> Cc: Alex Belits <abelits@marvell.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Signed-off-by: Waiman Long <longman@redhat.com>	2023-03-30 08:36:17 -04:00
Waiman Long	8640b64310	rcu/context_tracking: Move dynticks counter to context tracking Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516 commit 62e2412df4b90ae6706ce1f1a9649b789b2e44ef Author: Frederic Weisbecker <frederic@kernel.org> Date: Wed, 8 Jun 2022 16:40:29 +0200 rcu/context_tracking: Move dynticks counter to context tracking In order to prepare for merging RCU dynticks counter into the context tracking state, move the rcu_data's dynticks field to the context tracking structure. It will later be mixed within the context tracking state itself. [ paulmck: Move enum ctx_state into global scope. ] Acked-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Uladzislau Rezki <uladzislau.rezki@sony.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Nicolas Saenz Julienne <nsaenz@kernel.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com> Cc: Yu Liao <liaoyu15@huawei.com> Cc: Phil Auld <pauld@redhat.com> Cc: Paul Gortmaker<paul.gortmaker@windriver.com> Cc: Alex Belits <abelits@marvell.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Signed-off-by: Waiman Long <longman@redhat.com>	2023-03-30 08:36:17 -04:00
Waiman Long	d45fbffb5b	rcu: Move expedited grace period (GP) work to RT kthread_worker Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491 Conflicts: 1) A merge conflict in kernel/rcu/rcu.h due to upstream merge conflict with commit 99d6a2acb895 ("rcutorture: Suppress debugging grace period delays during flooding"). Manually merge according to upstream merge commit ce13389053a3. 2) A fuzz in kernel/rcu/tree.c due to upstream merge conflict with commit 87c5adf06bfb ("rcu/nocb: Initialize nocb kthreads only for boot CPU prior SMP initialization") and commit 3352911fa9b4 ("rcu: Initialize boost kthread only for boot node prior SMP initialization"). See upstream merge commit ce13389053a3. commit 9621fbee44df940e2e1b94b0676460a538dffefa Author: Kalesh Singh <kaleshsingh@google.com> Date: Fri, 8 Apr 2022 17:35:27 -0700 rcu: Move expedited grace period (GP) work to RT kthread_worker Enabling CONFIG_RCU_BOOST did not reduce RCU expedited grace-period latency because its workqueues run at SCHED_OTHER, and thus can be delayed by normal processes. This commit avoids these delays by moving the expedited GP work items to a real-time-priority kthread_worker. This option is controlled by CONFIG_RCU_EXP_KTHREAD and disabled by default on PREEMPT_RT=y kernels which disable expedited grace periods after boot by unconditionally setting rcupdate.rcu_normal_after_boot=1. The results were evaluated on arm64 Android devices (6GB ram) running 5.10 kernel, and capturing trace data in critical user-level code. The table below shows the resulting order-of-magnitude improvements in synchronize_rcu_expedited() latency: ------------------------------------------------------------------------ \| \| workqueues \| kthread_worker \| Diff \| ------------------------------------------------------------------------ \| Count \| 725 \| 688 \| \| ------------------------------------------------------------------------ \| Min Duration (ns) \| 326 \| 447 \| 37.12% \| ------------------------------------------------------------------------ \| Q1 (ns) \| 39,428 \| 38,971 \| -1.16% \| ------------------------------------------------------------------------ \| Q2 - Median (ns) \| 98,225 \| 69,743 \| -29.00% \| ------------------------------------------------------------------------ \| Q3 (ns) \| 342,122 \| 126,638 \| -62.98% \| ------------------------------------------------------------------------ \| Max Duration (ns) \| 372,766,967 \| 2,329,671 \| -99.38% \| ------------------------------------------------------------------------ \| Avg Duration (ns) \| 2,746,353 \| 151,242 \| -94.49% \| ------------------------------------------------------------------------ \| Standard Deviation (ns) \| 19,327,765 \| 294,408 \| \| ------------------------------------------------------------------------ The below table show the range of maximums/minimums for synchronize_rcu_expedited() latency from all experiments: ------------------------------------------------------------------------ \| \| workqueues \| kthread_worker \| Diff \| ------------------------------------------------------------------------ \| Total No. of Experiments \| 25 \| 23 \| \| ------------------------------------------------------------------------ \| Largest Maximum (ns) \| 372,766,967 \| 2,329,671 \| -99.38% \| ------------------------------------------------------------------------ \| Smallest Maximum (ns) \| 38,819 \| 86,954 \| 124.00% \| ------------------------------------------------------------------------ \| Range of Maximums (ns) \| 372,728,148 \| 2,242,717 \| \| ------------------------------------------------------------------------ \| Largest Minimum (ns) \| 88,623 \| 27,588 \| -68.87% \| ------------------------------------------------------------------------ \| Smallest Minimum (ns) \| 326 \| 447 \| 37.12% \| ------------------------------------------------------------------------ \| Range of Minimums (ns) \| 88,297 \| 27,141 \| \| ------------------------------------------------------------------------ Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Tejun Heo <tj@kernel.org> Reported-by: Tim Murray <timmurray@google.com> Reported-by: Wei Wang <wvw@google.com> Tested-by: Kyle Lin <kylelin@google.com> Tested-by: Chunwei Lu <chunweilu@google.com> Tested-by: Lulu Wang <luluw@google.com> Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Waiman Long <longman@redhat.com>	2022-08-30 17:38:28 -04:00
Waiman Long	f12dfd4e5c	rcu: Check for jiffies going backwards Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491 commit c708b08c65a0dfae127b9ee33b0fb73535a5e066 Author: Paul E. McKenney <paulmck@kernel.org> Date: Wed, 23 Feb 2022 17:29:37 -0800 rcu: Check for jiffies going backwards A report of a 12-jiffy normal RCU CPU stall warning raises interesting questions about the nature of time on the offending system. This commit instruments rcu_sched_clock_irq(), which is RCU's hook into the scheduling-clock interrupt, checking for the jiffies counter going backwards. Reported-by: Saravanan D <sarvanand@fb.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Waiman Long <longman@redhat.com>	2022-08-30 17:22:11 -04:00
Waiman Long	c54a776b65	rcu/nocb: Initialize nocb kthreads only for boot CPU prior SMP initialization Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491 commit 87c5adf06bfbf14c9d13e59d5d174ff5f2aafc0e Author: Frederic Weisbecker <frederic@kernel.org> Date: Wed, 16 Feb 2022 16:42:08 +0100 rcu/nocb: Initialize nocb kthreads only for boot CPU prior SMP initialization The rcu_spawn_gp_kthread() function is called as an early initcall, which means that SMP initialization hasn't happened yet and only the boot CPU is online. Therefore, create only the NOCB kthreads related to the boot CPU. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Uladzislau Rezki <uladzislau.rezki@sony.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Waiman Long <longman@redhat.com>	2022-08-30 17:22:01 -04:00
Waiman Long	b19ed13b34	rcu: Initialize boost kthread only for boot node prior SMP initialization Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491 commit 3352911fa9b47a90165e5c6fed440048c55146d1 Author: Frederic Weisbecker <frederic@kernel.org> Date: Wed, 16 Feb 2022 16:42:07 +0100 rcu: Initialize boost kthread only for boot node prior SMP initialization The rcu_spawn_gp_kthread() function is called as an early initcall, which means that SMP initialization hasn't happened yet and only the boot CPU is online. Therefore, create only the boost kthread for the leaf node of the boot CPU. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Uladzislau Rezki <uladzislau.rezki@sony.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Waiman Long <longman@redhat.com>	2022-08-30 17:22:01 -04:00
Waiman Long	28c195fff4	rcu/nocb: Move rcu_nocb_is_setup to rcu_state Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491 commit 8d2aaa9b7c290e766a41f29c71ec72192851d538 Author: Frederic Weisbecker <frederic@kernel.org> Date: Mon, 14 Feb 2022 14:23:39 +0100 rcu/nocb: Move rcu_nocb_is_setup to rcu_state This commit moves the RCU nocb initialization witness within rcu_state to consolidate RCU's global state. Reported-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Uladzislau Rezki <uladzislau.rezki@sony.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Waiman Long <longman@redhat.com>	2022-08-30 17:22:00 -04:00
Waiman Long	a9408fae13	rcu: Add per-CPU rcuc task dumps to RCU CPU stall warnings Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713 commit c9515875850fefcc79492c5189fe8431e75ddec5 Author: Zqiang <qiang1.zhang@intel.com> Date: Tue, 25 Jan 2022 10:47:44 +0800 rcu: Add per-CPU rcuc task dumps to RCU CPU stall warnings When the rcutree.use_softirq kernel boot parameter is set to zero, all RCU_SOFTIRQ processing is carried out by the per-CPU rcuc kthreads. If these kthreads are being starved, quiescent states will not be reported, which in turn means that the grace period will not end, which can in turn trigger RCU CPU stall warnings. This commit therefore dumps stack traces of stalled CPUs' rcuc kthreads, which can help identify what is preventing those kthreads from running. Suggested-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Reviewed-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Waiman Long <longman@redhat.com>	2022-05-12 08:30:04 -04:00
Waiman Long	4c01b1af26	rcu: Make rcu_barrier() no longer block CPU-hotplug operations Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713 commit 80b3fd474c91b3ecfd845b4a0bfb58706b877ba5 Author: Paul E. McKenney <paulmck@kernel.org> Date: Tue, 14 Dec 2021 13:35:17 -0800 rcu: Make rcu_barrier() no longer block CPU-hotplug operations This commit removes the cpus_read_lock() and cpus_read_unlock() calls from rcu_barrier(), thus allowing CPUs to come and go during the course of rcu_barrier() execution. Posting of the ->barrier_head callbacks does synchronize with portions of RCU's CPU-hotplug notifiers, but these locks are held for short time periods on both sides. Thus, full CPU-hotplug operations could both start and finish during the execution of a given rcu_barrier() invocation. Additional synchronization is provided by a global ->barrier_lock. Since the ->barrier_lock is only used during rcu_barrier() execution and during onlining/offlining a CPU, the contention for this lock should be low. It might be tempting to make use of a per-CPU lock just on general principles, but straightforward attempts to do this have the problems shown below. Initial state: 3 CPUs present, CPU 0 and CPU1 do not have any callback and CPU2 has callbacks. 1. CPU0 calls rcu_barrier(). 2. CPU1 starts offlining for CPU2. CPU1 calls rcutree_migrate_callbacks(). rcu_barrier_entrain() is called from rcutree_migrate_callbacks(), with CPU2's rdp->barrier_lock. It does not entrain ->barrier_head for CPU2, as rcu_barrier() on CPU0 hasn't started the barrier sequence (by calling rcu_seq_start(&rcu_state.barrier_sequence)) yet. 3. CPU0 starts new barrier sequence. It iterates over CPU0 and CPU1, after acquiring their per-cpu ->barrier_lock and finds 0 segcblist length. It updates ->barrier_seq_snap for CPU0 and CPU1 and continues loop iteration to CPU2. for_each_possible_cpu(cpu) { raw_spin_lock_irqsave(&rdp->barrier_lock, flags); if (!rcu_segcblist_n_cbs(&rdp->cblist)) { WRITE_ONCE(rdp->barrier_seq_snap, gseq); raw_spin_unlock_irqrestore(&rdp->barrier_lock, flags); rcu_barrier_trace(TPS("NQ"), cpu, rcu_state.barrier_sequence); continue; } 4. rcutree_migrate_callbacks() completes execution on CPU1. Segcblist len for CPU2 becomes 0. 5. The loop iteration on CPU0, checks rcu_segcblist_n_cbs(&rdp->cblist) for CPU2 and completes the loop iteration after setting ->barrier_seq_snap. 6. As there isn't any ->barrier_head callback entrained; at this point, rcu_barrier() in CPU0 returns. 7. The callbacks, which migrated from CPU2 to CPU1, execute. Straightforward per-CPU locking is also subject to the following race condition noted by Boqun Feng: 1. CPU0 calls rcu_barrier(), starting a new barrier sequence by invoking rcu_seq_start() and init_completion(), but does not yet initialize rcu_state.barrier_cpu_count. 2. CPU1 starts offlining for CPU2, calling rcutree_migrate_callbacks(), which in turn calls rcu_barrier_entrain() holding CPU2's. rdp->barrier_lock. It then entrains ->barrier_head for CPU2 and atomically increments rcu_state.barrier_cpu_count, which is unfortunately not yet initialized to the value 2. 3. The just-entrained RCU callback is invoked. It atomically decrements rcu_state.barrier_cpu_count and sees that it is now zero. This callback therefore invokes complete(). 4. CPU0 continues executing rcu_barrier(), but is not blocked by its call to wait_for_completion(). This results in rcu_barrier() returning before all pre-existing callbacks have been invoked, which is a bug. Therefore, synchronization is provided by rcu_state.barrier_lock, which is also held across the initialization sequence, especially the rcu_seq_start() and the atomic_set() that sets rcu_state.barrier_cpu_count to the value 2. In addition, this lock is held when entraining the rcu_barrier() callback, when deciding whether or not a CPU has callbacks that rcu_barrier() must wait on, when setting the ->qsmaskinitnext for incoming CPUs, and when migrating callbacks from a CPU that is going offline. Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Waiman Long <longman@redhat.com>	2022-05-12 08:25:57 -04:00
Waiman Long	6d38f5233d	rcu: Rework rcu_barrier() and callback-migration logic Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713 commit a16578dd5e3a44b53ca0699ac2971679dab97484 Author: Paul E. McKenney <paulmck@kernel.org> Date: Tue, 14 Dec 2021 13:15:18 -0800 rcu: Rework rcu_barrier() and callback-migration logic This commit reworks rcu_barrier() and callback-migration logic to permit allowing rcu_barrier() to run concurrently with CPU-hotplug operations. The key trick is for callback migration to check to see if an rcu_barrier() is in flight, and, if so, enqueue the ->barrier_head callback on its behalf. This commit adds synchronization with RCU's CPU-hotplug notifiers. Taken together, this will permit a later commit to remove the cpus_read_lock() and cpus_read_unlock() calls from rcu_barrier(). [ paulmck: Updated per kbuild test robot feedback. ] [ paulmck: Updated per reviews session with Neeraj, Frederic, Uladzislau, and Boqun. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Waiman Long <longman@redhat.com>	2022-05-12 08:25:56 -04:00
Waiman Long	5bef7666bb	rcu: Remove unused rcu_state.boost Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713 Conflicts: Fuzz in rcu_spawn_one_boost_kthread() due to upstream commit conflict as shown in merge commit d5578190bed3. commit eae9f147a4b02e132187a2d88a403b9ccc28212a Author: Neeraj Upadhyay <quic_neeraju@quicinc.com> Date: Mon, 13 Dec 2021 12:32:09 +0530 rcu: Remove unused rcu_state.boost Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Waiman Long <longman@redhat.com>	2022-05-12 08:25:55 -04:00
Waiman Long	8c2518377e	rcu/nocb: Handle concurrent nocb kthreads creation Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713 commit 02e3024175274ed4bf7912e7a1281b300cec76b5 Author: Neeraj Upadhyay <quic_neeraju@quicinc.com> Date: Sat, 11 Dec 2021 22:31:39 +0530 rcu/nocb: Handle concurrent nocb kthreads creation When multiple CPUs in the same nocb gp/cb group concurrently come online, they might try to concurrently create the same rcuog kthread. Fix this by using nocb gp CPU's spawn mutex to provide mutual exclusion for the rcuog kthread creation code. [ paulmck: Whitespace fixes per kernel test robot feedback. ] Acked-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Waiman Long <longman@redhat.com>	2022-05-12 08:25:54 -04:00
Waiman Long	ba1bfcb746	rcu: Add mutex for rcu boost kthread spawning and affinity setting Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713 Conflicts: A fuzz in rcu_boost_kthread_setaffinity() of kernel/rcu/tree_plugin.h due to the presence of a later ustream commit 04d4e665a609 ("sched/isolation: Use single feature type while referring to housekeeping cpumask"). commit 218b957a6959a2fb5b3967fc824072bb89ac2611 Author: David Woodhouse <dwmw@amazon.co.uk> Date: Wed, 8 Dec 2021 23:41:53 +0000 rcu: Add mutex for rcu boost kthread spawning and affinity setting As we handle parallel CPU bringup, we will need to take care to avoid spawning multiple boost threads, or race conditions when setting their affinity. Spotted by Paul McKenney. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Waiman Long <longman@redhat.com>	2022-05-12 08:25:17 -04:00
Waiman Long	5824fc0262	rcu: Kill rnp->ofl_seq and use only rcu_state.ofl_lock for exclusion Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713 commit 82980b1622d97017053c6792382469d7dc26a486 Author: David Woodhouse <dwmw@amazon.co.uk> Date: Tue, 16 Feb 2021 15:04:34 +0000 rcu: Kill rnp->ofl_seq and use only rcu_state.ofl_lock for exclusion If we allow architectures to bring APs online in parallel, then we end up requiring rcu_cpu_starting() to be reentrant. But currently, the manipulation of rnp->ofl_seq is not thread-safe. However, rnp->ofl_seq is also fairly much pointless anyway since both rcu_cpu_starting() and rcu_report_dead() hold rcu_state.ofl_lock for fairly much the whole time that rnp->ofl_seq is set to an odd number to indicate that an operation is in progress. So drop rnp->ofl_seq completely, and use only rcu_state.ofl_lock. This has a couple of minor complexities: lockdep will complain when we take rcu_state.ofl_lock, and currently accepts the 'excuse' of having an odd value in rnp->ofl_seq. So switch it to an arch_spinlock_t to avoid that false positive complaint. Since we're killing rnp->ofl_seq of course that 'excuse' has to be changed too, so make it check for arch_spin_is_locked(rcu_state.ofl_lock). There's no arch_spin_lock_irqsave() so we have to manually save and restore local interrupts around the locking. At Paul's request based on Neeraj's analysis, make rcu_gp_init not just wait but exclude any CPU online/offline activity, which was fairly much true already by virtue of it holding rcu_state.ofl_lock. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Waiman Long <longman@redhat.com>	2022-05-12 08:19:35 -04:00
Patrick Talbert	ea38048f36	Merge: rcu: Backport upstream RCU related commits up to v5.17 MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/602 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994 MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/602 This patch series backport upstream RCU and various torture tests up to v5.17 kernel. Beside patch 10 which has a merge conflict due to upstream merge conflict, the other patches are all applied cleanly with any issue. Signed-off-by: Waiman Long <longman@redhat.com> ~~~ Waiman Long (112): torture: Apply CONFIG_KCSAN_STRICT to kvm.sh --kcsan argument torture: Make torture.sh print the number of files to be compressed rcu-nocb: Fix a couple of tree_nocb code-style nits rcu: Eliminate rcu_implicit_dynticks_qs() local variable rnhqp rcu: Eliminate rcu_implicit_dynticks_qs() local variable ruqp doc: Add another stall-warning root cause in stallwarn.rst rcu: Fix undefined Kconfig macros rcu: Comment rcu_gp_init() code waiting for CPU-hotplug operations rcu-tasks: Simplify trc_read_check_handler() atomic operations rcu-tasks: Add trc_inspect_reader() checks for exiting critical section rcu-tasks: Remove second argument of rcu_read_unlock_trace_special() rcu: Move rcu_dynticks_eqs_online() to rcu_cpu_starting() rcu: Simplify rcu_report_dead() call to rcu_report_exp_rdp() rcu: Make rcutree_dying_cpu() use its "cpu" parameter rcu-tasks: Wait for trc_read_check_handler() IPIs rcutorture: Suppressing read-exit testing is not an error rcu-tasks: Fix s/instruction/instructions/ typo in comment rcutorture: Warn on individual rcu_torture_init() error conditions locktorture: Warn on individual lock_torture_init() error conditions rcuscale: Warn on individual rcu_scale_init() error conditions rcutorture: Don't cpuhp_remove_state() if cpuhp_setup_state() failed rcu: Make rcu_normal_after_boot writable again rcu: Make rcu update module parameters world-readable rcu-tasks: Move RTGS_WAIT_CBS to beginning of rcu_tasks_kthread() loop rcu-tasks: Fix s/rcu_add_holdout/trc_add_holdout/ typo in comment rcu-tasks: Correct firstreport usage in check_all_holdout_tasks_trace rcu-tasks: Correct comparisons for CPU numbers in show_stalled_task_trace rcu-tasks: Clarify read side section info for rcu_tasks_rude GP primitives rcu: Fix existing exp request check in sync_sched_exp_online_cleanup() rcutorture: Avoid problematic critical section nesting on PREEMPT_RT rcu-tasks: Fix read-side primitives comment for call_rcu_tasks_trace rcu-tasks: Fix IPI failure handling in trc_wait_for_one_reader rcu: Replace ________p1 and _________p1 with __UNIQUE_ID(rcu) rcu-tasks: Update comments to cond_resched_tasks_rcu_qs() rcu: Ignore rdp.cpu_no_qs.b.exp on preemptible RCU's rcu_qs() rcu: Move rcu_data.cpu_no_qs.b.exp reset to rcu_export_exp_rdp() rcu: Remove rcu_data.exp_deferred_qs and convert to rcu_data.cpu no_qs.b.exp rcu-tasks: Don't remove tasks with pending IPIs from holdout list torture: Catch kvm.sh help text up with actual options rcutorture: Sanitize RCUTORTURE_RDR_MASK rcutorture: More thoroughly test nested readers srcu: Prevent redundant __srcu_read_unlock() wakeup rcutorture: Suppress pi-lock-across read-unlock testing for Tiny SRCU doc: Remove obsolete kernel-per-CPU-kthreads RCU_FAST_NO_HZ advice rcu: in_irq() cleanup rcu: Always inline rcu_dynticks_task_{enter,exit}() rcu: Mark sync_sched_exp_online_cleanup() ->cpu_no_qs.b.exp load rcu: Prevent expedited GP from enabling tick on offline CPU rcu: Make idle entry report expedited quiescent states rcu/nocb: Make local rcu_nocb_lock_irqsave() safe against concurrent deoffloading rcu/nocb: Prepare state machine for a new step rcu/nocb: Invoke rcu_core() at the start of deoffloading rcu/nocb: Make rcu_core() callbacks acceleration preempt-safe rcu/nocb: Make rcu_core() callbacks acceleration (de-)offloading safe rcu/nocb: Check a stable offloaded state to manipulate qlen_last_fqs_check rcu/nocb: Use appropriate rcu_nocb_lock_irqsave() rcu/nocb: Limit number of softirq callbacks only on softirq rcu: Fix callbacks processing time limit retaining cond_resched() rcu: Apply callbacks processing time limit only on softirq rcu/nocb: Don't invoke local rcu core on callback overload from nocb kthread rcu: Improve tree_plugin.h comments and add code cleanups refscale: Simplify the errexit checkpoint refscale: Prevent buffer to pr_alert() being too long refscale: Always log the error message doc: Add refcount analogy to What is RCU refscale: Add missing '\n' to flush message scftorture: Add missing '\n' to flush message scftorture: Remove unused SCFTORTOUT scftorture: Account for weight_resched when checking for all zeroes rcuscale: Always log error message doc: RCU: Avoid 'Symbol' font-family in SVG figures scftorture: Always log error message locktorture,rcutorture,torture: Always log error message rcu-tasks: Create per-CPU callback lists rcu-tasks: Introduce ->percpu_enqueue_shift for dynamic queue selection rcu-tasks: Convert grace-period counter to grace-period sequence number rcu_tasks: Convert bespoke callback list to rcu_segcblist structure rcu-tasks: Use spin_lock_rcu_node() and friends rcu-tasks: Inspect stalled task's trc state in locked state rcu-tasks: Add a ->percpu_enqueue_lim to the rcu_tasks structure rcu-tasks: Abstract checking of callback lists rcu-tasks: Abstract invocations of callbacks rcutorture: Avoid soft lockup during cpu stall torture: Make kvm-find-errors.sh report link-time undefined symbols rcu-tasks: Use workqueues for multiple rcu_tasks_invoke_cbs() invocations rcu-tasks: Make rcu_barrier_tasks() handle multiple callback queues rcu-tasks: Add rcupdate.rcu_task_enqueue_lim to set initial queueing rcutorture: Test RCU-tasks multiqueue callback queueing rcu: Avoid running boost kthreads on isolated CPUs rcu: Avoid alloc_pages() when recording stack rcutorture: Add CONFIG_PREEMPT_DYNAMIC=n to tiny scenarios torture: Retry download once before giving up rcu-tasks: Count trylocks to estimate call_rcu_tasks() contention rcu/nocb: Remove rcu_node structure from nocb list when de-offloaded rcu/nocb: Prepare nocb_cb_wait() to start with a non-offloaded rdp rcu/nocb: Optimize kthreads and rdp initialization rcu/nocb: Create kthreads on all CPUs if "rcu_nocbs=" or "nohz_full=" are passed rcu/nocb: Allow empty "rcu_nocbs" kernel parameter rcu/nocb: Merge rcu_spawn_cpu_nocb_kthread() and rcu_spawn_one_nocb_kthread() rcutorture: Enable multiple concurrent callback-flood kthreads rcutorture: Cause TREE02 and TREE10 scenarios to do more callback flooding rcutorture: Add ability to limit callback-flood intensity rcutorture: Combine n_max_cbs from all kthreads in a callback flood rcu-tasks: Avoid raw-spinlocked wakeups from call_rcu_tasks_generic() rcu-tasks: Use more callback queues if contention encountered rcutorture: Test RCU Tasks lock-contention detection rcu-tasks: Use separate ->percpu_dequeue_lim for callback dequeueing rcu-tasks: Use fewer callbacks queues if callback flood ends rcu/exp: Mark current CPU as exp-QS in IPI loop second pass torture: Fix incorrectly redirected "exit" in kvm-remote.sh torture: Properly redirect kvm-remote.sh "echo" commands rcu-tasks: Fix computation of CPU-to-list shift counts .../Expedited-Grace-Periods/Funnel0.svg \| 4 +- .../Expedited-Grace-Periods/Funnel1.svg \| 4 +- .../Expedited-Grace-Periods/Funnel2.svg \| 4 +- .../Expedited-Grace-Periods/Funnel3.svg \| 4 +- .../Expedited-Grace-Periods/Funnel4.svg \| 4 +- .../Expedited-Grace-Periods/Funnel5.svg \| 4 +- .../Expedited-Grace-Periods/Funnel6.svg \| 4 +- .../Expedited-Grace-Periods/Funnel7.svg \| 4 +- .../Expedited-Grace-Periods/Funnel8.svg \| 4 +- .../Tree-RCU-Memory-Ordering.rst \| 69 +-- .../Requirements/GPpartitionReaders1.svg \| 36 +- .../Requirements/ReadersPartitionGP1.svg \| 62 +- Documentation/RCU/stallwarn.rst \| 10 + Documentation/RCU/whatisRCU.rst \| 90 ++- .../admin-guide/kernel-parameters.txt \| 66 +- .../admin-guide/kernel-per-CPU-kthreads.rst \| 2 +- arch/sh/configs/sdk7786_defconfig \| 1 - arch/xtensa/configs/nommu_kc705_defconfig \| 1 - include/linux/rcu_segcblist.h \| 51 +- include/linux/rcupdate.h \| 50 +- include/linux/rcupdate_trace.h \| 5 +- include/linux/rcutiny.h \| 2 +- include/linux/srcu.h \| 3 +- include/linux/torture.h \| 17 +- kernel/locking/locktorture.c \| 18 +- kernel/rcu/Kconfig \| 2 +- kernel/rcu/rcu_segcblist.c \| 10 +- kernel/rcu/rcu_segcblist.h \| 12 +- kernel/rcu/rcuscale.c \| 24 +- kernel/rcu/rcutorture.c \| 320 +++++++--- kernel/rcu/refscale.c \| 50 +- kernel/rcu/srcutiny.c \| 2 +- kernel/rcu/tasks.h \| 583 ++++++++++++++---- kernel/rcu/tree.c \| 119 ++-- kernel/rcu/tree.h \| 24 +- kernel/rcu/tree_exp.h \| 15 +- kernel/rcu/tree_nocb.h \| 162 +++-- kernel/rcu/tree_plugin.h \| 61 +- kernel/rcu/update.c \| 8 +- kernel/scftorture.c \| 20 +- kernel/torture.c \| 4 +- .../rcutorture/bin/kvm-find-errors.sh \| 4 +- .../rcutorture/bin/kvm-recheck-rcu.sh \| 2 +- .../selftests/rcutorture/bin/kvm-remote.sh \| 23 +- tools/testing/selftests/rcutorture/bin/kvm.sh \| 11 +- .../selftests/rcutorture/bin/parse-build.sh \| 3 +- .../selftests/rcutorture/bin/torture.sh \| 9 +- .../selftests/rcutorture/configs/rcu/SRCU-T \| 1 + .../selftests/rcutorture/configs/rcu/SRCU-U \| 1 + .../rcutorture/configs/rcu/TASKS01.boot \| 1 + .../selftests/rcutorture/configs/rcu/TINY01 \| 1 + .../selftests/rcutorture/configs/rcu/TINY02 \| 1 + .../rcutorture/configs/rcu/TRACE01.boot \| 1 + .../rcutorture/configs/rcu/TRACE02.boot \| 1 + .../rcutorture/configs/rcu/TREE02.boot \| 1 + .../rcutorture/configs/rcu/TREE10.boot \| 1 + .../rcutorture/configs/rcuscale/TINY \| 1 + 57 files changed, 1360 insertions(+), 637 deletions(-) create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/TREE02.boot create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/TREE10.boot Approved-by: Prarit Bhargava <prarit@redhat.com> Approved-by: Wander Lairson Costa <wander@redhat.com> Approved-by: Phil Auld <pauld@redhat.com> Signed-off-by: Patrick Talbert <ptalbert@redhat.com>	2022-04-19 12:23:21 +02:00
Waiman Long	026f852e1e	rcu/nocb: Remove rcu_node structure from nocb list when de-offloaded Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994 commit 2ebc45c44c4f3cc4c757430b2409ece4f976892e Author: Frederic Weisbecker <frederic@kernel.org> Date: Tue, 23 Nov 2021 01:37:03 +0100 rcu/nocb: Remove rcu_node structure from nocb list when de-offloaded The nocb_gp_wait() function iterates over all CPUs in its group, including even those CPUs that have been de-offloaded. This is of course suboptimal, especially if none of the CPUs within the group are currently offloaded. This will become even more of a problem once a nocb kthread is created for all possible CPUs. Therefore use a standard double linked list to link all the offloaded rcu_data structures and safely add or delete these structure as we offload or de-offload them, respectively. Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Tested-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Waiman Long <longman@redhat.com>	2022-03-24 17:16:20 -04:00
Waiman Long	400d40f7b0	rcu/nocb: Make local rcu_nocb_lock_irqsave() safe against concurrent deoffloading Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994 commit 118e0d4a1bc85d4ecea0427e440a72d21ffbfa6a Author: Frederic Weisbecker <frederic@kernel.org> Date: Mon, 11 Oct 2021 16:51:30 +0200 rcu/nocb: Make local rcu_nocb_lock_irqsave() safe against concurrent deoffloading rcu_nocb_lock_irqsave() can be preempted between the call to rcu_segcblist_is_offloaded() and the actual locking. This matters now that rcu_core() is preemptible on PREEMPT_RT and the (de-)offloading process can interrupt the softirq or the rcuc kthread. As a result we may locklessly call into code that requires nocb locking. In practice this is a problem while we accelerate callbacks on rcu_core(). Simply disabling interrupts before (instead of after) checking the NOCB offload state fixes the issue. Reported-and-tested-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Waiman Long <longman@redhat.com>	2022-03-24 17:15:59 -04:00
Waiman Long	c9b4dd21b8	rcu: Remove rcu_data.exp_deferred_qs and convert to rcu_data.cpu no_qs.b.exp Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994 commit 6120b72e25e195b6fa15b0a674479a38166c392a Author: Frederic Weisbecker <frederic@kernel.org> Date: Thu, 16 Sep 2021 14:10:48 +0200 rcu: Remove rcu_data.exp_deferred_qs and convert to rcu_data.cpu no_qs.b.exp Having two fields for the same purpose with subtle differences on different RCU flavours is confusing, especially when both fields always exist on both RCU flavours. Fortunately, it is now safe for preemptible RCU to rely on the rcu_data structure's ->cpu_no_qs.b.exp field, just like non-preemptible RCU. This commit therefore removes the ad-hoc ->exp_deferred_qs field. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Waiman Long <longman@redhat.com>	2022-03-24 17:15:53 -04:00
Desnes A. Nunes do Rosario	9814a162d4	rcu: Remove the RCU_FAST_NO_HZ Kconfig option Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2059555 Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=e2c73a6860bdf54f2c6bf8cddc34ddc91a1343e1 commit e2c73a6860bdf54f2c6bf8cddc34ddc91a1343e1 Author: "Paul E. McKenney" <paulmck@kernel.org> Date: Mon, 27 Sep 2021 14:18:51 -0700 All of the uses of CONFIG_RCU_FAST_NO_HZ=y that I have seen involve systems with RCU callbacks offloaded. In this situation, all that this Kconfig option does is slow down idle entry/exit with an additional allways-taken early exit. If this is the only use case, then this Kconfig option nothing but an attractive nuisance that needs to go away. This commit therefore removes the RCU_FAST_NO_HZ Kconfig option. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Desnes A. Nunes do Rosario <drosario@redhat.com>	2022-03-24 14:39:57 -04:00
Paul E. McKenney	641faf1b90	Merge branches 'bitmaprange.2021.05.10c', 'doc.2021.05.10c', 'fixes.2021.05.13a', 'kvfree_rcu.2021.05.10c', 'mmdumpobj.2021.05.10c', 'nocb.2021.05.12a', 'srcu.2021.05.12a', 'tasks.2021.05.18a' and 'torture.2021.05.10c' into HEAD bitmaprange.2021.05.10c: Allow "all" for bitmap ranges. doc.2021.05.10c: Documentation updates. fixes.2021.05.13a: Miscellaneous fixes. kvfree_rcu.2021.05.10c: kvfree_rcu() updates. mmdumpobj.2021.05.10c: mem_dump_obj() updates. nocb.2021.05.12a: RCU NOCB CPU updates, including limited deoffloading. srcu.2021.05.12a: SRCU updates. tasks.2021.05.18a: Tasks-RCU updates. torture.2021.05.10c: Torture-test updates.	2021-05-18 10:56:19 -07:00
Ingo Molnar	a616aec9aa	rcu: Fix various typos in comments Fix ~12 single-word typos in RCU code comments. [ paulmck: Apply feedback from Randy Dunlap. ] Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-05-12 12:11:05 -07:00
Frederic Weisbecker	e75bcd48e2	rcu/nocb: Unify timers Now that ->nocb_timer and ->nocb_bypass_timer have become quite similar, this commit merges them together. A new RCU_NOCB_WAKE_BYPASS wake level is introduced. As a result, timers perform all kinds of deferred wake ups but other deferred wakeup callsites only handle non-bypass wakeups in order not to wake up rcuo too early. The timer also unconditionally executes a full barrier so as to order timer_pending() and callback enqueue although the path performing RCU_NOCB_WAKE_FORCE that makes use of it is debatable. It should also test against the rdp leader instead of the current rdp. This unconditional full barrier shouldn't bring visible overhead since these timers almost never fire. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-05-12 12:10:23 -07:00
Frederic Weisbecker	870905169d	rcu/nocb: Prepare for fine-grained deferred wakeup Tuning the deferred wakeup level must be done from a safe wakeup point. Currently those sites are: * ->nocb_timer * user/idle/guest entry * CPU down * softirq/rcuc All of these sites perform the wake up for both RCU_NOCB_WAKE and RCU_NOCB_WAKE_FORCE. In order to merge ->nocb_timer and ->nocb_bypass_timer together, we plan to add a new RCU_NOCB_WAKE_BYPASS that really should be deferred until a timer fires so that we don't wake up the NOCB-gp kthread too early. To prepare for that, this commit specifies the per-callsite wakeup level/limit. Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> [ paulmck: Fix non-NOCB rcu_nocb_need_deferred_wakeup() definition. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-05-12 12:10:23 -07:00
Paul E. McKenney	3ef5a1c382	rcu: Make RCU priority boosting work on single-CPU rcu_node structures When any CPU comes online, it checks to see if an RCU-boost kthread has already been created for that CPU's leaf rcu_node structure, and if not, it creates one. Unfortunately, it also verifies that this leaf rcu_node structure actually has at least one online CPU, and if not, it declines to create the kthread. Although this behavior makes sense during early boot, especially on systems that claim far more CPUs than they actually have, it makes no sense for the first CPU to come online for a given rcu_node structure. There is no point in checking because we know there is a CPU on its way in. The problem is that timing differences can cause this incoming CPU to not yet be reflected in the various bit masks even at rcutree_online_cpu() time, and there is no chance at rcutree_prepare_cpu() time. Plus it would be better to create the RCU-boost kthread at rcutree_prepare_cpu() to handle the case where the CPU is involved in an RCU priority inversion very shortly after it comes online. This commit therefore moves the checking to rcu_prepare_kthreads(), which is called only at early boot, when the check is appropriate. In addition, it makes rcutree_prepare_cpu() invoke rcu_spawn_one_boost_kthread(), which no longer does any checking for online CPUs. With this change, RCU priority boosting tests now pass for short rcutorture runs, even with single-CPU leaf rcu_node structures. Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Scott Wood <swood@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-05-10 16:22:54 -07:00
Paul E. McKenney	396eba65f6	rcu: Add quiescent states and boost states to show_rcu_gp_kthreads() output This commit adds each rcu_node structure's ->qsmask and "bBEG" output indicating whether: (1) There is a boost kthread, (2) A reader needs to be (or is in the process of being) boosted, (3) A reader is blocking an expedited grace period, and (4) A reader is blocking a normal grace period. This helps diagnose RCU priority boosting failures. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-05-10 16:22:54 -07:00
Frederic Weisbecker	d76e0926d8	rcu/nocb: Use the rcuog CPU's ->nocb_timer Currently each CPU has its own ->nocb_timer queued when the nocb_gp wakeup must be deferred. This approach has many drawbacks, compared to a solution based on a single timer per NOCB group: * There are a lot of timers to maintain. * The per-rdp ->nocb_lock must be held to queue and cancel the timer and this lock can already be heavily contended. * One timer firing doesn't cancel the other timers in the same group: - These other timers can thus cause spurious wakeups - Each rdp that queued a timer must lock both ->nocb_lock and then ->nocb_gp_lock upon exit from the kernel to idle/user/guest mode. * We can't cancel all of them if we detect an unflushed bypass in nocb_gp_wait(). In fact currently we only ever cancel the ->nocb_timer of the leader group. * The leader group's nocb_timer is cancelled without locking ->nocb_lock in nocb_gp_wait(). This currently appears to be safe but is an accident waiting to happen. * Since the timer acquires ->nocb_lock, it requires extra care in the NOCB (de-)offloading process, requiring that it be either enabled or disabled and then flushed. This commit instead uses the rcuog kthread's CPU's ->nocb_timer instead. It is protected by nocb_gp_lock, which is _way_ less contended and remains so even after this change. As a matter of fact, the nocb_timer almost never fires and the deferred wakeup is mostly carried out upon idle/user/guest entry. Now the early check performed at this point in do_nocb_deferred_wakeup() is done on rdp_gp->nocb_defer_wakeup, which is of course racy. However, this raciness is harmless because we only need the guarantee that the timer is queued if we were the last one to queue it. Any other situation (another CPU has queued it and we either see it or not) is fine. This solves all the issues listed above. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-05-10 16:02:44 -07:00
Linus Torvalds	657bd90c93	Scheduler updates for v5.12: [ NOTE: unfortunately this tree had to be freshly rebased today, it's a same-content tree of 82891be90f3c (-next published) merged with v5.11. The main reason for the rebase was an authorship misattribution problem with a new commit, which we noticed in the last minute, and which we didn't want to be merged upstream. The offending commit was deep in the tree, and dependent commits had to be rebased as well. ] - Core scheduler updates: - Add CONFIG_PREEMPT_DYNAMIC: this in its current form adds the preempt=none/voluntary/full boot options (default: full), to allow distros to build a PREEMPT kernel but fall back to close to PREEMPT_VOLUNTARY (or PREEMPT_NONE) runtime scheduling behavior via a boot time selection. There's also the /debug/sched_debug switch to do this runtime. This feature is implemented via runtime patching (a new variant of static calls). The scope of the runtime patching can be best reviewed by looking at the sched_dynamic_update() function in kernel/sched/core.c. ( Note that the dynamic none/voluntary mode isn't 100% identical, for example preempt-RCU is available in all cases, plus the preempt count is maintained in all models, which has runtime overhead even with the code patching. ) The PREEMPT_VOLUNTARY/PREEMPT_NONE models, used by the vast majority of distributions, are supposed to be unaffected. - Fix ignored rescheduling after rcu_eqs_enter(). This is a bug that was found via rcutorture triggering a hang. The bug is that rcu_idle_enter() may wake up a NOCB kthread, but this happens after the last generic need_resched() check. Some cpuidle drivers fix it by chance but many others don't. In true 2020 fashion the original bug fix has grown into a 5-patch scheduler/RCU fix series plus another 16 RCU patches to address the underlying issue of missed preemption events. These are the initial fixes that should fix current incarnations of the bug. - Clean up rbtree usage in the scheduler, by providing & using the following consistent set of rbtree APIs: partial-order; less() based: - rb_add(): add a new entry to the rbtree - rb_add_cached(): like rb_add(), but for a rb_root_cached total-order; cmp() based: - rb_find(): find an entry in an rbtree - rb_find_add(): find an entry, and add if not found - rb_find_first(): find the first (leftmost) matching entry - rb_next_match(): continue from rb_find_first() - rb_for_each(): iterate a sub-tree using the previous two - Improve the SMP/NUMA load-balancer: scan for an idle sibling in a single pass. This is a 4-commit series where each commit improves one aspect of the idle sibling scan logic. - Improve the cpufreq cooling driver by getting the effective CPU utilization metrics from the scheduler - Improve the fair scheduler's active load-balancing logic by reducing the number of active LB attempts & lengthen the load-balancing interval. This improves stress-ng mmapfork performance. - Fix CFS's estimated utilization (util_est) calculation bug that can result in too high utilization values - Misc updates & fixes: - Fix the HRTICK reprogramming & optimization feature - Fix SCHED_SOFTIRQ raising race & warning in the CPU offlining code - Reduce dl_add_task_root_domain() overhead - Fix uprobes refcount bug - Process pending softirqs in flush_smp_call_function_from_idle() - Clean up task priority related defines, remove USER_PRIO and USER_PRIO() - Simplify the sched_init_numa() deduplication sort - Documentation updates - Fix EAS bug in update_misfit_status(), which degraded the quality of energy-balancing - Smaller cleanups Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmAtHBsRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1itgg/+NGed12pgPjYBzesdou60Lvx7LZLGjfOt M1F1EnmQGn/hEH2fCY6ZoqIZQTVltm7GIcBNabzYTzlaHZsdtyuDUJBZyj19vTlk zekcj7WVt+qvfjChaNwEJhQ9nnOM/eohMgEOHMAAJd9zlnQvve7NOLQ56UDM+kn/ 9taFJ5ZPvb4avP6C5p3KivvKex6Bjof/Tl0m3utpNyPpI/qK3FyGxwdgCxU0yepT ABWQX5ZQCufFvo1bgnBPfqyzab4MqhoM3bNKBsLQfuAlssG1xRv4KQOev4dRwrt9 pXJikV5C9yez5d2lGe5p0ltH5IZS/l9x2yI/ZQj3OUDTFyV1ic6WfFAqJgDzVF8E i/vvA4NPQiI241Bkps+ErcCw4aVOgiY6TWli74cHjLUIX0+As6aHrFWXGSxUmiHB WR+B8KmdfzRTTlhOxMA+cvlpZcKCfxWkJJmXzr/lDZzIuKPqM3QCE2wD9sixkfVo JNICT0IvZghWOdbMEfZba8Psh/e2LVI9RzdpEiuYJz1ZrVlt1hO0M6jBxY0hMz9n k54z81xODw0a8P2FHMtpmB1vhAeqCmvwA6DO8z0Oxs0DFi+KM2bLf2efHsCKafI+ Bm5v9YFaOk/55R76hJVh+aYLlyFgFkKd+P/niJTPDnxOk3SqJuXvTrql1HeGHkNr kYgQa23dsZk= =pyaG -----END PGP SIGNATURE----- Merge tag 'sched-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "Core scheduler updates: - Add CONFIG_PREEMPT_DYNAMIC: this in its current form adds the preempt=none/voluntary/full boot options (default: full), to allow distros to build a PREEMPT kernel but fall back to close to PREEMPT_VOLUNTARY (or PREEMPT_NONE) runtime scheduling behavior via a boot time selection. There's also the /debug/sched_debug switch to do this runtime. This feature is implemented via runtime patching (a new variant of static calls). The scope of the runtime patching can be best reviewed by looking at the sched_dynamic_update() function in kernel/sched/core.c. ( Note that the dynamic none/voluntary mode isn't 100% identical, for example preempt-RCU is available in all cases, plus the preempt count is maintained in all models, which has runtime overhead even with the code patching. ) The PREEMPT_VOLUNTARY/PREEMPT_NONE models, used by the vast majority of distributions, are supposed to be unaffected. - Fix ignored rescheduling after rcu_eqs_enter(). This is a bug that was found via rcutorture triggering a hang. The bug is that rcu_idle_enter() may wake up a NOCB kthread, but this happens after the last generic need_resched() check. Some cpuidle drivers fix it by chance but many others don't. In true 2020 fashion the original bug fix has grown into a 5-patch scheduler/RCU fix series plus another 16 RCU patches to address the underlying issue of missed preemption events. These are the initial fixes that should fix current incarnations of the bug. - Clean up rbtree usage in the scheduler, by providing & using the following consistent set of rbtree APIs: partial-order; less() based: - rb_add(): add a new entry to the rbtree - rb_add_cached(): like rb_add(), but for a rb_root_cached total-order; cmp() based: - rb_find(): find an entry in an rbtree - rb_find_add(): find an entry, and add if not found - rb_find_first(): find the first (leftmost) matching entry - rb_next_match(): continue from rb_find_first() - rb_for_each(): iterate a sub-tree using the previous two - Improve the SMP/NUMA load-balancer: scan for an idle sibling in a single pass. This is a 4-commit series where each commit improves one aspect of the idle sibling scan logic. - Improve the cpufreq cooling driver by getting the effective CPU utilization metrics from the scheduler - Improve the fair scheduler's active load-balancing logic by reducing the number of active LB attempts & lengthen the load-balancing interval. This improves stress-ng mmapfork performance. - Fix CFS's estimated utilization (util_est) calculation bug that can result in too high utilization values Misc updates & fixes: - Fix the HRTICK reprogramming & optimization feature - Fix SCHED_SOFTIRQ raising race & warning in the CPU offlining code - Reduce dl_add_task_root_domain() overhead - Fix uprobes refcount bug - Process pending softirqs in flush_smp_call_function_from_idle() - Clean up task priority related defines, remove USER_PRIO and USER_PRIO() - Simplify the sched_init_numa() deduplication sort - Documentation updates - Fix EAS bug in update_misfit_status(), which degraded the quality of energy-balancing - Smaller cleanups" * tag 'sched-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits) sched,x86: Allow !PREEMPT_DYNAMIC entry/kvm: Explicitly flush pending rcuog wakeup before last rescheduling point entry: Explicitly flush pending rcuog wakeup before last rescheduling point rcu/nocb: Trigger self-IPI on late deferred wake up before user resume rcu/nocb: Perform deferred wake up before last idle's need_resched() check rcu: Pull deferred rcuog wake up to rcu_eqs_enter() callers sched/features: Distinguish between NORMAL and DEADLINE hrtick sched/features: Fix hrtick reprogramming sched/deadline: Reduce rq lock contention in dl_add_task_root_domain() uprobes: (Re)add missing get_uprobe() in __find_uprobe() smp: Process pending softirqs in flush_smp_call_function_from_idle() sched: Harden PREEMPT_DYNAMIC static_call: Allow module use without exposing static_call_key sched: Add /debug/sched_preempt preempt/dynamic: Support dynamic preempt with preempt= boot option preempt/dynamic: Provide irqentry_exit_cond_resched() static call preempt/dynamic: Provide preempt_schedule[_notrace]() static calls preempt/dynamic: Provide cond_resched() and might_resched() static calls preempt: Introduce CONFIG_PREEMPT_DYNAMIC static_call: Provide DEFINE_STATIC_CALL_RET0() ...	2021-02-21 12:35:04 -08:00
Frederic Weisbecker	f8bb5cae96	rcu/nocb: Trigger self-IPI on late deferred wake up before user resume Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP kthread (rcuog) to be serviced. Unfortunately the call to rcu_user_enter() is already past the last rescheduling opportunity before we resume to userspace or to guest mode. We may escape there with the woken task ignored. The ultimate resort to fix every callsites is to trigger a self-IPI (nohz_full depends on arch to implement arch_irq_work_raise()) that will trigger a reschedule on IRQ tail or guest exit. Eventually every site that want a saner treatment will need to carefully place a call to rcu_nocb_flush_deferred_wakeup() before the last explicit need_resched() check upon resume. Fixes: `96d3fd0d31` (rcu: Break call_rcu() deadlock involving scheduler and perf) Reported-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20210131230548.32970-4-frederic@kernel.org	2021-02-17 14:12:43 +01:00
Frederic Weisbecker	69cdea873c	rcu/nocb: Shutdown nocb timer on de-offloading This commit ensures that the nocb timer is shut down before reaching the final de-offloaded state. The key goal is to prevent the timer handler from manipulating the callbacks without the protection of the nocb locks. Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Inspired-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-01-06 16:24:59 -08:00
Frederic Weisbecker	d97b078182	rcu/nocb: De-offloading CB kthread To de-offload callback processing back onto a CPU, it is necessary to clear SEGCBLIST_OFFLOAD and notify the nocb CB kthread, which will then clear its own bit flag and go to sleep to stop handling callbacks. This commit makes that change. It will also be necessary to notify the nocb GP kthread in this same way, which is the subject of a follow-on commit. Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Inspired-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> [ paulmck: Add export per kernel test robot feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-01-06 16:24:19 -08:00
Paul E. McKenney	4d60b475f8	rcu: Prevent lockdep-RCU splats on lock acquisition/release The rcu_cpu_starting() and rcu_report_dead() functions transition the current CPU between online and offline state from an RCU perspective. Unfortunately, this means that the rcu_cpu_starting() function's lock acquisition and the rcu_report_dead() function's lock releases happen while the CPU is offline from an RCU perspective, which can result in lockdep-RCU splats about using RCU from an offline CPU. And this situation can also result in too-short grace periods, especially in guest OSes that are subject to vCPU preemption. This commit therefore uses sequence-count-like synchronization to forgive use of RCU while RCU thinks a CPU is offline across the full extent of the rcu_cpu_starting() and rcu_report_dead() function's lock acquisitions and releases. One approach would have been to use the actual sequence-count primitives provided by the Linux kernel. Unfortunately, the resulting code looks completely broken and wrong, and is likely to result in patches that break RCU in an attempt to address this appearance of broken wrongness. Plus there is no net savings in lines of code, given the additional explicit memory barriers required. Therefore, this sequence count is instead implemented by a new ->ofl_seq field in the rcu_node structure. If this counter's value is an odd number, RCU forgives RCU read-side critical sections on other CPUs covered by the same rcu_node structure, even if those CPUs are offline from an RCU perspective. In addition, if a given leaf rcu_node structure's ->ofl_seq counter value is an odd number, rcu_gp_init() delays starting the grace period until that counter value changes. [ paulmck: Apply Peter Zijlstra feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-11-19 19:37:17 -08:00
Neeraj Upadhyay	ed73860cec	rcu: Fix single-CPU check in rcu_blocking_is_gp() Currently, for CONFIG_PREEMPTION=n kernels, rcu_blocking_is_gp() uses num_online_cpus() to determine whether there is only one CPU online. When there is only a single CPU online, the simple fact that synchronize_rcu() could be legally called implies that a full grace period has elapsed. Therefore, in the single-CPU case, synchronize_rcu() simply returns immediately. Unfortunately, num_online_cpus() is unreliable while a CPU-hotplug operation is transitioning to or from single-CPU operation because: 1. num_online_cpus() uses atomic_read(&__num_online_cpus) to locklessly sample the number of online CPUs. The hotplug locks are not held, which means that an incoming CPU can concurrently update this count. This in turn means that an RCU read-side critical section on the incoming CPU might observe updates prior to the grace period, but also that this critical section might extend beyond the end of the optimized synchronize_rcu(). This breaks RCU's fundamental guarantee. 2. In addition, num_online_cpus() does no ordering, thus providing another way that RCU's fundamental guarantee can be broken by the current code. 3. The most probable failure mode happens on outgoing CPUs. The outgoing CPU updates the count of online CPUs in the CPUHP_TEARDOWN_CPU stop-machine handler, which is fine in and of itself due to preemption being disabled at the call to num_online_cpus(). Unfortunately, after that stop-machine handler returns, the CPU takes one last trip through the scheduler (which has RCU readers) and, after the resulting context switch, one final dive into the idle loop. During this time, RCU needs to keep track of two CPUs, but num_online_cpus() will say that there is only one, which in turn means that the surviving CPU will incorrectly ignore the outgoing CPU's RCU read-side critical sections. This problem is illustrated by the following litmus test in which P0() corresponds to synchronize_rcu() and P1() corresponds to the incoming CPU. The herd7 tool confirms that the "exists" clause can be satisfied, thus demonstrating that this breakage can happen according to the Linux kernel memory model. { int x = 0; atomic_t numonline = ATOMIC_INIT(1); } P0(int x, atomic_t numonline) { int r0; WRITE_ONCE(x, 1); r0 = atomic_read(numonline); if (r0 == 1) { smp_mb(); } else { synchronize_rcu(); } WRITE_ONCE(x, 2); } P1(int x, atomic_t numonline) { int r0; int r1; atomic_inc(numonline); smp_mb(); rcu_read_lock(); r0 = READ_ONCE(x); smp_rmb(); r1 = READ_ONCE(x); rcu_read_unlock(); } locations [x;numonline;] exists (1:r0=0 /\ 1:r1=2) It is important to note that these problems arise only when the system is transitioning to or from single-CPU operation. One solution would be to hold the CPU-hotplug locks while sampling num_online_cpus(), which was in fact the intent of the (redundant) preempt_disable() and preempt_enable() surrounding this call to num_online_cpus(). Actually blocking CPU hotplug would not only result in excessive overhead, but would also unnecessarily impede CPU-hotplug operations. This commit therefore follows long-standing RCU tradition by maintaining a separate RCU-specific set of CPU-hotplug books. This separate set of books is implemented by a new ->n_online_cpus field in the rcu_state structure that maintains RCU's count of the online CPUs. This count is incremented early in the CPU-online process, so that the critical transition away from single-CPU operation will occur when there is only a single CPU. Similarly for the critical transition to single-CPU operation, the counter is decremented late in the CPU-offline process, again while there is only a single CPU. Because there is only ever a single CPU when the ->n_online_cpus field undergoes the critical 1->2 and 2->1 transitions, full memory ordering and mutual exclusion is provided implicitly and, better yet, for free. In the case where the CPU is coming online, nothing will happen until the current CPU helps it come online. Therefore, the new CPU will see all accesses prior to the optimized grace period, which means that RCU does not need to further delay this new CPU. In the case where the CPU is going offline, the outgoing CPU is totally out of the picture before the optimized grace period starts, which means that this outgoing CPU cannot see any of the accesses following that grace period. Again, RCU needs no further interaction with the outgoing CPU. This does mean that synchronize_rcu() will unnecessarily do a few grace periods the hard way just before the second CPU comes online and just after the second-to-last CPU goes offline, but it is not worth optimizing this uncommon case. Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-11-19 19:37:16 -08:00

1 2 3 4 5 ...

298 Commits