Commit Graph

1008 Commits

Author SHA1 Message Date
Waiman Long ce330fc3bc rcu: Make polled grace-period API account for expedited grace periods
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit dd04140531b5d38b77ad9ff7b18117654be5bf5c
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Thu, 14 Apr 2022 06:56:35 -0700

    rcu: Make polled grace-period API account for expedited grace periods

    Currently, this code could splat:

            oldstate = get_state_synchronize_rcu();
            synchronize_rcu_expedited();
            WARN_ON_ONCE(!poll_state_synchronize_rcu(oldstate));

    This situation is counter-intuitive and user-unfriendly.  After all, there
    really was a perfectly valid full grace period right after the call to
    get_state_synchronize_rcu(), so why shouldn't poll_state_synchronize_rcu()
    know about it?

    This commit therefore makes the polled grace-period API aware of expedited
    grace periods in addition to the normal grace periods that it is already
    aware of.  With this change, the above code is guaranteed not to splat.

    Please note that the above code can still splat due to counter wrap on the
    one hand and situations involving partially overlapping normal/expedited
    grace periods on the other.  On 64-bit systems, the second is of course
    much more likely than the first.  It is possible to modify this approach
    to prevent overlapping grace periods from causing splats, but only at
    the expense of greatly increasing the probability of counter wrap, as
    in within milliseconds on 32-bit systems and within minutes on 64-bit
    systems.

    This commit is in preparation for polled expedited grace periods.

    Link: https://lore.kernel.org/all/20220121142454.1994916-1-bfoster@redhat.com/
    Link: https://docs.google.com/document/d/1RNKWW9jQyfjxw2E8dsXVTdvZYh0HnYeSHDKog9jhdN8/edit?usp=sharing
    Cc: Brian Foster <bfoster@redhat.com>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Ian Kent <raven@themaw.net>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:47:51 -04:00
Waiman Long 7df8a78b55 rcu: Switch polled grace-period APIs to ->gp_seq_polled
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit bf95b2bc3e42f11f4d7a5e8a98376c2b4a2aa82f
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Wed, 13 Apr 2022 17:46:15 -0700

    rcu: Switch polled grace-period APIs to ->gp_seq_polled

    This commit switches the existing polled grace-period APIs to use a
    new ->gp_seq_polled counter in the rcu_state structure.  An additional
    ->gp_seq_polled_snap counter in that same structure allows the normal
    grace period kthread to interact properly with the !SMP !PREEMPT fastpath
    through synchronize_rcu().  The first of the two to note the end of a
    given grace period will make knowledge of this transition available to
    the polled API.

    This commit is in preparation for polled expedited grace periods.

    [ paulmck: Fix use of rcu_state.gp_seq_polled to start normal grace period. ]

    Link: https://lore.kernel.org/all/20220121142454.1994916-1-bfoster@redhat.com/
    Link: https://docs.google.com/document/d/1RNKWW9jQyfjxw2E8dsXVTdvZYh0HnYeSHDKog9jhdN8/edit?usp=sharing
    Cc: Brian Foster <bfoster@redhat.com>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Ian Kent <raven@themaw.net>
    Co-developed-by: Boqun Feng <boqun.feng@gmail.com>
    Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:47:51 -04:00
Waiman Long 9b02706bd3 rcu/nocb: Add option to opt rcuo kthreads out of RT priority
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 8f489b4da5278fc6e5fc8f0029ae7fb51c060215
Author: Uladzislau Rezki (Sony) <urezki@gmail.com>
Date:   Wed, 11 May 2022 10:57:03 +0200

    rcu/nocb: Add option to opt rcuo kthreads out of RT priority

    This commit introduces a RCU_NOCB_CPU_CB_BOOST Kconfig option that
    prevents rcuo kthreads from running at real-time priority, even in
    kernels built with RCU_BOOST.  This capability is important to devices
    needing low-latency (as in a few milliseconds) response from expedited
    RCU grace periods, but which are not running a classic real-time workload.
    On such devices, permitting the rcuo kthreads to run at real-time priority
    results in unacceptable latencies imposed on the application tasks,
    which run as SCHED_OTHER.

    See for example the following trace output:

    <snip>
    <...>-60 [006] d..1 2979.028717: rcu_batch_start: rcu_preempt CBs=34619 bl=270
    <snip>

    If that rcuop kthread were permitted to run at real-time SCHED_FIFO
    priority, it would monopolize its CPU for hundreds of milliseconds
    while invoking those 34619 RCU callback functions, which would cause an
    unacceptably long latency spike for many application stacks on Android
    platforms.

    However, some existing real-time workloads require that callback
    invocation run at SCHED_FIFO priority, for example, those running on
    systems with heavy SCHED_OTHER background loads.  (It is the real-time
    system's administrator's responsibility to make sure that important
    real-time tasks run at a higher priority than do RCU's kthreads.)

    Therefore, this new RCU_NOCB_CPU_CB_BOOST Kconfig option defaults to
    "y" on kernels built with PREEMPT_RT and defaults to "n" otherwise.
    The effect is to preserve current behavior for real-time systems, but for
    other systems to allow expedited RCU grace periods to run with real-time
    priority while continuing to invoke RCU callbacks as SCHED_OTHER.

    As you would expect, this RCU_NOCB_CPU_CB_BOOST Kconfig option has no
    effect except on CPUs with offloaded RCU callbacks.

    Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:23 -04:00
Waiman Long 3cd6c37180 rcu: Add nocb_cb_kthread check to rcu_is_callbacks_kthread()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 5103850654fdc651f0a7076ac753b958f018bb85
Author: Zqiang <qiang1.zhang@intel.com>
Date:   Fri, 29 Apr 2022 20:42:22 +0800

    rcu: Add nocb_cb_kthread check to rcu_is_callbacks_kthread()

    Callbacks are invoked in RCU kthreads when calbacks are offloaded
    (rcu_nocbs boot parameter) or when RCU's softirq handler has been
    offloaded to rcuc kthreads (use_softirq==0).  The current code allows
    for the rcu_nocbs case but not the use_softirq case.  This commit adds
    support for the use_softirq case.

    Reported-by: kernel test robot <lkp@intel.com>
    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:22 -04:00
Waiman Long 0cdca8e4a1 rcu/tree: Add comment to describe GP-done condition in fqs loop
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit a03ae49c4785c1bc7b940e38bbdf2e63d79d1470
Author: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Date:   Thu, 9 Jun 2022 12:43:40 +0530

    rcu/tree: Add comment to describe GP-done condition in fqs loop

    Add a comment to explain why !rcu_preempt_blocked_readers_cgp() condition
    is required on root rnp node, for GP completion check in rcu_gp_fqs_loop().

    Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:21 -04:00
Waiman Long cbaed607bb rcu: Initialize first_gp_fqs at declaration in rcu_gp_fqs()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 9bdb5b3a8d8ad1c92db309219859fe1c87c95351
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Wed, 8 Jun 2022 09:34:10 -0700

    rcu: Initialize first_gp_fqs at declaration in rcu_gp_fqs()

    This commit saves a line of code by initializing the rcu_gp_fqs()
    function's first_gp_fqs local variable in its declaration.

    Reported-by: Frederic Weisbecker <frederic@kernel.org>
    Reported-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:21 -04:00
Waiman Long b9bc6190d1 rcu/kvfree: Remove useless monitor_todo flag
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 82d26c36cc68e781400eb4e541f943008208f2d6
Author: Joel Fernandes (Google) <joel@joelfernandes.org>
Date:   Thu, 2 Jun 2022 10:06:43 +0200

    rcu/kvfree: Remove useless monitor_todo flag

    monitor_todo is not needed as the work struct already tracks
    if work is pending. Just use that to know if work is pending
    using schedule_delayed_work() helper.

    Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:21 -04:00
Waiman Long 4ebe041e32 rcu: Cleanup RCU urgency state for offline CPU
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit e2bb1288a381e9239aaf606ae8c1e20ea71c20bd
Author: Zqiang <qiang1.zhang@intel.com>
Date:   Thu, 26 May 2022 09:55:12 +0800

    rcu: Cleanup RCU urgency state for offline CPU

    When a CPU is slow to provide a quiescent state for a given grace
    period, RCU takes steps to encourage that CPU to get with the
    quiescent-state program in a more timely fashion.  These steps
    include these flags in the rcu_data structure:

    1.      ->rcu_urgent_qs, which causes the scheduling-clock interrupt to
            request an otherwise pointless context switch from the scheduler.

    2.      ->rcu_need_heavy_qs, which causes both cond_resched() and RCU's
            context-switch hook to do an immediate momentary quiscent state.

    3.      ->rcu_need_heavy_qs, which causes the scheduler-clock tick to
            be enabled even on nohz_full CPUs with only one runnable task.

    These flags are of course cleared once the corresponding CPU has passed
    through a quiescent state.  Unless that quiescent state is the CPU
    going offline, which means that when the CPU comes back online, it will
    needlessly consume additional CPU time and incur additional latency,
    which constitutes a minor but very real performance bug.

    This commit therefore adds the call to rcu_disable_urgency_upon_qs()
    that clears these flags to the CPU-hotplug offlining code path.

    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:20 -04:00
Waiman Long a371ed853d rcu: Add rnp->cbovldmask check in rcutree_migrate_callbacks()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 52c1d81ee2911ef592048582c6d07975b7399726
Author: Zqiang <qiang1.zhang@intel.com>
Date:   Thu, 5 May 2022 23:52:36 +0800

    rcu: Add rnp->cbovldmask check in rcutree_migrate_callbacks()

    Currently, the rcu_node structure's ->cbovlmask field is set in call_rcu()
    when a given CPU is suffering from callback overload.  But if that CPU
    goes offline, the outgoing CPU's callbacks is migrated to the running
    CPU, which is likely to overload the running CPU.  However, that CPU's
    bit in its leaf rcu_node structure's ->cbovlmask field remains zero.

    Initially, this is OK because the outgoing CPU's bit remains set.
    However, that bit will be cleared at the next end of a grace period,
    at which time it is quite possible that the running CPU will still
    be overloaded.  If the running CPU invokes call_rcu(), then overload
    will be checked for and the bit will be set.  Except that there is no
    guarantee that the running CPU will invoke call_rcu(), in which case the
    next grace period will fail to take the running CPU's overload condition
    into account.  Plus, because the bit is not set, the end of the grace
    period won't check for overload on this CPU.

    This commit therefore adds a call to check_cb_ovld_locked() in
    rcutree_migrate_callbacks() to set the running CPU's ->cbovlmask bit
    appropriately.

    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:19 -04:00
Waiman Long 199fb66385 rcu: Decrease FQS scan wait time in case of callback overloading
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit fb77dccfc701b6ebcc232574c828bc69146cf90a
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Tue, 12 Apr 2022 15:08:14 -0700

    rcu: Decrease FQS scan wait time in case of callback overloading

    The force-quiesce-state loop function rcu_gp_fqs_loop() checks for
    callback overloading and does an immediate initial scan for idle CPUs
    if so.  However, subsequent rescans will be carried out at as leisurely a
    rate as they always are, as specified by the rcutree.jiffies_till_next_fqs
    module parameter.  It might be tempting to just continue immediately
    rescanning, but this turns the RCU grace-period kthread into a CPU hog.
    It might also be tempting to reduce the time between rescans to a single
    jiffy, but this can be problematic on larger systems.

    This commit therefore divides the normal time between rescans by three,
    rounding up.  Thus a small system running at HZ=1000 that is suffering
    from callback overload will wait only one jiffy instead of the normal
    three between rescans.

    [ paulmck: Apply Neeraj Upadhyay feedback. ]

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:19 -04:00
Waiman Long 3436a57e93 context_tracking: Convert state to atomic_t
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 171476775d32a40bfebf83250136c19b2e842672
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Wed, 8 Jun 2022 16:40:35 +0200

    context_tracking: Convert state to atomic_t

    Context tracking's state and dynticks counter are going to be merged
    in a single field so that both updates can happen atomically and at the
    same time. Prepare for that with converting the state into an atomic_t.

    [ paulmck: Apply kernel test robot feedback. ]

    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
    Cc: Marcelo Tosatti <mtosatti@redhat.com>
    Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
    Cc: Yu Liao <liaoyu15@huawei.com>
    Cc: Phil Auld <pauld@redhat.com>
    Cc: Paul Gortmaker<paul.gortmaker@windriver.com>
    Cc: Alex Belits <abelits@marvell.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
    Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:18 -04:00
Waiman Long 5b925bf582 rcu/context-tracking: Move RCU-dynticks internal functions to context_tracking
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 172114552701b85d5c3b1a089a73ee85d0d7786b
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Wed, 8 Jun 2022 16:40:33 +0200

    rcu/context-tracking: Move RCU-dynticks internal functions to context_tracking

    Move the core RCU eqs/dynticks functions to context tracking so that
    we can later merge all that code within context tracking.

    Acked-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
    Cc: Marcelo Tosatti <mtosatti@redhat.com>
    Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
    Cc: Yu Liao <liaoyu15@huawei.com>
    Cc: Phil Auld <pauld@redhat.com>
    Cc: Paul Gortmaker<paul.gortmaker@windriver.com>
    Cc: Alex Belits <abelits@marvell.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
    Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:18 -04:00
Waiman Long 166bdb926e rcu/context-tracking: Move deferred nocb resched to context tracking
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 564506495ca96a6e66d077d3d5b9f02d4b9b0f45
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Wed, 8 Jun 2022 16:40:32 +0200

    rcu/context-tracking: Move deferred nocb resched to context tracking

    To prepare for migrating the RCU eqs accounting code to context tracking,
    split the last-resort deferred nocb resched from rcu_user_enter() and
    move it into a separate call from context tracking.

    Acked-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
    Cc: Marcelo Tosatti <mtosatti@redhat.com>
    Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
    Cc: Yu Liao <liaoyu15@huawei.com>
    Cc: Phil Auld <pauld@redhat.com>
    Cc: Paul Gortmaker<paul.gortmaker@windriver.com>
    Cc: Alex Belits <abelits@marvell.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
    Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:18 -04:00
Waiman Long e0440c243a rcu/context_tracking: Move dynticks_nmi_nesting to context tracking
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 95e04f48ec0a634e2f221081f5fa1a904755f326
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Wed, 8 Jun 2022 16:40:31 +0200

    rcu/context_tracking: Move dynticks_nmi_nesting to context tracking

    The RCU eqs tracking is going to be performed by the context tracking
    subsystem. The related nesting counters thus need to be moved to the
    context tracking structure.

    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
    Cc: Marcelo Tosatti <mtosatti@redhat.com>
    Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
    Cc: Yu Liao <liaoyu15@huawei.com>
    Cc: Phil Auld <pauld@redhat.com>
    Cc: Paul Gortmaker<paul.gortmaker@windriver.com>
    Cc: Alex Belits <abelits@marvell.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
    Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:17 -04:00
Waiman Long c1013cee1d rcu/context_tracking: Move dynticks_nesting to context tracking
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 904e600e60f46f92eb4bcfb95788b1fedf7e8237
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Wed, 8 Jun 2022 16:40:30 +0200

    rcu/context_tracking: Move dynticks_nesting to context tracking

    The RCU eqs tracking is going to be performed by the context tracking
    subsystem. The related nesting counters thus need to be moved to the
    context tracking structure.

    Acked-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
    Cc: Marcelo Tosatti <mtosatti@redhat.com>
    Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
    Cc: Yu Liao <liaoyu15@huawei.com>
    Cc: Phil Auld <pauld@redhat.com>
    Cc: Paul Gortmaker<paul.gortmaker@windriver.com>
    Cc: Alex Belits <abelits@marvell.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
    Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:17 -04:00
Waiman Long 8640b64310 rcu/context_tracking: Move dynticks counter to context tracking
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 62e2412df4b90ae6706ce1f1a9649b789b2e44ef
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Wed, 8 Jun 2022 16:40:29 +0200

    rcu/context_tracking: Move dynticks counter to context tracking

    In order to prepare for merging RCU dynticks counter into the context
    tracking state, move the rcu_data's dynticks field to the context
    tracking structure. It will later be mixed within the context tracking
    state itself.

    [ paulmck: Move enum ctx_state into global scope. ]

    Acked-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
    Cc: Marcelo Tosatti <mtosatti@redhat.com>
    Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
    Cc: Yu Liao <liaoyu15@huawei.com>
    Cc: Phil Auld <pauld@redhat.com>
    Cc: Paul Gortmaker<paul.gortmaker@windriver.com>
    Cc: Alex Belits <abelits@marvell.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
    Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:17 -04:00
Waiman Long 887bd73cb2 rcu/context-tracking: Remove rcu_irq_enter/exit()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 3864caafe7c66f01b188ffccb6a4215f3bf56292
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Wed, 8 Jun 2022 16:40:28 +0200

    rcu/context-tracking: Remove rcu_irq_enter/exit()

    Now rcu_irq_enter/exit() is an unnecessary middle call between
    ct_irq_enter/exit() and nmi_irq_enter/exit(). Take this opportunity
    to remove the former functions and move the comments above them to the
    new entrypoints.

    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
    Cc: Marcelo Tosatti <mtosatti@redhat.com>
    Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
    Cc: Yu Liao <liaoyu15@huawei.com>
    Cc: Phil Auld <pauld@redhat.com>
    Cc: Paul Gortmaker<paul.gortmaker@windriver.com>
    Cc: Alex Belits <abelits@marvell.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
    Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:17 -04:00
Waiman Long 034dc8d70a context_tracking: Take idle eqs entrypoints over RCU
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit e67198cc05b8ecbb7b8e2d8ef9fb5c8d26821873
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Wed, 8 Jun 2022 16:40:25 +0200

    context_tracking: Take idle eqs entrypoints over RCU

    The RCU dynticks counter is going to be merged into the context tracking
    subsystem. Start with moving the idle extended quiescent states
    entrypoints to context tracking. For now those are dumb redirections to
    existing RCU calls.

    [ paulmck: Apply kernel test robot feedback. ]

    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
    Cc: Marcelo Tosatti <mtosatti@redhat.com>
    Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
    Cc: Yu Liao <liaoyu15@huawei.com>
    Cc: Phil Auld <pauld@redhat.com>
    Cc: Paul Gortmaker<paul.gortmaker@windriver.com>
    Cc: Alex Belits <abelits@marvell.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
    Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:16 -04:00
Waiman Long 37eb1b0bb2 rcu: Apply noinstr to rcu_idle_enter() and rcu_idle_exit()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit ed4ae5eff4b38797607cbdd80da394149110fb37
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Tue, 17 May 2022 21:00:04 -0700

    rcu: Apply noinstr to rcu_idle_enter() and rcu_idle_exit()

    This commit applies the "noinstr" tag to the rcu_idle_enter() and
    rcu_idle_exit() functions, which are invoked from portions of the idle
    loop that cannot be instrumented.  These tags require reworking the
    rcu_eqs_enter() and rcu_eqs_exit() functions that these two functions
    invoke in order to cause them to use normal assertions rather than
    lockdep.  In addition, within rcu_idle_exit(), the raw versions of
    local_irq_save() and local_irq_restore() are used, again to avoid issues
    with lockdep in uninstrumented code.

    This patch is based in part on an earlier patch by Jiri Olsa, discussions
    with Peter Zijlstra and Frederic Weisbecker, earlier changes by Thomas
    Gleixner, and off-list discussions with Yonghong Song.

    Link: https://lore.kernel.org/lkml/20220515203653.4039075-1-jolsa@kernel.org/
    Reported-by: Jiri Olsa <jolsa@kernel.org>
    Reported-by: Alexei Starovoitov <ast@kernel.org>
    Reported-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Yonghong Song <yhs@fb.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:11 -04:00
Waiman Long bbdc7c0871 rcu: Provide a get_completed_synchronize_rcu() function
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 414c12385d4741e35d88670c6cc2f40a77809734
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Wed, 13 Apr 2022 15:17:25 -0700

    rcu: Provide a get_completed_synchronize_rcu() function

    It is currently up to the caller to handle stale return values from
    get_state_synchronize_rcu().  If poll_state_synchronize_rcu() returned
    true once, a grace period has elapsed, regardless of the fact that counter
    wrap might cause some future poll_state_synchronize_rcu() invocation to
    return false.  For example, the caller might store a separate flag that
    indicates whether some previous call to poll_state_synchronize_rcu()
    determined that the relevant grace period had already ended.

    This approach works, but it requires extra storage and is easy to get
    wrong.  This commit therefore introduces a get_completed_synchronize_rcu()
    that returns a cookie that causes poll_state_synchronize_rcu() to always
    return true.  This already-completed cookie can be stored in place of the
    cookie that previously caused poll_state_synchronize_rcu() to return true.
    It can also be used to flag a given structure as not having been exposed
    to readers, and thus not requiring a grace period to elapse.

    This commit is in preparation for polled expedited grace periods.

    Link: https://lore.kernel.org/all/20220121142454.1994916-1-bfoster@redhat.com/
    Link: https://docs.google.com/document/d/1RNKWW9jQyfjxw2E8dsXVTdvZYh0HnYeSHDKog9jhdN8/edit?usp=sharing
    Cc: Brian Foster <bfoster@redhat.com>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Ian Kent <raven@themaw.net>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:06 -04:00
Waiman Long 67aa89a8ff rcu: Make normal polling GP be more precise about sequence numbers
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 2403e8044f222e7c816fb2416661f5f469662973
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Mon, 21 Mar 2022 18:41:46 -0700

    rcu: Make normal polling GP be more precise about sequence numbers

    Currently, poll_state_synchronize_rcu() uses rcu_seq_done() to check
    whether the specified grace period has completed.  However, rcu_seq_done()
    does a simple comparison that reserves have of the sequence-number space
    for uncompleted grace periods.  This has the unfortunate side-effect
    of not handling sequence-number wrap gracefully.  Of course, one can
    argue that if someone has already waited for half of the full range of
    grace periods, they can wait for the other half, but why wait at all in
    this case?

    This commit therefore creates a rcu_seq_done_exact() that counts as
    uncompleted only the two grace periods during which the sequence number
    might have been handed out, while still being uncompleted.  This way,
    if sequence-number wrap happens to hit that range, at most two additional
    grace periods need be waited for.

    This commit is in preparation for polled expedited grace periods.

    Link: https://lore.kernel.org/all/20220121142454.1994916-1-bfoster@redhat.com/
    Link: https://docs.google.com/document/d/1RNKWW9jQyfjxw2E8dsXVTdvZYh0HnYeSHDKog9jhdN8/edit?usp=sharing
    Cc: Brian Foster <bfoster@redhat.com>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Ian Kent <raven@themaw.net>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:05 -04:00
Chris von Recklinghausen 8dced2b153 mm: shrinkers: provide shrinkers with names
Bugzilla: https://bugzilla.redhat.com/2160210

commit e33c267ab70de4249d22d7eab1cc7d68a889bac2
Author: Roman Gushchin <roman.gushchin@linux.dev>
Date:   Tue May 31 20:22:24 2022 -0700

    mm: shrinkers: provide shrinkers with names

    Currently shrinkers are anonymous objects.  For debugging purposes they
    can be identified by count/scan function names, but it's not always
    useful: e.g.  for superblock's shrinkers it's nice to have at least an
    idea of to which superblock the shrinker belongs.

    This commit adds names to shrinkers.  register_shrinker() and
    prealloc_shrinker() functions are extended to take a format and arguments
    to master a name.

    In some cases it's not possible to determine a good name at the time when
    a shrinker is allocated.  For such cases shrinker_debugfs_rename() is
    provided.

    The expected format is:
        <subsystem>-<shrinker_type>[:<instance>]-<id>
    For some shrinkers an instance can be encoded as (MAJOR:MINOR) pair.

    After this change the shrinker debugfs directory looks like:
      $ cd /sys/kernel/debug/shrinker/
      $ ls
        dquota-cache-16     sb-devpts-28     sb-proc-47       sb-tmpfs-42
        mm-shadow-18        sb-devtmpfs-5    sb-proc-48       sb-tmpfs-43
        mm-zspool:zram0-34  sb-hugetlbfs-17  sb-pstore-31     sb-tmpfs-44
        rcu-kfree-0         sb-hugetlbfs-33  sb-rootfs-2      sb-tmpfs-49
        sb-aio-20           sb-iomem-12      sb-securityfs-6  sb-tracefs-13
        sb-anon_inodefs-15  sb-mqueue-21     sb-selinuxfs-22  sb-xfs:vda1-36
        sb-bdev-3           sb-nsfs-4        sb-sockfs-8      sb-zsmalloc-19
        sb-bpf-32           sb-pipefs-14     sb-sysfs-26      thp-deferred_split-10
        sb-btrfs:vda2-24    sb-proc-25       sb-tmpfs-1       thp-zero-9
        sb-cgroup2-30       sb-proc-39       sb-tmpfs-27      xfs-buf:vda1-37
        sb-configfs-23      sb-proc-41       sb-tmpfs-29      xfs-inodegc:vda1-38
        sb-dax-11           sb-proc-45       sb-tmpfs-35
        sb-debugfs-7        sb-proc-46       sb-tmpfs-40

    [roman.gushchin@linux.dev: fix build warnings]
      Link: https://lkml.kernel.org/r/Yr+ZTnLb9lJk6fJO@castle
      Reported-by: kernel test robot <lkp@intel.com>
    Link: https://lkml.kernel.org/r/20220601032227.4076670-4-roman.gushchin@linux.dev
    Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
    Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
    Cc: Dave Chinner <dchinner@redhat.com>
    Cc: Hillf Danton <hdanton@sina.com>
    Cc: Kent Overstreet <kent.overstreet@gmail.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:17 -04:00
Waiman Long d45fbffb5b rcu: Move expedited grace period (GP) work to RT kthread_worker
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491
Conflicts:
 1) A merge conflict in kernel/rcu/rcu.h due to upstream merge conflict
    with commit 99d6a2acb895 ("rcutorture: Suppress debugging grace
    period delays during flooding"). Manually merge according to upstream
    merge commit ce13389053a3.
 2) A fuzz in kernel/rcu/tree.c due to upstream merge conflict with
    commit 87c5adf06bfb ("rcu/nocb: Initialize nocb kthreads only
    for boot CPU prior SMP initialization") and commit 3352911fa9b4
    ("rcu: Initialize boost kthread only for boot node prior SMP
    initialization"). See upstream merge commit ce13389053a3.

commit 9621fbee44df940e2e1b94b0676460a538dffefa
Author: Kalesh Singh <kaleshsingh@google.com>
Date:   Fri, 8 Apr 2022 17:35:27 -0700

    rcu: Move expedited grace period (GP) work to RT kthread_worker

    Enabling CONFIG_RCU_BOOST did not reduce RCU expedited grace-period
    latency because its workqueues run at SCHED_OTHER, and thus can be
    delayed by normal processes.  This commit avoids these delays by moving
    the expedited GP work items to a real-time-priority kthread_worker.

    This option is controlled by CONFIG_RCU_EXP_KTHREAD and disabled by
    default on PREEMPT_RT=y kernels which disable expedited grace periods
    after boot by unconditionally setting rcupdate.rcu_normal_after_boot=1.

    The results were evaluated on arm64 Android devices (6GB ram) running
    5.10 kernel, and capturing trace data in critical user-level code.

    The table below shows the resulting order-of-magnitude improvements
    in synchronize_rcu_expedited() latency:

    ------------------------------------------------------------------------
    |                          |   workqueues  |  kthread_worker |  Diff   |
    ------------------------------------------------------------------------
    | Count                    |          725  |            688  |         |
    ------------------------------------------------------------------------
    | Min Duration       (ns)  |          326  |            447  |  37.12% |
    ------------------------------------------------------------------------
    | Q1                 (ns)  |       39,428  |         38,971  |  -1.16% |
    ------------------------------------------------------------------------
    | Q2 - Median        (ns)  |       98,225  |         69,743  | -29.00% |
    ------------------------------------------------------------------------
    | Q3                 (ns)  |      342,122  |        126,638  | -62.98% |
    ------------------------------------------------------------------------
    | Max Duration       (ns)  |  372,766,967  |      2,329,671  | -99.38% |
    ------------------------------------------------------------------------
    | Avg Duration       (ns)  |    2,746,353  |        151,242  | -94.49% |
    ------------------------------------------------------------------------
    | Standard Deviation (ns)  |   19,327,765  |        294,408  |         |
    ------------------------------------------------------------------------

    The below table show the range of maximums/minimums for
    synchronize_rcu_expedited() latency from all experiments:

    ------------------------------------------------------------------------
    |                          |   workqueues  |  kthread_worker |  Diff   |
    ------------------------------------------------------------------------
    | Total No. of Experiments |           25  |             23  |         |
    ------------------------------------------------------------------------
    | Largest  Maximum   (ns)  |  372,766,967  |      2,329,671  | -99.38% |
    ------------------------------------------------------------------------
    | Smallest Maximum   (ns)  |       38,819  |         86,954  | 124.00% |
    ------------------------------------------------------------------------
    | Range of Maximums  (ns)  |  372,728,148  |      2,242,717  |         |
    ------------------------------------------------------------------------
    | Largest  Minimum   (ns)  |       88,623  |         27,588  | -68.87% |
    ------------------------------------------------------------------------
    | Smallest Minimum   (ns)  |          326  |            447  |  37.12% |
    ------------------------------------------------------------------------
    | Range of Minimums  (ns)  |       88,297  |         27,141  |         |
    ------------------------------------------------------------------------

    Cc: "Paul E. McKenney" <paulmck@kernel.org>
    Cc: Tejun Heo <tj@kernel.org>
    Reported-by: Tim Murray <timmurray@google.com>
    Reported-by: Wei Wang <wvw@google.com>
    Tested-by: Kyle Lin <kylelin@google.com>
    Tested-by: Chunwei Lu <chunweilu@google.com>
    Tested-by: Lulu Wang <luluw@google.com>
    Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-08-30 17:38:28 -04:00
Waiman Long da53e146fe rcu: Fix preemption mode check on synchronize_rcu[_expedited]()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491

commit 70ae7b0ce03347fab35d6d8df81e1165d7ea8045
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Mon, 14 Mar 2022 14:37:38 +0100

    rcu: Fix preemption mode check on synchronize_rcu[_expedited]()

    An early check on synchronize_rcu[_expedited]() tries to determine if
    the current CPU is in UP mode on an SMP no-preempt kernel, in which case
    there is no need to start a grace period since the current assumed
    quiescent state is all we need.

    However the preemption mode doesn't take into account the boot selected
    preemption mode under CONFIG_PREEMPT_DYNAMIC=y, missing a possible
    early return if the running flavour is "none" or "voluntary".

    Use the shiny new preempt mode accessors to fix this.  However,
    avoid invoking them during early boot because doing so triggers a
    WARN_ON_ONCE().

    [ paulmck: Update for mainlined API. ]

    Reported-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Cc: Valentin Schneider <valentin.schneider@arm.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-08-30 17:22:13 -04:00
Waiman Long 2ceaa01398 rcu: Add comments to final rcu_gp_cleanup() "if" statement
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491

commit 75182a4eaaf8b697f66d68ad039f021f461dd2a4
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Wed, 2 Mar 2022 11:01:37 -0800

    rcu: Add comments to final rcu_gp_cleanup() "if" statement

    The final "if" statement in rcu_gp_cleanup() has proven to be rather
    confusing, straightforward though it might have seemed when initially
    written.  This commit therefore adds comments to its "then" and "else"
    clauses to at least provide a more elevated form of confusion.

    Reported-by: Boqun Feng <boqun.feng@gmail.com>
    Reported-by: Frederic Weisbecker <frederic@kernel.org>
    Reported-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Reported-by: Uladzislau Rezki <urezki@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-08-30 17:22:12 -04:00
Waiman Long f12dfd4e5c rcu: Check for jiffies going backwards
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491

commit c708b08c65a0dfae127b9ee33b0fb73535a5e066
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Wed, 23 Feb 2022 17:29:37 -0800

    rcu: Check for jiffies going backwards

    A report of a 12-jiffy normal RCU CPU stall warning raises interesting
    questions about the nature of time on the offending system.  This commit
    instruments rcu_sched_clock_irq(), which is RCU's hook into the
    scheduling-clock interrupt, checking for the jiffies counter going
    backwards.

    Reported-by: Saravanan D <sarvanand@fb.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-08-30 17:22:11 -04:00
Waiman Long b0678da638 rcutorture: Suppress debugging grace period delays during flooding
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491

commit 99d6a2acb8955f12489bfba04f2db22bc0b57726
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Fri, 4 Feb 2022 12:45:18 -0800

    rcutorture: Suppress debugging grace period delays during flooding

    Tree RCU supports grace-period delays using the rcutree.gp_cleanup_delay,
    rcutree.gp_init_delay, and rcutree.gp_preinit_delay kernel boot
    parameters.  These delays are strictly for debugging purposes, and have
    proven quite effective at exposing bugs involving race with CPU-hotplug
    operations.  However, these delays can result in false positives when
    used in conjunction with callback flooding, for example, those generated
    by the rcutorture.fwd_progress kernel boot parameter.

    This commit therefore suppresses grace-period delays while callback
    flooding is in progress.

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-08-30 17:22:06 -04:00
Waiman Long bc54b27cee rcu-tasks: Make Tasks RCU account for userspace execution
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491

commit 5d90070816534882b9158f14154b7e2cdef1194a
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Fri, 4 Mar 2022 10:41:44 -0800

    rcu-tasks: Make Tasks RCU account for userspace execution

    The main Tasks RCU quiescent state is voluntary context switch.  However,
    userspace execution is also a valid quiescent state, and is a valuable one
    for userspace applications that spin repeatedly executing light-weight
    non-sleeping system calls.  Currently, such an application can delay a
    Tasks RCU grace period for many tens of seconds.

    This commit therefore enlists the aid of the scheduler-clock interrupt to
    provide a Tasks RCU quiescent state when it interrupted a task executing
    in userspace.

    [ paulmck: Apply feedback from kernel test robot. ]

    Cc: Martin KaFai Lau <kafai@fb.com>
    Cc: Neil Spring <ntspring@fb.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-08-30 17:22:03 -04:00
Waiman Long c54a776b65 rcu/nocb: Initialize nocb kthreads only for boot CPU prior SMP initialization
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491

commit 87c5adf06bfbf14c9d13e59d5d174ff5f2aafc0e
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Wed, 16 Feb 2022 16:42:08 +0100

    rcu/nocb: Initialize nocb kthreads only for boot CPU prior SMP initialization

    The rcu_spawn_gp_kthread() function is called as an early initcall, which
    means that SMP initialization hasn't happened yet and only the boot CPU is
    online. Therefore, create only the NOCB kthreads related to the boot CPU.

    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-08-30 17:22:01 -04:00
Waiman Long b19ed13b34 rcu: Initialize boost kthread only for boot node prior SMP initialization
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491

commit 3352911fa9b47a90165e5c6fed440048c55146d1
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Wed, 16 Feb 2022 16:42:07 +0100

    rcu: Initialize boost kthread only for boot node prior SMP initialization

    The rcu_spawn_gp_kthread() function is called as an early initcall,
    which means that SMP initialization hasn't happened yet and only the
    boot CPU is online.  Therefore, create only the boost kthread for the
    leaf node of the boot CPU.

    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-08-30 17:22:01 -04:00
Waiman Long 5779af3081 rcu: Assume rcu_init() is called before smp
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491

commit 2eed973adc6e749439730e53e6220b122398d319
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Wed, 16 Feb 2022 16:42:06 +0100

    rcu: Assume rcu_init() is called before smp

    The rcu_init() function is called way before SMP is initialized and
    therefore only the boot CPU should be online at this stage.

    Simplify the boot per-cpu initialization accordingly.

    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-08-30 17:22:01 -04:00
Waiman Long a9408fae13 rcu: Add per-CPU rcuc task dumps to RCU CPU stall warnings
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713

commit c9515875850fefcc79492c5189fe8431e75ddec5
Author: Zqiang <qiang1.zhang@intel.com>
Date:   Tue, 25 Jan 2022 10:47:44 +0800

    rcu: Add per-CPU rcuc task dumps to RCU CPU stall warnings

    When the rcutree.use_softirq kernel boot parameter is set to zero, all
    RCU_SOFTIRQ processing is carried out by the per-CPU rcuc kthreads.
    If these kthreads are being starved, quiescent states will not be
    reported, which in turn means that the grace period will not end, which
    can in turn trigger RCU CPU stall warnings.  This commit therefore dumps
    stack traces of stalled CPUs' rcuc kthreads, which can help identify
    what is preventing those kthreads from running.

    Suggested-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
    Reviewed-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-05-12 08:30:04 -04:00
Waiman Long 22f9156241 rcu: Elevate priority of offloaded callback threads
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713

commit c8b16a65267e35ecc5621dbc81cbe7e5b0992fce
Author: Alison Chaiken <achaiken@aurora.tech>
Date:   Tue, 11 Jan 2022 15:32:52 -0800

    rcu: Elevate priority of offloaded callback threads

    When CONFIG_PREEMPT_RT=y, the rcutree.kthread_prio command-line
    parameter signals initialization code to boost the priority of rcuc
    callbacks to the designated value.  With the additional
    CONFIG_RCU_NOCB_CPU=y configuration and an additional rcu_nocbs
    command-line parameter, the callbacks on the listed cores are
    offloaded to new rcuop kthreads that are not pinned to the cores whose
    post-grace-period work is performed.  While the rcuop kthreads perform
    the same function as the rcuc kthreads they offload, the kthread_prio
    parameter only boosts the priority of the rcuc kthreads.  Fix this
    inconsistency by elevating rcuop kthreads to the same priority as the rcuc
    kthreads.

    Signed-off-by: Alison Chaiken <achaiken@aurora.tech>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-05-12 08:26:09 -04:00
Waiman Long 3dc8452aa5 rcu: Move kthread_prio bounds-check to a separate function
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713

commit c8db27dd0ea8071d2ea29a1a401c4ccc611ec6c1
Author: Alison Chaiken <achaiken@aurora.tech>
Date:   Tue, 11 Jan 2022 15:32:50 -0800

    rcu: Move kthread_prio bounds-check to a separate function

    Move the bounds-check of the kthread_prio cmdline parameter to a new
    function in order to faciliate a different callsite.

    Signed-off-by: Alison Chaiken <achaiken@aurora.tech>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-05-12 08:26:08 -04:00
Waiman Long f3300badb5 rcu: Create per-cpu rcuc kthreads only when rcutree.use_softirq=0
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713

commit 4b4399b2450de38916718ba9947e6cdb69c99c55
Author: Zqiang <qiang1.zhang@intel.com>
Date:   Wed, 29 Dec 2021 00:05:10 +0800

    rcu: Create per-cpu rcuc kthreads only when rcutree.use_softirq=0

    The per-CPU "rcuc" kthreads are used only by kernels booted with
    rcutree.use_softirq=0, but they are nevertheless unconditionally created
    by kernels built with CONFIG_RCU_BOOST=y.  This results in "rcuc"
    kthreads being created that are never actually used.  This commit
    therefore refrains from creating these kthreads unless the kernel
    is actually booted with rcutree.use_softirq=0.

    Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-05-12 08:26:03 -04:00
Waiman Long 7e57b41b6e kasan: Record work creation stack trace with interrupts enabled
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713

commit d818cc76e2b4d5f6cebf8c7ce1160d652d7e572b
Author: Zqiang <qiang1.zhang@intel.com>
Date:   Sun, 26 Dec 2021 08:52:04 +0800

    kasan: Record work creation stack trace with interrupts enabled

    Recording the work creation stack trace for KASAN reports in
    call_rcu() is expensive, due to unwinding the stack, but also
    due to acquiring depot_lock inside stackdepot (which may be contended).
    Because calling kasan_record_aux_stack_noalloc() does not require
    interrupts to already be disabled, this may unnecessarily extend
    the time with interrupts disabled.

    Therefore, move calling kasan_record_aux_stack() before the section
    with interrupts disabled.

    Acked-by: Marco Elver <elver@google.com>
    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-05-12 08:26:03 -04:00
Waiman Long 07e0b8909d rcu: Inline __call_rcu() into call_rcu()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713

commit 1fe09ebe7a9c9907f516779fbe4954165dd01529
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Sat, 18 Dec 2021 09:30:33 -0800

    rcu: Inline __call_rcu() into call_rcu()

    Because __call_rcu() is invoked only by call_rcu(), this commit inlines
    the former into the latter.

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-05-12 08:26:02 -04:00
Waiman Long 4c01b1af26 rcu: Make rcu_barrier() no longer block CPU-hotplug operations
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713

commit 80b3fd474c91b3ecfd845b4a0bfb58706b877ba5
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Tue, 14 Dec 2021 13:35:17 -0800

    rcu: Make rcu_barrier() no longer block CPU-hotplug operations

    This commit removes the cpus_read_lock() and cpus_read_unlock() calls
    from rcu_barrier(), thus allowing CPUs to come and go during the course
    of rcu_barrier() execution.  Posting of the ->barrier_head callbacks does
    synchronize with portions of RCU's CPU-hotplug notifiers, but these locks
    are held for short time periods on both sides.  Thus, full CPU-hotplug
    operations could both start and finish during the execution of a given
    rcu_barrier() invocation.

    Additional synchronization is provided by a global ->barrier_lock.
    Since the ->barrier_lock is only used during rcu_barrier() execution and
    during onlining/offlining a CPU, the contention for this lock should
    be low.  It might be tempting to make use of a per-CPU lock just on
    general principles, but straightforward attempts to do this have the
    problems shown below.

    Initial state: 3 CPUs present, CPU 0 and CPU1 do not have
    any callback and CPU2 has callbacks.

    1. CPU0 calls rcu_barrier().

    2. CPU1 starts offlining for CPU2. CPU1 calls
       rcutree_migrate_callbacks(). rcu_barrier_entrain() is called
       from rcutree_migrate_callbacks(), with CPU2's rdp->barrier_lock.
       It does not entrain ->barrier_head for CPU2, as rcu_barrier()
       on CPU0 hasn't started the barrier sequence (by calling
       rcu_seq_start(&rcu_state.barrier_sequence)) yet.

    3. CPU0 starts new barrier sequence. It iterates over
       CPU0 and CPU1, after acquiring their per-cpu ->barrier_lock
       and finds 0 segcblist length. It updates ->barrier_seq_snap
       for CPU0 and CPU1 and continues loop iteration to CPU2.

        for_each_possible_cpu(cpu) {
            raw_spin_lock_irqsave(&rdp->barrier_lock, flags);
            if (!rcu_segcblist_n_cbs(&rdp->cblist)) {
                WRITE_ONCE(rdp->barrier_seq_snap, gseq);
                raw_spin_unlock_irqrestore(&rdp->barrier_lock, flags);
                rcu_barrier_trace(TPS("NQ"), cpu, rcu_state.barrier_sequence);
                continue;
            }

    4. rcutree_migrate_callbacks() completes execution on CPU1.
       Segcblist len for CPU2 becomes 0.

    5. The loop iteration on CPU0, checks rcu_segcblist_n_cbs(&rdp->cblist)
       for CPU2 and completes the loop iteration after setting
       ->barrier_seq_snap.

    6. As there isn't any ->barrier_head callback entrained; at
       this point, rcu_barrier() in CPU0 returns.

    7. The callbacks, which migrated from CPU2 to CPU1, execute.

    Straightforward per-CPU locking is also subject to the following race
    condition noted by Boqun Feng:

    1. CPU0 calls rcu_barrier(), starting a new barrier sequence by invoking
       rcu_seq_start() and init_completion(), but does not yet initialize
       rcu_state.barrier_cpu_count.

    2. CPU1 starts offlining for CPU2, calling rcutree_migrate_callbacks(),
       which in turn calls rcu_barrier_entrain() holding CPU2's.
       rdp->barrier_lock.  It then entrains ->barrier_head for CPU2
       and atomically increments rcu_state.barrier_cpu_count, which is
       unfortunately not yet initialized to the value 2.

    3. The just-entrained RCU callback is invoked.  It atomically
       decrements rcu_state.barrier_cpu_count and sees that it is
       now zero.  This callback therefore invokes complete().

    4. CPU0 continues executing rcu_barrier(), but is not blocked
       by its call to wait_for_completion().  This results in rcu_barrier()
       returning before all pre-existing callbacks have been invoked,
       which is a bug.

    Therefore, synchronization is provided by rcu_state.barrier_lock,
    which is also held across the initialization sequence, especially the
    rcu_seq_start() and the atomic_set() that sets rcu_state.barrier_cpu_count
    to the value 2.  In addition, this lock is held when entraining the
    rcu_barrier() callback, when deciding whether or not a CPU has callbacks
    that rcu_barrier() must wait on, when setting the ->qsmaskinitnext for
    incoming CPUs, and when migrating callbacks from a CPU that is going
    offline.

    Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
    Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-05-12 08:25:57 -04:00
Waiman Long 6d38f5233d rcu: Rework rcu_barrier() and callback-migration logic
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713

commit a16578dd5e3a44b53ca0699ac2971679dab97484
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Tue, 14 Dec 2021 13:15:18 -0800

    rcu: Rework rcu_barrier() and callback-migration logic

    This commit reworks rcu_barrier() and callback-migration logic to
    permit allowing rcu_barrier() to run concurrently with CPU-hotplug
    operations.  The key trick is for callback migration to check to see if
    an rcu_barrier() is in flight, and, if so, enqueue the ->barrier_head
    callback on its behalf.

    This commit adds synchronization with RCU's CPU-hotplug notifiers.  Taken
    together, this will permit a later commit to remove the cpus_read_lock()
    and cpus_read_unlock() calls from rcu_barrier().

    [ paulmck: Updated per kbuild test robot feedback. ]
    [ paulmck: Updated per reviews session with Neeraj, Frederic, Uladzislau, and Boqun. ]

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-05-12 08:25:56 -04:00
Waiman Long e65be485f7 rcu: Refactor rcu_barrier() empty-list handling
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713

commit 0cabb47af3cfaeb6007ba3868379bbd4daee64cc
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Fri, 10 Dec 2021 16:25:20 -0800

    rcu: Refactor rcu_barrier() empty-list handling

    This commit saves a few lines by checking first for an empty callback
    list.  If the callback list is empty, then that CPU is taken care of,
    regardless of its online or nocb state.  Also simplify tracing accordingly
    and fold a few lines together.

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-05-12 08:25:54 -04:00
Waiman Long 9f48f77ccc rcu: Create and use an rcu_rdp_cpu_online()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713

commit 5ae0f1b58b28b53f4ab3708ef9337a2665e79664
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Fri, 10 Dec 2021 13:44:17 -0800

    rcu: Create and use an rcu_rdp_cpu_online()

    The pattern "rdp->grpmask & rcu_rnp_online_cpus(rnp)" occurs frequently
    in RCU code in order to determine whether rdp->cpu is online from an
    RCU perspective.  This commit therefore creates an rcu_rdp_cpu_online()
    function to replace it.

    [ paulmck: Apply kernel test robot unused-variable feedback. ]

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-05-12 08:25:49 -04:00
Waiman Long ba1bfcb746 rcu: Add mutex for rcu boost kthread spawning and affinity setting
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713
Conflicts: A fuzz in rcu_boost_kthread_setaffinity() of
	   kernel/rcu/tree_plugin.h due to the presence of a later
	   ustream commit 04d4e665a609 ("sched/isolation: Use single
	   feature type while referring to housekeeping cpumask").

commit 218b957a6959a2fb5b3967fc824072bb89ac2611
Author: David Woodhouse <dwmw@amazon.co.uk>
Date:   Wed, 8 Dec 2021 23:41:53 +0000

    rcu: Add mutex for rcu boost kthread spawning and affinity setting

    As we handle parallel CPU bringup, we will need to take care to avoid
    spawning multiple boost threads, or race conditions when setting their
    affinity. Spotted by Paul McKenney.

    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-05-12 08:25:17 -04:00
Waiman Long 5824fc0262 rcu: Kill rnp->ofl_seq and use only rcu_state.ofl_lock for exclusion
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713

commit 82980b1622d97017053c6792382469d7dc26a486
Author: David Woodhouse <dwmw@amazon.co.uk>
Date:   Tue, 16 Feb 2021 15:04:34 +0000

    rcu: Kill rnp->ofl_seq and use only rcu_state.ofl_lock for exclusion

    If we allow architectures to bring APs online in parallel, then we end
    up requiring rcu_cpu_starting() to be reentrant. But currently, the
    manipulation of rnp->ofl_seq is not thread-safe.

    However, rnp->ofl_seq is also fairly much pointless anyway since both
    rcu_cpu_starting() and rcu_report_dead() hold rcu_state.ofl_lock for
    fairly much the whole time that rnp->ofl_seq is set to an odd number
    to indicate that an operation is in progress.

    So drop rnp->ofl_seq completely, and use only rcu_state.ofl_lock.

    This has a couple of minor complexities: lockdep will complain when we
    take rcu_state.ofl_lock, and currently accepts the 'excuse' of having
    an odd value in rnp->ofl_seq. So switch it to an arch_spinlock_t to
    avoid that false positive complaint. Since we're killing rnp->ofl_seq
    of course that 'excuse' has to be changed too, so make it check for
    arch_spin_is_locked(rcu_state.ofl_lock).

    There's no arch_spin_lock_irqsave() so we have to manually save and
    restore local interrupts around the locking.

    At Paul's request based on Neeraj's analysis, make rcu_gp_init not just
    wait but *exclude* any CPU online/offline activity, which was fairly
    much true already by virtue of it holding rcu_state.ofl_lock.

    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-05-12 08:19:35 -04:00
Patrick Talbert ea38048f36 Merge: rcu: Backport upstream RCU related commits up to v5.17
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/602

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/602

This patch series backport upstream RCU and various torture tests up to
v5.17 kernel. Beside patch 10 which has a merge conflict due to upstream
merge conflict, the other patches are all applied cleanly with any issue.

Signed-off-by: Waiman Long <longman@redhat.com>
~~~
Waiman Long (112):
  torture: Apply CONFIG_KCSAN_STRICT to kvm.sh --kcsan argument
  torture: Make torture.sh print the number of files to be compressed
  rcu-nocb: Fix a couple of tree_nocb code-style nits
  rcu: Eliminate rcu_implicit_dynticks_qs() local variable rnhqp
  rcu: Eliminate rcu_implicit_dynticks_qs() local variable ruqp
  doc: Add another stall-warning root cause in stallwarn.rst
  rcu: Fix undefined Kconfig macros
  rcu: Comment rcu_gp_init() code waiting for CPU-hotplug operations
  rcu-tasks: Simplify trc_read_check_handler() atomic operations
  rcu-tasks: Add trc_inspect_reader() checks for exiting critical
    section
  rcu-tasks: Remove second argument of rcu_read_unlock_trace_special()
  rcu: Move rcu_dynticks_eqs_online() to rcu_cpu_starting()
  rcu: Simplify rcu_report_dead() call to rcu_report_exp_rdp()
  rcu: Make rcutree_dying_cpu() use its "cpu" parameter
  rcu-tasks: Wait for trc_read_check_handler() IPIs
  rcutorture: Suppressing read-exit testing is not an error
  rcu-tasks: Fix s/instruction/instructions/ typo in comment
  rcutorture: Warn on individual rcu_torture_init() error conditions
  locktorture: Warn on individual lock_torture_init() error conditions
  rcuscale: Warn on individual rcu_scale_init() error conditions
  rcutorture: Don't cpuhp_remove_state() if cpuhp_setup_state() failed
  rcu: Make rcu_normal_after_boot writable again
  rcu: Make rcu update module parameters world-readable
  rcu-tasks: Move RTGS_WAIT_CBS to beginning of rcu_tasks_kthread() loop
  rcu-tasks: Fix s/rcu_add_holdout/trc_add_holdout/ typo in comment
  rcu-tasks: Correct firstreport usage in check_all_holdout_tasks_trace
  rcu-tasks: Correct comparisons for CPU numbers in
    show_stalled_task_trace
  rcu-tasks: Clarify read side section info for rcu_tasks_rude GP
    primitives
  rcu: Fix existing exp request check in sync_sched_exp_online_cleanup()
  rcutorture: Avoid problematic critical section nesting on PREEMPT_RT
  rcu-tasks: Fix read-side primitives comment for call_rcu_tasks_trace
  rcu-tasks: Fix IPI failure handling in trc_wait_for_one_reader
  rcu: Replace ________p1 and _________p1 with __UNIQUE_ID(rcu)
  rcu-tasks: Update comments to cond_resched_tasks_rcu_qs()
  rcu: Ignore rdp.cpu_no_qs.b.exp on preemptible RCU's rcu_qs()
  rcu: Move rcu_data.cpu_no_qs.b.exp reset to rcu_export_exp_rdp()
  rcu: Remove rcu_data.exp_deferred_qs and convert to rcu_data.cpu
    no_qs.b.exp
  rcu-tasks: Don't remove tasks with pending IPIs from holdout list
  torture: Catch kvm.sh help text up with actual options
  rcutorture: Sanitize RCUTORTURE_RDR_MASK
  rcutorture: More thoroughly test nested readers
  srcu: Prevent redundant __srcu_read_unlock() wakeup
  rcutorture: Suppress pi-lock-across read-unlock testing for Tiny SRCU
  doc: Remove obsolete kernel-per-CPU-kthreads RCU_FAST_NO_HZ advice
  rcu: in_irq() cleanup
  rcu: Always inline rcu_dynticks_task*_{enter,exit}()
  rcu: Mark sync_sched_exp_online_cleanup() ->cpu_no_qs.b.exp load
  rcu: Prevent expedited GP from enabling tick on offline CPU
  rcu: Make idle entry report expedited quiescent states
  rcu/nocb: Make local rcu_nocb_lock_irqsave() safe against concurrent
    deoffloading
  rcu/nocb: Prepare state machine for a new step
  rcu/nocb: Invoke rcu_core() at the start of deoffloading
  rcu/nocb: Make rcu_core() callbacks acceleration preempt-safe
  rcu/nocb: Make rcu_core() callbacks acceleration (de-)offloading safe
  rcu/nocb: Check a stable offloaded state to manipulate
    qlen_last_fqs_check
  rcu/nocb: Use appropriate rcu_nocb_lock_irqsave()
  rcu/nocb: Limit number of softirq callbacks only on softirq
  rcu: Fix callbacks processing time limit retaining cond_resched()
  rcu: Apply callbacks processing time limit only on softirq
  rcu/nocb: Don't invoke local rcu core on callback overload from nocb
    kthread
  rcu: Improve tree_plugin.h comments and add code cleanups
  refscale: Simplify the errexit checkpoint
  refscale: Prevent buffer to pr_alert() being too long
  refscale: Always log the error message
  doc: Add refcount analogy to What is RCU
  refscale: Add missing '\n' to flush message
  scftorture: Add missing '\n' to flush message
  scftorture: Remove unused SCFTORTOUT
  scftorture: Account for weight_resched when checking for all zeroes
  rcuscale: Always log error message
  doc: RCU: Avoid 'Symbol' font-family in SVG figures
  scftorture: Always log error message
  locktorture,rcutorture,torture: Always log error message
  rcu-tasks: Create per-CPU callback lists
  rcu-tasks: Introduce ->percpu_enqueue_shift for dynamic queue
    selection
  rcu-tasks: Convert grace-period counter to grace-period sequence
    number
  rcu_tasks: Convert bespoke callback list to rcu_segcblist structure
  rcu-tasks: Use spin_lock_rcu_node() and friends
  rcu-tasks: Inspect stalled task's trc state in locked state
  rcu-tasks: Add a ->percpu_enqueue_lim to the rcu_tasks structure
  rcu-tasks: Abstract checking of callback lists
  rcu-tasks: Abstract invocations of callbacks
  rcutorture: Avoid soft lockup during cpu stall
  torture: Make kvm-find-errors.sh report link-time undefined symbols
  rcu-tasks: Use workqueues for multiple rcu_tasks_invoke_cbs()
    invocations
  rcu-tasks: Make rcu_barrier_tasks*() handle multiple callback queues
  rcu-tasks: Add rcupdate.rcu_task_enqueue_lim to set initial queueing
  rcutorture: Test RCU-tasks multiqueue callback queueing
  rcu: Avoid running boost kthreads on isolated CPUs
  rcu: Avoid alloc_pages() when recording stack
  rcutorture: Add CONFIG_PREEMPT_DYNAMIC=n to tiny scenarios
  torture: Retry download once before giving up
  rcu-tasks: Count trylocks to estimate call_rcu_tasks() contention
  rcu/nocb: Remove rcu_node structure from nocb list when de-offloaded
  rcu/nocb: Prepare nocb_cb_wait() to start with a non-offloaded rdp
  rcu/nocb: Optimize kthreads and rdp initialization
  rcu/nocb: Create kthreads on all CPUs if "rcu_nocbs=" or "nohz_full="
    are passed
  rcu/nocb: Allow empty "rcu_nocbs" kernel parameter
  rcu/nocb: Merge rcu_spawn_cpu_nocb_kthread() and
    rcu_spawn_one_nocb_kthread()
  rcutorture: Enable multiple concurrent callback-flood kthreads
  rcutorture: Cause TREE02 and TREE10 scenarios to do more callback
    flooding
  rcutorture: Add ability to limit callback-flood intensity
  rcutorture: Combine n_max_cbs from all kthreads in a callback flood
  rcu-tasks: Avoid raw-spinlocked wakeups from call_rcu_tasks_generic()
  rcu-tasks: Use more callback queues if contention encountered
  rcutorture: Test RCU Tasks lock-contention detection
  rcu-tasks: Use separate ->percpu_dequeue_lim for callback dequeueing
  rcu-tasks: Use fewer callbacks queues if callback flood ends
  rcu/exp: Mark current CPU as exp-QS in IPI loop second pass
  torture: Fix incorrectly redirected "exit" in kvm-remote.sh
  torture: Properly redirect kvm-remote.sh "echo" commands
  rcu-tasks: Fix computation of CPU-to-list shift counts

 .../Expedited-Grace-Periods/Funnel0.svg       |   4 +-
 .../Expedited-Grace-Periods/Funnel1.svg       |   4 +-
 .../Expedited-Grace-Periods/Funnel2.svg       |   4 +-
 .../Expedited-Grace-Periods/Funnel3.svg       |   4 +-
 .../Expedited-Grace-Periods/Funnel4.svg       |   4 +-
 .../Expedited-Grace-Periods/Funnel5.svg       |   4 +-
 .../Expedited-Grace-Periods/Funnel6.svg       |   4 +-
 .../Expedited-Grace-Periods/Funnel7.svg       |   4 +-
 .../Expedited-Grace-Periods/Funnel8.svg       |   4 +-
 .../Tree-RCU-Memory-Ordering.rst              |  69 +--
 .../Requirements/GPpartitionReaders1.svg      |  36 +-
 .../Requirements/ReadersPartitionGP1.svg      |  62 +-
 Documentation/RCU/stallwarn.rst               |  10 +
 Documentation/RCU/whatisRCU.rst               |  90 ++-
 .../admin-guide/kernel-parameters.txt         |  66 +-
 .../admin-guide/kernel-per-CPU-kthreads.rst   |   2 +-
 arch/sh/configs/sdk7786_defconfig             |   1 -
 arch/xtensa/configs/nommu_kc705_defconfig     |   1 -
 include/linux/rcu_segcblist.h                 |  51 +-
 include/linux/rcupdate.h                      |  50 +-
 include/linux/rcupdate_trace.h                |   5 +-
 include/linux/rcutiny.h                       |   2 +-
 include/linux/srcu.h                          |   3 +-
 include/linux/torture.h                       |  17 +-
 kernel/locking/locktorture.c                  |  18 +-
 kernel/rcu/Kconfig                            |   2 +-
 kernel/rcu/rcu_segcblist.c                    |  10 +-
 kernel/rcu/rcu_segcblist.h                    |  12 +-
 kernel/rcu/rcuscale.c                         |  24 +-
 kernel/rcu/rcutorture.c                       | 320 +++++++---
 kernel/rcu/refscale.c                         |  50 +-
 kernel/rcu/srcutiny.c                         |   2 +-
 kernel/rcu/tasks.h                            | 583 ++++++++++++++----
 kernel/rcu/tree.c                             | 119 ++--
 kernel/rcu/tree.h                             |  24 +-
 kernel/rcu/tree_exp.h                         |  15 +-
 kernel/rcu/tree_nocb.h                        | 162 +++--
 kernel/rcu/tree_plugin.h                      |  61 +-
 kernel/rcu/update.c                           |   8 +-
 kernel/scftorture.c                           |  20 +-
 kernel/torture.c                              |   4 +-
 .../rcutorture/bin/kvm-find-errors.sh         |   4 +-
 .../rcutorture/bin/kvm-recheck-rcu.sh         |   2 +-
 .../selftests/rcutorture/bin/kvm-remote.sh    |  23 +-
 tools/testing/selftests/rcutorture/bin/kvm.sh |  11 +-
 .../selftests/rcutorture/bin/parse-build.sh   |   3 +-
 .../selftests/rcutorture/bin/torture.sh       |   9 +-
 .../selftests/rcutorture/configs/rcu/SRCU-T   |   1 +
 .../selftests/rcutorture/configs/rcu/SRCU-U   |   1 +
 .../rcutorture/configs/rcu/TASKS01.boot       |   1 +
 .../selftests/rcutorture/configs/rcu/TINY01   |   1 +
 .../selftests/rcutorture/configs/rcu/TINY02   |   1 +
 .../rcutorture/configs/rcu/TRACE01.boot       |   1 +
 .../rcutorture/configs/rcu/TRACE02.boot       |   1 +
 .../rcutorture/configs/rcu/TREE02.boot        |   1 +
 .../rcutorture/configs/rcu/TREE10.boot        |   1 +
 .../rcutorture/configs/rcuscale/TINY          |   1 +
 57 files changed, 1360 insertions(+), 637 deletions(-)
 create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/TREE02.boot
 create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/TREE10.boot

Approved-by: Prarit Bhargava <prarit@redhat.com>
Approved-by: Wander Lairson Costa <wander@redhat.com>
Approved-by: Phil Auld <pauld@redhat.com>

Signed-off-by: Patrick Talbert <ptalbert@redhat.com>
2022-04-19 12:23:21 +02:00
Waiman Long bcf6cd7df4 rcu: Avoid alloc_pages() when recording stack
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994

commit 300c0c5e721834f484b03fa3062602dd8ff48413
Author: Jun Miao <jun.miao@intel.com>
Date:   Tue, 16 Nov 2021 07:23:02 +0800

    rcu: Avoid alloc_pages() when recording stack

    The default kasan_record_aux_stack() calls stack_depot_save() with GFP_NOWAIT,
    which in turn can then call alloc_pages(GFP_NOWAIT, ...).  In general, however,
    it is not even possible to use either GFP_ATOMIC nor GFP_NOWAIT in certain
    non-preemptive contexts/RT kernel including raw_spin_locks (see gfp.h and ab00db216c).
    Fix it by instructing stackdepot to not expand stack storage via alloc_pages()
    in case it runs out by using kasan_record_aux_stack_noalloc().

    Jianwei Hu reported:
    BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:969
    in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 15319, name: python3
    INFO: lockdep is turned off.
    irq event stamp: 0
      hardirqs last  enabled at (0): [<0000000000000000>] 0x0
      hardirqs last disabled at (0): [<ffffffff856c8b13>] copy_process+0xaf3/0x2590
      softirqs last  enabled at (0): [<ffffffff856c8b13>] copy_process+0xaf3/0x2590
      softirqs last disabled at (0): [<0000000000000000>] 0x0
      CPU: 6 PID: 15319 Comm: python3 Tainted: G        W  O 5.15-rc7-preempt-rt #1
      Hardware name: Supermicro SYS-E300-9A-8C/A2SDi-8C-HLN4F, BIOS 1.1b 12/17/2018
      Call Trace:
        show_stack+0x52/0x58
        dump_stack+0xa1/0xd6
        ___might_sleep.cold+0x11c/0x12d
        rt_spin_lock+0x3f/0xc0
        rmqueue+0x100/0x1460
        rmqueue+0x100/0x1460
        mark_usage+0x1a0/0x1a0
        ftrace_graph_ret_addr+0x2a/0xb0
        rmqueue_pcplist.constprop.0+0x6a0/0x6a0
         __kasan_check_read+0x11/0x20
         __zone_watermark_ok+0x114/0x270
         get_page_from_freelist+0x148/0x630
         is_module_text_address+0x32/0xa0
         __alloc_pages_nodemask+0x2f6/0x790
         __alloc_pages_slowpath.constprop.0+0x12d0/0x12d0
         create_prof_cpu_mask+0x30/0x30
         alloc_pages_current+0xb1/0x150
         stack_depot_save+0x39f/0x490
         kasan_save_stack+0x42/0x50
         kasan_save_stack+0x23/0x50
         kasan_record_aux_stack+0xa9/0xc0
         __call_rcu+0xff/0x9c0
         call_rcu+0xe/0x10
         put_object+0x53/0x70
         __delete_object+0x7b/0x90
         kmemleak_free+0x46/0x70
         slab_free_freelist_hook+0xb4/0x160
         kfree+0xe5/0x420
         kfree_const+0x17/0x30
         kobject_cleanup+0xaa/0x230
         kobject_put+0x76/0x90
         netdev_queue_update_kobjects+0x17d/0x1f0
         ... ...
         ksys_write+0xd9/0x180
         __x64_sys_write+0x42/0x50
         do_syscall_64+0x38/0x50
         entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Links: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/kasan.h?id=7cb3007ce2da27ec02a1a3211941e7fe6875b642
    Fixes: 84109ab585 ("rcu: Record kvfree_call_rcu() call stack for KASAN")
    Fixes: 26e760c9a7 ("rcu: kasan: record and print call_rcu() call stack")
    Reported-by: Jianwei Hu <jianwei.hu@windriver.com>
    Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
    Acked-by: Marco Elver <elver@google.com>
    Tested-by: Juri Lelli <juri.lelli@redhat.com>
    Signed-off-by: Jun Miao <jun.miao@intel.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-03-24 17:16:18 -04:00
Waiman Long dee4fbd239 rcu/nocb: Don't invoke local rcu core on callback overload from nocb kthread
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994

commit 0598a4d4429c0a952ac0e99e5280354cf4ccc01c
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Tue, 19 Oct 2021 02:08:16 +0200

    rcu/nocb: Don't invoke local rcu core on callback overload from nocb kthread

    rcu_core() tries to ensure that its self-invocation in case of callbacks
    overload only happen in softirq/rcuc mode. Indeed it doesn't make sense
    to trigger local RCU core from nocb_cb kthread since it can execute
    on a CPU different from the target rdp. Also in case of overload, the
    nocb_cb kthread simply iterates a new loop of callbacks processing.

    However the "offloaded" check that aims at preventing misplaced
    rcu_core() invocations is wrong. First of all that state is volatile
    and second: softirq/rcuc can execute while the target rdp is offloaded.
    As a result rcu_core() can be invoked on the wrong CPU while in the
    process of (de-)offloading.

    Fix that with moving the rcu_core() self-invocation to rcu_core() itself,
    irrespective of the rdp offloaded state.

    Tested-by: Valentin Schneider <valentin.schneider@arm.com>
    Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Valentin Schneider <valentin.schneider@arm.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Cc: Josh Triplett <josh@joshtriplett.org>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
    Cc: Uladzislau Rezki <urezki@gmail.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-03-24 17:16:04 -04:00
Waiman Long 27dd5723e4 rcu: Apply callbacks processing time limit only on softirq
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994

commit a554ba288845fd3f6f12311fd76a51694233458a
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Tue, 19 Oct 2021 02:08:15 +0200

    rcu: Apply callbacks processing time limit only on softirq

    Time limit only makes sense when callbacks are serviced in softirq mode
    because:

    _ In case we need to get back to the scheduler,
      cond_resched_tasks_rcu_qs() is called after each callback.

    _ In case some other softirq vector needs the CPU, the call to
      local_bh_enable() before cond_resched_tasks_rcu_qs() takes care about
      them via a call to do_softirq().

    Therefore, make sure the time limit only applies to softirq mode.

    Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
    Tested-by: Valentin Schneider <valentin.schneider@arm.com>
    Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Valentin Schneider <valentin.schneider@arm.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Cc: Josh Triplett <josh@joshtriplett.org>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
    Cc: Uladzislau Rezki <urezki@gmail.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-03-24 17:16:03 -04:00
Waiman Long fb8f304925 rcu: Fix callbacks processing time limit retaining cond_resched()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994

commit 3e61e95e2d095e308616cba4ffb640f95a480e01
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Tue, 19 Oct 2021 02:08:14 +0200

    rcu: Fix callbacks processing time limit retaining cond_resched()

    The callbacks processing time limit makes sure we are not exceeding a
    given amount of time executing the queue.

    However its "continue" clause bypasses the cond_resched() call on
    rcuc and NOCB kthreads, delaying it until we reach the limit, which can
    be very long...

    Make sure the scheduler has a higher priority than the time limit.

    Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
    Tested-by: Valentin Schneider <valentin.schneider@arm.com>
    Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Valentin Schneider <valentin.schneider@arm.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Cc: Josh Triplett <josh@joshtriplett.org>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
    Cc: Uladzislau Rezki <urezki@gmail.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-03-24 17:16:03 -04:00
Waiman Long 4c81879303 rcu/nocb: Limit number of softirq callbacks only on softirq
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994

commit 78ad37a2c50dfdb9a60e42bb9ee1da86d1fe770c
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Tue, 19 Oct 2021 02:08:13 +0200

    rcu/nocb: Limit number of softirq callbacks only on softirq

    The current condition to limit the number of callbacks executed in a
    row checks the offloaded state of the rdp. Not only is it volatile
    but it is also misleading: the rcu_core() may well be executing
    callbacks concurrently with NOCB kthreads, and the offloaded state
    would then be verified on both cases. As a result the limit would
    spuriously not apply anymore on softirq while in the middle of
    (de-)offloading process.

    Fix and clarify the condition with those constraints in mind:

    _ If callbacks are processed either by rcuc or NOCB kthread, the call
      to cond_resched_tasks_rcu_qs() is enough to take care of the overload.

    _ If instead callbacks are processed by softirqs:
      * If need_resched(), exit the callbacks processing
      * Otherwise if CPU is idle we can continue
      * Otherwise exit because a softirq shouldn't interrupt a task for too
        long nor deprive other pending softirq vectors of the CPU.

    Tested-by: Valentin Schneider <valentin.schneider@arm.com>
    Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Valentin Schneider <valentin.schneider@arm.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Cc: Josh Triplett <josh@joshtriplett.org>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
    Cc: Uladzislau Rezki <urezki@gmail.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-03-24 17:16:03 -04:00
Waiman Long 548b78a98f rcu/nocb: Use appropriate rcu_nocb_lock_irqsave()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994

commit 7b65dfa32dca1be0400d43a3d5bb80ed6e04958e
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Tue, 19 Oct 2021 02:08:12 +0200

    rcu/nocb: Use appropriate rcu_nocb_lock_irqsave()

    Instead of hardcoding IRQ save and nocb lock, use the consolidated
    API (and fix a comment as per Valentin Schneider's suggestion).

    Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
    Tested-by: Valentin Schneider <valentin.schneider@arm.com>
    Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Valentin Schneider <valentin.schneider@arm.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Cc: Josh Triplett <josh@joshtriplett.org>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
    Cc: Uladzislau Rezki <urezki@gmail.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-03-24 17:16:02 -04:00
Waiman Long a91e96b424 rcu/nocb: Check a stable offloaded state to manipulate qlen_last_fqs_check
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994

commit 344e219d7d2b28117daaae5fe8da2e054b53d5a2
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Tue, 19 Oct 2021 02:08:11 +0200

    rcu/nocb: Check a stable offloaded state to manipulate qlen_last_fqs_check

    It's not entirely obvious why rdp->qlen_last_fqs_check is updated before
    processing the queue only on offloaded rdp. There can be different
    effect to that, either in favour of triggering the force quiescent state
    path or not. For example:

    1) If the number of callbacks has decreased since the last
       rdp->qlen_last_fqs_check update (because we recently called
       rcu_do_batch() and we executed below qhimark callbacks) and the number
       of processed callbacks on a subsequent do_batch() arranges for
       exceeding qhimark on non-offloaded but not on offloaded setup, then we
       may spare a later run to the force quiescent state
       slow path on __call_rcu_nocb_wake(), as compared to the non-offloaded
       counterpart scenario.

       Here is such an offloaded scenario instance:

        qhimark = 1000
        rdp->last_qlen_last_fqs_check = 3000
        rcu_segcblist_n_cbs(rdp) = 2000

        rcu_do_batch() {
            if (offloaded)
                rdp->last_qlen_fqs_check = rcu_segcblist_n_cbs(rdp) // 2000
            // run 1000 callback
            rcu_segcblist_n_cbs(rdp) = 1000
            // Not updating rdp->qlen_last_fqs_check
            if (count < rdp->qlen_last_fqs_check - qhimark)
                rdp->qlen_last_fqs_check = count;
        }

        call_rcu() * 1001 {
            __call_rcu_nocb_wake() {
                // not taking the fqs slowpath:
                // rcu_segcblist_n_cbs(rdp) == 2001
                // rdp->qlen_last_fqs_check == 2000
                // qhimark == 1000
                if (len > rdp->qlen_last_fqs_check + qhimark)
                    ...
        }

        In the case of a non-offloaded scenario, rdp->qlen_last_fqs_check
        would be 1000 and the fqs slowpath would have executed.

    2) If the number of callbacks has increased since the last
       rdp->qlen_last_fqs_check update (because we recently queued below
       qhimark callbacks) and the number of callbacks executed in rcu_do_batch()
       doesn't exceed qhimark for either offloaded or non-offloaded setup,
       then it's possible that the offloaded scenario later run the force
       quiescent state slow path on __call_rcu_nocb_wake() while the
       non-offloaded doesn't.

        qhimark = 1000
        rdp->last_qlen_last_fqs_check = 3000
        rcu_segcblist_n_cbs(rdp) = 2000

        rcu_do_batch() {
            if (offloaded)
                rdp->last_qlen_last_fqs_check = rcu_segcblist_n_cbs(rdp) // 2000
            // run 100 callbacks
            // concurrent queued 100
            rcu_segcblist_n_cbs(rdp) = 2000
            // Not updating rdp->qlen_last_fqs_check
            if (count < rdp->qlen_last_fqs_check - qhimark)
                rdp->qlen_last_fqs_check = count;
        }

        call_rcu() * 1001 {
            __call_rcu_nocb_wake() {
                // Taking the fqs slowpath:
                // rcu_segcblist_n_cbs(rdp) == 3001
                // rdp->qlen_last_fqs_check == 2000
                // qhimark == 1000
                if (len > rdp->qlen_last_fqs_check + qhimark)
                    ...
        }

        In the case of a non-offloaded scenario, rdp->qlen_last_fqs_check
        would be 3000 and the fqs slowpath would have executed.

    The reason for updating rdp->qlen_last_fqs_check when invoking callbacks
    for offloaded CPUs is that there is usually no point in waking up either
    the rcuog or rcuoc kthreads while in this state.  After all, both threads
    are prohibited from indefinite sleeps.

    The exception is when some huge number of callbacks are enqueued while
    rcu_do_batch() is in the midst of invoking, in which case interrupting
    the rcuog kthread's timed sleep might get more callbacks set up for the
    next grace period.

    Reported-and-tested-by: Valentin Schneider <valentin.schneider@arm.com>
    Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Original-patch-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Valentin Schneider <valentin.schneider@arm.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Cc: Josh Triplett <josh@joshtriplett.org>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
    Cc: Uladzislau Rezki <urezki@gmail.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-03-24 17:16:02 -04:00
Waiman Long a927dc4641 rcu/nocb: Make rcu_core() callbacks acceleration (de-)offloading safe
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994

commit b3bb02fe5a2b538ae53eda1fe591dd6c81a91ad4
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Tue, 19 Oct 2021 02:08:10 +0200

    rcu/nocb: Make rcu_core() callbacks acceleration (de-)offloading safe

    When callbacks are offloaded, the NOCB kthreads handle the callbacks
    progression on behalf of rcu_core().

    However during the (de-)offloading process, the kthread may not be
    entirely up to the task. As a result some callbacks grace period
    sequence number may remain stale for a while because rcu_core() won't
    take care of them either.

    Fix this with forcing callbacks acceleration from rcu_core() as long
    as the offloading process isn't complete.

    Reported-and-tested-by: Valentin Schneider <valentin.schneider@arm.com>
    Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Valentin Schneider <valentin.schneider@arm.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Cc: Josh Triplett <josh@joshtriplett.org>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
    Cc: Uladzislau Rezki <urezki@gmail.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-03-24 17:16:01 -04:00
Waiman Long 835ea67712 rcu/nocb: Make rcu_core() callbacks acceleration preempt-safe
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994

commit 24ee940d89277602147ce1b8b4fd87b01b9a6660
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue, 19 Oct 2021 02:08:09 +0200

    rcu/nocb: Make rcu_core() callbacks acceleration preempt-safe

    While reporting a quiescent state for a given CPU, rcu_core() takes
    advantage of the freshly loaded grace period sequence number and the
    locked rnp to accelerate the callbacks whose sequence number have been
    assigned a stale value.

    This action is only necessary when the rdp isn't offloaded, otherwise
    the NOCB kthreads already take care of the callbacks progression.

    However the check for the offloaded state is volatile because it is
    performed outside the IRQs disabled section. It's possible for the
    offloading process to preempt rcu_core() at that point on PREEMPT_RT.

    This is dangerous because rcu_core() may end up accelerating callbacks
    concurrently with NOCB kthreads without appropriate locking.

    Fix this with moving the offloaded check inside the rnp locking section.

    Reported-and-tested-by: Valentin Schneider <valentin.schneider@arm.com>
    Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
    Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Cc: Josh Triplett <josh@joshtriplett.org>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
    Cc: Uladzislau Rezki <urezki@gmail.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-03-24 17:16:01 -04:00
Waiman Long 2f862dbd8a rcu/nocb: Invoke rcu_core() at the start of deoffloading
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994

commit fbb94cbd70d41c7511460896dfc7f9ea5da704b3
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Tue, 19 Oct 2021 02:08:08 +0200

    rcu/nocb: Invoke rcu_core() at the start of deoffloading

    On PREEMPT_RT, if rcu_core() is preempted by the de-offloading process,
    some work, such as callbacks acceleration and invocation, may be left
    unattended due to the volatile checks on the offloaded state.

    In the worst case this work is postponed until the next rcu_pending()
    check that can take a jiffy to reach, which can be a problem in case
    of callbacks flooding.

    Solve that with invoking rcu_core() early in the de-offloading process.
    This way any work dismissed by an ongoing rcu_core() call fooled by
    a preempting deoffloading process will be caught up by a nearby future
    recall to rcu_core(), this time fully aware of the de-offloading state.

    Tested-by: Valentin Schneider <valentin.schneider@arm.com>
    Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Valentin Schneider <valentin.schneider@arm.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Cc: Josh Triplett <josh@joshtriplett.org>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
    Cc: Uladzislau Rezki <urezki@gmail.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-03-24 17:16:00 -04:00
Waiman Long 9f38e99279 rcu/nocb: Prepare state machine for a new step
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994

commit 213d56bf33bdda835bac04046f09256a75c5ca8e
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Tue, 19 Oct 2021 02:08:07 +0200

    rcu/nocb: Prepare state machine for a new step

    Currently SEGCBLIST_SOFTIRQ_ONLY is a bit of an exception among the
    segcblist flags because it is an exclusive state that doesn't mix up
    with the other flags. Remove it in favour of:

    _ A flag specifying that rcu_core() needs to perform callbacks execution
      and acceleration

    and

    _ A flag specifying we want the nocb lock to be held in any needed
      circumstances

    This clarifies the code and is more flexible: It allows to have a state
    where rcu_core() runs with locking while offloading hasn't started yet.
    This is a necessary step to prepare for triggering rcu_core() at the
    very beginning of the de-offloading process so that rcu_core() won't
    dismiss work while being preempted by the de-offloading process, at
    least not without a pending subsequent rcu_core() that will quickly
    catch up.

    Reviewed-by: Valentin Schneider <Valentin.Schneider@arm.com>
    Tested-by: Valentin Schneider <valentin.schneider@arm.com>
    Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Cc: Josh Triplett <josh@joshtriplett.org>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
    Cc: Uladzislau Rezki <urezki@gmail.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-03-24 17:16:00 -04:00
Waiman Long aa5e9f7836 rcu: Make idle entry report expedited quiescent states
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994

commit 790da248978a0722d92d1471630c881704f7eb0d
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Wed, 29 Sep 2021 11:09:34 -0700

    rcu: Make idle entry report expedited quiescent states

    In non-preemptible kernels, an unfortunately timed expedited grace period
    can result in the rcu_exp_handler() IPI handler setting the rcu_data
    structure's cpu_no_qs.b.exp field just as the target CPU enters idle.
    There are situations in which this field will not be checked until after
    that CPU exits idle.  The resulting grace-period latency does not qualify
    as "expedited".

    This commit therefore checks this field upon non-preemptible idle entry in
    the rcu_preempt_deferred_qs() function.  It also qualifies the rcu_core()
    preempt_count() check with IS_ENABLED(CONFIG_PREEMPT_COUNT) to prevent
    false-positive quiescent states from count-free kernels.

    Reported-by: Neeraj Upadhyay <neeraju@codeaurora.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-03-24 17:15:59 -04:00
Waiman Long 5dac0f1d20 rcu: in_irq() cleanup
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994

commit 2407a64f8045552203ee5cb9904ce75ce2fceef4
Author: Changbin Du <changbin.du@intel.com>
Date:   Tue, 28 Sep 2021 08:21:28 +0800

    rcu: in_irq() cleanup

    This commit replaces the obsolete and ambiguous macro in_irq() with its
    shiny new in_hardirq() equivalent.

    Signed-off-by: Changbin Du <changbin.du@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-03-24 17:15:57 -04:00
Waiman Long e96de89670 rcu: Make rcutree_dying_cpu() use its "cpu" parameter
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994

commit 4aa846f97c0c0d9740d120f9ac3e2fba1522ac0c
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Thu, 29 Jul 2021 20:30:32 -0700

    rcu: Make rcutree_dying_cpu() use its "cpu" parameter

    The CPU-hotplug functions take a "cpu" parameter, but rcutree_dying_cpu()
    ignores it in favor of this_cpu_ptr().  This works at the moment, but
    it would be better to be consistent.  This might also work better given
    some possible future changes.  This commit therefore uses per_cpu_ptr()
    to avoid ignoring the rcutree_dying_cpu() function's argument.

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-03-24 17:15:43 -04:00
Waiman Long 73817d3ae2 rcu: Simplify rcu_report_dead() call to rcu_report_exp_rdp()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994

commit 768f5d50e6ad88363291f96a2e230442b8d633bc
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Thu, 29 Jul 2021 15:35:21 -0700

    rcu: Simplify rcu_report_dead() call to rcu_report_exp_rdp()

    Currently, rcu_report_dead() disables preemption across its call to
    rcu_report_exp_rdp(), but this is pointless because interrupts are
    already disabled by the caller.  In addition, rcu_report_dead() computes
    the address of the outgoing CPU's rcu_data structure, which is also
    pointless because this address is already present in local variable rdp.
    This commit therefore drops the preemption disabling and passes rdp
    to rcu_report_exp_rdp().

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-03-24 17:15:42 -04:00
Waiman Long 86f4f43dc8 rcu: Move rcu_dynticks_eqs_online() to rcu_cpu_starting()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994

commit 2caebefb00f03b5ba13d44aa6cc3723759b43822
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Wed, 28 Jul 2021 12:38:42 -0700

    rcu: Move rcu_dynticks_eqs_online() to rcu_cpu_starting()

    The purpose of rcu_dynticks_eqs_online() is to adjust the ->dynticks
    counter of an incoming CPU when required.  It is currently invoked
    from rcutree_prepare_cpu(), which runs before the incoming CPU is
    running, and thus on some other CPU.  This makes the per-CPU accesses in
    rcu_dynticks_eqs_online() iffy at best, and it all "works" only because
    the running CPU cannot possibly be in dyntick-idle mode, which means
    that rcu_dynticks_eqs_online() never has any effect.

    It is currently OK for rcu_dynticks_eqs_online() to have no effect, but
    only because the CPU-offline process just happens to leave ->dynticks in
    the correct state.  After all, if ->dynticks were in the wrong state on a
    just-onlined CPU, rcutorture would complain bitterly the next time that
    CPU went idle, at least in kernels built with CONFIG_RCU_EQS_DEBUG=y,
    for example, those built by rcutorture scenario TREE04.  One could
    argue that this means that rcu_dynticks_eqs_online() is unnecessary,
    however, removing it would make the CPU-online process vulnerable to
    slight changes in the CPU-offline process.

    One could also ask why it is safe to move the rcu_dynticks_eqs_online()
    call so late in the CPU-online process.  Indeed, there was a time when it
    would not have been safe, which does much to explain its current location.
    However, the marking of a CPU as online from an RCU perspective has long
    since moved from rcutree_prepare_cpu() to rcu_cpu_starting(), and all
    that is required is that ->dynticks be set correctly by the time that
    the CPU is marked as online from an RCU perspective.  After all, the RCU
    grace-period kthread does not check to see if offline CPUs are also idle.
    (In case you were curious, this is one reason why there is quiescent-state
    reporting as part of the offlining process.)

    This commit therefore moves the call to rcu_dynticks_eqs_online() from
    rcutree_prepare_cpu() to rcu_cpu_starting(), this latter being guaranteed
    to be running on the incoming CPU.  The call to this function must of
    course be placed before this rcu_cpu_starting() announces this CPU's
    presence to RCU.

    Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-03-24 17:15:42 -04:00
Waiman Long 5afb954908 rcu: Comment rcu_gp_init() code waiting for CPU-hotplug operations
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994

commit ebc88ad491362e6a4fae5bfb1c23c06c876f70be
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Mon, 26 Jul 2021 11:57:39 -0700

    rcu: Comment rcu_gp_init() code waiting for CPU-hotplug operations

    Near the beginning of rcu_gp_init() is a per-rcu_node loop that waits
    for CPU-hotplug operations that might have started before the new
    grace period did.  This commit adds a comment explaining that this
    wait does not exclude CPU-hotplug operations.

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-03-24 17:15:40 -04:00
Waiman Long 83435bba4d rcu: Eliminate rcu_implicit_dynticks_qs() local variable ruqp
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994

commit 9424b867a759febc2b67b6777bfa27f0f830d437
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Thu, 22 Jul 2021 16:47:42 -0700

    rcu: Eliminate rcu_implicit_dynticks_qs() local variable ruqp

    The rcu_implicit_dynticks_qs() function's local variable ruqp references
    the ->rcu_urgent_qs field in the rcu_data structure referenced by the
    function parameter rdp, with a rather odd method for computing the
    pointer to this field.  This commit therefore simplifies things and
    saves a couple of lines of code by replacing each instance of ruqp with
    &rdp->need_heavy_qs.

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-03-24 17:15:38 -04:00
Waiman Long 359a49ac3f rcu: Eliminate rcu_implicit_dynticks_qs() local variable rnhqp
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994

commit 88ee23ef1c129e40309f4612f80dd74be4590c03
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Thu, 22 Jul 2021 15:49:05 -0700

    rcu: Eliminate rcu_implicit_dynticks_qs() local variable rnhqp

    The rcu_implicit_dynticks_qs() function's local variable rnhqp references
    the ->rcu_need_heavy_qs field in the rcu_data structure referenced by
    the function parameter rdp, with a rather odd method for computing
    the pointer to this field.  This commit therefore simplifies things
    and saves a few lines of code by replacing each instance of rnhqp with
    &rdp->need_heavy_qs.

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-03-24 17:15:38 -04:00
Desnes A. Nunes do Rosario 1a7d94bf06 tick/rcu: Remove obsolete rcu_needs_cpu() parameters
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2059555
Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=2984539959dbaf4e65e19bf90c2419304a81a985

commit 2984539959dbaf4e65e19bf90c2419304a81a985
Author: Frederic Weisbecker <frederic@kernel.org>
Date: Tue, 8 Feb 2022 17:16:33 +0100

  With the removal of CONFIG_RCU_FAST_NO_HZ, the parameters in
  rcu_needs_cpu() are not necessary anymore. Simply remove them.

  Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
  Cc: Thomas Gleixner <tglx@linutronix.de>
  Cc: Peter Zijlstra <peterz@infradead.org>
  Cc: Paul E. McKenney <paulmck@kernel.org>
  Cc: Paul Menzel <pmenzel@molgen.mpg.de>

Signed-off-by: Desnes A. Nunes do Rosario <drosario@redhat.com>
2022-03-24 14:39:58 -04:00
Desnes A. Nunes do Rosario 3a7d6d5b49 rcu: Move rcu_needs_cpu() to tree.c
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2059555
Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=bc849e9192c75833a85f2e9376a265ab31f8eec7

commit bc849e9192c75833a85f2e9376a265ab31f8eec7
Author: "Paul E. McKenney" <paulmck@kernel.org>
Date: Mon, 27 Sep 2021 14:30:20 -0700

  Now that RCU_FAST_NO_HZ is no more, there is but one implementation of
  the rcu_needs_cpu() function.  This commit therefore moves this function
  from kernel/rcu/tree_plugin.c to kernel/rcu/tree.c.

  Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Desnes A. Nunes do Rosario <drosario@redhat.com>
2022-03-24 14:39:57 -04:00
Desnes A. Nunes do Rosario 9814a162d4 rcu: Remove the RCU_FAST_NO_HZ Kconfig option
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2059555
Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=e2c73a6860bdf54f2c6bf8cddc34ddc91a1343e1

commit e2c73a6860bdf54f2c6bf8cddc34ddc91a1343e1
Author: "Paul E. McKenney" <paulmck@kernel.org>
Date: Mon, 27 Sep 2021 14:18:51 -0700

  All of the uses of CONFIG_RCU_FAST_NO_HZ=y that I have seen involve
  systems with RCU callbacks offloaded.  In this situation, all that this
  Kconfig option does is slow down idle entry/exit with an additional
  allways-taken early exit.  If this is the only use case, then this
  Kconfig option nothing but an attractive nuisance that needs to go away.

  This commit therefore removes the RCU_FAST_NO_HZ Kconfig option.

  Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Desnes A. Nunes do Rosario <drosario@redhat.com>
2022-03-24 14:39:57 -04:00
Daniel Vacek c03b1b8cc6 rcu: Tighten rcu_advance_cbs_nowake() checks
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2026991
Tested: The WARNING and subsequent RCU stall reproduced on my test VM in
        matter of seconds. With this patch the race window is closed and
        the system remains stable.

Upstream Status: rcu/next https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/kernel/rcu/tree.c?h=rcu/next&id=21e034adb9df3581fda926a29b3a11bda38ba93b
                 related discussion https://lore.kernel.org/all/20211118225923.GX641268@paulmck-ThinkPad-P17-Gen-1/

commit 21e034adb9df3581fda926a29b3a11bda38ba93b
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Fri Sep 17 15:04:48 2021 -0700

    rcu: Tighten rcu_advance_cbs_nowake() checks

    Currently, rcu_advance_cbs_nowake() checks that a grace period is in
    progress, however, that grace period could end just after the check.
    This commit rechecks that a grace period is still in progress the lock.
    The grace period cannot end while the current CPU's rcu_node structure's
    ->lock is held, thus avoiding false positives from the WARN_ON_ONCE().

    As Daniel Vacek noted, it is not necessary for the rcu_node structure
    to have a CPU that has not yet passed through its quiescent state.

    Tested-By: Guillaume Morin <guillaume@morinfr.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

(cherry picked from commit 21e034adb9df3581fda926a29b3a11bda38ba93b)
Signed-off-by: Daniel Vacek <neelx@redhat.com>
Signed-off-by: Waiman Long <longman@redhat.com>
2022-02-10 21:06:35 -05:00
Waiman Long ab9278d4c3 rcu: Fix rcu_dynticks_curr_cpu_in_eqs() vs noinstr
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2022806

commit 74aece72f95f399dd29363669dc32a1344c8fab4
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Tue, 28 Sep 2021 10:40:22 +0200

    rcu: Fix rcu_dynticks_curr_cpu_in_eqs() vs noinstr

      vmlinux.o: warning: objtool: rcu_nmi_enter()+0x36: call to __kasan_check_read() leaves .noinstr.text section

    noinstr cannot have atomic_*() functions in because they're explicitly
    annotated, use arch_atomic_*().

    Fixes: 2be57f732889 ("rcu: Weaken ->dynticks accesses and updates")
    Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
    Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2021-11-12 14:23:29 -05:00
Waiman Long a0031691c0 rcu: Replace deprecated CPU-hotplug functions
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2022806

commit d3dd95a8853f1d588e38e9d9d7c8cc2da412cc36
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Tue, 3 Aug 2021 16:16:14 +0200

    rcu: Replace deprecated CPU-hotplug functions

    The functions get_online_cpus() and put_online_cpus() have been
    deprecated during the CPU hotplug rework. They map directly to
    cpus_read_lock() and cpus_read_unlock().

    Replace deprecated CPU-hotplug functions with the official version.
    The behavior remains unchanged.

    Cc: "Paul E. McKenney" <paulmck@kernel.org>
    Cc: Josh Triplett <josh@joshtriplett.org>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Cc: Lai Jiangshan <jiangshanlai@gmail.com>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: rcu@vger.kernel.org
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2021-11-12 14:23:23 -05:00
Waiman Long e7ff93c34d rcu: Mark accesses to rcu_state.n_force_qs
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2022806

commit 2431774f04d1050292054c763070021bade7b151
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Tue, 20 Jul 2021 06:16:27 -0700

    rcu: Mark accesses to rcu_state.n_force_qs

    This commit marks accesses to the rcu_state.n_force_qs.  These data
    races are hard to make happen, but syzkaller was equal to the task.

    Reported-by: syzbot+e08a83a1940ec3846cd5@syzkaller.appspotmail.com
    Acked-by: Marco Elver <elver@google.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2021-11-12 14:23:17 -05:00
Waiman Long 2c89e7e141 rcu: Use per_cpu_ptr to get the pointer of per_cpu variable
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2022806

commit 8211e922de2854130e3633f52cd4fc2d7817ceb0
Author: Liu Song <liu.song11@zte.com.cn>
Date:   Wed, 30 Jun 2021 22:08:02 +0800

    rcu: Use per_cpu_ptr to get the pointer of per_cpu variable

    There are a few remaining locations in kernel/rcu that still use
    "&per_cpu()".  This commit replaces them with "per_cpu_ptr(&)", and does
    not introduce any functional change.

    Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
    Reviewed-by: Neeraj Upadhyay <neeraju@codeaurora.org>
    Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Liu Song <liu.song11@zte.com.cn>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2021-11-12 14:23:07 -05:00
Waiman Long 9299560365 rcu: Remove useless "ret" update in rcu_gp_fqs_loop()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2022806

commit eb880949ef41c98a203c4a033e06e05854d902ef
Author: Liu Song <liu.song11@zte.com.cn>
Date:   Tue, 29 Jun 2021 21:55:51 +0800

    rcu: Remove useless "ret" update in rcu_gp_fqs_loop()

    Within rcu_gp_fqs_loop(), the "ret" local variable is set to the
    return value from swait_event_idle_timeout_exclusive(), but "ret" is
    unconditionally overwritten later in the code.  This commit therefore
    removes this useless assignment.

    Signed-off-by: Liu Song <liu.song11@zte.com.cn>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2021-11-12 14:23:06 -05:00
Waiman Long 93bb801194 rcu: Make rcu_gp_init() and rcu_gp_fqs_loop noinline to conserve stack
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2022806

commit f74126dcbcbffe0d9fc3cb9bbf171b124a6791e5
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Mon, 7 Jun 2021 21:57:02 -0700

    rcu: Make rcu_gp_init() and rcu_gp_fqs_loop noinline to conserve stack

    The kbuild test project found an oversized stack frame in rcu_gp_kthread()
    for some kernel configurations.  This oversizing was due to a very large
    amount of inlining, which is unnecessary due to the fact that this code
    executes infrequently.  This commit therefore marks rcu_gp_init() and
    rcu_gp_fqs_loop noinline_for_stack to conserve stack space.

    Reported-by: kernel test robot <lkp@intel.com>
    Tested-by: Rong Chen <rong.a.chen@intel.com>
    [ paulmck: noinline_for_stack per Nathan Chancellor. ]
    Reviewed-by: Nathan Chancellor <nathan@kernel.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2021-11-12 14:23:01 -05:00
Waiman Long a83389f084 rcu: Weaken ->dynticks accesses and updates
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2022806

commit 2be57f732889277b07ccddd205ef0616c8c1941f
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Wed, 19 May 2021 17:25:42 -0700

    rcu: Weaken ->dynticks accesses and updates

    Accesses to the rcu_data structure's ->dynticks field have always been
    fully ordered because it was not possible to prove that weaker ordering
    was safe.  However, with the removal of the rcu_eqs_special_set() function
    and the advent of the Linux-kernel memory model, it is now easy to show
    that two of the four original full memory barriers can be weakened to
    acquire and release operations.  The remaining pair must remain full
    memory barriers.  This change makes the memory ordering requirements
    more evident, and it might well also speed up the to-idle and from-idle
    fastpaths on some architectures.

    The following litmus test, adapted from one supplied off-list by Frederic
    Weisbecker, models the RCU grace-period kthread detecting an idle CPU
    that is concurrently transitioning to non-idle:

            C dynticks-from-idle

            {
                    DYNTICKS=0; (* Initially idle. *)
            }

            P0(int *X, int *DYNTICKS)
            {
                    int dynticks;
                    int x;

                    // Idle.
                    dynticks = READ_ONCE(*DYNTICKS);
                    smp_store_release(DYNTICKS, dynticks + 1);
                    smp_mb();
                    // Now non-idle
                    x = READ_ONCE(*X);
            }

            P1(int *X, int *DYNTICKS)
            {
                    int dynticks;

                    WRITE_ONCE(*X, 1);
                    smp_mb();
                    dynticks = smp_load_acquire(DYNTICKS);
            }

            exists (1:dynticks=0 /\ 0:x=1)

    Running "herd7 -conf linux-kernel.cfg dynticks-from-idle.litmus" verifies
    this transition, namely, showing that if the RCU grace-period kthread (P1)
    sees another CPU as idle (P0), then any memory access prior to the start
    of the grace period (P1's write to X) will be seen by any RCU read-side
    critical section following the to-non-idle transition (P0's read from X).
    This is a straightforward use of full memory barriers to force ordering
    in a store-buffering (SB) litmus test.

    The following litmus test, also adapted from the one supplied off-list
    by Frederic Weisbecker, models the RCU grace-period kthread detecting
    a non-idle CPU that is concurrently transitioning to idle:

            C dynticks-into-idle

            {
                    DYNTICKS=1; (* Initially non-idle. *)
            }

            P0(int *X, int *DYNTICKS)
            {
                    int dynticks;

                    // Non-idle.
                    WRITE_ONCE(*X, 1);
                    dynticks = READ_ONCE(*DYNTICKS);
                    smp_store_release(DYNTICKS, dynticks + 1);
                    smp_mb();
                    // Now idle.
            }

            P1(int *X, int *DYNTICKS)
            {
                    int x;
                    int dynticks;

                    smp_mb();
                    dynticks = smp_load_acquire(DYNTICKS);
                    x = READ_ONCE(*X);
            }

            exists (1:dynticks=2 /\ 1:x=0)

    Running "herd7 -conf linux-kernel.cfg dynticks-into-idle.litmus" verifies
    this transition, namely, showing that if the RCU grace-period kthread
    (P1) sees another CPU as newly idle (P0), then any pre-idle memory access
    (P0's write to X) will be seen by any code following the grace period
    (P1's read from X).  This is a simple release-acquire pair forcing
    ordering in a message-passing (MP) litmus test.

    Of course, if the grace-period kthread detects the CPU as non-idle,
    it will refrain from reporting a quiescent state on behalf of that CPU,
    so there are no ordering requirements from the grace-period kthread in
    that case.  However, other subsystems call rcu_is_idle_cpu() to check
    for CPUs being non-idle from an RCU perspective.  That case is also
    verified by the above litmus tests with the proviso that the sense of
    the low-order bit of the DYNTICKS counter be inverted.

    Unfortunately, on x86 smp_mb() is as expensive as a cache-local atomic
    increment.  This commit therefore weakens only the read from ->dynticks.
    However, the updates are abstracted into a rcu_dynticks_inc() function
    to ease any future changes that might be needed.

    [ paulmck: Apply Linus Torvalds feedback. ]

    Link: https://lore.kernel.org/lkml/20210721202127.2129660-4-paulmck@kernel.org/
    Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
    Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2021-11-12 14:22:53 -05:00
Waiman Long 76b5d8b925 rcu: Remove special bit at the bottom of the ->dynticks counter
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2022806

commit a86baa69c2b7b85bab41692fa3ec188a5aae1d27
Author: Joel Fernandes (Google) <joel@joelfernandes.org>
Date:   Tue, 18 May 2021 19:17:16 -0700

    rcu: Remove special bit at the bottom of the ->dynticks counter

    Commit b8c17e6664 ("rcu: Maintain special bits at bottom of ->dynticks
    counter") reserved a bit at the bottom of the ->dynticks counter to defer
    flushing of TLBs, but this facility never has been used.  This commit
    therefore removes this capability along with the rcu_eqs_special_set()
    function used to trigger it.

    Link: https://lore.kernel.org/linux-doc/CALCETrWNPOOdTrFabTDd=H7+wc6xJ9rJceg6OL1S0rTV5pfSsA@mail.gmail.com/
    Suggested-by: Andy Lutomirski <luto@kernel.org>
    Signed-off-by: "Joel Fernandes (Google)" <joel@joelfernandes.org>
    [ paulmck: Forward-port to v5.13-rc1. ]
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2021-11-12 14:22:53 -05:00
Waiman Long bc5b8d61a3 rcu/nocb: Remove NOCB deferred wakeup from rcutree_dead_cpu()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2022806

commit cba712beebf32b27fea71241aa3cdd2ab0fc31a3
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Wed, 19 May 2021 02:09:29 +0200

    rcu/nocb: Remove NOCB deferred wakeup from rcutree_dead_cpu()

    At CPU offline time, we must handle any pending wakeup for the nocb_gp
    kthread linked to the outgoing CPU.

    Now we are making sure of that twice:

    1) From rcu_report_dead() when the outgoing CPU makes the very last
       local cleanups by itself before switching offline.

    2) From rcutree_dead_cpu(). Here the offlining CPU has gone and is truly
       now offline. Another CPU takes care of post-portem cleaning up and
       check if the offline CPU had pending wakeup.

    Both ways are fine but we have to choose one or the other because we
    don't need to repeat that action. Simply benefit from cache locality
    and keep only the first solution.

    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2021-11-12 14:22:52 -05:00
Waiman Long 91e2081a69 rcu/nocb: Start moving nocb code to its own plugin file
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2022806

commit dfcb27540213e8061ecffacd4bd8ed54a310a7b0
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Wed, 19 May 2021 02:09:28 +0200

    rcu/nocb: Start moving nocb code to its own plugin file

    The kernel/rcu/tree_plugin.h file contains not only the plugins for
    preemptible RCU, but also many other features including rcu_nocbs
    callback offloading.  This offloading has become large and complex,
    so it is time to put it in its own file.

    This commit starts that process.

    Suggested-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    [ paulmck: Rename to tree_nocb.h, add Frederic as author. ]
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2021-11-12 14:22:52 -05:00
Linus Torvalds 28e92f9903 Merge branch 'core-rcu-2021.07.04' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU updates from Paul McKenney:

 - Bitmap parsing support for "all" as an alias for all bits

 - Documentation updates

 - Miscellaneous fixes, including some that overlap into mm and lockdep

 - kvfree_rcu() updates

 - mem_dump_obj() updates, with acks from one of the slab-allocator
   maintainers

 - RCU NOCB CPU updates, including limited deoffloading

 - SRCU updates

 - Tasks-RCU updates

 - Torture-test updates

* 'core-rcu-2021.07.04' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (78 commits)
  tasks-rcu: Make show_rcu_tasks_gp_kthreads() be static inline
  rcu-tasks: Make ksoftirqd provide RCU Tasks quiescent states
  rcu: Add missing __releases() annotation
  rcu: Remove obsolete rcu_read_unlock() deadlock commentary
  rcu: Improve comments describing RCU read-side critical sections
  rcu: Create an unrcu_pointer() to remove __rcu from a pointer
  srcu: Early test SRCU polling start
  rcu: Fix various typos in comments
  rcu/nocb: Unify timers
  rcu/nocb: Prepare for fine-grained deferred wakeup
  rcu/nocb: Only cancel nocb timer if not polling
  rcu/nocb: Delete bypass_timer upon nocb_gp wakeup
  rcu/nocb: Cancel nocb_timer upon nocb_gp wakeup
  rcu/nocb: Allow de-offloading rdp leader
  rcu/nocb: Directly call __wake_nocb_gp() from bypass timer
  rcu: Don't penalize priority boosting when there is nothing to boost
  rcu: Point to documentation of ordering guarantees
  rcu: Make rcu_gp_cleanup() be noinline for tracing
  rcu: Restrict RCU_STRICT_GRACE_PERIOD to at most four CPUs
  rcu: Make show_rcu_gp_kthreads() dump rcu_node structures blocking GP
  ...
2021-07-04 12:58:33 -07:00
Andy Shevchenko f39650de68 kernel.h: split out panic and oops helpers
kernel.h is being used as a dump for all kinds of stuff for a long time.
Here is the attempt to start cleaning it up by splitting out panic and
oops helpers.

There are several purposes of doing this:
- dropping dependency in bug.h
- dropping a loop by moving out panic_notifier.h
- unload kernel.h from something which has its own domain

At the same time convert users tree-wide to use new headers, although for
the time being include new header back to kernel.h to avoid twisted
indirected includes for existing users.

[akpm@linux-foundation.org: thread_info.h needs limits.h]
[andriy.shevchenko@linux.intel.com: ia64 fix]
  Link: https://lkml.kernel.org/r/20210520130557.55277-1-andriy.shevchenko@linux.intel.com

Link: https://lkml.kernel.org/r/20210511074137.33666-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Co-developed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Sebastian Reichel <sre@kernel.org>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Stephen Boyd <sboyd@kernel.org>
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Acked-by: Helge Deller <deller@gmx.de> # parisc
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-01 11:06:04 -07:00
Paul E. McKenney 641faf1b90 Merge branches 'bitmaprange.2021.05.10c', 'doc.2021.05.10c', 'fixes.2021.05.13a', 'kvfree_rcu.2021.05.10c', 'mmdumpobj.2021.05.10c', 'nocb.2021.05.12a', 'srcu.2021.05.12a', 'tasks.2021.05.18a' and 'torture.2021.05.10c' into HEAD
bitmaprange.2021.05.10c: Allow "all" for bitmap ranges.
doc.2021.05.10c: Documentation updates.
fixes.2021.05.13a: Miscellaneous fixes.
kvfree_rcu.2021.05.10c: kvfree_rcu() updates.
mmdumpobj.2021.05.10c: mem_dump_obj() updates.
nocb.2021.05.12a: RCU NOCB CPU updates, including limited deoffloading.
srcu.2021.05.12a: SRCU updates.
tasks.2021.05.18a: Tasks-RCU updates.
torture.2021.05.10c: Torture-test updates.
2021-05-18 10:56:19 -07:00
Paul E. McKenney cf868c2af2 rcu-tasks: Make ksoftirqd provide RCU Tasks quiescent states
Heavy networking load can cause a CPU to execute continuously and
indefinitely within ksoftirqd, in which case there will be no voluntary
task switches and thus no RCU-tasks quiescent states.  This commit
therefore causes the exiting rcu_softirq_qs() to provide an RCU-tasks
quiescent state.

This of course means that __do_softirq() and its callers cannot be
invoked from within a tracing trampoline.

Reported-by: Toke Høiland-Jørgensen <toke@redhat.com>
Tested-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
2021-05-18 10:54:51 -07:00
Paul E. McKenney 1893afd634 rcu: Improve comments describing RCU read-side critical sections
There are a number of places that call out the fact that preempt-disable
regions of code now act as RCU read-side critical sections, where
preempt-disable regions of code include irq-disable regions of code,
bh-disable regions of code, hardirq handlers, and NMI handlers.  However,
someone relying solely on (for example) the call_rcu() header comment
might well have no idea that preempt-disable regions of code have RCU
semantics.

This commit therefore updates the header comments for
call_rcu(), synchronize_rcu(), rcu_dereference_bh_check(), and
rcu_dereference_sched_check() to call out these new(ish) forms of RCU
readers.

Reported-by: Michel Lespinasse <michel@lespinasse.org>
[ paulmck: Apply Matthew Wilcox and Michel Lespinasse feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-13 09:13:23 -07:00
Ingo Molnar a616aec9aa rcu: Fix various typos in comments
Fix ~12 single-word typos in RCU code comments.

[ paulmck: Apply feedback from Randy Dunlap. ]
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-12 12:11:05 -07:00
Frederic Weisbecker 870905169d rcu/nocb: Prepare for fine-grained deferred wakeup
Tuning the deferred wakeup level must be done from a safe wakeup
point. Currently those sites are:

* ->nocb_timer
* user/idle/guest entry
* CPU down
* softirq/rcuc

All of these sites perform the wake up for both RCU_NOCB_WAKE and
RCU_NOCB_WAKE_FORCE.

In order to merge ->nocb_timer and ->nocb_bypass_timer together, we plan
to add a new RCU_NOCB_WAKE_BYPASS that really should be deferred until
a timer fires so that we don't wake up the NOCB-gp kthread too early.

To prepare for that, this commit specifies the per-callsite wakeup
level/limit.

Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
[ paulmck: Fix non-NOCB rcu_nocb_need_deferred_wakeup() definition. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-12 12:10:23 -07:00
Paul E. McKenney 3d3a0d1b50 rcu: Point to documentation of ordering guarantees
Add comments to synchronize_rcu() and friends that point to
Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10 16:22:54 -07:00
Paul E. McKenney 2f20de99a6 rcu: Make rcu_gp_cleanup() be noinline for tracing
Although there are trace events for RCU grace periods, these are only
enabled in CONFIG_RCU_TRACE=y kernels.  This commit therefore marks
rcu_gp_cleanup() noinline in order to provide a function that can be
traced that is invoked near the end of each grace period.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10 16:22:54 -07:00
Paul E. McKenney 3ef5a1c382 rcu: Make RCU priority boosting work on single-CPU rcu_node structures
When any CPU comes online, it checks to see if an RCU-boost kthread has
already been created for that CPU's leaf rcu_node structure, and if
not, it creates one.  Unfortunately, it also verifies that this leaf
rcu_node structure actually has at least one online CPU, and if not,
it declines to create the kthread.  Although this behavior makes sense
during early boot, especially on systems that claim far more CPUs than
they actually have, it makes no sense for the first CPU to come online
for a given rcu_node structure.  There is no point in checking because
we know there is a CPU on its way in.

The problem is that timing differences can cause this incoming CPU to not
yet be reflected in the various bit masks even at rcutree_online_cpu()
time, and there is no chance at rcutree_prepare_cpu() time.  Plus it
would be better to create the RCU-boost kthread at rcutree_prepare_cpu()
to handle the case where the CPU is involved in an RCU priority inversion
very shortly after it comes online.

This commit therefore moves the checking to rcu_prepare_kthreads(), which
is called only at early boot, when the check is appropriate.  In addition,
it makes rcutree_prepare_cpu() invoke rcu_spawn_one_boost_kthread(), which
no longer does any checking for online CPUs.

With this change, RCU priority boosting tests now pass for short rcutorture
runs, even with single-CPU leaf rcu_node structures.

Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Scott Wood <swood@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10 16:22:54 -07:00
Paul E. McKenney 8e4b1d2bc1 rcu: Invoke rcu_spawn_core_kthreads() from rcu_spawn_gp_kthread()
Currently, rcu_spawn_core_kthreads() is invoked via an early_initcall(),
which works, except that rcu_spawn_gp_kthread() is also invoked via an
early_initcall() and rcu_spawn_core_kthreads() relies on adjustments to
kthread_prio that are carried out by rcu_spawn_gp_kthread().  There is
no guaranttee of ordering among early_initcall() handlers, and thus no
guarantee that kthread_prio will be properly checked and range-limited
at the time that rcu_spawn_core_kthreads() needs it.

In most cases, this bug is harmless.  After all, the only reason that
rcu_spawn_gp_kthread() adjusts the value of kthread_prio is if the user
specified a nonsensical value for this boot parameter, which experience
indicates is rare.

Nevertheless, a bug is a bug.  This commit therefore causes the
rcu_spawn_core_kthreads() function to be invoked directly from
rcu_spawn_gp_kthread() after any needed adjustments to kthread_prio have
been carried out.

Fixes: 48d07c04b4 ("rcu: Enable elimination of Tree-RCU softirq processing")
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10 16:22:54 -07:00
Zhouyi Zhou 277ffe1b70 rcu: Improve tree.c comments and add code cleanups
This commit cleans up some comments and code in kernel/rcu/tree.c.

Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10 16:22:53 -07:00
Paul E. McKenney ce7c169dee rcu: Remove the unused rcu_irq_exit_preempt() function
Commit 9ee01e0f69 ("x86/entry: Clean up idtentry_enter/exit()
leftovers") left the rcu_irq_exit_preempt() in place in order to avoid
conflicts with the -rcu tree.  Now that this change has long since hit
mainline, this commit removes the no-longer-used rcu_irq_exit_preempt()
function.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10 16:22:53 -07:00
Frederic Weisbecker b5befe842e srcu: Fix broken node geometry after early ssp init
An srcu_struct structure that is initialized before rcu_init_geometry()
will have its srcu_node hierarchy based on CONFIG_NR_CPUS.  Once
rcu_init_geometry() is called, this hierarchy is compressed as needed
for the actual maximum number of CPUs for this system.

Later on, that srcu_struct structure is confused, sometimes referring
to its initial CONFIG_NR_CPUS-based hierarchy, and sometimes instead
to the new num_possible_cpus() hierarchy.  For example, each of its
->mynode fields continues to reference the original leaf rcu_node
structures, some of which might no longer exist.  On the other hand,
srcu_for_each_node_breadth_first() traverses to the new node hierarchy.

There are at least two bad possible outcomes to this:

1) a) A callback enqueued early on an srcu_data structure (call it
      *sdp) is recorded pending on sdp->mynode->srcu_data_have_cbs in
      srcu_funnel_gp_start() with sdp->mynode pointing to a deep leaf
      (say 3 levels).

   b) The grace period ends after rcu_init_geometry() shrinks the
      nodes level to a single one.  srcu_gp_end() walks through the new
      srcu_node hierarchy without ever reaching the old leaves so the
      callback is never executed.

   This is easily reproduced on an 8 CPUs machine with CONFIG_NR_CPUS >= 32
   and "rcupdate.rcu_self_test=1". The srcu_barrier() after early tests
   verification never completes and the boot hangs:

	[ 5413.141029] INFO: task swapper/0:1 blocked for more than 4915 seconds.
	[ 5413.147564]       Not tainted 5.12.0-rc4+ #28
	[ 5413.151927] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
	[ 5413.159753] task:swapper/0       state:D stack:    0 pid:    1 ppid:     0 flags:0x00004000
	[ 5413.168099] Call Trace:
	[ 5413.170555]  __schedule+0x36c/0x930
	[ 5413.174057]  ? wait_for_completion+0x88/0x110
	[ 5413.178423]  schedule+0x46/0xf0
	[ 5413.181575]  schedule_timeout+0x284/0x380
	[ 5413.185591]  ? wait_for_completion+0x88/0x110
	[ 5413.189957]  ? mark_held_locks+0x61/0x80
	[ 5413.193882]  ? mark_held_locks+0x61/0x80
	[ 5413.197809]  ? _raw_spin_unlock_irq+0x24/0x50
	[ 5413.202173]  ? wait_for_completion+0x88/0x110
	[ 5413.206535]  wait_for_completion+0xb4/0x110
	[ 5413.210724]  ? srcu_torture_stats_print+0x110/0x110
	[ 5413.215610]  srcu_barrier+0x187/0x200
	[ 5413.219277]  ? rcu_tasks_verify_self_tests+0x50/0x50
	[ 5413.224244]  ? rdinit_setup+0x2b/0x2b
	[ 5413.227907]  rcu_verify_early_boot_tests+0x2d/0x40
	[ 5413.232700]  do_one_initcall+0x63/0x310
	[ 5413.236541]  ? rdinit_setup+0x2b/0x2b
	[ 5413.240207]  ? rcu_read_lock_sched_held+0x52/0x80
	[ 5413.244912]  kernel_init_freeable+0x253/0x28f
	[ 5413.249273]  ? rest_init+0x250/0x250
	[ 5413.252846]  kernel_init+0xa/0x110
	[ 5413.256257]  ret_from_fork+0x22/0x30

2) An srcu_struct structure that is initialized before rcu_init_geometry()
   and used afterward will always have stale rdp->mynode references,
   resulting in callbacks to be missed in srcu_gp_end(), just like in
   the previous scenario.

This commit therefore causes init_srcu_struct_nodes to initialize the
geometry, if needed.  This ensures that the srcu_node hierarchy is
properly built and distributed from the get-go.

Suggested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10 16:03:35 -07:00
Frederic Weisbecker 8e9c01c717 srcu: Initialize SRCU after timers
Once srcu_init() is called, the SRCU core will make use of delayed
workqueues, which rely on timers.  However init_timers() is called
several steps after rcu_init().  This means that a call_srcu() after
rcu_init() but before init_timers() would find itself within a dangerously
uninitialized timer core.

This commit therefore creates a separate call to srcu_init() after
init_timer() completes, which ensures that we stay in early SRCU mode
until timers are safe(r).

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10 16:03:35 -07:00
Uladzislau Rezki (Sony) a78d4a2a10 kvfree_rcu: Refactor kfree_rcu_monitor()
Currently we have three functions which depend on each other.
Two of them are quite tiny and the last one where the most
work is done. All of them are related to queuing RCU batches
to reclaim objects after a GP.

1. kfree_rcu_monitor(). It consist of few lines. It acquires a spin-lock
   and calls kfree_rcu_drain_unlock().

2. kfree_rcu_drain_unlock(). It also consists of few lines of code. It
   calls queue_kfree_rcu_work() to queue the batch.  If this fails,
   it rearms the monitor work to try again later.

3. queue_kfree_rcu_work(). This provides the bulk of the functionality,
   attempting to start a new batch to free objects after a GP.

Since there are no external users of functions [2] and [3], both
can eliminated by moving all logic directly into [1], which both
shrinks and simplifies the code.

Also replace comments which start with "/*" to "//" format to make it
unified across the file.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10 16:00:48 -07:00
Uladzislau Rezki (Sony) d8628f35ba kvfree_rcu: Fix comments according to current code
The kvfree_rcu() function now defers allocations in the common
case due to the fact that there is no lockless access to the
memory-allocator caches/pools.  In addition, in CONFIG_PREEMPT_NONE=y
and in CONFIG_PREEMPT_VOLUNTARY=y kernels, there is no reliable way to
determine if spinlocks are held.  As a result, allocation is deferred in
the common case, and the two-argument form of kvfree_rcu() thus uses the
"channel 3" queue through all the rcu_head structures.  This channel
is called referred to as the emergency case in comments, and these
comments are now obsolete.

This commit therefore updates these comments to reflect the new
common-case nature of such emergencies.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10 16:00:48 -07:00
Uladzislau Rezki (Sony) 7fe1da33f6 kvfree_rcu: Use kfree_rcu_monitor() instead of open-coded variant
Replace an open-coded version of the kfree_rcu_monitor() function body
with a call to that function.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10 16:00:48 -07:00
Uladzislau Rezki (Sony) dd28c9f057 kvfree_rcu: Update "monitor_todo" once a batch is started
Before attempting to start a new batch the "monitor_todo" variable is
set to "false" and set back to "true" when a previous RCU batch is still
in progress.  This is at best confusing.

Thus change this variable to "false" only when a new batch has been
successfully queued, otherwise, just leave it be.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10 16:00:48 -07:00
Uladzislau Rezki (Sony) d434c00fa3 kvfree_rcu: Add a bulk-list check when a scheduler is run
The rcu_scheduler_active flag is set to RCU_SCHEDULER_RUNNING once the
scheduler is up and running.  That signal is used in order to check
and queue a "monitor work" to reclaim freed objects (if there are any)
during early boot.  This flag is used by kvfree_rcu() to determine when
work can safely be queued, at which point memory passed to earlier
invocations of kvfree_rcu() can be processed.

However, only "krcp->head" is checked for objects that need to be
released, and there are now two more, namely, "krcp->bkvhead[0]" and
"krcp->bkvhead[1]".  Therefore, check these two additional channels.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10 16:00:48 -07:00
Uladzislau Rezki (Sony) ac7625ebd5 kvfree_rcu: Use [READ/WRITE]_ONCE() macros to access to nr_bkv_objs
nr_bkv_objs is a count of the objects in the kvfree_rcu page cache.
Accessing it requires holding the ->lock.  Switch to READ_ONCE() and
WRITE_ONCE() macros to provide lockless access to this counter.
This lockless access is used for the shrinker.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10 16:00:48 -07:00
Zhang Qiang d0bfa8b3c4 kvfree_rcu: Release a page cache under memory pressure
Add a drain_page_cache() function to drain a per-cpu page cache.
The reason behind of it is a system can run into a low memory
condition, in that case a page shrinker can ask for its users
to free their caches in order to get extra memory available for
other needs in a system.

When a system hits such condition, a page cache is drained for
all CPUs in a system. By default a page cache work is delayed
with 5 seconds interval until a memory pressure disappears, if
needed it can be changed. See a rcu_delay_page_cache_fill_msec
module parameter.

Co-developed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Zqiang <qiang.zhang@windriver.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10 16:00:48 -07:00
Paul E. McKenney ab6ad3dbdd Merge branches 'bitmaprange.2021.03.08a', 'fixes.2021.03.15a', 'kvfree_rcu.2021.03.08a', 'mmdumpobj.2021.03.08a', 'nocb.2021.03.15a', 'poll.2021.03.24a', 'rt.2021.03.08a', 'tasks.2021.03.08a', 'torture.2021.03.08a' and 'torturescript.2021.03.22a' into HEAD
bitmaprange.2021.03.08a:  Allow 3-N for bitmap ranges.
fixes.2021.03.15a:  Miscellaneous fixes.
kvfree_rcu.2021.03.08a:  kvfree_rcu() updates.
mmdumpobj.2021.03.08a:  mem_dump_obj() updates.
nocb.2021.03.15a:  RCU NOCB CPU updates, including limited deoffloading.
poll.2021.03.24a:  Polling grace-period interfaces for RCU.
rt.2021.03.08a:  Realtime-related RCU changes.
tasks.2021.03.08a:  Tasks-RCU updates.
torture.2021.03.08a:  Torture-test updates.
torturescript.2021.03.22a:  Torture-test scripting updates.
2021-03-24 17:20:18 -07:00
Paul E. McKenney 7abb18bd75 rcu: Provide polling interfaces for Tree RCU grace periods
There is a need for a non-blocking polling interface for RCU grace
periods, so this commit supplies start_poll_synchronize_rcu() and
poll_state_synchronize_rcu() for this purpose.  Note that the existing
get_state_synchronize_rcu() may be used if future grace periods are
inevitable (perhaps due to a later call_rcu() invocation).  The new
start_poll_synchronize_rcu() is to be used if future grace periods
might not otherwise happen.  Finally, poll_state_synchronize_rcu()
provides a lockless check for a grace period having elapsed since
the corresponding call to either of the get_state_synchronize_rcu()
or start_poll_synchronize_rcu().

As with get_state_synchronize_rcu(), the return value from either
get_state_synchronize_rcu() or start_poll_synchronize_rcu() is passed in
to a later call to either poll_state_synchronize_rcu() or the existing
(might_sleep) cond_synchronize_rcu().

[ paulmck: Remove redundant smp_mb() per Frederic Weisbecker feedback. ]
[ Update poll_state_synchronize_rcu() docbook per Frederic Weisbecker feedback. ]
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-22 08:23:48 -07:00
Frederic Weisbecker ec711bc12c rcu/nocb: Only (re-)initialize segcblist when needed on CPU up
At the start of a CPU-hotplug operation, the incoming CPU's callback
list can be in a number of states:

1.	Disabled and empty.  This is the case when the boot CPU has
	not invoked call_rcu(), when a non-boot CPU first comes online,
	and when a non-offloaded CPU comes back online.  In this case,
	it is both necessary and permissible to initialize ->cblist.
	Because either the CPU is currently running with interrupts
	disabled (boot CPU) or is not yet running at all (other CPUs),
	it is not necessary to acquire ->nocb_lock.

	In this case, initialization is required.

2.	Disabled and non-empty.  This cannot occur, because early boot
	call_rcu() invocations enable the callback list before enqueuing
	their callback.

3.	Enabled, whether empty or not.	In this case, the callback
	list has already been initialized.  This case occurs when the
	boot CPU has executed an early boot call_rcu() and also when
	an offloaded CPU comes back online.  In both cases, there is
	no need to initialize the callback list: In the boot-CPU case,
	the CPU has not (yet) gone offline, and in the offloaded case,
	the rcuo kthreads are taking care of business.

	Because it is not necessary to initialize the callback list,
	it is also not necessary to acquire ->nocb_lock.

Therefore, checking if the segcblist is enabled suffices.  This commit
therefore initializes the callback list at rcutree_prepare_cpu() time
only if that list is disabled.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-08 14:20:22 -08:00
Frederic Weisbecker 64305db285 rcu/nocb: Forbid NOCB toggling on offline CPUs
It makes no sense to de-offload an offline CPU because that CPU will never
invoke any remaining callbacks.  It also makes little sense to offload an
offline CPU because any pending RCU callbacks were migrated when that CPU
went offline.  Yes, it is in theory possible to use a number of tricks
to permit offloading and deoffloading offline CPUs in certain cases, but
in practice it is far better to have the simple and deterministic rule
"Toggling the offload state of an offline CPU is forbidden".

For but one example, consider that an offloaded offline CPU might have
millions of callbacks queued.  Best to just say "no".

This commit therefore forbids toggling of the offloaded state of
offline CPUs.

Reported-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-08 14:20:21 -08:00
Frederic Weisbecker 3820b513a2 rcu/nocb: Detect unsafe checks for offloaded rdp
Provide CONFIG_PROVE_RCU sanity checks to ensure we are always reading
the offloaded state of an rdp in a safe and stable way and prevent from
its value to be changed under us. We must either hold the barrier mutex,
the cpu-hotplug lock (read or write) or the nocb lock.
Local non-preemptible reads are also safe. NOCB kthreads and timers have
their own means of synchronization against the offloaded state updaters.

Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-08 14:20:20 -08:00
Uladzislau Rezki (Sony) ee6ddf5847 kvfree_rcu: Use same set of GFP flags as does single-argument
Running an rcuscale stress-suite can lead to "Out of memory" of a
system. This can happen under high memory pressure with a small amount
of physical memory.

For example, a KVM test configuration with 64 CPUs and 512 megabytes
can result in OOM when running rcuscale with below parameters:

../kvm.sh --torture rcuscale --allcpus --duration 10 --kconfig CONFIG_NR_CPUS=64 \
--bootargs "rcuscale.kfree_rcu_test=1 rcuscale.kfree_nthreads=16 rcuscale.holdoff=20 \
  rcuscale.kfree_loops=10000 torture.disable_onoff_at_boot" --trust-make

<snip>
[   12.054448] kworker/1:1H invoked oom-killer: gfp_mask=0x2cc0(GFP_KERNEL|__GFP_NOWARN), order=0, oom_score_adj=0
[   12.055303] CPU: 1 PID: 377 Comm: kworker/1:1H Not tainted 5.11.0-rc3+ #510
[   12.055416] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-1 04/01/2014
[   12.056485] Workqueue: events_highpri fill_page_cache_func
[   12.056485] Call Trace:
[   12.056485]  dump_stack+0x57/0x6a
[   12.056485]  dump_header+0x4c/0x30a
[   12.056485]  ? del_timer_sync+0x20/0x30
[   12.056485]  out_of_memory.cold.47+0xa/0x7e
[   12.056485]  __alloc_pages_slowpath.constprop.123+0x82f/0xc00
[   12.056485]  __alloc_pages_nodemask+0x289/0x2c0
[   12.056485]  __get_free_pages+0x8/0x30
[   12.056485]  fill_page_cache_func+0x39/0xb0
[   12.056485]  process_one_work+0x1ed/0x3b0
[   12.056485]  ? process_one_work+0x3b0/0x3b0
[   12.060485]  worker_thread+0x28/0x3c0
[   12.060485]  ? process_one_work+0x3b0/0x3b0
[   12.060485]  kthread+0x138/0x160
[   12.060485]  ? kthread_park+0x80/0x80
[   12.060485]  ret_from_fork+0x22/0x30
[   12.062156] Mem-Info:
[   12.062350] active_anon:0 inactive_anon:0 isolated_anon:0
[   12.062350]  active_file:0 inactive_file:0 isolated_file:0
[   12.062350]  unevictable:0 dirty:0 writeback:0
[   12.062350]  slab_reclaimable:2797 slab_unreclaimable:80920
[   12.062350]  mapped:1 shmem:2 pagetables:8 bounce:0
[   12.062350]  free:10488 free_pcp:1227 free_cma:0
...
[   12.101610] Out of memory and no killable processes...
[   12.102042] Kernel panic - not syncing: System is deadlocked on memory
[   12.102583] CPU: 1 PID: 377 Comm: kworker/1:1H Not tainted 5.11.0-rc3+ #510
[   12.102600] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-1 04/01/2014
<snip>

Because kvfree_rcu() has a fallback path, memory allocation failure is
not the end of the world.  Furthermore, the added overhead of aggressive
GFP settings must be balanced against the overhead of the fallback path,
which is a cache miss for double-argument kvfree_rcu() and a call to
synchronize_rcu() for single-argument kvfree_rcu().  The current choice
of GFP_KERNEL|__GFP_NOWARN can result in longer latencies than a call
to synchronize_rcu(), so less-tenacious GFP flags would be helpful.

Here is the tradeoff that must be balanced:
    a) Minimize use of the fallback path,
    b) Avoid pushing the system into OOM,
    c) Bound allocation latency to that of synchronize_rcu(), and
    d) Leave the emergency reserves to use cases lacking fallbacks.

This commit therefore changes GFP flags from GFP_KERNEL|__GFP_NOWARN to
GFP_KERNEL|__GFP_NORETRY|__GFP_NOMEMALLOC|__GFP_NOWARN.  This combination
leaves the emergency reserves alone and can initiate reclaim, but will
not invoke the OOM killer.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-08 14:18:07 -08:00
Uladzislau Rezki (Sony) 3e7ce7a187 kvfree_rcu: Replace __GFP_RETRY_MAYFAIL by __GFP_NORETRY
__GFP_RETRY_MAYFAIL can spend quite a bit of time reclaiming, and this
can be wasted effort given that there is a fallback code path in case
memory allocation fails.

__GFP_NORETRY does perform some light-weight reclaim, but it will fail
under OOM conditions, allowing the fallback to be taken as an alternative
to hard-OOMing the system.

There is a four-way tradeoff that must be balanced:
    1) Minimize use of the fallback path;
    2) Avoid full-up OOM;
    3) Do a light-wait allocation request;
    4) Avoid dipping into the emergency reserves.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-08 14:18:07 -08:00
Paul E. McKenney 7ffc9ec8ea kvfree_rcu: Make krc_this_cpu_unlock() use raw_spin_unlock_irqrestore()
The krc_this_cpu_unlock() function does a raw_spin_unlock() immediately
followed by a local_irq_restore().  This commit saves a line of code by
merging them into a raw_spin_unlock_irqrestore().  This transformation
also reduces scheduling latency because raw_spin_unlock_irqrestore()
responds immediately to a reschedule request.  In contrast,
local_irq_restore() does a scheduling-oblivious enabling of interrupts.

Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-08 14:18:07 -08:00
Paul E. McKenney b01b405092 kvfree_rcu: Use __GFP_NOMEMALLOC for single-argument kvfree_rcu()
This commit applies the __GFP_NOMEMALLOC gfp flag to memory allocations
carried out by the single-argument variant of kvfree_rcu(), thus avoiding
this can-sleep code path from dipping into the emergency reserves.

Acked-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-08 14:18:07 -08:00
Uladzislau Rezki (Sony) 148e3731d1 kvfree_rcu: Directly allocate page for single-argument case
Single-argument kvfree_rcu() must be invoked from sleepable contexts,
so we can directly allocate pages.  Furthermmore, the fallback in case
of page-allocation failure is the high-latency synchronize_rcu(), so it
makes sense to do these page allocations from the fastpath, and even to
permit limited sleeping within the allocator.

This commit therefore allocates if needed on the fastpath using
GFP_KERNEL|__GFP_RETRY_MAYFAIL.  This also has the beneficial effect
of leaving kvfree_rcu()'s per-CPU caches to the double-argument variant
of kvfree_rcu(), given that the double-argument variant cannot directly
invoke the allocator.

[ paulmck: Add add_ptr_to_bulk_krc_lock header comment per Michal Hocko. ]
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-08 14:18:07 -08:00
Zhouyi Zhou 6494ccb932 rcu: Remove spurious instrumentation_end() in rcu_nmi_enter()
In rcu_nmi_enter(), there is an erroneous instrumentation_end() in the
second branch of the "if" statement.  Oddly enough, "objtool check -f
vmlinux.o" fails to complain because it is unable to correctly cover
all cases.  Instead, objtool visits the third branch first, which marks
following trace_rcu_dyntick() as visited.  This commit therefore removes
the spurious instrumentation_end().

Fixes: 04b25a495b ("rcu: Mark rcu_nmi_enter() call to rcu_cleanup_after_idle() noinstr")
Reported-by Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-08 14:17:35 -08:00
Neeraj Upadhyay 47fcbc8dd6 rcu: Fix CPU-offline trace in rcutree_dying_cpu
The condition in the trace_rcu_grace_period() in rcutree_dying_cpu() is
backwards, so that it uses the string "cpuofl" when the offline CPU is
blocking the current grace period and "cpuofl-bgp" otherwise.  Given that
the "-bgp" stands for "blocking grace period", this is at best misleading.
This commit therefore switches these strings in order to correctly trace
whether the outgoing cpu blocks the current grace period.

Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-08 14:17:35 -08:00
Frederic Weisbecker d3ad5bbc4d rcu: Remove superfluous rdp fetch
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar<mingo@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-08 14:17:35 -08:00
Linus Torvalds 657bd90c93 Scheduler updates for v5.12:
[ NOTE: unfortunately this tree had to be freshly rebased today,
         it's a same-content tree of 82891be90f3c (-next published)
         merged with v5.11.
 
         The main reason for the rebase was an authorship misattribution
         problem with a new commit, which we noticed in the last minute,
         and which we didn't want to be merged upstream. The offending
         commit was deep in the tree, and dependent commits had to be
         rebased as well. ]
 
 - Core scheduler updates:
 
   - Add CONFIG_PREEMPT_DYNAMIC: this in its current form adds the
     preempt=none/voluntary/full boot options (default: full),
     to allow distros to build a PREEMPT kernel but fall back to
     close to PREEMPT_VOLUNTARY (or PREEMPT_NONE) runtime scheduling
     behavior via a boot time selection.
 
     There's also the /debug/sched_debug switch to do this runtime.
 
     This feature is implemented via runtime patching (a new variant of static calls).
 
     The scope of the runtime patching can be best reviewed by looking
     at the sched_dynamic_update() function in kernel/sched/core.c.
 
     ( Note that the dynamic none/voluntary mode isn't 100% identical,
       for example preempt-RCU is available in all cases, plus the
       preempt count is maintained in all models, which has runtime
       overhead even with the code patching. )
 
     The PREEMPT_VOLUNTARY/PREEMPT_NONE models, used by the vast majority
     of distributions, are supposed to be unaffected.
 
   - Fix ignored rescheduling after rcu_eqs_enter(). This is a bug that
     was found via rcutorture triggering a hang. The bug is that
     rcu_idle_enter() may wake up a NOCB kthread, but this happens after
     the last generic need_resched() check. Some cpuidle drivers fix it
     by chance but many others don't.
 
     In true 2020 fashion the original bug fix has grown into a 5-patch
     scheduler/RCU fix series plus another 16 RCU patches to address
     the underlying issue of missed preemption events. These are the
     initial fixes that should fix current incarnations of the bug.
 
   - Clean up rbtree usage in the scheduler, by providing & using the following
     consistent set of rbtree APIs:
 
      partial-order; less() based:
        - rb_add(): add a new entry to the rbtree
        - rb_add_cached(): like rb_add(), but for a rb_root_cached
 
      total-order; cmp() based:
        - rb_find(): find an entry in an rbtree
        - rb_find_add(): find an entry, and add if not found
 
        - rb_find_first(): find the first (leftmost) matching entry
        - rb_next_match(): continue from rb_find_first()
        - rb_for_each(): iterate a sub-tree using the previous two
 
   - Improve the SMP/NUMA load-balancer: scan for an idle sibling in a single pass.
     This is a 4-commit series where each commit improves one aspect of the idle
     sibling scan logic.
 
   - Improve the cpufreq cooling driver by getting the effective CPU utilization
     metrics from the scheduler
 
   - Improve the fair scheduler's active load-balancing logic by reducing the number
     of active LB attempts & lengthen the load-balancing interval. This improves
     stress-ng mmapfork performance.
 
   - Fix CFS's estimated utilization (util_est) calculation bug that can result in
     too high utilization values
 
 - Misc updates & fixes:
 
    - Fix the HRTICK reprogramming & optimization feature
    - Fix SCHED_SOFTIRQ raising race & warning in the CPU offlining code
    - Reduce dl_add_task_root_domain() overhead
    - Fix uprobes refcount bug
    - Process pending softirqs in flush_smp_call_function_from_idle()
    - Clean up task priority related defines, remove *USER_*PRIO and
      USER_PRIO()
    - Simplify the sched_init_numa() deduplication sort
    - Documentation updates
    - Fix EAS bug in update_misfit_status(), which degraded the quality
      of energy-balancing
    - Smaller cleanups
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmAtHBsRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1itgg/+NGed12pgPjYBzesdou60Lvx7LZLGjfOt
 M1F1EnmQGn/hEH2fCY6ZoqIZQTVltm7GIcBNabzYTzlaHZsdtyuDUJBZyj19vTlk
 zekcj7WVt+qvfjChaNwEJhQ9nnOM/eohMgEOHMAAJd9zlnQvve7NOLQ56UDM+kn/
 9taFJ5ZPvb4avP6C5p3KivvKex6Bjof/Tl0m3utpNyPpI/qK3FyGxwdgCxU0yepT
 ABWQX5ZQCufFvo1bgnBPfqyzab4MqhoM3bNKBsLQfuAlssG1xRv4KQOev4dRwrt9
 pXJikV5C9yez5d2lGe5p0ltH5IZS/l9x2yI/ZQj3OUDTFyV1ic6WfFAqJgDzVF8E
 i/vvA4NPQiI241Bkps+ErcCw4aVOgiY6TWli74cHjLUIX0+As6aHrFWXGSxUmiHB
 WR+B8KmdfzRTTlhOxMA+cvlpZcKCfxWkJJmXzr/lDZzIuKPqM3QCE2wD9sixkfVo
 JNICT0IvZghWOdbMEfZba8Psh/e2LVI9RzdpEiuYJz1ZrVlt1hO0M6jBxY0hMz9n
 k54z81xODw0a8P2FHMtpmB1vhAeqCmvwA6DO8z0Oxs0DFi+KM2bLf2efHsCKafI+
 Bm5v9YFaOk/55R76hJVh+aYLlyFgFkKd+P/niJTPDnxOk3SqJuXvTrql1HeGHkNr
 kYgQa23dsZk=
 =pyaG
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:
 "Core scheduler updates:

   - Add CONFIG_PREEMPT_DYNAMIC: this in its current form adds the
     preempt=none/voluntary/full boot options (default: full), to allow
     distros to build a PREEMPT kernel but fall back to close to
     PREEMPT_VOLUNTARY (or PREEMPT_NONE) runtime scheduling behavior via
     a boot time selection.

     There's also the /debug/sched_debug switch to do this runtime.

     This feature is implemented via runtime patching (a new variant of
     static calls).

     The scope of the runtime patching can be best reviewed by looking
     at the sched_dynamic_update() function in kernel/sched/core.c.

     ( Note that the dynamic none/voluntary mode isn't 100% identical,
       for example preempt-RCU is available in all cases, plus the
       preempt count is maintained in all models, which has runtime
       overhead even with the code patching. )

     The PREEMPT_VOLUNTARY/PREEMPT_NONE models, used by the vast
     majority of distributions, are supposed to be unaffected.

   - Fix ignored rescheduling after rcu_eqs_enter(). This is a bug that
     was found via rcutorture triggering a hang. The bug is that
     rcu_idle_enter() may wake up a NOCB kthread, but this happens after
     the last generic need_resched() check. Some cpuidle drivers fix it
     by chance but many others don't.

     In true 2020 fashion the original bug fix has grown into a 5-patch
     scheduler/RCU fix series plus another 16 RCU patches to address the
     underlying issue of missed preemption events. These are the initial
     fixes that should fix current incarnations of the bug.

   - Clean up rbtree usage in the scheduler, by providing & using the
     following consistent set of rbtree APIs:

       partial-order; less() based:
         - rb_add(): add a new entry to the rbtree
         - rb_add_cached(): like rb_add(), but for a rb_root_cached

       total-order; cmp() based:
         - rb_find(): find an entry in an rbtree
         - rb_find_add(): find an entry, and add if not found

         - rb_find_first(): find the first (leftmost) matching entry
         - rb_next_match(): continue from rb_find_first()
         - rb_for_each(): iterate a sub-tree using the previous two

   - Improve the SMP/NUMA load-balancer: scan for an idle sibling in a
     single pass. This is a 4-commit series where each commit improves
     one aspect of the idle sibling scan logic.

   - Improve the cpufreq cooling driver by getting the effective CPU
     utilization metrics from the scheduler

   - Improve the fair scheduler's active load-balancing logic by
     reducing the number of active LB attempts & lengthen the
     load-balancing interval. This improves stress-ng mmapfork
     performance.

   - Fix CFS's estimated utilization (util_est) calculation bug that can
     result in too high utilization values

  Misc updates & fixes:

   - Fix the HRTICK reprogramming & optimization feature

   - Fix SCHED_SOFTIRQ raising race & warning in the CPU offlining code

   - Reduce dl_add_task_root_domain() overhead

   - Fix uprobes refcount bug

   - Process pending softirqs in flush_smp_call_function_from_idle()

   - Clean up task priority related defines, remove *USER_*PRIO and
     USER_PRIO()

   - Simplify the sched_init_numa() deduplication sort

   - Documentation updates

   - Fix EAS bug in update_misfit_status(), which degraded the quality
     of energy-balancing

   - Smaller cleanups"

* tag 'sched-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
  sched,x86: Allow !PREEMPT_DYNAMIC
  entry/kvm: Explicitly flush pending rcuog wakeup before last rescheduling point
  entry: Explicitly flush pending rcuog wakeup before last rescheduling point
  rcu/nocb: Trigger self-IPI on late deferred wake up before user resume
  rcu/nocb: Perform deferred wake up before last idle's need_resched() check
  rcu: Pull deferred rcuog wake up to rcu_eqs_enter() callers
  sched/features: Distinguish between NORMAL and DEADLINE hrtick
  sched/features: Fix hrtick reprogramming
  sched/deadline: Reduce rq lock contention in dl_add_task_root_domain()
  uprobes: (Re)add missing get_uprobe() in __find_uprobe()
  smp: Process pending softirqs in flush_smp_call_function_from_idle()
  sched: Harden PREEMPT_DYNAMIC
  static_call: Allow module use without exposing static_call_key
  sched: Add /debug/sched_preempt
  preempt/dynamic: Support dynamic preempt with preempt= boot option
  preempt/dynamic: Provide irqentry_exit_cond_resched() static call
  preempt/dynamic: Provide preempt_schedule[_notrace]() static calls
  preempt/dynamic: Provide cond_resched() and might_resched() static calls
  preempt: Introduce CONFIG_PREEMPT_DYNAMIC
  static_call: Provide DEFINE_STATIC_CALL_RET0()
  ...
2021-02-21 12:35:04 -08:00
Frederic Weisbecker 4ae7dc97f7 entry/kvm: Explicitly flush pending rcuog wakeup before last rescheduling point
Following the idle loop model, cleanly check for pending rcuog wakeup
before the last rescheduling point upon resuming to guest mode. This
way we can avoid to do it from rcu_user_enter() with the last resort
self-IPI hack that enforces rescheduling.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210131230548.32970-6-frederic@kernel.org
2021-02-17 14:12:43 +01:00
Frederic Weisbecker 47b8ff194c entry: Explicitly flush pending rcuog wakeup before last rescheduling point
Following the idle loop model, cleanly check for pending rcuog wakeup
before the last rescheduling point on resuming to user mode. This
way we can avoid to do it from rcu_user_enter() with the last resort
self-IPI hack that enforces rescheduling.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210131230548.32970-5-frederic@kernel.org
2021-02-17 14:12:43 +01:00
Frederic Weisbecker f8bb5cae96 rcu/nocb: Trigger self-IPI on late deferred wake up before user resume
Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP
kthread (rcuog) to be serviced.

Unfortunately the call to rcu_user_enter() is already past the last
rescheduling opportunity before we resume to userspace or to guest mode.
We may escape there with the woken task ignored.

The ultimate resort to fix every callsites is to trigger a self-IPI
(nohz_full depends on arch to implement arch_irq_work_raise()) that will
trigger a reschedule on IRQ tail or guest exit.

Eventually every site that want a saner treatment will need to carefully
place a call to rcu_nocb_flush_deferred_wakeup() before the last explicit
need_resched() check upon resume.

Fixes: 96d3fd0d31 (rcu: Break call_rcu() deadlock involving scheduler and perf)
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210131230548.32970-4-frederic@kernel.org
2021-02-17 14:12:43 +01:00
Frederic Weisbecker 43789ef3f7 rcu/nocb: Perform deferred wake up before last idle's need_resched() check
Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP
kthread (rcuog) to be serviced.

Usually a local wake up happening while running the idle task is handled
in one of the need_resched() checks carefully placed within the idle
loop that can break to the scheduler.

Unfortunately the call to rcu_idle_enter() is already beyond the last
generic need_resched() check and we may halt the CPU with a resched
request unhandled, leaving the task hanging.

Fix this with splitting the rcuog wakeup handling from rcu_idle_enter()
and place it before the last generic need_resched() check in the idle
loop. It is then assumed that no call to call_rcu() will be performed
after that in the idle loop until the CPU is put in low power mode.

Fixes: 96d3fd0d31 (rcu: Break call_rcu() deadlock involving scheduler and perf)
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210131230548.32970-3-frederic@kernel.org
2021-02-17 14:12:43 +01:00
Frederic Weisbecker 54b7429eff rcu: Pull deferred rcuog wake up to rcu_eqs_enter() callers
Deferred wakeup of rcuog kthreads upon RCU idle mode entry is going to
be handled differently whether initiated by idle, user or guest. Prepare
with pulling that control up to rcu_eqs_enter() callers.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210131230548.32970-2-frederic@kernel.org
2021-02-17 14:12:42 +01:00
Paul E. McKenney 0d2460ba61 Merge branches 'doc.2021.01.06a', 'fixes.2021.01.04b', 'kfree_rcu.2021.01.04a', 'mmdumpobj.2021.01.22a', 'nocb.2021.01.06a', 'rt.2021.01.04a', 'stall.2021.01.06a', 'torture.2021.01.12a' and 'tortureall.2021.01.06a' into HEAD
doc.2021.01.06a: Documentation updates.
fixes.2021.01.04b: Miscellaneous fixes.
kfree_rcu.2021.01.04a: kfree_rcu() updates.
mmdumpobj.2021.01.22a: Dump allocation point for memory blocks.
nocb.2021.01.06a: RCU callback offload updates and cblist segment lengths.
rt.2021.01.04a: Real-time updates.
stall.2021.01.06a: RCU CPU stall warning updates.
torture.2021.01.12a: Torture-test updates and polling SRCU grace-period API.
tortureall.2021.01.06a: Torture-test script updates.
2021-01-22 15:26:44 -08:00
Paul E. McKenney b4b7914a6a rcu: Make call_rcu() print mem_dump_obj() info for double-freed callback
The debug-object double-free checks in __call_rcu() print out the
RCU callback function, which is usually sufficient to track down the
double free.  However, all uses of things like queue_rcu_work() will
have the same RCU callback function (rcu_work_rcufn() in this case),
so a diagnostic message for a double queue_rcu_work() needs more than
just the callback function.

This commit therefore calls mem_dump_obj() to dump out any additional
available information on the double-freed callback.

Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <linux-mm@kvack.org>
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-22 15:24:16 -08:00
Neeraj Upadhyay 683954e55c rcu: Check and report missed fqs timer wakeup on RCU stall
For a new grace period request, the RCU GP kthread transitions through
following states:

a. [RCU_GP_WAIT_GPS] -> [RCU_GP_DONE_GPS]

The RCU_GP_WAIT_GPS state is where the GP kthread waits for a request
for a new GP.  Once it receives a request (for example, when a new RCU
callback is queued), the GP kthread transitions to RCU_GP_DONE_GPS.

b. [RCU_GP_DONE_GPS] -> [RCU_GP_ONOFF]

Grace period initialization starts in rcu_gp_init(), which records the
start of new GP in rcu_state.gp_seq and transitions to RCU_GP_ONOFF.

c. [RCU_GP_ONOFF] -> [RCU_GP_INIT]

The purpose of the RCU_GP_ONOFF state is to apply the online/offline
information that was buffered for any CPUs that recently came online or
went offline.  This state is maintained in per-leaf rcu_node bitmasks,
with the buffered state in ->qsmaskinitnext and the state for the upcoming
GP in ->qsmaskinit.  At the end of this RCU_GP_ONOFF state, each bit in
->qsmaskinit will correspond to a CPU that must pass through a quiescent
state before the upcoming grace period is allowed to complete.

However, a leaf rcu_node structure with an all-zeroes ->qsmaskinit
cannot necessarily be ignored.  In preemptible RCU, there might well be
tasks still in RCU read-side critical sections that were first preempted
while running on one of the CPUs managed by this structure.  Such tasks
will be queued on this structure's ->blkd_tasks list.  Only after this
list fully drains can this leaf rcu_node structure be ignored, and even
then only if none of its CPUs have come back online in the meantime.
Once that happens, the ->qsmaskinit masks further up the tree will be
updated to exclude this leaf rcu_node structure.

Once the ->qsmaskinitnext and ->qsmaskinit fields have been updated
as needed, the GP kthread transitions to RCU_GP_INIT.

d. [RCU_GP_INIT] -> [RCU_GP_WAIT_FQS]

The purpose of the RCU_GP_INIT state is to copy each ->qsmaskinit to
the ->qsmask field within each rcu_node structure.  This copying is done
breadth-first from the root to the leaves.  Why not just copy directly
from ->qsmaskinitnext to ->qsmask?  Because the ->qsmaskinitnext masks
can change in the meantime as additional CPUs come online or go offline.
Such changes would result in inconsistencies in the ->qsmask fields up and
down the tree, which could in turn result in too-short grace periods or
grace-period hangs.  These issues are avoided by snapshotting the leaf
rcu_node structures' ->qsmaskinitnext fields into their ->qsmaskinit
counterparts, generating a consistent set of ->qsmaskinit fields
throughout the tree, and only then copying these consistent ->qsmaskinit
fields to their ->qsmask counterparts.

Once this initialization step is complete, the GP kthread transitions
to RCU_GP_WAIT_FQS, where it waits to do a force-quiescent-state scan
on the one hand or for the end of the grace period on the other.

e. [RCU_GP_WAIT_FQS] -> [RCU_GP_DOING_FQS]

The RCU_GP_WAIT_FQS state waits for one of three things:  (1) An
explicit request to do a force-quiescent-state scan, (2) The end of
the grace period, or (3) A short interval of time, after which it
will do a force-quiescent-state (FQS) scan.  The explicit request can
come from rcutorture or from any CPU that has too many RCU callbacks
queued (see the qhimark kernel parameter and the RCU_GP_FLAG_OVLD
flag).  The aforementioned "short period of time" is specified by the
jiffies_till_first_fqs boot parameter for a given grace period's first
FQS scan and by the jiffies_till_next_fqs for later FQS scans.

Either way, once the wait is over, the GP kthread transitions to
RCU_GP_DOING_FQS.

f. [RCU_GP_DOING_FQS] -> [RCU_GP_CLEANUP]

The RCU_GP_DOING_FQS state performs an FQS scan.  Each such scan carries
out two functions for any CPU whose bit is still set in its leaf rcu_node
structure's ->qsmask field, that is, for any CPU that has not yet reported
a quiescent state for the current grace period:

  i.  Report quiescent states on behalf of CPUs that have been observed
      to be idle (from an RCU perspective) since the beginning of the
      grace period.

  ii. If the current grace period is too old, take various actions to
      encourage holdout CPUs to pass through quiescent states, including
      enlisting the aid of any calls to cond_resched() and might_sleep(),
      and even including IPIing the holdout CPUs.

These checks are skipped for any leaf rcu_node structure with a all-zero
->qsmask field, however such structures are subject to RCU priority
boosting if there are tasks on a given structure blocking the current
grace period.  The end of the grace period is detected when the root
rcu_node structure's ->qsmask is zero and when there are no longer any
preempted tasks blocking the current grace period.  (No, this last check
is not redundant.  To see this, consider an rcu_node tree having exactly
one structure that serves as both root and leaf.)

Once the end of the grace period is detected, the GP kthread transitions
to RCU_GP_CLEANUP.

g. [RCU_GP_CLEANUP] -> [RCU_GP_CLEANED]

The RCU_GP_CLEANUP state marks the end of grace period by updating the
rcu_state structure's ->gp_seq field and also all rcu_node structures'
->gp_seq field.  As before, the rcu_node tree is traversed in breadth
first order.  Once this update is complete, the GP kthread transitions
to the RCU_GP_CLEANED state.

i. [RCU_GP_CLEANED] -> [RCU_GP_INIT]

Once in the RCU_GP_CLEANED state, the GP kthread immediately transitions
into the RCU_GP_INIT state.

j. The role of timers.

If there is at least one idle CPU, and if timers are not firing, the
transition from RCU_GP_DOING_FQS to RCU_GP_CLEANUP will never happen.
Timers can fail to fire for a number of reasons, including issues in
timer configuration, issues in the timer framework, and failure to handle
softirqs (for example, when there is a storm of interrupts).  Whatever the
reason, if the timers fail to fire, the GP kthread will never be awakened,
resulting in RCU CPU stall warnings and eventually in OOM.

However, an RCU CPU stall warning has a large number of potential causes,
as documented in Documentation/RCU/stallwarn.rst.  This commit therefore
adds analysis to the RCU CPU stall-warning code to emit an additional
message if the cause of the stall is likely to be timer failure.

Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 16:54:11 -08:00
Paul E. McKenney 147c6852d3 rcu: Do any deferred nocb wakeups at CPU offline time
Because the need to wake a nocb GP kthread ("rcuog") is sometimes
detected when wakeups cannot be done, these wakeups can be deferred.
The wakeups are then carried out by calls to do_nocb_deferred_wakeup()
at various safe points in the code, including RCU's idle hooks.  However,
when a CPU goes offline, it invokes arch_cpu_idle_dead() without invoking
any of RCU's idle hooks.

This commit therefore adds a call to do_nocb_deferred_wakeup() in
rcu_report_dead() in order to handle any deferred wakeups that have been
requested by the outgoing CPU.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 16:50:24 -08:00
Frederic Weisbecker 634954c2db rcu/nocb: Locally accelerate callbacks as long as offloading isn't complete
The local callbacks processing checks if any callbacks need acceleration.
This commit carries out this checking under nocb lock protection in
the middle of toggle operations, during which time rcu_core() executes
concurrently with GP/CB kthreads.

Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Inspired-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 16:24:59 -08:00
Frederic Weisbecker 32aa2f4170 rcu/nocb: Process batch locally as long as offloading isn't complete
This commit makes sure to process the callbacks locally (via either
RCU_SOFTIRQ or the rcuc kthread) whenever the segcblist isn't entirely
offloaded.  This ensures that callbacks are invoked one way or another
while a CPU is in the middle of a toggle operation.

Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Inspired-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 16:24:59 -08:00
Frederic Weisbecker e3abe959fb rcu/nocb: Only cond_resched() from actual offloaded batch processing
During a toggle operations, rcu_do_batch() may be invoked concurrently
by softirqs and offloaded processing for a given CPU's callbacks.
This commit therefore makes sure cond_resched() is invoked only from
the offloaded context.

Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Inspired-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 16:24:59 -08:00
Frederic Weisbecker 126d9d4952 rcu/nocb: Always init segcblist on CPU up
How the rdp->cblist enabled state is treated at CPU-hotplug time depends
on whether or not that ->cblist is offloaded.

1) Not offloaded: The ->cblist is disabled when the CPU goes down. All
   its callbacks are migrated and none can to enqueued until after some
   later CPU-hotplug operation brings the CPU back up.

2) Offloaded: The ->cblist is not disabled on CPU down because the CB/GP
   kthreads must finish invoking the remaining callbacks. There is thus
   no need to re-enable it on CPU up.

Since the ->cblist offloaded state is set in stone at boot, it cannot
change between CPU down and CPU up. So 1) and 2) are symmetrical.

However, given runtime toggling of the offloaded state, there are two
additional asymmetrical scenarios:

3) The ->cblist is not offloaded when the CPU goes down. The ->cblist
   is later toggled to offloaded and then the CPU comes back up.

4) The ->cblist is offloaded when the CPU goes down. The ->cblist is
   later toggled to no longer be offloaded and then the CPU comes back up.

Scenario 4) is currently handled correctly. The ->cblist remains enabled
on CPU down and gets re-initialized on CPU up. The toggling operation
will wait until ->cblist is empty, so ->cblist will remain empty until
CPU-up time.

The scenario 3) would run into trouble though, as the rdp is disabled
on CPU down and not re-initialized/re-enabled on CPU up.  Except that
in this case, ->cblist is guaranteed to be empty because all its
callbacks were migrated away at CPU-down time.  And the CPU-up code
already initializes and enables any empty ->cblist structures in order
to handle the possibility of early-boot invocations of call_rcu() in
the case where such invocations don't occur.  So all that need be done
is to adjust the locking.

Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Inspired-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 16:24:19 -08:00
Frederic Weisbecker 8d346d438f rcu/nocb: Provide basic callback offloading state machine bits
Offloading and de-offloading RCU callback processes must be done
carefully.  There must never be a time at which callback processing is
disabled because the task driving the offloading or de-offloading might be
preempted or otherwise stalled at that point in time, which would result
in OOM due to calbacks piling up indefinitely.  This implies that there
will be times during which a given CPU's callbacks might be concurrently
invoked by both that CPU's RCU_SOFTIRQ handler (or, equivalently, that
CPU's rcuc kthread) and by that CPU's rcuo kthread.

This situation could fatally confuse both rcu_barrier() and the
CPU-hotplug offlining process, so these must be excluded during any
concurrent-callback-invocation period.  In addition, during times of
concurrent callback invocation, changes to ->cblist must be protected
both as needed for RCU_SOFTIRQ and as needed for the rcuo kthread.

This commit therefore defines and documents the states for a state
machine that coordinates offloading and deoffloading.

Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Inspired-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 16:24:19 -08:00
Joel Fernandes (Google) b4e6039e8a rcu/segcblist: Add debug checks for segment lengths
This commit adds debug checks near the end of rcu_do_batch() that emit
warnings if an empty rcu_segcblist structure has non-zero segment counts,
or, conversely, if a non-empty structure has all-zero segment counts.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
[ paulmck: Fix queue/segment-length checks. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 16:24:19 -08:00
Joel Fernandes (Google) 3afe7fa535 rcu/trace: Add tracing for how segcb list changes
This commit adds tracing to track how the segcb list changes before/after
acceleration, during queuing and during dequeuing.

This tracing helped discover an optimization that avoided needless GP
requests when no callbacks were accelerated. The tracing overhead is
minimal as each segment's length is now stored in the respective segment.

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 16:24:19 -08:00
Joel Fernandes (Google) 68804cf1c9 rcu/tree: segcblist: Remove redundant smp_mb()s
The full memory barriers in rcu_segcblist_enqueue() and in rcu_do_batch()
are not needed because rcu_segcblist_add_len(), and thus also
rcu_segcblist_inc_len(), already includes a memory barrier *before*
and *after* the length of the list is updated.

This commit therefore removes these redundant smp_mb() invocations.

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 16:24:19 -08:00
Paul E. McKenney a649d25dcc rcu: Add lockdep_assert_irqs_disabled() to rcu_sched_clock_irq() and callees
This commit adds a number of lockdep_assert_irqs_disabled() calls
to rcu_sched_clock_irq() and a number of the functions that it calls.
The point of this is to help track down a situation where lockdep appears
to be insisting that interrupts are enabled within these functions, which
should only ever be invoked from the scheduling-clock interrupt handler.

Link: https://lore.kernel.org/lkml/20201111133813.GA81547@elver.google.com/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-04 15:54:49 -08:00
Scott Wood 8b9a0ecc7e rcu: Unconditionally use rcuc threads on PREEMPT_RT
PREEMPT_RT systems have long used the rcutree.use_softirq kernel
boot parameter to avoid use of RCU_SOFTIRQ handlers, which can disrupt
real-time applications by invoking callbacks during return from interrupts
that arrived while executing time-critical code.  This kernel boot
parameter instead runs RCU core processing in an 'rcuc' kthread, thus
allowing the scheduler to do its job of avoiding disrupting time-critical
code.

This commit therefore disables the rcutree.use_softirq kernel boot
parameter on PREEMPT_RT systems, thus forcing such systems to do RCU
core processing in 'rcuc' kthreads.  This approach has long been in
use by users of the -rt patchset, and there have been no complaints.
There is therefore no way for the system administrator to override this
choice, at least without modifying and rebuilding the kernel.

Signed-off-by: Scott Wood <swood@redhat.com>
[bigeasy: Reword commit message]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
[ paulmck: Update kernel-parameters.txt accordingly. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-04 13:43:51 -08:00
Zqiang 84109ab585 rcu: Record kvfree_call_rcu() call stack for KASAN
This commit adds a call to kasan_record_aux_stack() in kvfree_call_rcu()
in order to record the call stack of the code that caused the object
to be freed.  Please note that this function does not update the
allocated/freed state, which is important because RCU readers might
still be referencing this object.

Acked-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Zqiang <qiang.zhang@windriver.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-04 13:42:04 -08:00
Joel Fernandes (Google) 6bc3358280 rcu/tree: Make rcu_do_batch count how many callbacks were executed
The rcu_do_batch() function extracts the ready-to-invoke callbacks
from the rcu_segcblist located in the ->cblist field of the current
CPU's rcu_data structure.  These callbacks are first moved to a local
(unsegmented) rcu_cblist.  The rcu_do_batch() function then uses this
rcu_cblist's ->len field to count how many CBs it has invoked, but it
does so by counting that field down from zero.  Finally, this function
negates the value in this ->len field (resulting in a positive number)
and subtracts the result from the ->len field of the current CPU's
->cblist field.

Except that it is sometimes necessary for rcu_do_batch() to stop invoking
callbacks mid-stream, despite there being more ready to invoke, for
example, if a high-priority task wakes up.  In this case the remaining
not-yet-invoked callbacks are requeued back onto the CPU's ->cblist,
but remain in the ready-to-invoke segment of that list.  As above, the
negative of the local rcu_cblist's ->len field is still subtracted from
the ->len field of the current CPU's ->cblist field.

The design of counting down from 0 is confusing and error-prone, plus
use of a positive count will make it easier to provide a uniform and
consistent API to deal with the per-segment counts that are added
later in this series.  For example, rcu_segcblist_extract_done_cbs()
can unconditionally populate the resulting unsegmented list's ->len
field during extraction.

This commit therefore explicitly counts how many callbacks were executed
in rcu_do_batch() itself, counting up from zero, and then uses that
to update the per-CPU segcb list's ->len field, without relying on the
downcounting of rcl->len from zero.

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-04 13:22:12 -08:00
Linus Torvalds adb35e8dc9 Scheduler updates:
- migrate_disable/enable() support which originates from the RT tree and
    is now a prerequisite for the new preemptible kmap_local() API which aims
    to replace kmap_atomic().
 
  - A fair amount of topology and NUMA related improvements
 
  - Improvements for the frequency invariant calculations
 
  - Enhanced robustness for the global CPU priority tracking and decision
    making
 
  - The usual small fixes and enhancements all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/XwK4THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoX28D/9cVrvziSQGfBfuQWnUiw8iOIq1QBa2
 Me+Tvenhfrlt7xU6rbP9ciFu7eTN+fS06m5uQPGI+t22WuJmHzbmw1bJVXfkvYfI
 /QoU+Hg7DkDAn1p7ZKXh0dRkV0nI9ixxSHl0E+Zf1ATBxCUMV2SO85flg6z/4qJq
 3VWUye0dmR7/bhtkIjv5rwce9v2JB2g1AbgYXYTW9lHVoUdGoMSdiZAF4tGyHLnx
 sJ6DMqQ+k+dmPyYO0z5MTzjW/fXit4n9w2e3z9TvRH/uBu58WSW1RBmQYX6aHBAg
 dhT9F4lvTs6lJY23x5RSFWDOv6xAvKF5a0xfb8UZcyH5EoLYrPRvm42a0BbjdeRa
 u0z7LbwIlKA+RFdZzFZWz8UvvO0ljyMjmiuqZnZ5dY9Cd80LSBuxrWeQYG0qg6lR
 Y2povhhCepEG+q8AXIe2YjHKWKKC1s/l/VY3CNnCzcd21JPQjQ4Z5eWGmHif5IED
 CntaeFFhZadR3w02tkX35zFmY3w4soKKrbI4EKWrQwd+cIEQlOSY7dEPI/b5BbYj
 MWAb3P4EG9N77AWTNmbhK4nN0brEYb+rBbCA+5dtNBVhHTxAC7OTWElJOC2O66FI
 e06dREjvwYtOkRUkUguWwErbIai2gJ2MH0VILV3hHoh64oRk7jjM8PZYnjQkdptQ
 Gsq0rJW5iiu/OQ==
 =Oz1V
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Thomas Gleixner:

 - migrate_disable/enable() support which originates from the RT tree
   and is now a prerequisite for the new preemptible kmap_local() API
   which aims to replace kmap_atomic().

 - A fair amount of topology and NUMA related improvements

 - Improvements for the frequency invariant calculations

 - Enhanced robustness for the global CPU priority tracking and decision
   making

 - The usual small fixes and enhancements all over the place

* tag 'sched-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (61 commits)
  sched/fair: Trivial correction of the newidle_balance() comment
  sched/fair: Clear SMT siblings after determining the core is not idle
  sched: Fix kernel-doc markup
  x86: Print ratio freq_max/freq_base used in frequency invariance calculations
  x86, sched: Use midpoint of max_boost and max_P for frequency invariance on AMD EPYC
  x86, sched: Calculate frequency invariance for AMD systems
  irq_work: Optimize irq_work_single()
  smp: Cleanup smp_call_function*()
  irq_work: Cleanup
  sched: Limit the amount of NUMA imbalance that can exist at fork time
  sched/numa: Allow a floating imbalance between NUMA nodes
  sched: Avoid unnecessary calculation of load imbalance at clone time
  sched/numa: Rename nr_running and break out the magic number
  sched: Make migrate_disable/enable() independent of RT
  sched/topology: Condition EAS enablement on FIE support
  arm64: Rebuild sched domains on invariance status changes
  sched/topology,schedutil: Wrap sched domains rebuild
  sched/uclamp: Allow to reset a task uclamp constraint value
  sched/core: Fix typos in comments
  Documentation: scheduler: fix information on arch SD flags, sched_domain and sched_debug
  ...
2020-12-14 18:29:11 -08:00
Linus Torvalds 8c1dccc803 RCU, LKMM and KCSAN updates collected by Paul McKenney:
RCU:
 
     - Avoid cpuinfo-induced IPI pileups and idle-CPU IPIs.
 
     - Lockdep-RCU updates reducing the need for __maybe_unused.
 
     - Tasks-RCU updates.
 
     - Miscellaneous fixes.
 
     - Documentation updates.
 
     - Torture-test updates.
 
   KCSAN:
 
     - updates for selftests, avoiding setting watchpoints on NULL pointers
 
     - fix to watchpoint encoding
 
   LKMM:
 
     - updates for documentation along with some updates to example-code
       litmus tests
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/Xon4THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYobXUD/92LJTI/TMgK6Z6EEQBiJZO/2mNKjK8
 FEKc6AqTNMlZNsWCfQ5UgqtHpn+MkBZsX1x4u22gehE1qaCB8gnQ5wXgbXon8tQm
 exxVk6vvQZjseeqCMqrsUYQlD7dNgHnf1qAmWXJvji4sA/1Opo6n2M74tqfE2ueV
 S5hpQwSuK/6Zu2Hrr62HD8+Fx0in6ZuKRZxHGp1392l++DGbniJM3dzntRXB+JbZ
 w3PDHFCQuGzTytyeKuQV48ot9IK+2YzmjIp/+4tHL6mvU38xeSu6gcYtqKPcfYWw
 D6HXvDa965h5IrFdSA2JWSzjJ+VYgZVElk2HyXDNIae0fM/8GidgoIDQipT1WAur
 sxW/Ke4U6Jm5MMqXqV8iMNduktkGD1/h6G/iB1Yis29xFdthorNpbHVAP+8cKXgf
 1cR6RorOuBYv6XpyzygHtE7qfLY5ST352pJ4+UqNzboujOcuEnGaygttt0F/F8sA
 ZH8NT5dyUfbGeqepdZWkbj116Hjeg3fyV3CZeyBhDeqpjf1Nn3nbJ1xRksPLfa3i
 IKvN7HSzEg+vKnsJNnQeFlAmQ/W3n2bedzRqfaCg77pNhKI6jPuavY5f2YGFUj0y
 yx0UzOYoI1Cln0keBMmynbyUKgJ7zstLkrt/JenjhtD3B+0df5BmYjkL+nqkP6ax
 +XTCu7Xg+B061g==
 =N/iO
 -----END PGP SIGNATURE-----

Merge tag 'core-rcu-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RCU updates from Thomas Gleixner:
 "RCU, LKMM and KCSAN updates collected by Paul McKenney.

  RCU:
   - Avoid cpuinfo-induced IPI pileups and idle-CPU IPIs

   - Lockdep-RCU updates reducing the need for __maybe_unused

   - Tasks-RCU updates

   - Miscellaneous fixes

   - Documentation updates

   - Torture-test updates

  KCSAN:
   - updates for selftests, avoiding setting watchpoints on NULL pointers

   - fix to watchpoint encoding

  LKMM:
   - updates for documentation along with some updates to example-code
     litmus tests"

* tag 'core-rcu-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
  srcu: Take early exit on memory-allocation failure
  rcu/tree: Defer kvfree_rcu() allocation to a clean context
  rcu: Do not report strict GPs for outgoing CPUs
  rcu: Fix a typo in rcu_blocking_is_gp() header comment
  rcu: Prevent lockdep-RCU splats on lock acquisition/release
  rcu/tree: nocb: Avoid raising softirq for offloaded ready-to-execute CBs
  rcu,ftrace: Fix ftrace recursion
  rcu/tree: Make struct kernel_param_ops definitions const
  rcu/tree: Add a warning if CPU being onlined did not report QS already
  rcu: Clarify nocb kthreads naming in RCU_NOCB_CPU config
  rcu: Fix single-CPU check in rcu_blocking_is_gp()
  rcu: Implement rcu_segcblist_is_offloaded() config dependent
  list.h: Update comment to explicitly note circular lists
  rcu: Panic after fixed number of stalls
  x86/smpboot:  Move rcu_cpu_starting() earlier
  rcu: Allow rcu_irq_enter_check_tick() from NMI
  tools/memory-model: Label MP tests' producers and consumers
  tools/memory-model: Use "buf" and "flag" for message-passing tests
  tools/memory-model: Add types to litmus tests
  tools/memory-model: Add a glossary of LKMM terms
  ...
2020-12-14 17:21:16 -08:00
Ingo Molnar a787bdaff8 Merge branch 'linus' into sched/core, to resolve semantic conflict
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-11-27 11:10:50 +01:00
Peter Zijlstra 7a9f50a058 irq_work: Cleanup
Get rid of the __call_single_node union and clean up the API a little
to avoid external code relying on the structure layout as much.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
2020-11-24 16:47:48 +01:00
Paul E. McKenney 7fc91fc845 Merge branches 'cpuinfo.2020.11.06a', 'doc.2020.11.06a', 'fixes.2020.11.19b', 'lockdep.2020.11.02a', 'tasks.2020.11.06a' and 'torture.2020.11.06a' into HEAD
cpuinfo.2020.11.06a: Speedups for /proc/cpuinfo.
doc.2020.11.06a: Documentation updates.
fixes.2020.11.19b: Miscellaneous fixes.
lockdep.2020.11.02a: Lockdep-RCU updates to avoid "unused variable".
tasks.2020.11.06a: Tasks-RCU updates.
torture.2020.11.06a': Torture-test updates.
2020-11-19 19:37:47 -08:00
Uladzislau Rezki (Sony) 56292e8609 rcu/tree: Defer kvfree_rcu() allocation to a clean context
The current memmory-allocation interface causes the following difficulties
for kvfree_rcu():

a) If built with CONFIG_PROVE_RAW_LOCK_NESTING, the lockdep will
   complain about violation of the nesting rules, as in "BUG: Invalid
   wait context".  This Kconfig option checks for proper raw_spinlock
   vs. spinlock nesting, in particular, it is not legal to acquire a
   spinlock_t while holding a raw_spinlock_t.

   This is a problem because kfree_rcu() uses raw_spinlock_t whereas the
   "page allocator" internally deals with spinlock_t to access to its
   zones. The code also can be broken from higher level of view:
   <snip>
       raw_spin_lock(&some_lock);
       kfree_rcu(some_pointer, some_field_offset);
   <snip>

b) If built with CONFIG_PREEMPT_RT, spinlock_t is converted into
   sleeplock.  This means that invoking the page allocator from atomic
   contexts results in "BUG: scheduling while atomic".

c) Please note that call_rcu() is already invoked from raw atomic context,
   so it is only reasonable to expaect that kfree_rcu() and kvfree_rcu()
   will also be called from atomic raw context.

This commit therefore defers page allocation to a clean context using the
combination of an hrtimer and a workqueue.  The hrtimer stage is required
in order to avoid deadlocks with the scheduler.  This deferred allocation
is required only when kvfree_rcu()'s per-CPU page cache is empty.

Link: https://lore.kernel.org/lkml/20200630164543.4mdcf6zb4zfclhln@linutronix.de/
Fixes: 3042f83f19 ("rcu: Support reclaim for head-less object")
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-19 19:37:17 -08:00
Zhouyi Zhou 354c3f0e22 rcu: Fix a typo in rcu_blocking_is_gp() header comment
This commit fixes a typo in the rcu_blocking_is_gp() function's header
comment.

Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-19 19:37:17 -08:00
Paul E. McKenney 4d60b475f8 rcu: Prevent lockdep-RCU splats on lock acquisition/release
The rcu_cpu_starting() and rcu_report_dead() functions transition the
current CPU between online and offline state from an RCU perspective.
Unfortunately, this means that the rcu_cpu_starting() function's lock
acquisition and the rcu_report_dead() function's lock releases happen
while the CPU is offline from an RCU perspective, which can result
in lockdep-RCU splats about using RCU from an offline CPU.  And this
situation can also result in too-short grace periods, especially in
guest OSes that are subject to vCPU preemption.

This commit therefore uses sequence-count-like synchronization to forgive
use of RCU while RCU thinks a CPU is offline across the full extent of
the rcu_cpu_starting() and rcu_report_dead() function's lock acquisitions
and releases.

One approach would have been to use the actual sequence-count primitives
provided by the Linux kernel.  Unfortunately, the resulting code looks
completely broken and wrong, and is likely to result in patches that
break RCU in an attempt to address this appearance of broken wrongness.
Plus there is no net savings in lines of code, given the additional
explicit memory barriers required.

Therefore, this sequence count is instead implemented by a new ->ofl_seq
field in the rcu_node structure.  If this counter's value is an odd
number, RCU forgives RCU read-side critical sections on other CPUs covered
by the same rcu_node structure, even if those CPUs are offline from
an RCU perspective.  In addition, if a given leaf rcu_node structure's
->ofl_seq counter value is an odd number, rcu_gp_init() delays starting
the grace period until that counter value changes.

[ paulmck: Apply Peter Zijlstra feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-19 19:37:17 -08:00
Joel Fernandes (Google) bd56e0a4a2 rcu/tree: nocb: Avoid raising softirq for offloaded ready-to-execute CBs
Testing showed that rcu_pending() can return 1 when offloaded callbacks
are ready to execute.  This invokes RCU core processing, for example,
by raising RCU_SOFTIRQ, eventually resulting in a call to rcu_core().
However, rcu_core() explicitly avoids in any way manipulating offloaded
callbacks, which are instead handled by the rcuog and rcuoc kthreads,
which work independently of rcu_core().

One exception to this independence is that rcu_core() invokes
do_nocb_deferred_wakeup(), however, rcu_pending() also checks
rcu_nocb_need_deferred_wakeup() in order to correctly handle this case,
invoking rcu_core() when needed.

This commit therefore avoids needlessly invoking RCU core processing
by checking rcu_segcblist_ready_cbs() only on non-offloaded CPUs.
This reduces overhead, for example, by reducing softirq activity.

This change passed 30 minute tests of TREE01 through TREE09 each.

On TREE08, there is at most 150us from the time that rcu_pending() chose
not to invoke RCU core processing to the time when the ready callbacks
were invoked by the rcuoc kthread.  This provides further evidence that
there is no need to invoke rcu_core() for offloaded callbacks that are
ready to invoke.

Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-19 19:37:17 -08:00
Peter Zijlstra d2098b4440 rcu,ftrace: Fix ftrace recursion
Kim reported that perf-ftrace made his box unhappy. It turns out that
commit:

  ff5c4f5cad ("rcu/tree: Mark the idle relevant functions noinstr")

removed one too many notrace qualifiers, probably due to there not being
a helpful comment.

This commit therefore reinstates the notrace and adds a comment to avoid
losing it again.

[ paulmck: Apply Steven Rostedt's feedback on the comment. ]
Fixes: ff5c4f5cad ("rcu/tree: Mark the idle relevant functions noinstr")
Reported-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-19 19:37:17 -08:00
Joe Perches 7c47ee5aa0 rcu/tree: Make struct kernel_param_ops definitions const
These should be const, so make it so.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-19 19:37:17 -08:00
Joel Fernandes (Google) 9f866dac94 rcu/tree: Add a warning if CPU being onlined did not report QS already
Currently, rcu_cpu_starting() checks to see if the RCU core expects a
quiescent state from the incoming CPU.  However, the current interaction
between RCU quiescent-state reporting and CPU-hotplug operations should
mean that the incoming CPU never needs to report a quiescent state.
First, the outgoing CPU reports a quiescent state if needed.  Second,
the race where the CPU is leaving just as RCU is initializing a new
grace period is handled by an explicit check for this condition.  Third,
the CPU's leaf rcu_node structure's ->lock serializes these checks.

This means that if rcu_cpu_starting() ever feels the need to report
a quiescent state, then there is a bug somewhere in the CPU hotplug
code or the RCU grace-period handling code.  This commit therefore
adds a WARN_ON_ONCE() to bring that bug to everyone's attention.

Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Suggested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-19 19:37:16 -08:00
Neeraj Upadhyay ed73860cec rcu: Fix single-CPU check in rcu_blocking_is_gp()
Currently, for CONFIG_PREEMPTION=n kernels, rcu_blocking_is_gp() uses
num_online_cpus() to determine whether there is only one CPU online.  When
there is only a single CPU online, the simple fact that synchronize_rcu()
could be legally called implies that a full grace period has elapsed.
Therefore, in the single-CPU case, synchronize_rcu() simply returns
immediately.  Unfortunately, num_online_cpus() is unreliable while a
CPU-hotplug operation is transitioning to or from single-CPU operation
because:

1.	num_online_cpus() uses atomic_read(&__num_online_cpus) to
	locklessly sample the number of online CPUs.  The hotplug locks
	are not held, which means that an incoming CPU can concurrently
	update this count.  This in turn means that an RCU read-side
	critical section on the incoming CPU might observe updates
	prior to the grace period, but also that this critical section
	might extend beyond the end of the optimized synchronize_rcu().
	This breaks RCU's fundamental guarantee.

2.	In addition, num_online_cpus() does no ordering, thus providing
	another way that RCU's fundamental guarantee can be broken by
	the current code.

3.	The most probable failure mode happens on outgoing CPUs.
	The outgoing CPU updates the count of online CPUs in the
	CPUHP_TEARDOWN_CPU stop-machine handler, which is fine in
	and of itself due to preemption being disabled at the call
	to num_online_cpus().  Unfortunately, after that stop-machine
	handler returns, the CPU takes one last trip through the
	scheduler (which has RCU readers) and, after the resulting
	context switch, one final dive into the idle loop.  During this
	time, RCU needs to keep track of two CPUs, but num_online_cpus()
	will say that there is only one, which in turn means that the
	surviving CPU will incorrectly ignore the outgoing CPU's RCU
	read-side critical sections.

This problem is illustrated by the following litmus test in which P0()
corresponds to synchronize_rcu() and P1() corresponds to the incoming CPU.
The herd7 tool confirms that the "exists" clause can be satisfied,
thus demonstrating that this breakage can happen according to the Linux
kernel memory model.

   {
     int x = 0;
     atomic_t numonline = ATOMIC_INIT(1);
   }

   P0(int *x, atomic_t *numonline)
   {
     int r0;
     WRITE_ONCE(*x, 1);
     r0 = atomic_read(numonline);
     if (r0 == 1) {
       smp_mb();
     } else {
       synchronize_rcu();
     }
     WRITE_ONCE(*x, 2);
   }

   P1(int *x, atomic_t *numonline)
   {
     int r0; int r1;

     atomic_inc(numonline);
     smp_mb();
     rcu_read_lock();
     r0 = READ_ONCE(*x);
     smp_rmb();
     r1 = READ_ONCE(*x);
     rcu_read_unlock();
   }

   locations [x;numonline;]

   exists (1:r0=0 /\ 1:r1=2)

It is important to note that these problems arise only when the system
is transitioning to or from single-CPU operation.

One solution would be to hold the CPU-hotplug locks while sampling
num_online_cpus(), which was in fact the intent of the (redundant)
preempt_disable() and preempt_enable() surrounding this call to
num_online_cpus().  Actually blocking CPU hotplug would not only result
in excessive overhead, but would also unnecessarily impede CPU-hotplug
operations.

This commit therefore follows long-standing RCU tradition by maintaining
a separate RCU-specific set of CPU-hotplug books.

This separate set of books is implemented by a new ->n_online_cpus field
in the rcu_state structure that maintains RCU's count of the online CPUs.
This count is incremented early in the CPU-online process, so that
the critical transition away from single-CPU operation will occur when
there is only a single CPU.  Similarly for the critical transition to
single-CPU operation, the counter is decremented late in the CPU-offline
process, again while there is only a single CPU.  Because there is only
ever a single CPU when the ->n_online_cpus field undergoes the critical
1->2 and 2->1 transitions, full memory ordering and mutual exclusion is
provided implicitly and, better yet, for free.

In the case where the CPU is coming online, nothing will happen until
the current CPU helps it come online.  Therefore, the new CPU will see
all accesses prior to the optimized grace period, which means that RCU
does not need to further delay this new CPU.  In the case where the CPU
is going offline, the outgoing CPU is totally out of the picture before
the optimized grace period starts, which means that this outgoing CPU
cannot see any of the accesses following that grace period.  Again,
RCU needs no further interaction with the outgoing CPU.

This does mean that synchronize_rcu() will unnecessarily do a few grace
periods the hard way just before the second CPU comes online and just
after the second-to-last CPU goes offline, but it is not worth optimizing
this uncommon case.

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-19 19:37:16 -08:00
Frederic Weisbecker e3771c850d rcu: Implement rcu_segcblist_is_offloaded() config dependent
This commit simplifies the use of the rcu_segcblist_is_offloaded() API so
that its callers no longer need to check the RCU_NOCB_CPU Kconfig option.
Note that rcu_segcblist_is_offloaded() is defined in the header file,
which means that the generated code should be just as efficient as before.

Suggested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-19 19:37:16 -08:00
Peter Zijlstra 6dbce04d84 rcu: Allow rcu_irq_enter_check_tick() from NMI
Eugenio managed to tickle #PF from NMI context which resulted in
hitting a WARN in RCU through irqentry_enter() ->
__rcu_irq_enter_check_tick().

However, this situation is perfectly sane and does not warrant an
WARN. The #PF will (necessarily) be atomic and not require messing
with the tick state, so early return is correct.  This commit
therefore removes the WARN.

Fixes: aaf2bc50df ("rcu: Abstract out rcu_irq_enter_check_tick() from rcu_nmi_enter()")
Reported-by: "Eugenio Pérez" <eupm90@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-19 19:34:17 -08:00
Linus Torvalds 88b31f07f3 arm64 fixes for -rc4
- Spectre/Meltdown safelisting for some Qualcomm KRYO cores
 
 - Fix RCU splat when failing to online a CPU due to a feature mismatch
 
 - Fix a recently introduced sparse warning in kexec()
 
 - Fix handling of CPU erratum 1418040 for late CPUs
 
 - Ensure hot-added memory falls within linear-mapped region
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAl+ubogQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNPD7B/9i5ao44AEJwjz0a68S/jD7kUD7i3xVkCNN
 Y8i/i9mx44IAcf8pmyQh3ngaFlJuF2C6oC/SQFiDbmVeGeZXLXvXV7uGAqXG5Xjm
 O2Svgr1ry176JWpsB7MNnZwzAatQffdkDjbjQCcUnUIKYcLvge8H2fICljujGcfQ
 094vNmT9VerTWRbWDti3Ck/ug+sanVHuzk5BWdKx3jamjeTqo+sBZK/wgBr6UoYQ
 mT3BFX42kLIGg+AzwXRDPlzkJymjYgQDbSwGsvny8qKdOEJbAUwWXYZ5sTs9J/gU
 E9PT3VJI7BYtTd1uPEWkD645U3arfx3Pf2JcJlbkEp86qx4CUF9s
 =T6k4
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:

 - Spectre/Meltdown safelisting for some Qualcomm KRYO cores

 - Fix RCU splat when failing to online a CPU due to a feature mismatch

 - Fix a recently introduced sparse warning in kexec()

 - Fix handling of CPU erratum 1418040 for late CPUs

 - Ensure hot-added memory falls within linear-mapped region

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: cpu_errata: Apply Erratum 845719 to KRYO2XX Silver
  arm64: proton-pack: Add KRYO2XX silver CPUs to spectre-v2 safe-list
  arm64: kpti: Add KRYO2XX gold/silver CPU cores to kpti safelist
  arm64: Add MIDR value for KRYO2XX gold/silver CPU cores
  arm64/mm: Validate hotplug range before creating linear mapping
  arm64: smp: Tell RCU about CPUs that fail to come online
  arm64: psci: Avoid printing in cpu_psci_cpu_die()
  arm64: kexec_file: Fix sparse warning
  arm64: errata: Fix handling of 1418040 with late CPU onlining
2020-11-13 09:23:10 -08:00