Commit Graph

535 Commits

Author SHA1 Message Date
Rado Vrbovsky 16bf54f108 Merge: Fix RCUC latency issue
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5165

JIRA: https://issues.redhat.com/browse/RHEL-20288

Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

Signed-off-by: Leonardo Bras <leobras@redhat.com>

Approved-by: Wander Lairson Costa <wander@redhat.com>
Approved-by: Waiman Long <longman@redhat.com>
Approved-by: Marcelo Tosatti <mtosatti@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2024-10-25 16:26:53 +00:00
Leonardo Bras 483ecb54c6 rcu: Add rcutree.nohz_full_patience_delay to reduce nohz_full OS jitter
JIRA: https://issues.redhat.com/browse/RHEL-20288

commit 68d124b0999919015e6d23008eafea106ec6bb40
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   2024-05-08 20:11:58 -0700

    rcu: Add rcutree.nohz_full_patience_delay to reduce nohz_full OS jitter

    If a CPU is running either a userspace application or a guest OS in
    nohz_full mode, it is possible for a system call to occur just as an
    RCU grace period is starting.  If that CPU also has the scheduling-clock
    tick enabled for any reason (such as a second runnable task), and if the
    system was booted with rcutree.use_softirq=0, then RCU can add insult to
    injury by awakening that CPU's rcuc kthread, resulting in yet another
    task and yet more OS jitter due to switching to that task, running it,
    and switching back.

    In addition, in the common case where that system call is not of
    excessively long duration, awakening the rcuc task is pointless.
    This pointlessness is due to the fact that the CPU will enter an extended
    quiescent state upon returning to the userspace application or guest OS.
    In this case, the rcuc kthread cannot do anything that the main RCU
    grace-period kthread cannot do on its behalf, at least if it is given
    a few additional milliseconds (for example, given the time duration
    specified by rcutree.jiffies_till_first_fqs, give or take scheduling
    delays).

    This commit therefore adds a rcutree.nohz_full_patience_delay kernel
    boot parameter that specifies the grace period age (in milliseconds,
    rounded to jiffies) before which RCU will refrain from awakening the
    rcuc kthread.  Preliminary experimentation suggests a value of 1000,
    that is, one second.  Increasing rcutree.nohz_full_patience_delay will
    increase grace-period latency and in turn increase memory footprint,
    so systems with constrained memory might choose a smaller value.
    Systems with less-aggressive OS-jitter requirements might choose the
    default value of zero, which keeps the traditional immediate-wakeup
    behavior, thus avoiding increases in grace-period latency.

    [ paulmck: Apply Leonardo Bras feedback.  ]

    Link: https://lore.kernel.org/all/20240328171949.743211-1-leobras@redhat.com/

    Reported-by: Leonardo Bras <leobras@redhat.com>
    Suggested-by: Leonardo Bras <leobras@redhat.com>
    Suggested-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Leonardo Bras <leobras@redhat.com>

Signed-off-by: Leonardo Bras <leobras@redhat.com>
2024-10-08 18:52:03 -03:00
Waiman Long 7804cac54b rcu: Make hotplug operations track GP state, not flags
JIRA: https://issues.redhat.com/browse/RHEL-55557

commit ae2b217ab542d0db0ca1a6de4f442201a1982f00
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Fri, 8 Mar 2024 11:15:01 -0800

    rcu: Make hotplug operations track GP state, not flags

    Currently, there are rcu_data structure fields named ->rcu_onl_gp_seq
    and ->rcu_ofl_gp_seq that track the rcu_state.gp_flags field at the
    time of the corresponding CPU's last online or offline operation,
    respectively.  However, this information is not particularly useful.
    It would be better to instead track the grace period state kept
    in rcu_state.gp_state.  This would also be consistent with the
    initialization in rcu_boot_init_percpu_data(), which is to RCU_GP_CLEANED
    (an rcu_state.gp_state value), and also with the diagnostics in
    rcu_implicit_dynticks_qs(), whose format is consistent with an integer,
    not a bitmask.

    This commit therefore makes this change and changes the names to
    ->rcu_onl_gp_flags and ->rcu_ofl_gp_flags, respectively.

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2024-08-26 10:57:32 -04:00
Waiman Long 2f23c68f4a rcu/exp: Handle parallel exp gp kworkers affinity
JIRA: https://issues.redhat.com/browse/RHEL-55557

commit b67cffcbbf9dc759d95d330a5af5d1480af2b1f1
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Fri, 12 Jan 2024 16:46:20 +0100

    rcu/exp: Handle parallel exp gp kworkers affinity

    Affine the parallel expedited gp kworkers to their respective RCU node
    in order to make them close to the cache their are playing with.

    This reuses the boost kthreads machinery that probe into CPU hotplug
    operations such that the kthreads become/stay affine to their respective
    node as soon/long as they contain online CPUs. Otherwise and if the
    current CPU going down was the last online on the leaf node, the related
    kthread is affine to the housekeeping CPUs.

    In the long run, this affinity VS CPU hotplug operation game should
    probably be implemented at the generic kthread level.

    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
    [boqun: s/* rcu_boost_task/*rcu_boost_task as reported by checkpatch]
    Signed-off-by: Boqun Feng <boqun.feng@gmail.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2024-08-26 10:57:14 -04:00
Waiman Long d5ad8ad294 rcu/exp: Make parallel exp gp kworker per rcu node
JIRA: https://issues.redhat.com/browse/RHEL-55557

commit 8e5e621566485a3e160c0d8bfba206cb1d6b980d
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Fri, 12 Jan 2024 16:46:19 +0100

    rcu/exp: Make parallel exp gp kworker per rcu node

    When CONFIG_RCU_EXP_KTHREAD=n, the expedited grace period per node
    initialization is performed in parallel via workqueues (one work per
    node).

    However in CONFIG_RCU_EXP_KTHREAD=y, this per node initialization is
    performed by a single kworker serializing each node initialization (one
    work for all nodes).

    The second part is certainly less scalable and efficient beyond a single
    leaf node.

    To improve this, expand this single kworker into per-node kworkers. This
    new layout is eventually intended to remove the workqueues based
    implementation since it will essentially now become duplicate code.

    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Boqun Feng <boqun.feng@gmail.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2024-08-26 10:57:13 -04:00
Waiman Long 119acfe64c rcu: s/boost_kthread_mutex/kthread_mutex
JIRA: https://issues.redhat.com/browse/RHEL-55557

commit 7836b270607676ed1c0c6a4a840a2ede9437a6a1
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Fri, 12 Jan 2024 16:46:17 +0100

    rcu: s/boost_kthread_mutex/kthread_mutex

    This mutex is currently protecting per node boost kthreads creation and
    affinity setting across CPU hotplug operations.

    Since the expedited kworkers will soon be split per node as well, they
    will be subject to the same concurrency constraints against hotplug.

    Therefore their creation and affinity tuning operations will be grouped
    with those of boost kthreads and then rely on the same mutex.

    To prepare for that, generalize its name.

    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Boqun Feng <boqun.feng@gmail.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2024-08-26 10:57:12 -04:00
Waiman Long dafc57ac84 rcu: Mark additional concurrent load from ->cpu_no_qs.b.exp
JIRA: https://issues.redhat.com/browse/RHEL-34076

commit 9146eb25495ea8bfb5010192e61e3ed5805ce9ef
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Fri, 7 Apr 2023 16:05:38 -0700

    rcu: Mark additional concurrent load from ->cpu_no_qs.b.exp

    The per-CPU rcu_data structure's ->cpu_no_qs.b.exp field is updated
    only on the instance corresponding to the current CPU, but can be read
    more widely.  Unmarked accesses are OK from the corresponding CPU, but
    only if interrupts are disabled, given that interrupt handlers can and
    do modify this field.

    Unfortunately, although the load from rcu_preempt_deferred_qs() is always
    carried out from the corresponding CPU, interrupts are not necessarily
    disabled.  This commit therefore upgrades this load to READ_ONCE.

    Similarly, the diagnostic access from synchronize_rcu_expedited_wait()
    might run with interrupts disabled and from some other CPU.  This commit
    therefore marks this load with data_race().

    Finally, the C-language access in rcu_preempt_ctxt_queue() is OK as
    is because interrupts are disabled and this load is always from the
    corresponding CPU.  This commit adds a comment giving the rationale for
    this access being safe.

    This data race was reported by KCSAN.  Not appropriate for backporting
    due to failure being unlikely.

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2024-05-31 10:56:12 -04:00
Waiman Long 909a68d328 rcu: Synchronize ->qsmaskinitnext in rcu_boost_kthread_setaffinity()
JIRA: https://issues.redhat.com/browse/RHEL-34076

commit 6343402ac35dd534291a6c82924a4f09cf6cd1e5
Author: Pingfan Liu <kernelfans@gmail.com>
Date:   Tue, 6 Sep 2022 11:36:42 -0700

    rcu: Synchronize ->qsmaskinitnext in rcu_boost_kthread_setaffinity()

    Once either rcutree_online_cpu() or rcutree_dead_cpu() is invoked
    concurrently, the following rcu_boost_kthread_setaffinity() race can
    occur:

            CPU 1                               CPU2
    mask = rcu_rnp_online_cpus(rnp);
    ...

                                       mask = rcu_rnp_online_cpus(rnp);
                                       ...
                                       set_cpus_allowed_ptr(t, cm);

    set_cpus_allowed_ptr(t, cm);

    This results in CPU2's update being overwritten by that of CPU1, and
    thus the possibility of ->boost_kthread_task continuing to run on a
    to-be-offlined CPU.

    This commit therefore eliminates this race by relying on the pre-existing
    acquisition of ->boost_kthread_mutex to serialize the full process of
    changing the affinity of ->boost_kthread_task.

    Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
    Cc: David Woodhouse <dwmw@amazon.co.uk>
    Cc: Frederic Weisbecker <frederic@kernel.org>
    Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Cc: Josh Triplett <josh@joshtriplett.org>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Cc: Lai Jiangshan <jiangshanlai@gmail.com>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2024-05-31 10:56:12 -04:00
Waiman Long 9cc21271ea rcu-tasks: Make RCU Tasks Trace check for userspace execution
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 528262f50274079740b53e29bcaaabf219aa7417
Author: Zqiang <qiang1.zhang@intel.com>
Date:   Tue, 19 Jul 2022 12:39:00 +0800

    rcu-tasks: Make RCU Tasks Trace check for userspace execution

    Userspace execution is a valid quiescent state for RCU Tasks Trace,
    but the scheduling-clock interrupt does not currently report such
    quiescent states.

    Of course, the scheduling-clock interrupt is not strictly speaking
    userspace execution.  However, the only way that this code is not
    in a quiescent state is if something invoked rcu_read_lock_trace(),
    and that would be reflected in the ->trc_reader_nesting field in
    the task_struct structure.  Furthermore, this field is checked by
    rcu_tasks_trace_qs(), which is invoked by rcu_tasks_qs() which is in
    turn invoked by rcu_note_voluntary_context_switch() in kernels building
    at least one of the RCU Tasks flavors.  It is therefore safe to invoke
    rcu_tasks_trace_qs() from the rcu_sched_clock_irq().

    But rcu_tasks_qs() also invokes rcu_tasks_classic_qs() for RCU
    Tasks, which lacks the read-side markers provided by RCU Tasks Trace.
    This raises the possibility that an RCU Tasks grace period could start
    after the interrupt from userspace execution, but before the call to
    rcu_sched_clock_irq().  However, it turns out that this is safe because
    the RCU Tasks grace period waits for an RCU grace period, which will
    wait for the entire scheduling-clock interrupt handler, including any
    RCU Tasks read-side critical section that this handler might contain.

    This commit therefore updates the rcu_sched_clock_irq() function's
    check for usermode execution and its call to rcu_tasks_classic_qs()
    to instead check for both usermode execution and interrupt from idle,
    and to instead call rcu_note_voluntary_context_switch().  This
    consolidates code and provides more faster RCU Tasks Trace
    reporting of quiescent states in kernels that do scheduling-clock
    interrupts for userspace execution.

    [ paulmck: Consolidate checks into rcu_sched_clock_irq(). ]

    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:48:15 -04:00
Waiman Long b6ffe74fc1 rcu: Exclude outgoing CPU when it is the last to leave
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 7634b1eaa0cd135d5eedadb04ad3c91b1ecf28a9
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Wed, 24 Aug 2022 14:46:56 -0700

    rcu: Exclude outgoing CPU when it is the last to leave

    The rcu_boost_kthread_setaffinity() function removes the outgoing CPU
    from the set_cpus_allowed() mask for the corresponding leaf rcu_node
    structure's rcub priority-boosting kthread.  Except that if the outgoing
    CPU will leave that structure without any online CPUs, the mask is set
    to the housekeeping CPU mask from housekeeping_cpumask().  Which is fine
    unless the outgoing CPU happens to be a housekeeping CPU.

    This commit therefore removes the outgoing CPU from the housekeeping mask.
    This would of course be problematic if the outgoing CPU was the last
    online housekeeping CPU, but in that case you are in a world of hurt
    anyway.  If someone comes up with a valid use case for a system needing
    all the housekeeping CPUs to be offline, further adjustments can be made.

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:47:59 -04:00
Waiman Long 67a20d4628 rcu: Avoid triggering strict-GP irq-work when RCU is idle
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 621189a1fe93cb2b34d62c5cdb9e258bca044813
Author: Zqiang <qiang1.zhang@intel.com>
Date:   Mon, 8 Aug 2022 10:26:26 +0800

    rcu: Avoid triggering strict-GP irq-work when RCU is idle

    Kernels built with PREEMPT_RCU=y and RCU_STRICT_GRACE_PERIOD=y trigger
    irq-work from rcu_read_unlock(), and the resulting irq-work handler
    invokes rcu_preempt_deferred_qs_handle().  The point of this triggering
    is to force grace periods to end quickly in order to give tools like KASAN
    a better chance of detecting RCU usage bugs such as leaking RCU-protected
    pointers out of an RCU read-side critical section.

    However, this irq-work triggering is unconditional.  This works, but
    there is no point in doing this irq-work unless the current grace period
    is waiting on the running CPU or task, which is not the common case.
    After all, in the common case there are many rcu_read_unlock() calls
    per CPU per grace period.

    This commit therefore triggers the irq-work only when the current grace
    period is waiting on the running CPU or task.

    This change was tested as follows on a four-CPU system:

            echo rcu_preempt_deferred_qs_handler > /sys/kernel/debug/tracing/set_ftrace_filter
            echo 1 > /sys/kernel/debug/tracing/function_profile_enabled
            insmod rcutorture.ko
            sleep 20
            rmmod rcutorture.ko
            echo 0 > /sys/kernel/debug/tracing/function_profile_enabled
            echo > /sys/kernel/debug/tracing/set_ftrace_filter

    This procedure produces results in this per-CPU set of files:

            /sys/kernel/debug/tracing/trace_stat/function*

    Sample output from one of these files is as follows:

      Function                               Hit    Time            Avg             s^2
      --------                               ---    ----            ---             ---
      rcu_preempt_deferred_qs_handle      838746    182650.3 us     0.217 us        0.004 us

    The baseline sum of the "Hit" values (the number of calls to this
    function) was 3,319,015.  With this commit, that sum was 1,140,359,
    for a 2.9x reduction.  The worst-case variance across the CPUs was less
    than 25%, so this large effect size is statistically significant.

    The raw data is available in the Link: URL.

    Link: https://lore.kernel.org/all/20220808022626.12825-1-qiang1.zhang@intel.com/
    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:47:58 -04:00
Waiman Long dc93a1a75e rcu: Document reason for rcu_all_qs() call to preempt_disable()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 089254fd386eb6800dd7d7863f12a04ada0c35fa
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Wed, 3 Aug 2022 08:48:12 -0700

    rcu: Document reason for rcu_all_qs() call to preempt_disable()

    Given that rcu_all_qs() is in non-preemptible kernels, why on earth should
    it invoke preempt_disable()?  This commit adds the reason, which is to
    work nicely with debugging enabled in CONFIG_PREEMPT_COUNT=y kernels.

    Reported-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Reported-by: Boqun Feng <boqun.feng@gmail.com>
    Reported-by: Frederic Weisbecker <frederic@kernel.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:47:56 -04:00
Waiman Long 84cca2f288 rcu: Update rcu_preempt_deferred_qs() comments for !PREEMPT kernels
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit bca4fa8cb0f4c096b515952f64e560fd784a0514
Author: Zqiang <qiang1.zhang@intel.com>
Date:   Mon, 20 Jun 2022 14:42:24 +0800

    rcu: Update rcu_preempt_deferred_qs() comments for !PREEMPT kernels

    In non-premptible kernels, tasks never do context switches within
    RCU read-side critical sections.  Therefore, in such kernels, each
    leaf rcu_node structure's ->blkd_tasks list will always be empty.
    The comment on the non-preemptible version of rcu_preempt_deferred_qs()
    confuses this point, so this commit therefore fixes it.

    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:47:55 -04:00
Waiman Long 58978c68c8 rcu: Fix rcu_read_unlock_strict() strict QS reporting
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 6d60ea03ac2d3dcf6ddee6b45aa7213d8b0461c5
Author: Zqiang <qiang1.zhang@intel.com>
Date:   Thu, 16 Jun 2022 21:53:47 +0800

    rcu: Fix rcu_read_unlock_strict() strict QS reporting

    Kernels built with CONFIG_PREEMPT=n and CONFIG_RCU_STRICT_GRACE_PERIOD=y
    report the quiescent state directly from the outermost rcu_read_unlock().
    However, the current CPU's rcu_data structure's ->cpu_no_qs.b.norm
    might still be set, in which case rcu_report_qs_rdp() will exit early,
    thus failing to report quiescent state.

    This commit therefore causes rcu_read_unlock_strict() to clear
    CPU's rcu_data structure's ->cpu_no_qs.b.norm field before invoking
    rcu_report_qs_rdp().

    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:47:55 -04:00
Waiman Long 3cd6c37180 rcu: Add nocb_cb_kthread check to rcu_is_callbacks_kthread()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 5103850654fdc651f0a7076ac753b958f018bb85
Author: Zqiang <qiang1.zhang@intel.com>
Date:   Fri, 29 Apr 2022 20:42:22 +0800

    rcu: Add nocb_cb_kthread check to rcu_is_callbacks_kthread()

    Callbacks are invoked in RCU kthreads when calbacks are offloaded
    (rcu_nocbs boot parameter) or when RCU's softirq handler has been
    offloaded to rcuc kthreads (use_softirq==0).  The current code allows
    for the rcu_nocbs case but not the use_softirq case.  This commit adds
    support for the use_softirq case.

    Reported-by: kernel test robot <lkp@intel.com>
    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:22 -04:00
Waiman Long 64478f9fce rcu: Immediately boost preempted readers for strict grace periods
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 70a82c3c55c8665d3996dcb9968adcf24d52bbc4
Author: Zqiang <qiang1.zhang@intel.com>
Date:   Fri, 13 May 2022 08:42:55 +0800

    rcu: Immediately boost preempted readers for strict grace periods

    The intent of the CONFIG_RCU_STRICT_GRACE_PERIOD Konfig option is to
    cause normal grace periods to complete quickly in order to better catch
    errors resulting from improperly leaking pointers from RCU read-side
    critical sections.  However, kernels built with this option enabled still
    wait for some hundreds of milliseconds before boosting RCU readers that
    have been preempted within their current critical section.  The value
    of this delay is set by the CONFIG_RCU_BOOST_DELAY Kconfig option,
    which defaults to 500 milliseconds.

    This commit therefore causes kernels build with strict grace periods
    to ignore CONFIG_RCU_BOOST_DELAY.  This causes rcu_initiate_boost()
    to start boosting immediately after all CPUs on a given leaf rcu_node
    structure have passed through their quiescent states.

    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:20 -04:00
Waiman Long 845a0ce6d6 rcu: Avoid tracing a few functions executed in stop machine
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516
Conflicts: 2 merge conflicts due to upstream merge conflict. Apply the
	   same resolutions as in upstream merge commit 34bc7b454dc3
	   ("Merge branch 'ctxt.2022.07.05a' into HEAD").

commit 48f8070f5dd8e13148ae4647780a452d53c457a2
Author: Patrick Wang <patrick.wang.shcn@gmail.com>
Date:   Tue, 26 Apr 2022 18:45:02 +0800

    rcu: Avoid tracing a few functions executed in stop machine

    Stop-machine recently started calling additional functions while waiting:

    ----------------------------------------------------------------
    Former stop machine wait loop:
    do {
        cpu_relax(); => macro
        ...
    } while (curstate != STOPMACHINE_EXIT);
    -----------------------------------------------------------------
    Current stop machine wait loop:
    do {
        stop_machine_yield(cpumask); => function (notraced)
        ...
        touch_nmi_watchdog(); => function (notraced, inside calls also notraced)
        ...
        rcu_momentary_dyntick_idle(); => function (notraced, inside calls traced)
    } while (curstate != MULTI_STOP_EXIT);
    ------------------------------------------------------------------

    These functions (and the functions that they call) must be marked
    notrace to prevent them from being updated while they are executing.
    The consequences of failing to mark these functions can be severe:

      rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
      rcu:  1-...!: (0 ticks this GP) idle=14f/1/0x4000000000000000 softirq=3397/3397 fqs=0
      rcu:  3-...!: (0 ticks this GP) idle=ee9/1/0x4000000000000000 softirq=5168/5168 fqs=0
            (detected by 0, t=8137 jiffies, g=5889, q=2 ncpus=4)
      Task dump for CPU 1:
      task:migration/1     state:R  running task     stack:    0 pid:   19 ppid:     2 flags:0x00000000
      Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
      Call Trace:
      Task dump for CPU 3:
      task:migration/3     state:R  running task     stack:    0 pid:   29 ppid:     2 flags:0x00000000
      Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
      Call Trace:
      rcu: rcu_preempt kthread timer wakeup didn't happen for 8136 jiffies! g5889 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
      rcu:  Possible timer handling issue on cpu=2 timer-softirq=594
      rcu: rcu_preempt kthread starved for 8137 jiffies! g5889 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2
      rcu:  Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
      rcu: RCU grace-period kthread stack dump:
      task:rcu_preempt     state:I stack:    0 pid:   14 ppid:     2 flags:0x00000000
      Call Trace:
        schedule+0x56/0xc2
        schedule_timeout+0x82/0x184
        rcu_gp_fqs_loop+0x19a/0x318
        rcu_gp_kthread+0x11a/0x140
        kthread+0xee/0x118
        ret_from_exception+0x0/0x14
      rcu: Stack dump where RCU GP kthread last ran:
      Task dump for CPU 2:
      task:migration/2     state:R  running task     stack:    0 pid:   24 ppid:     2 flags:0x00000000
      Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
      Call Trace:

    This commit therefore marks these functions notrace:
     rcu_preempt_deferred_qs()
     rcu_preempt_need_deferred_qs()
     rcu_preempt_deferred_qs_irqrestore()

    [ paulmck: Apply feedback from Neeraj Upadhyay. ]

    Signed-off-by: Patrick Wang <patrick.wang.shcn@gmail.com>
    Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:19 -04:00
Waiman Long 5b925bf582 rcu/context-tracking: Move RCU-dynticks internal functions to context_tracking
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 172114552701b85d5c3b1a089a73ee85d0d7786b
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Wed, 8 Jun 2022 16:40:33 +0200

    rcu/context-tracking: Move RCU-dynticks internal functions to context_tracking

    Move the core RCU eqs/dynticks functions to context tracking so that
    we can later merge all that code within context tracking.

    Acked-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
    Cc: Marcelo Tosatti <mtosatti@redhat.com>
    Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
    Cc: Yu Liao <liaoyu15@huawei.com>
    Cc: Phil Auld <pauld@redhat.com>
    Cc: Paul Gortmaker<paul.gortmaker@windriver.com>
    Cc: Alex Belits <abelits@marvell.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
    Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:18 -04:00
Waiman Long fe0d176f60 rcu-tasks: Make rcu_note_context_switch() unconditionally call rcu_tasks_qs()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 6a694411977a6d57ff76a896a745c2f717372dac
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Tue, 24 May 2022 20:33:17 -0700

    rcu-tasks: Make rcu_note_context_switch() unconditionally call rcu_tasks_qs()

    This commit makes rcu_note_context_switch() unconditionally invoke the
    rcu_tasks_qs() function, as opposed to doing so only when RCU (as opposed
    to RCU Tasks Trace) urgently needs a grace period to end.

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Cc: Eric Dumazet <edumazet@google.com>
    Cc: Alexei Starovoitov <ast@kernel.org>
    Cc: Andrii Nakryiko <andrii@kernel.org>
    Cc: Martin KaFai Lau <kafai@fb.com>
    Cc: KP Singh <kpsingh@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:08 -04:00
Waiman Long bc9106a9da rcu: Use IRQ_WORK_INIT_HARD() to avoid rcu_read_unlock() hangs
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491

commit f596e2ce1c0f250bb3ecc179f611be37e862635f
Author: Zqiang <qiang1.zhang@intel.com>
Date:   Mon, 4 Apr 2022 07:59:32 +0800

    rcu: Use IRQ_WORK_INIT_HARD() to avoid rcu_read_unlock() hangs

    When booting kernels built with both CONFIG_RCU_STRICT_GRACE_PERIOD=y
    and CONFIG_PREEMPT_RT=y, the rcu_read_unlock_special() function's
    invocation of irq_work_queue_on() the init_irq_work() causes the
    rcu_preempt_deferred_qs_handler() function to work execute in SCHED_FIFO
    irq_work kthreads.  Because rcu_read_unlock_special() is invoked on each
    rcu_read_unlock() in such kernels, the amount of work just keeps piling
    up, resulting in a boot-time hang.

    This commit therefore avoids this hang by using IRQ_WORK_INIT_HARD()
    instead of init_irq_work(), but only in kernels built with both
    CONFIG_PREEMPT_RT=y and CONFIG_RCU_STRICT_GRACE_PERIOD=y.

    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-08-30 17:22:14 -04:00
Waiman Long 9a95f382ca rcu: Check for successful spawn of ->boost_kthread_task
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491

commit 88ca472f80604c070526eb58b977ea0a9c3c2e1f
Author: Zqiang <qiang1.zhang@intel.com>
Date:   Thu, 24 Mar 2022 19:15:15 +0800

    rcu: Check for successful spawn of ->boost_kthread_task

    For the spawning of the priority-boost kthreads can fail, improbable
    though this might seem.  This commit therefore refrains from attemoting
    to initiate RCU priority boosting when The ->boost_kthread_task pointer
    is NULL.

    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-08-30 17:22:13 -04:00
Waiman Long f0eaee2a3d rcu: Fix rcu_preempt_deferred_qs_irqrestore() strict QS reporting
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491

commit 90d2efe7bdbde5371b6122174af0718843f805c6
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Wed, 16 Feb 2022 09:54:56 -0800

    rcu: Fix rcu_preempt_deferred_qs_irqrestore() strict QS reporting

    Suppose we have a kernel built with both CONFIG_RCU_STRICT_GRACE_PERIOD=y
    and CONFIG_PREEMPT=y.  Suppose further that an RCU reader from which RCU
    core needs a quiescent state ends in rcu_preempt_deferred_qs_irqrestore().
    This function will then invoke rcu_report_qs_rdp() in order to immediately
    report that quiescent state.  Unfortunately, it will not have cleared
    that reader's CPU's rcu_data structure's ->cpu_no_qs.b.norm field.
    As a result, rcu_report_qs_rdp() will take an early exit because it
    will believe that this CPU has not yet encountered a quiescent state,
    and there will be no reporting of the current quiescent state.

    This commit therefore causes rcu_preempt_deferred_qs_irqrestore() to
    clear the ->cpu_no_qs.b.norm field before invoking rcu_report_qs_rdp().

    Kudos to Boqun Feng and Neeraj Upadhyay for helping with analysis of
    this issue!

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-08-30 17:22:10 -04:00
Waiman Long b19ed13b34 rcu: Initialize boost kthread only for boot node prior SMP initialization
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491

commit 3352911fa9b47a90165e5c6fed440048c55146d1
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Wed, 16 Feb 2022 16:42:07 +0100

    rcu: Initialize boost kthread only for boot node prior SMP initialization

    The rcu_spawn_gp_kthread() function is called as an early initcall,
    which means that SMP initialization hasn't happened yet and only the
    boot CPU is online.  Therefore, create only the boost kthread for the
    leaf node of the boot CPU.

    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-08-30 17:22:01 -04:00
Waiman Long a9408fae13 rcu: Add per-CPU rcuc task dumps to RCU CPU stall warnings
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713

commit c9515875850fefcc79492c5189fe8431e75ddec5
Author: Zqiang <qiang1.zhang@intel.com>
Date:   Tue, 25 Jan 2022 10:47:44 +0800

    rcu: Add per-CPU rcuc task dumps to RCU CPU stall warnings

    When the rcutree.use_softirq kernel boot parameter is set to zero, all
    RCU_SOFTIRQ processing is carried out by the per-CPU rcuc kthreads.
    If these kthreads are being starved, quiescent states will not be
    reported, which in turn means that the grace period will not end, which
    can in turn trigger RCU CPU stall warnings.  This commit therefore dumps
    stack traces of stalled CPUs' rcuc kthreads, which can help identify
    what is preventing those kthreads from running.

    Suggested-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
    Reviewed-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-05-12 08:30:04 -04:00
Waiman Long 35f8e0f336 rcu: Replace cpumask_weight with cpumask_empty where appropriate
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713
Conflicts: A merge conflict in rcu_boost_kthread_setaffinity() of
	   kernel/rcu/tree_plugin.h due to the presence of a later
	   upstream commit 04d4e665a609 ("sched/isolation: Use single
	   feature type while referring to housekeeping cpumask").

commit 6a2c1d450a6a328027280a854019c55de989e14e
Author: Yury Norov <yury.norov@gmail.com>
Date:   Sun, 23 Jan 2022 10:38:53 -0800

    rcu: Replace cpumask_weight with cpumask_empty where appropriate

    In some places, RCU code calls cpumask_weight() to check if any bit of a
    given cpumask is set. We can do it more efficiently with cpumask_empty()
    because cpumask_empty() stops traversing the cpumask as soon as it finds
    first set bit, while cpumask_weight() counts all bits unconditionally.

    Signed-off-by: Yury Norov <yury.norov@gmail.com>
    Acked-by: Frederic Weisbecker <frederic@kernel.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-05-12 08:29:02 -04:00
Waiman Long 6e2345a90d rcu: Don't deboost before reporting expedited quiescent state
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713

commit 10c535787436d62ea28156a4b91365fd89b5a432
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Fri, 21 Jan 2022 12:40:08 -0800

    rcu: Don't deboost before reporting expedited quiescent state

    Currently rcu_preempt_deferred_qs_irqrestore() releases rnp->boost_mtx
    before reporting the expedited quiescent state.  Under heavy real-time
    load, this can result in this function being preempted before the
    quiescent state is reported, which can in turn prevent the expedited grace
    period from completing.  Tim Murray reports that the resulting expedited
    grace periods can take hundreds of milliseconds and even more than one
    second, when they should normally complete in less than a millisecond.

    This was fine given that there were no particular response-time
    constraints for synchronize_rcu_expedited(), as it was designed
    for throughput rather than latency.  However, some users now need
    sub-100-millisecond response-time constratints.

    This patch therefore follows Neeraj's suggestion (seconded by Tim and
    by Uladzislau Rezki) of simply reversing the two operations.

    Reported-by: Tim Murray <timmurray@google.com>
    Reported-by: Joel Fernandes <joelaf@google.com>
    Reported-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
    Tested-by: Tim Murray <timmurray@google.com>
    Cc: Todd Kjos <tkjos@google.com>
    Cc: Sandeep Patil <sspatil@google.com>
    Cc: <stable@vger.kernel.org> # 5.4.x
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-05-12 08:26:14 -04:00
Waiman Long 5bef7666bb rcu: Remove unused rcu_state.boost
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713
Conflicts: Fuzz in rcu_spawn_one_boost_kthread() due to upstream commit
	   conflict as shown in merge commit d5578190bed3.

commit eae9f147a4b02e132187a2d88a403b9ccc28212a
Author: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Date:   Mon, 13 Dec 2021 12:32:09 +0530

    rcu: Remove unused rcu_state.boost

    Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-05-12 08:25:55 -04:00
Waiman Long 9f48f77ccc rcu: Create and use an rcu_rdp_cpu_online()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713

commit 5ae0f1b58b28b53f4ab3708ef9337a2665e79664
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Fri, 10 Dec 2021 13:44:17 -0800

    rcu: Create and use an rcu_rdp_cpu_online()

    The pattern "rdp->grpmask & rcu_rnp_online_cpus(rnp)" occurs frequently
    in RCU code in order to determine whether rdp->cpu is online from an
    RCU perspective.  This commit therefore creates an rcu_rdp_cpu_online()
    function to replace it.

    [ paulmck: Apply kernel test robot unused-variable feedback. ]

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-05-12 08:25:49 -04:00
Waiman Long ba1bfcb746 rcu: Add mutex for rcu boost kthread spawning and affinity setting
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713
Conflicts: A fuzz in rcu_boost_kthread_setaffinity() of
	   kernel/rcu/tree_plugin.h due to the presence of a later
	   ustream commit 04d4e665a609 ("sched/isolation: Use single
	   feature type while referring to housekeeping cpumask").

commit 218b957a6959a2fb5b3967fc824072bb89ac2611
Author: David Woodhouse <dwmw@amazon.co.uk>
Date:   Wed, 8 Dec 2021 23:41:53 +0000

    rcu: Add mutex for rcu boost kthread spawning and affinity setting

    As we handle parallel CPU bringup, we will need to take care to avoid
    spawning multiple boost threads, or race conditions when setting their
    affinity. Spotted by Paul McKenney.

    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-05-12 08:25:17 -04:00
Patrick Talbert d46e36b09c Merge: sched/isolation: Split housekeeping cpumask per isolation features
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/671

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065222
Depends: https://bugzilla.redhat.com/show_bug.cgi?id=2065994
Tested: Setup isolation and ran scheduler tests, checked that housekeeping
looked right (tasks offloaded from isolated cpus to HK ones etc).

Split the housekeeping flags into finer granularity in preparation
for allowing them to be configured dynamically. There should not be
much functional change.

Signed-off-by: Phil Auld <pauld@redhat.com>

Approved-by: Jiri Benc <jbenc@redhat.com>
Approved-by: Waiman Long <longman@redhat.com>
Approved-by: Prarit Bhargava <prarit@redhat.com>
Approved-by: Paolo Bonzini <bonzini@gnu.org>
Approved-by: Wander Lairson Costa <wander@redhat.com>
Approved-by: David Arcari <darcari@redhat.com>

Signed-off-by: Patrick Talbert <ptalbert@redhat.com>
2022-05-11 08:42:56 +02:00
Patrick Talbert ea38048f36 Merge: rcu: Backport upstream RCU related commits up to v5.17
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/602

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/602

This patch series backport upstream RCU and various torture tests up to
v5.17 kernel. Beside patch 10 which has a merge conflict due to upstream
merge conflict, the other patches are all applied cleanly with any issue.

Signed-off-by: Waiman Long <longman@redhat.com>
~~~
Waiman Long (112):
  torture: Apply CONFIG_KCSAN_STRICT to kvm.sh --kcsan argument
  torture: Make torture.sh print the number of files to be compressed
  rcu-nocb: Fix a couple of tree_nocb code-style nits
  rcu: Eliminate rcu_implicit_dynticks_qs() local variable rnhqp
  rcu: Eliminate rcu_implicit_dynticks_qs() local variable ruqp
  doc: Add another stall-warning root cause in stallwarn.rst
  rcu: Fix undefined Kconfig macros
  rcu: Comment rcu_gp_init() code waiting for CPU-hotplug operations
  rcu-tasks: Simplify trc_read_check_handler() atomic operations
  rcu-tasks: Add trc_inspect_reader() checks for exiting critical
    section
  rcu-tasks: Remove second argument of rcu_read_unlock_trace_special()
  rcu: Move rcu_dynticks_eqs_online() to rcu_cpu_starting()
  rcu: Simplify rcu_report_dead() call to rcu_report_exp_rdp()
  rcu: Make rcutree_dying_cpu() use its "cpu" parameter
  rcu-tasks: Wait for trc_read_check_handler() IPIs
  rcutorture: Suppressing read-exit testing is not an error
  rcu-tasks: Fix s/instruction/instructions/ typo in comment
  rcutorture: Warn on individual rcu_torture_init() error conditions
  locktorture: Warn on individual lock_torture_init() error conditions
  rcuscale: Warn on individual rcu_scale_init() error conditions
  rcutorture: Don't cpuhp_remove_state() if cpuhp_setup_state() failed
  rcu: Make rcu_normal_after_boot writable again
  rcu: Make rcu update module parameters world-readable
  rcu-tasks: Move RTGS_WAIT_CBS to beginning of rcu_tasks_kthread() loop
  rcu-tasks: Fix s/rcu_add_holdout/trc_add_holdout/ typo in comment
  rcu-tasks: Correct firstreport usage in check_all_holdout_tasks_trace
  rcu-tasks: Correct comparisons for CPU numbers in
    show_stalled_task_trace
  rcu-tasks: Clarify read side section info for rcu_tasks_rude GP
    primitives
  rcu: Fix existing exp request check in sync_sched_exp_online_cleanup()
  rcutorture: Avoid problematic critical section nesting on PREEMPT_RT
  rcu-tasks: Fix read-side primitives comment for call_rcu_tasks_trace
  rcu-tasks: Fix IPI failure handling in trc_wait_for_one_reader
  rcu: Replace ________p1 and _________p1 with __UNIQUE_ID(rcu)
  rcu-tasks: Update comments to cond_resched_tasks_rcu_qs()
  rcu: Ignore rdp.cpu_no_qs.b.exp on preemptible RCU's rcu_qs()
  rcu: Move rcu_data.cpu_no_qs.b.exp reset to rcu_export_exp_rdp()
  rcu: Remove rcu_data.exp_deferred_qs and convert to rcu_data.cpu
    no_qs.b.exp
  rcu-tasks: Don't remove tasks with pending IPIs from holdout list
  torture: Catch kvm.sh help text up with actual options
  rcutorture: Sanitize RCUTORTURE_RDR_MASK
  rcutorture: More thoroughly test nested readers
  srcu: Prevent redundant __srcu_read_unlock() wakeup
  rcutorture: Suppress pi-lock-across read-unlock testing for Tiny SRCU
  doc: Remove obsolete kernel-per-CPU-kthreads RCU_FAST_NO_HZ advice
  rcu: in_irq() cleanup
  rcu: Always inline rcu_dynticks_task*_{enter,exit}()
  rcu: Mark sync_sched_exp_online_cleanup() ->cpu_no_qs.b.exp load
  rcu: Prevent expedited GP from enabling tick on offline CPU
  rcu: Make idle entry report expedited quiescent states
  rcu/nocb: Make local rcu_nocb_lock_irqsave() safe against concurrent
    deoffloading
  rcu/nocb: Prepare state machine for a new step
  rcu/nocb: Invoke rcu_core() at the start of deoffloading
  rcu/nocb: Make rcu_core() callbacks acceleration preempt-safe
  rcu/nocb: Make rcu_core() callbacks acceleration (de-)offloading safe
  rcu/nocb: Check a stable offloaded state to manipulate
    qlen_last_fqs_check
  rcu/nocb: Use appropriate rcu_nocb_lock_irqsave()
  rcu/nocb: Limit number of softirq callbacks only on softirq
  rcu: Fix callbacks processing time limit retaining cond_resched()
  rcu: Apply callbacks processing time limit only on softirq
  rcu/nocb: Don't invoke local rcu core on callback overload from nocb
    kthread
  rcu: Improve tree_plugin.h comments and add code cleanups
  refscale: Simplify the errexit checkpoint
  refscale: Prevent buffer to pr_alert() being too long
  refscale: Always log the error message
  doc: Add refcount analogy to What is RCU
  refscale: Add missing '\n' to flush message
  scftorture: Add missing '\n' to flush message
  scftorture: Remove unused SCFTORTOUT
  scftorture: Account for weight_resched when checking for all zeroes
  rcuscale: Always log error message
  doc: RCU: Avoid 'Symbol' font-family in SVG figures
  scftorture: Always log error message
  locktorture,rcutorture,torture: Always log error message
  rcu-tasks: Create per-CPU callback lists
  rcu-tasks: Introduce ->percpu_enqueue_shift for dynamic queue
    selection
  rcu-tasks: Convert grace-period counter to grace-period sequence
    number
  rcu_tasks: Convert bespoke callback list to rcu_segcblist structure
  rcu-tasks: Use spin_lock_rcu_node() and friends
  rcu-tasks: Inspect stalled task's trc state in locked state
  rcu-tasks: Add a ->percpu_enqueue_lim to the rcu_tasks structure
  rcu-tasks: Abstract checking of callback lists
  rcu-tasks: Abstract invocations of callbacks
  rcutorture: Avoid soft lockup during cpu stall
  torture: Make kvm-find-errors.sh report link-time undefined symbols
  rcu-tasks: Use workqueues for multiple rcu_tasks_invoke_cbs()
    invocations
  rcu-tasks: Make rcu_barrier_tasks*() handle multiple callback queues
  rcu-tasks: Add rcupdate.rcu_task_enqueue_lim to set initial queueing
  rcutorture: Test RCU-tasks multiqueue callback queueing
  rcu: Avoid running boost kthreads on isolated CPUs
  rcu: Avoid alloc_pages() when recording stack
  rcutorture: Add CONFIG_PREEMPT_DYNAMIC=n to tiny scenarios
  torture: Retry download once before giving up
  rcu-tasks: Count trylocks to estimate call_rcu_tasks() contention
  rcu/nocb: Remove rcu_node structure from nocb list when de-offloaded
  rcu/nocb: Prepare nocb_cb_wait() to start with a non-offloaded rdp
  rcu/nocb: Optimize kthreads and rdp initialization
  rcu/nocb: Create kthreads on all CPUs if "rcu_nocbs=" or "nohz_full="
    are passed
  rcu/nocb: Allow empty "rcu_nocbs" kernel parameter
  rcu/nocb: Merge rcu_spawn_cpu_nocb_kthread() and
    rcu_spawn_one_nocb_kthread()
  rcutorture: Enable multiple concurrent callback-flood kthreads
  rcutorture: Cause TREE02 and TREE10 scenarios to do more callback
    flooding
  rcutorture: Add ability to limit callback-flood intensity
  rcutorture: Combine n_max_cbs from all kthreads in a callback flood
  rcu-tasks: Avoid raw-spinlocked wakeups from call_rcu_tasks_generic()
  rcu-tasks: Use more callback queues if contention encountered
  rcutorture: Test RCU Tasks lock-contention detection
  rcu-tasks: Use separate ->percpu_dequeue_lim for callback dequeueing
  rcu-tasks: Use fewer callbacks queues if callback flood ends
  rcu/exp: Mark current CPU as exp-QS in IPI loop second pass
  torture: Fix incorrectly redirected "exit" in kvm-remote.sh
  torture: Properly redirect kvm-remote.sh "echo" commands
  rcu-tasks: Fix computation of CPU-to-list shift counts

 .../Expedited-Grace-Periods/Funnel0.svg       |   4 +-
 .../Expedited-Grace-Periods/Funnel1.svg       |   4 +-
 .../Expedited-Grace-Periods/Funnel2.svg       |   4 +-
 .../Expedited-Grace-Periods/Funnel3.svg       |   4 +-
 .../Expedited-Grace-Periods/Funnel4.svg       |   4 +-
 .../Expedited-Grace-Periods/Funnel5.svg       |   4 +-
 .../Expedited-Grace-Periods/Funnel6.svg       |   4 +-
 .../Expedited-Grace-Periods/Funnel7.svg       |   4 +-
 .../Expedited-Grace-Periods/Funnel8.svg       |   4 +-
 .../Tree-RCU-Memory-Ordering.rst              |  69 +--
 .../Requirements/GPpartitionReaders1.svg      |  36 +-
 .../Requirements/ReadersPartitionGP1.svg      |  62 +-
 Documentation/RCU/stallwarn.rst               |  10 +
 Documentation/RCU/whatisRCU.rst               |  90 ++-
 .../admin-guide/kernel-parameters.txt         |  66 +-
 .../admin-guide/kernel-per-CPU-kthreads.rst   |   2 +-
 arch/sh/configs/sdk7786_defconfig             |   1 -
 arch/xtensa/configs/nommu_kc705_defconfig     |   1 -
 include/linux/rcu_segcblist.h                 |  51 +-
 include/linux/rcupdate.h                      |  50 +-
 include/linux/rcupdate_trace.h                |   5 +-
 include/linux/rcutiny.h                       |   2 +-
 include/linux/srcu.h                          |   3 +-
 include/linux/torture.h                       |  17 +-
 kernel/locking/locktorture.c                  |  18 +-
 kernel/rcu/Kconfig                            |   2 +-
 kernel/rcu/rcu_segcblist.c                    |  10 +-
 kernel/rcu/rcu_segcblist.h                    |  12 +-
 kernel/rcu/rcuscale.c                         |  24 +-
 kernel/rcu/rcutorture.c                       | 320 +++++++---
 kernel/rcu/refscale.c                         |  50 +-
 kernel/rcu/srcutiny.c                         |   2 +-
 kernel/rcu/tasks.h                            | 583 ++++++++++++++----
 kernel/rcu/tree.c                             | 119 ++--
 kernel/rcu/tree.h                             |  24 +-
 kernel/rcu/tree_exp.h                         |  15 +-
 kernel/rcu/tree_nocb.h                        | 162 +++--
 kernel/rcu/tree_plugin.h                      |  61 +-
 kernel/rcu/update.c                           |   8 +-
 kernel/scftorture.c                           |  20 +-
 kernel/torture.c                              |   4 +-
 .../rcutorture/bin/kvm-find-errors.sh         |   4 +-
 .../rcutorture/bin/kvm-recheck-rcu.sh         |   2 +-
 .../selftests/rcutorture/bin/kvm-remote.sh    |  23 +-
 tools/testing/selftests/rcutorture/bin/kvm.sh |  11 +-
 .../selftests/rcutorture/bin/parse-build.sh   |   3 +-
 .../selftests/rcutorture/bin/torture.sh       |   9 +-
 .../selftests/rcutorture/configs/rcu/SRCU-T   |   1 +
 .../selftests/rcutorture/configs/rcu/SRCU-U   |   1 +
 .../rcutorture/configs/rcu/TASKS01.boot       |   1 +
 .../selftests/rcutorture/configs/rcu/TINY01   |   1 +
 .../selftests/rcutorture/configs/rcu/TINY02   |   1 +
 .../rcutorture/configs/rcu/TRACE01.boot       |   1 +
 .../rcutorture/configs/rcu/TRACE02.boot       |   1 +
 .../rcutorture/configs/rcu/TREE02.boot        |   1 +
 .../rcutorture/configs/rcu/TREE10.boot        |   1 +
 .../rcutorture/configs/rcuscale/TINY          |   1 +
 57 files changed, 1360 insertions(+), 637 deletions(-)
 create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/TREE02.boot
 create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/TREE10.boot

Approved-by: Prarit Bhargava <prarit@redhat.com>
Approved-by: Wander Lairson Costa <wander@redhat.com>
Approved-by: Phil Auld <pauld@redhat.com>

Signed-off-by: Patrick Talbert <ptalbert@redhat.com>
2022-04-19 12:23:21 +02:00
Phil Auld 1cf795c344 sched/isolation: Use single feature type while referring to housekeeping cpumask
Bugzilla: http://bugzilla.redhat.com/2065222

commit 04d4e665a60902cf36e7ad39af1179cb5df542ad
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Mon Feb 7 16:59:06 2022 +0100

    sched/isolation: Use single feature type while referring to housekeeping cpumask

    Refer to housekeeping APIs using single feature types instead of flags.
    This prevents from passing multiple isolation features at once to
    housekeeping interfaces, which soon won't be possible anymore as each
    isolation features will have their own cpumask.

    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Reviewed-by: Juri Lelli <juri.lelli@redhat.com>
    Reviewed-by: Phil Auld <pauld@redhat.com>
    Link: https://lore.kernel.org/r/20220207155910.527133-5-frederic@kernel.org

Signed-off-by: Phil Auld <pauld@redhat.com>
2022-03-31 10:40:39 -04:00
Waiman Long 1a6798ec33 rcu: Avoid running boost kthreads on isolated CPUs
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994

commit c2cf0767e98eb4487444e5c7ebba491a866811ce
Author: Zqiang <qiang.zhang1211@gmail.com>
Date:   Mon, 15 Nov 2021 13:15:46 +0800

    rcu: Avoid running boost kthreads on isolated CPUs

    When the boost kthreads are created on systems with nohz_full CPUs,
    the cpus_allowed_ptr is set to housekeeping_cpumask(HK_FLAG_KTHREAD).
    However, when the rcu_boost_kthread_setaffinity() is called, the original
    affinity will be changed and these kthreads can subsequently run on
    nohz_full CPUs.  This commit makes rcu_boost_kthread_setaffinity()
    restrict these boost kthreads to housekeeping CPUs.

    Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-03-24 17:16:17 -04:00
Waiman Long 235acef905 rcu: Improve tree_plugin.h comments and add code cleanups
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994

commit 17ea3718824912e773b0fd78579694b2e75ee597
Author: Zhouyi Zhou <zhouzhouyi@gmail.com>
Date:   Sun, 24 Oct 2021 08:36:34 +0800

    rcu: Improve tree_plugin.h comments and add code cleanups

    This commit cleans up some comments and code in kernel/rcu/tree_plugin.h.

    Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-03-24 17:16:04 -04:00
Waiman Long aa5e9f7836 rcu: Make idle entry report expedited quiescent states
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994

commit 790da248978a0722d92d1471630c881704f7eb0d
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Wed, 29 Sep 2021 11:09:34 -0700

    rcu: Make idle entry report expedited quiescent states

    In non-preemptible kernels, an unfortunately timed expedited grace period
    can result in the rcu_exp_handler() IPI handler setting the rcu_data
    structure's cpu_no_qs.b.exp field just as the target CPU enters idle.
    There are situations in which this field will not be checked until after
    that CPU exits idle.  The resulting grace-period latency does not qualify
    as "expedited".

    This commit therefore checks this field upon non-preemptible idle entry in
    the rcu_preempt_deferred_qs() function.  It also qualifies the rcu_core()
    preempt_count() check with IS_ENABLED(CONFIG_PREEMPT_COUNT) to prevent
    false-positive quiescent states from count-free kernels.

    Reported-by: Neeraj Upadhyay <neeraju@codeaurora.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-03-24 17:15:59 -04:00
Waiman Long 8b492f5404 rcu: Always inline rcu_dynticks_task*_{enter,exit}()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994

commit 7663ad9a5dbcc27f3090e6bfd192c7e59222709f
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Tue, 28 Sep 2021 10:40:21 +0200

    rcu: Always inline rcu_dynticks_task*_{enter,exit}()

    RCU managed to grow a few noinstr violations:

      vmlinux.o: warning: objtool: rcu_dynticks_eqs_enter()+0x0: call to rcu_dynticks_task_trace_enter() leaves .noinstr.text section
      vmlinux.o: warning: objtool: rcu_dynticks_eqs_exit()+0xe: call to rcu_dynticks_task_trace_exit() leaves .noinstr.text section

    Fix them by adding __always_inline to the relevant trivial functions.

    Also replace the noinstr with __always_inline for the existing
    rcu_dynticks_task_*() functions since noinstr would force noinline
    them, even when empty, which seems silly.

    Fixes: 7d0c9c50c5 ("rcu-tasks: Avoid IPIing userspace/idle tasks if kernel is so built")
    Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
    Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-03-24 17:15:57 -04:00
Waiman Long 5dac0f1d20 rcu: in_irq() cleanup
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994

commit 2407a64f8045552203ee5cb9904ce75ce2fceef4
Author: Changbin Du <changbin.du@intel.com>
Date:   Tue, 28 Sep 2021 08:21:28 +0800

    rcu: in_irq() cleanup

    This commit replaces the obsolete and ambiguous macro in_irq() with its
    shiny new in_hardirq() equivalent.

    Signed-off-by: Changbin Du <changbin.du@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-03-24 17:15:57 -04:00
Waiman Long c9b4dd21b8 rcu: Remove rcu_data.exp_deferred_qs and convert to rcu_data.cpu no_qs.b.exp
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994

commit 6120b72e25e195b6fa15b0a674479a38166c392a
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Thu, 16 Sep 2021 14:10:48 +0200

    rcu: Remove rcu_data.exp_deferred_qs and convert to rcu_data.cpu no_qs.b.exp

    Having two fields for the same purpose with subtle differences on
    different RCU flavours is confusing, especially when both fields always
    exist on both RCU flavours.

    Fortunately, it is now safe for preemptible RCU to rely on the rcu_data
    structure's ->cpu_no_qs.b.exp field, just like non-preemptible RCU.
    This commit therefore removes the ad-hoc ->exp_deferred_qs field.

    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-03-24 17:15:53 -04:00
Waiman Long c58e6fd8a6 rcu: Move rcu_data.cpu_no_qs.b.exp reset to rcu_export_exp_rdp()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994

commit 6e16b0f7bae3817ea67f4bef4f84298e880fbf66
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Thu, 16 Sep 2021 14:10:47 +0200

    rcu: Move rcu_data.cpu_no_qs.b.exp reset to rcu_export_exp_rdp()

    On non-preemptible RCU, move clearing of the rcu_data structure's
    ->cpu_no_qs.b.exp filed to the actual expedited quiescent state report
    function, matching hw preemptible RCU handles the ->exp_deferred_qs field.

    This prepares for removing ->exp_deferred_qs in favor of ->cpu_no_qs.b.exp
    for both preemptible and non-preemptible RCU.

    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-03-24 17:15:53 -04:00
Waiman Long f88081bad1 rcu: Ignore rdp.cpu_no_qs.b.exp on preemptible RCU's rcu_qs()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994

commit a4382659487f84c00b5fbb61df25a9ad59396789
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Thu, 16 Sep 2021 14:10:45 +0200

    rcu: Ignore rdp.cpu_no_qs.b.exp on preemptible RCU's rcu_qs()

    Preemptible RCU does not use the rcu_data structure's ->cpu_no_qs.b.exp,
    instead using a separate ->exp_deferred_qs field to record the need for
    an expedited quiescent state.

    In fact ->cpu_no_qs.b.exp should never be set in preemptible RCU because
    preemptible RCU's expedited grace periods use other mechanisms to record
    quiescent states.

    This commit therefore removes the implicit rcu_qs() reference to
    ->cpu_no_qs.b.exp in favor of a direct reference to ->cpu_no_qs.b.norm.

    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-03-24 17:15:52 -04:00
Desnes A. Nunes do Rosario 3a7d6d5b49 rcu: Move rcu_needs_cpu() to tree.c
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2059555
Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=bc849e9192c75833a85f2e9376a265ab31f8eec7

commit bc849e9192c75833a85f2e9376a265ab31f8eec7
Author: "Paul E. McKenney" <paulmck@kernel.org>
Date: Mon, 27 Sep 2021 14:30:20 -0700

  Now that RCU_FAST_NO_HZ is no more, there is but one implementation of
  the rcu_needs_cpu() function.  This commit therefore moves this function
  from kernel/rcu/tree_plugin.c to kernel/rcu/tree.c.

  Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Desnes A. Nunes do Rosario <drosario@redhat.com>
2022-03-24 14:39:57 -04:00
Desnes A. Nunes do Rosario 9814a162d4 rcu: Remove the RCU_FAST_NO_HZ Kconfig option
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2059555
Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=e2c73a6860bdf54f2c6bf8cddc34ddc91a1343e1

commit e2c73a6860bdf54f2c6bf8cddc34ddc91a1343e1
Author: "Paul E. McKenney" <paulmck@kernel.org>
Date: Mon, 27 Sep 2021 14:18:51 -0700

  All of the uses of CONFIG_RCU_FAST_NO_HZ=y that I have seen involve
  systems with RCU callbacks offloaded.  In this situation, all that this
  Kconfig option does is slow down idle entry/exit with an additional
  allways-taken early exit.  If this is the only use case, then this
  Kconfig option nothing but an attractive nuisance that needs to go away.

  This commit therefore removes the RCU_FAST_NO_HZ Kconfig option.

  Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Desnes A. Nunes do Rosario <drosario@redhat.com>
2022-03-24 14:39:57 -04:00
Waiman Long c33d095f30 rcu: Print human-readable message for schedule() in RCU reader
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2022806

commit 521c89b3a4022269c75b35062358d1dae4ebfa79
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Mon, 19 Jul 2021 11:52:12 -0700

    rcu: Print human-readable message for schedule() in RCU reader

    The WARN_ON_ONCE() invocation within the CONFIG_PREEMPT=y version of
    rcu_note_context_switch() triggers when there is a voluntary context
    switch in an RCU read-side critical section, but there is quite a gap
    between the output of that WARN_ON_ONCE() and this RCU-usage error.
    This commit therefore converts the WARN_ON_ONCE() to a WARN_ONCE()
    that explicitly describes the problem in its message.

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2021-11-12 14:23:16 -05:00
Waiman Long e65c9cc4e1 rcu: Fix macro name CONFIG_TASKS_RCU_TRACE
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2022806

commit fed31a4dd3adb5455df7c704de2abb639a1dc1c0
Author: Zhouyi Zhou <zhouzhouyi@gmail.com>
Date:   Tue, 13 Jul 2021 08:56:45 +0800

    rcu: Fix macro name CONFIG_TASKS_RCU_TRACE

    This commit fixes several typos where CONFIG_TASKS_RCU_TRACE should
    instead be CONFIG_TASKS_TRACE_RCU.  Among other things, these typos
    could cause CONFIG_TASKS_TRACE_RCU_READ_MB=y kernels to suffer from
    memory-ordering bugs that could result in false-positive quiescent
    states and too-short grace periods.

    Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2021-11-12 14:23:13 -05:00
Waiman Long a48351713d rcu: Mark accesses to ->rcu_read_lock_nesting
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2022806

commit 5fcb3a5f04ee6422714adb02f5364042228bfc2e
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Thu, 20 May 2021 13:35:50 -0700

    rcu: Mark accesses to ->rcu_read_lock_nesting

    KCSAN flags accesses to ->rcu_read_lock_nesting as data races, but
    in the past, the overhead of marked accesses was excessive.  However,
    that was long ago, and much has changed since then, both in terms of
    hardware and of compilers.  Here is data taken on an eight-core laptop
    using Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz with a kernel built
    using gcc version 9.3.0, with all data in nanoseconds.

    Unmarked accesses (status quo), measured by three refscale runs:

            Minimum reader duration:  3.286  2.851  3.395
            Median reader duration:   3.698  3.531  3.4695
            Maximum reader duration:  4.481  5.215  5.157

    Marked accesses, also measured by three refscale runs:

            Minimum reader duration:  3.501  3.677  3.580
            Median reader duration:   4.053  3.723  3.895
            Maximum reader duration:  7.307  4.999  5.511

    This focused microbenhmark shows only sub-nanosecond differences which
    are unlikely to be visible at the system level.  This commit therefore
    marks data-racing accesses to ->rcu_read_lock_nesting.

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2021-11-12 14:22:54 -05:00
Waiman Long 91e2081a69 rcu/nocb: Start moving nocb code to its own plugin file
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2022806

commit dfcb27540213e8061ecffacd4bd8ed54a310a7b0
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Wed, 19 May 2021 02:09:28 +0200

    rcu/nocb: Start moving nocb code to its own plugin file

    The kernel/rcu/tree_plugin.h file contains not only the plugins for
    preemptible RCU, but also many other features including rcu_nocbs
    callback offloading.  This offloading has become large and complex,
    so it is time to put it in its own file.

    This commit starts that process.

    Suggested-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    [ paulmck: Rename to tree_nocb.h, add Frederic as author. ]
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2021-11-12 14:22:52 -05:00
Waiman Long 3c29e6cff1 locking/rtmutex: Split out the inner parts of 'struct rtmutex'
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2007032

commit 830e6acc8a1cafe153a0d88f9b2455965b396131
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Sun, 15 Aug 2021 23:27:58 +0200

    locking/rtmutex: Split out the inner parts of 'struct rtmutex'

    RT builds substitutions for rwsem, mutex, spinlock and rwlock around
    rtmutexes. Split the inner working out so each lock substitution can use
    them with the appropriate lockdep annotations. This avoids having an extra
    unused lockdep map in the wrapped rtmutex.

    No functional change.

    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Link: https://lore.kernel.org/r/20210815211302.784739994@linutronix.de

Signed-off-by: Waiman Long <longman@redhat.com>
2021-09-27 16:18:45 -04:00
Waiman Long 7f0d9a6f21 rcu: Avoid unneeded function call in rcu_read_unlock()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1998549
Upstream Status: linux-rcu commit fd07d7b373a8e7c8406a04b206bed89ec3cc2b52
		 https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
Tested: Before this patch, the symbol rcu_read_unlock_strict is found
	in 100s kernel modules. After this patch, the symbol is no longer
	found in any of the kernel modules.

commit fd07d7b373a8e7c8406a04b206bed89ec3cc2b52
Author: Waiman Long <longman@redhat.com>
Date:   Thu, 26 Aug 2021 22:21:22 -0400

    rcu: Avoid unneeded function call in rcu_read_unlock()

    Since commit aa40c138cc ("rcu: Report QS for outermost PREEMPT=n
    rcu_read_unlock() for strict GPs") the function rcu_read_unlock_strict()
    is invoked by the inlined rcu_read_unlock() function.  However,
    rcu_read_unlock_strict() is an empty function in production kernels,
    which are built with CONFIG_RCU_STRICT_GRACE_PERIOD=n.

    There is a mention of rcu_read_unlock_strict() in the BPF verifier,
    but this is in a deny-list, meaning that BPF does not care whether
    rcu_read_unlock_strict() is ever called.

    This commit therefore provides a slight performance improvement
    by hoisting the check of CONFIG_RCU_STRICT_GRACE_PERIOD from
    rcu_read_unlock_strict() into rcu_read_unlock(), thus avoiding the
    pointless call to an empty function.

    Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
    Cc: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Waiman Long <longman@redhat.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2021-09-08 15:45:20 -04:00
Linus Torvalds 28e92f9903 Merge branch 'core-rcu-2021.07.04' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU updates from Paul McKenney:

 - Bitmap parsing support for "all" as an alias for all bits

 - Documentation updates

 - Miscellaneous fixes, including some that overlap into mm and lockdep

 - kvfree_rcu() updates

 - mem_dump_obj() updates, with acks from one of the slab-allocator
   maintainers

 - RCU NOCB CPU updates, including limited deoffloading

 - SRCU updates

 - Tasks-RCU updates

 - Torture-test updates

* 'core-rcu-2021.07.04' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (78 commits)
  tasks-rcu: Make show_rcu_tasks_gp_kthreads() be static inline
  rcu-tasks: Make ksoftirqd provide RCU Tasks quiescent states
  rcu: Add missing __releases() annotation
  rcu: Remove obsolete rcu_read_unlock() deadlock commentary
  rcu: Improve comments describing RCU read-side critical sections
  rcu: Create an unrcu_pointer() to remove __rcu from a pointer
  srcu: Early test SRCU polling start
  rcu: Fix various typos in comments
  rcu/nocb: Unify timers
  rcu/nocb: Prepare for fine-grained deferred wakeup
  rcu/nocb: Only cancel nocb timer if not polling
  rcu/nocb: Delete bypass_timer upon nocb_gp wakeup
  rcu/nocb: Cancel nocb_timer upon nocb_gp wakeup
  rcu/nocb: Allow de-offloading rdp leader
  rcu/nocb: Directly call __wake_nocb_gp() from bypass timer
  rcu: Don't penalize priority boosting when there is nothing to boost
  rcu: Point to documentation of ordering guarantees
  rcu: Make rcu_gp_cleanup() be noinline for tracing
  rcu: Restrict RCU_STRICT_GRACE_PERIOD to at most four CPUs
  rcu: Make show_rcu_gp_kthreads() dump rcu_node structures blocking GP
  ...
2021-07-04 12:58:33 -07:00
Peter Zijlstra b03fbd4ff2 sched: Introduce task_is_running()
Replace a bunch of 'p->state == TASK_RUNNING' with a new helper:
task_is_running(p).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210611082838.222401495@infradead.org
2021-06-18 11:43:07 +02:00