Commit Graph

1008 Commits

Author SHA1 Message Date
Waiman Long 87fa04a04f rcu: Employ jiffies-based backstop to callback time limit
JIRA: https://issues.redhat.com/browse/RHEL-34076

commit f51164a808b5bf1d81fc37eb53ab1eae59c79f2d
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Fri, 31 Mar 2023 09:05:56 -0700

    rcu: Employ jiffies-based backstop to callback time limit

    Currently, if there are more than 100 ready-to-invoke RCU callbacks queued
    on a given CPU, the rcu_do_batch() function sets a timeout for invocation
    of the series.  This timeout defaulting to three milliseconds, and may
    be adjusted using the rcutree.rcu_resched_ns kernel boot parameter.
    This timeout is checked using local_clock(), but the overhead of this
    function combined with the common-case very small callback-invocation
    overhead means that local_clock() is checked every 32nd invocation.

    This works well except for longer-than average callbacks.  For example,
    a series of 500-microsecond-duration callbacks means that local_clock()
    is checked only once every 16 milliseconds, which makes it difficult to
    enforce a three-millisecond timeout.

    This commit therefore adds a Kconfig option RCU_DOUBLE_CHECK_CB_TIME
    that enables backup timeout checking using the coarser grained but
    lighter weight jiffies.  If the jiffies counter detects a timeout,
    then local_clock() is consulted even if this is not the 32nd callback.
    This prevents the aforementioned 16-millisecond latency blow.

    Reported-by: Domas Mituzas <dmituzas@meta.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2024-05-31 10:56:12 -04:00
Waiman Long 05bce2c628 rcu: Check callback-invocation time limit for rcuc kthreads
JIRA: https://issues.redhat.com/browse/RHEL-34076

commit fea1c1f0101783f24d00e065ecd3d6e90292f887
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Tue, 21 Mar 2023 16:43:54 -0700

    rcu: Check callback-invocation time limit for rcuc kthreads

    Currently, a callback-invocation time limit is enforced only for
    callbacks invoked from the softirq environment, the rationale being
    that when callbacks are instead invoked from rcuc and rcuoc kthreads,
    these callbacks cannot be holding up other softirq vectors.

    Which is in fact true.  However, if an rcuc kthread spends too much time
    invoking callbacks, it can delay quiescent-state reports from its CPU,
    which can also be a problem.

    This commit therefore applies the callback-invocation time limit to
    callback invocation from the rcuc kthreads as well as from softirq.

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2024-05-31 10:56:12 -04:00
Phil Auld 0ded1c9222 rcu: Introduce rcu_cpu_online()
JIRA: https://issues.redhat.com/browse/RHEL-25535

commit 2be4686d866ad5896f2bb94d82fe892197aea9c7
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Fri Oct 27 16:40:47 2023 +0200

    rcu: Introduce rcu_cpu_online()

    Export the RCU point of view as to when a CPU is considered offline
    (ie: when does RCU consider that a CPU is sufficiently down in the
    hotplug process to not feature any possible read side).

    This will be used by RCU-tasks whose vision of an offline CPU should
    reasonably match the one of RCU core.

    Fixes: cff9b2332ab7 ("kernel/sched: Modify initial boot task idle setup")
    Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

Signed-off-by: Phil Auld <pauld@redhat.com>
2024-04-08 15:47:15 -04:00
Prarit Bhargava 6481524e83 rcu: Remove rcu_is_idle_cpu()
JIRA: https://issues.redhat.com/browse/RHEL-25415

commit fdbdb868454a3e83996cf1500c6f7ba73c07a03c
Author: Yipeng Zou <zouyipeng@huawei.com>
Date:   Mon Sep 26 09:58:27 2022 +0800

    rcu: Remove rcu_is_idle_cpu()

    The commit 3fcd6a230f ("x86/cpu: Avoid cpuinfo-induced IPIing of
    idle CPUs") introduced rcu_is_idle_cpu() in order to identify the
    current CPU idle state.  But commit f3eca381bd49 ("x86/aperfmperf:
    Replace arch_freq_get_on_cpu()") switched to using MAX_SAMPLE_AGE,
    so rcu_is_idle_cpu() is no longer used.  This commit therefore removes it.

    Fixes: f3eca381bd49 ("x86/aperfmperf: Replace arch_freq_get_on_cpu()")
    Signed-off-by: Yipeng Zou <zouyipeng@huawei.com>
    Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
2024-03-20 09:42:38 -04:00
Čestmír Kalina 9ca123909b Revert "rcu: Permit start_poll_synchronize_rcu_expedited() to be invoked early"
JIRA: https://issues.redhat.com/browse/RHEL-14709
Upstream Status: RHEL only

Commit beb6958344 results in a hung
boot on qemu with TCG enabled on an RT debug kernel following
RCU self-test initiation.

Revert it for now, until a better solution is determined.

Signed-off-by: Čestmír Kalina <ckalina@redhat.com>
2023-10-30 19:36:02 +01:00
Waiman Long 646491ffa8 rcu/kvfree: Avoid freeing new kfree_rcu() memory after old grace period
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit 5da7cb193db32da783a3f3e77d8b639989321d48
Author: Ziwei Dai <ziwei.dai@unisoc.com>
Date:   Fri, 31 Mar 2023 20:42:09 +0800

    rcu/kvfree: Avoid freeing new kfree_rcu() memory after old grace period

    Memory passed to kvfree_rcu() that is to be freed is tracked by a
    per-CPU kfree_rcu_cpu structure, which in turn contains pointers
    to kvfree_rcu_bulk_data structures that contain pointers to memory
    that has not yet been handed to RCU, along with an kfree_rcu_cpu_work
    structure that tracks the memory that has already been handed to RCU.
    These structures track three categories of memory: (1) Memory for
    kfree(), (2) Memory for kvfree(), and (3) Memory for both that arrived
    during an OOM episode.  The first two categories are tracked in a
    cache-friendly manner involving a dynamically allocated page of pointers
    (the aforementioned kvfree_rcu_bulk_data structures), while the third
    uses a simple (but decidedly cache-unfriendly) linked list through the
    rcu_head structures in each block of memory.

    On a given CPU, these three categories are handled as a unit, with that
    CPU's kfree_rcu_cpu_work structure having one pointer for each of the
    three categories.  Clearly, new memory for a given category cannot be
    placed in the corresponding kfree_rcu_cpu_work structure until any old
    memory has had its grace period elapse and thus has been removed.  And
    the kfree_rcu_monitor() function does in fact check for this.

    Except that the kfree_rcu_monitor() function checks these pointers one
    at a time.  This means that if the previous kfree_rcu() memory passed
    to RCU had only category 1 and the current one has only category 2, the
    kfree_rcu_monitor() function will send that current category-2 memory
    along immediately.  This can result in memory being freed too soon,
    that is, out from under unsuspecting RCU readers.

    To see this, consider the following sequence of events, in which:

    o       Task A on CPU 0 calls rcu_read_lock(), then uses "from_cset",
            then is preempted.

    o       CPU 1 calls kfree_rcu(cset, rcu_head) in order to free "from_cset"
            after a later grace period.  Except that "from_cset" is freed
            right after the previous grace period ended, so that "from_cset"
            is immediately freed.  Task A resumes and references "from_cset"'s
            member, after which nothing good happens.

    In full detail:

    CPU 0                                   CPU 1
    ----------------------                  ----------------------
    count_memcg_event_mm()
    |rcu_read_lock()  <---
    |mem_cgroup_from_task()
     |// css_set_ptr is the "from_cset" mentioned on CPU 1
     |css_set_ptr = rcu_dereference((task)->cgroups)
     |// Hard irq comes, current task is scheduled out.

                                            cgroup_attach_task()
                                            |cgroup_migrate()
                                            |cgroup_migrate_execute()
                                            |css_set_move_task(task, from_cset, to_cset, true)
                                            |cgroup_move_task(task, to_cset)
                                            |rcu_assign_pointer(.., to_cset)
                                            |...
                                            |cgroup_migrate_finish()
                                            |put_css_set_locked(from_cset)
                                            |from_cset->refcount return 0
                                            |kfree_rcu(cset, rcu_head) // free from_cset after new gp
                                            |add_ptr_to_bulk_krc_lock()
                                            |schedule_delayed_work(&krcp->monitor_work, ..)

                                            kfree_rcu_monitor()
                                            |krcp->bulk_head[0]'s work attached to krwp->bulk_head_free[]
                                            |queue_rcu_work(system_wq, &krwp->rcu_work)
                                            |if rwork->rcu.work is not in WORK_STRUCT_PENDING_BIT state,
                                            |call_rcu(&rwork->rcu, rcu_work_rcufn) <--- request new gp

                                            // There is a perious call_rcu(.., rcu_work_rcufn)
                                            // gp end, rcu_work_rcufn() is called.
                                            rcu_work_rcufn()
                                            |__queue_work(.., rwork->wq, &rwork->work);

                                            |kfree_rcu_work()
                                            |krwp->bulk_head_free[0] bulk is freed before new gp end!!!
                                            |The "from_cset" is freed before new gp end.

    // the task resumes some time later.
     |css_set_ptr->subsys[(subsys_id) <--- Caused kernel crash, because css_set_ptr is freed.

    This commit therefore causes kfree_rcu_monitor() to refrain from moving
    kfree_rcu() memory to the kfree_rcu_cpu_work structure until the RCU
    grace period has completed for all three categories.

    v2: Use helper function instead of inserted code block at kfree_rcu_monitor().

    Fixes: 34c8817455 ("rcu: Support kfree_bulk() interface in kfree_rcu()")
    Fixes: 5f3c8d6204 ("rcu/tree: Maintain separate array for vmalloc ptrs")
    Reported-by: Mukesh Ojha <quic_mojha@quicinc.com>
    Signed-off-by: Ziwei Dai <ziwei.dai@unisoc.com>
    Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
    Tested-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-10-06 21:44:25 -04:00
Waiman Long b571f50b2a rcu: Avoid stack overflow due to __rcu_irq_enter_check_tick() being kprobe-ed
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit 7a29fb4a4771124bc61de397dbfc1554dbbcc19c
Author: Zheng Yejian <zhengyejian1@huawei.com>
Date:   Fri, 6 Jan 2023 15:09:34 +0800

    rcu: Avoid stack overflow due to __rcu_irq_enter_check_tick() being kprobe-ed

    Registering a kprobe on __rcu_irq_enter_check_tick() can cause kernel
    stack overflow as shown below. This issue can be reproduced by enabling
    CONFIG_NO_HZ_FULL and booting the kernel with argument "nohz_full=",
    and then giving the following commands at the shell prompt:

      # cd /sys/kernel/tracing/
      # echo 'p:mp1 __rcu_irq_enter_check_tick' >> kprobe_events
      # echo 1 > events/kprobes/enable

    This commit therefore adds __rcu_irq_enter_check_tick() to the kprobes
    blacklist using NOKPROBE_SYMBOL().

    Insufficient stack space to handle exception!
    ESR: 0x00000000f2000004 -- BRK (AArch64)
    FAR: 0x0000ffffccf3e510
    Task stack:     [0xffff80000ad30000..0xffff80000ad38000]
    IRQ stack:      [0xffff800008050000..0xffff800008058000]
    Overflow stack: [0xffff089c36f9f310..0xffff089c36fa0310]
    CPU: 5 PID: 190 Comm: bash Not tainted 6.2.0-rc2-00320-g1f5abbd77e2c #19
    Hardware name: linux,dummy-virt (DT)
    pstate: 400003c5 (nZcv DAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
    pc : __rcu_irq_enter_check_tick+0x0/0x1b8
    lr : ct_nmi_enter+0x11c/0x138
    sp : ffff80000ad30080
    x29: ffff80000ad30080 x28: ffff089c82e20000 x27: 0000000000000000
    x26: 0000000000000000 x25: ffff089c02a8d100 x24: 0000000000000000
    x23: 00000000400003c5 x22: 0000ffffccf3e510 x21: ffff089c36fae148
    x20: ffff80000ad30120 x19: ffffa8da8fcce148 x18: 0000000000000000
    x17: 0000000000000000 x16: 0000000000000000 x15: ffffa8da8e44ea6c
    x14: ffffa8da8e44e968 x13: ffffa8da8e03136c x12: 1fffe113804d6809
    x11: ffff6113804d6809 x10: 0000000000000a60 x9 : dfff800000000000
    x8 : ffff089c026b404f x7 : 00009eec7fb297f7 x6 : 0000000000000001
    x5 : ffff80000ad30120 x4 : dfff800000000000 x3 : ffffa8da8e3016f4
    x2 : 0000000000000003 x1 : 0000000000000000 x0 : 0000000000000000
    Kernel panic - not syncing: kernel stack overflow
    CPU: 5 PID: 190 Comm: bash Not tainted 6.2.0-rc2-00320-g1f5abbd77e2c #19
    Hardware name: linux,dummy-virt (DT)
    Call trace:
     dump_backtrace+0xf8/0x108
     show_stack+0x20/0x30
     dump_stack_lvl+0x68/0x84
     dump_stack+0x1c/0x38
     panic+0x214/0x404
     add_taint+0x0/0xf8
     panic_bad_stack+0x144/0x160
     handle_bad_stack+0x38/0x58
     __bad_stack+0x78/0x7c
     __rcu_irq_enter_check_tick+0x0/0x1b8
     arm64_enter_el1_dbg.isra.0+0x14/0x20
     el1_dbg+0x2c/0x90
     el1h_64_sync_handler+0xcc/0xe8
     el1h_64_sync+0x64/0x68
     __rcu_irq_enter_check_tick+0x0/0x1b8
     arm64_enter_el1_dbg.isra.0+0x14/0x20
     el1_dbg+0x2c/0x90
     el1h_64_sync_handler+0xcc/0xe8
     el1h_64_sync+0x64/0x68
     __rcu_irq_enter_check_tick+0x0/0x1b8
     arm64_enter_el1_dbg.isra.0+0x14/0x20
     el1_dbg+0x2c/0x90
     el1h_64_sync_handler+0xcc/0xe8
     el1h_64_sync+0x64/0x68
     __rcu_irq_enter_check_tick+0x0/0x1b8
     [...]
     el1_dbg+0x2c/0x90
     el1h_64_sync_handler+0xcc/0xe8
     el1h_64_sync+0x64/0x68
     __rcu_irq_enter_check_tick+0x0/0x1b8
     arm64_enter_el1_dbg.isra.0+0x14/0x20
     el1_dbg+0x2c/0x90
     el1h_64_sync_handler+0xcc/0xe8
     el1h_64_sync+0x64/0x68
     __rcu_irq_enter_check_tick+0x0/0x1b8
     arm64_enter_el1_dbg.isra.0+0x14/0x20
     el1_dbg+0x2c/0x90
     el1h_64_sync_handler+0xcc/0xe8
     el1h_64_sync+0x64/0x68
     __rcu_irq_enter_check_tick+0x0/0x1b8
     el1_interrupt+0x28/0x60
     el1h_64_irq_handler+0x18/0x28
     el1h_64_irq+0x64/0x68
     __ftrace_set_clr_event_nolock+0x98/0x198
     __ftrace_set_clr_event+0x58/0x80
     system_enable_write+0x144/0x178
     vfs_write+0x174/0x738
     ksys_write+0xd0/0x188
     __arm64_sys_write+0x4c/0x60
     invoke_syscall+0x64/0x180
     el0_svc_common.constprop.0+0x84/0x160
     do_el0_svc+0x48/0xe8
     el0_svc+0x34/0xd0
     el0t_64_sync_handler+0xb8/0xc0
     el0t_64_sync+0x190/0x194
    SMP: stopping secondary CPUs
    Kernel Offset: 0x28da86000000 from 0xffff800008000000
    PHYS_OFFSET: 0xfffff76600000000
    CPU features: 0x00000,01a00100,0000421b
    Memory Limit: none

    Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Link: https://lore.kernel.org/all/20221119040049.795065-1-zhengyejian1@huawei.com/
    Fixes: aaf2bc50df ("rcu: Abstract out rcu_irq_enter_check_tick() from rcu_nmi_enter()")
    Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-10-06 21:44:24 -04:00
Waiman Long beb6958344 rcu: Permit start_poll_synchronize_rcu_expedited() to be invoked early
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit 7ea91307ad2dbdd15ed0b762a2d994f816039b9d
Author: Zqiang <qiang1.zhang@intel.com>
Date:   Thu, 12 Jan 2023 10:48:29 -0800

    rcu: Permit start_poll_synchronize_rcu_expedited() to be invoked early

    According to the commit log of the patch that added it to the kernel,
    start_poll_synchronize_rcu_expedited() can be invoked very early, as
    in long before rcu_init() has been invoked.  But before rcu_init(),
    the rcu_data structure's ->mynode field has not yet been initialized.
    This means that the start_poll_synchronize_rcu_expedited() function's
    attempt to set the CPU's leaf rcu_node structure's ->exp_seq_poll_rq
    field will result in a segmentation fault.

    This commit therefore causes start_poll_synchronize_rcu_expedited() to
    set ->exp_seq_poll_rq only after rcu_init() has initialized all CPUs'
    rcu_data structures' ->mynode fields.  It also removes the check from
    the rcu_init() function so that start_poll_synchronize_rcu_expedited(
    is unconditionally invoked.  Yes, this might result in an unnecessary
    boot-time grace period, but this is down in the noise.

    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
    Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
    Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-10-06 21:44:23 -04:00
Waiman Long 6af56f1d8f rcu: Remove never-set needwake assignment from rcu_report_qs_rdp()
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit 46103fe01b02169150f16d8d6e028217c5a7abe5
Author: Zqiang <qiang1.zhang@intel.com>
Date:   Wed, 18 Jan 2023 15:30:14 +0800

    rcu: Remove never-set needwake assignment from rcu_report_qs_rdp()

    The rcu_accelerate_cbs() function is invoked by rcu_report_qs_rdp()
    only if there is a grace period in progress that is still blocked
    by at least one CPU on this rcu_node structure.  This means that
    rcu_accelerate_cbs() should never return the value true, and thus that
    this function should never set the needwake variable and in turn never
    invoke rcu_gp_kthread_wake().

    This commit therefore removes the needwake variable and the invocation
    of rcu_gp_kthread_wake() in favor of a WARN_ON_ONCE() on the call to
    rcu_accelerate_cbs().  The purpose of this new WARN_ON_ONCE() is to
    detect situations where the system's opinion differs from ours.

    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
    Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-10-06 21:44:22 -04:00
Waiman Long 629ce1cda6 rcu: Add comment to rcu_do_batch() identifying rcuoc code path
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit 09853fb89f6bc46727ccd325c0f6f266df51d155
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Sat, 4 Mar 2023 13:40:55 -0800

    rcu: Add comment to rcu_do_batch() identifying rcuoc code path

    This commit adds a comment to help explain why the "else" clause of the
    in_serving_softirq() "if" statement does not need to enforce a time limit.
    The reason is that this "else" clause handles rcuoc kthreads that do not
    block handlers for other softirq vectors.

    Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-10-06 21:44:20 -04:00
Waiman Long 6a698557f5 rcu: Disable laziness if lazy-tracking says so
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit cf7066b97e27b2319af1ae2ef6889c4a1704312d
Author: Joel Fernandes (Google) <joel@joelfernandes.org>
Date:   Thu, 12 Jan 2023 00:52:23 +0000

    rcu: Disable laziness if lazy-tracking says so

    During suspend, we see failures to suspend 1 in 300-500 suspends.
    Looking closer, it appears that asynchronous RCU callbacks are being
    queued as lazy even though synchronous callbacks are expedited. These
    delays appear to not be very welcome by the suspend/resume code as
    evidenced by these occasional suspend failures.

    This commit modifies call_rcu() to check if rcu_async_should_hurry(),
    which will return true if we are in suspend or in-kernel boot.

    [ paulmck: Alphabetize local variables. ]

    Ignoring the lazy hint makes the 3000 suspend/resume cycles pass
    reliably on a 12th gen 12-core Intel CPU, and there is some evidence
    that it also slightly speeds up boot performance.

    Fixes: 3cb278e73be5 ("rcu: Make call_rcu() lazy to save power")
    Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-09-22 13:21:41 -04:00
Waiman Long d0e55bfb7b rcu: Track laziness during boot and suspend
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit 6efdda8bec2900ce5166ee4ff4b1844b47b529cd
Author: Joel Fernandes (Google) <joel@joelfernandes.org>
Date:   Thu, 12 Jan 2023 00:52:22 +0000

    rcu: Track laziness during boot and suspend

    Boot and suspend/resume should not be slowed down in kernels built with
    CONFIG_RCU_LAZY=y.  In particular, suspend can sometimes fail in such
    kernels.

    This commit therefore adds rcu_async_hurry(), rcu_async_relax(), and
    rcu_async_should_hurry() functions that track whether or not either
    a boot or a suspend/resume operation is in progress.  This will
    enable a later commit to refrain from laziness during those times.

    Export rcu_async_should_hurry(), rcu_async_hurry(), and rcu_async_relax()
    for later use by rcutorture.

    [ paulmck: Apply feedback from Steve Rostedt. ]

    Fixes: 3cb278e73be5 ("rcu: Make call_rcu() lazy to save power")
    Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-09-22 13:21:40 -04:00
Waiman Long 8e4ee1e6a8 rcu: Remove redundant call to rcu_boost_kthread_setaffinity()
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit ccfe1fef9409ca80ffad6ce822a6d15eaee67c91
Author: Zqiang <qiang1.zhang@intel.com>
Date:   Wed, 21 Dec 2022 11:15:43 -0800

    rcu: Remove redundant call to rcu_boost_kthread_setaffinity()

    The rcu_boost_kthread_setaffinity() function is invoked at
    rcutree_online_cpu() and rcutree_offline_cpu() time, early in the online
    timeline and late in the offline timeline, respectively.  It is also
    invoked from rcutree_dead_cpu(), however, in the absence of userspace
    manipulations (for which userspace must take responsibility), this call
    is redundant with that from rcutree_offline_cpu().  This redundancy can
    be demonstrated by printing out the relevant cpumasks

    This commit therefore removes the call to rcu_boost_kthread_setaffinity()
    from rcutree_dead_cpu().

    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Frederic Weisbecker <frederic@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-09-22 13:21:40 -04:00
Waiman Long 576c46a4e4 rcu: Add RCU stall diagnosis information
JIRA: https://issues.redhat.com/browse/RHEL-5228
Conflicts: Upstream merge conflicts in kernel/rcu/{rcu.h,update.c}
	   and Documentation/admin-guide/kernel-parameters.txt with
	   commit 92987fe8bdd1 ("rcu: Allow expedited RCU CPU stall
	   warnings to dump task stacks").  Resolved according
	   to upstream merge commit bba8d3d17dc2 ("Merge branch
	   'stall.2023.01.09a' into HEAD").

commit be42f00b73a0f50710d16eb7cb4efda0cce062dd
Author: Zhen Lei <thunder.leizhen@huawei.com>
Date:   Sat, 19 Nov 2022 17:25:06 +0800

    rcu: Add RCU stall diagnosis information

    Because RCU CPU stall warnings are driven from the scheduling-clock
    interrupt handler, a workload consisting of a very large number of
    short-duration hardware interrupts can result in misleading stall-warning
    messages.  On systems supporting only a single level of interrupts,
    that is, where interrupts handlers cannot be interrupted, this can
    produce misleading diagnostics.  The stack traces will show the
    innocent-bystander interrupted task, not the interrupts that are
    at the very least exacerbating the stall.

    This situation can be improved by displaying the number of interrupts
    and the CPU time that they have consumed.  Diagnosing other types
    of stalls can be eased by also providing the count of softirqs and
    the CPU time that they consumed as well as the number of context
    switches and the task-level CPU time consumed.

    Consider the following output given this change:

    rcu: INFO: rcu_preempt self-detected stall on CPU
    rcu:     0-....: (1250 ticks this GP) <omitted>
    rcu:          hardirqs   softirqs   csw/system
    rcu:  number:      624         45            0
    rcu: cputime:       69          1         2425   ==> 2500(ms)

    This output shows that the number of hard and soft interrupts is small,
    there are no context switches, and the system takes up a lot of time. This
    indicates that the current task is looping with preemption disabled.

    The impact on system performance is negligible because snapshot is
    recorded only once for all continuous RCU stalls.

    This added debugging information is suppressed by default and can be
    enabled by building the kernel with CONFIG_RCU_CPU_STALL_CPUTIME=y or
    by booting with rcupdate.rcu_cpu_stall_cputime=1.

    Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
    Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
    Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-09-22 13:21:38 -04:00
Waiman Long 4f25f6707a rcu/kvfree: Split ready for reclaim objects from a batch
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit 2ca836b1da1777c75b7363a7ca2973e8ab11fc21
Author: Uladzislau Rezki (Sony) <urezki@gmail.com>
Date:   Wed, 14 Dec 2022 13:06:30 +0100

    rcu/kvfree: Split ready for reclaim objects from a batch

    This patch splits the lists of objects so as to avoid sending any
    through RCU that have already been queued for more than one grace
    period.  These long-term-resident objects are immediately freed.
    The remaining short-term-resident objects are queued for later freeing
    using queue_rcu_work().

    This change avoids delaying workqueue handlers with synchronize_rcu()
    invocations.  Yes, workqueue handlers are designed to handle blocking,
    but avoiding blocking when unnecessary improves performance during
    low-memory situations.

    Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-09-22 13:21:29 -04:00
Waiman Long fc9133f118 rcu/kvfree: Carefully reset number of objects in krcp
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit 4c33464ae85e59cba3f8048a34d571edf229823a
Author: Uladzislau Rezki (Sony) <urezki@gmail.com>
Date:   Wed, 14 Dec 2022 13:06:29 +0100

    rcu/kvfree: Carefully reset number of objects in krcp

    The schedule_delayed_monitor_work() function relies on the count of
    objects queued into any given kfree_rcu_cpu structure.  This count is
    used to determine how quickly to schedule passing these objects to RCU.

    There are three pipes where pointers can be placed.  When any pipe is
    offloaded, the kfree_rcu_cpu structure's ->count counter is set to zero,
    which is wrong because the other pipes might still be non-empty.

    This commit therefore maintains per-pipe counters, and introduces a
    krc_count() helper to access the aggregate value of those counters.

    Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-09-22 13:21:28 -04:00
Waiman Long 4b31e8117b rcu/kvfree: Use READ_ONCE() when access to krcp->head
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit 9627456101ec9bb502daae7276e5141f66a9ddd1
Author: Uladzislau Rezki (Sony) <urezki@gmail.com>
Date:   Fri, 2 Dec 2022 14:18:37 +0100

    rcu/kvfree: Use READ_ONCE() when access to krcp->head

    The need_offload_krc() function is now lock-free, which gives the
    compiler freedom to load old values from plain C-language loads from
    the kfree_rcu_cpu struture's ->head pointer.  This commit therefore
    applied READ_ONCE() to these loads.

    Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-09-22 13:21:27 -04:00
Waiman Long 4b9780e14d rcu/kvfree: Use a polled API to speedup a reclaim process
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit cc37d52076a91d8391bbd16249a5790a35292b85
Author: Uladzislau Rezki (Sony) <urezki@gmail.com>
Date:   Tue, 29 Nov 2022 16:58:22 +0100

    rcu/kvfree: Use a polled API to speedup a reclaim process

    Currently all objects placed into a batch wait for a full grace period
    to elapse after that batch is ready to send to RCU.  However, this
    can unnecessarily delay freeing of the first objects that were added
    to the batch.  After all, several RCU grace periods might have elapsed
    since those objects were added, and if so, there is no point in further
    deferring their freeing.

    This commit therefore adds per-page grace-period snapshots which are
    obtained from get_state_synchronize_rcu().  When the batch is ready
    to be passed to call_rcu(), each page's snapshot is checked by passing
    it to poll_state_synchronize_rcu().  If a given page's RCU grace period
    has already elapsed, its objects are freed immediately by kvfree_rcu_bulk().
    Otherwise, these objects are freed after a call to synchronize_rcu().

    This approach requires that the pages be traversed in reverse order,
    that is, the oldest ones first.

    Test example:

    kvm.sh --memory 10G --torture rcuscale --allcpus --duration 1 \
      --kconfig CONFIG_NR_CPUS=64 \
      --kconfig CONFIG_RCU_NOCB_CPU=y \
      --kconfig CONFIG_RCU_NOCB_CPU_DEFAULT_ALL=y \
      --kconfig CONFIG_RCU_LAZY=n \
      --bootargs "rcuscale.kfree_rcu_test=1 rcuscale.kfree_nthreads=16 \
      rcuscale.holdoff=20 rcuscale.kfree_loops=10000 \
      torture.disable_onoff_at_boot" --trust-make

    Before this commit:

    Total time taken by all kfree'ers: 8535693700 ns, loops: 10000, batches: 1188, memory footprint: 2248MB
    Total time taken by all kfree'ers: 8466933582 ns, loops: 10000, batches: 1157, memory footprint: 2820MB
    Total time taken by all kfree'ers: 5375602446 ns, loops: 10000, batches: 1130, memory footprint: 6502MB
    Total time taken by all kfree'ers: 7523283832 ns, loops: 10000, batches: 1006, memory footprint: 3343MB
    Total time taken by all kfree'ers: 6459171956 ns, loops: 10000, batches: 1150, memory footprint: 6549MB

    After this commit:

    Total time taken by all kfree'ers: 8560060176 ns, loops: 10000, batches: 1787, memory footprint: 61MB
    Total time taken by all kfree'ers: 8573885501 ns, loops: 10000, batches: 1777, memory footprint: 93MB
    Total time taken by all kfree'ers: 8320000202 ns, loops: 10000, batches: 1727, memory footprint: 66MB
    Total time taken by all kfree'ers: 8552718794 ns, loops: 10000, batches: 1790, memory footprint: 75MB
    Total time taken by all kfree'ers: 8601368792 ns, loops: 10000, batches: 1724, memory footprint: 62MB

    The reduction in memory footprint is well in excess of an order of
    magnitude.

    Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-09-22 13:21:27 -04:00
Waiman Long 52e5c38923 rcu/kvfree: Move need_offload_krc() out of krcp->lock
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit 8fc5494ad5face62747a3937db66b00db1e5d80b
Author: Uladzislau Rezki (Sony) <urezki@gmail.com>
Date:   Tue, 29 Nov 2022 16:58:21 +0100

    rcu/kvfree: Move need_offload_krc() out of krcp->lock

    The need_offload_krc() function currently holds the krcp->lock in order
    to safely check krcp->head.  This commit removes the need for this lock
    in that function by updating the krcp->head pointer using WRITE_ONCE()
    macro so that readers can carry out lockless loads of that pointer.

    Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-09-22 13:21:26 -04:00
Waiman Long 9549ecdf4f rcu/kvfree: Move bulk/list reclaim to separate functions
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit 8c15a9e8086508962b2b69456ed22dc517d91b15
Author: Uladzislau Rezki (Sony) <urezki@gmail.com>
Date:   Tue, 29 Nov 2022 16:58:20 +0100

    rcu/kvfree: Move bulk/list reclaim to separate functions

    The kvfree_rcu() code maintains lists of pages of pointers, but also a
    singly linked list, with the latter being used when memory allocation
    fails.  Traversal of these two types of lists is currently open coded.
    This commit simplifies the code by providing kvfree_rcu_bulk() and
    kvfree_rcu_list() functions, respectively, to traverse these two types
    of lists.  This patch does not introduce any functional change.

    Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-09-22 13:21:26 -04:00
Waiman Long d470bd0c70 rcu/kvfree: Switch to a generic linked list API
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit 27538e18b62fa38d38c593e8c9e050a31b6c8cea
Author: Uladzislau Rezki (Sony) <urezki@gmail.com>
Date:   Tue, 29 Nov 2022 16:58:19 +0100

    rcu/kvfree: Switch to a generic linked list API

    This commit improves the readability and maintainability of the
    kvfree_rcu() code by switching from an open-coded linked list to
    the standard Linux-kernel circular doubly linked list.  This patch
    does not introduce any functional change.

    Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-09-22 13:21:26 -04:00
Waiman Long 4cb9cac60d rcu: Refactor kvfree_call_rcu() and high-level helpers
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit 04a522b7da3dbc083f8ae0aa1a6184b959a8f81c
Author: Uladzislau Rezki (Sony) <urezki@gmail.com>
Date:   Tue, 25 Oct 2022 16:46:12 +0200

    rcu: Refactor kvfree_call_rcu() and high-level helpers

    Currently a kvfree_call_rcu() takes an offset within a structure as
    a second parameter, so a helper such as a kvfree_rcu_arg_2() has to
    convert rcu_head and a freed ptr to an offset in order to pass it. That
    leads to an extra conversion on macro entry.

    Instead of converting, refactor the code in way that a pointer that has
    to be freed is passed directly to the kvfree_call_rcu().

    This patch does not make any functional change and is transparent to
    all kvfree_rcu() users.

    Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-09-22 13:21:25 -04:00
Waiman Long 171845fc7b rcu: Test synchronous RCU grace periods at the end of rcu_init()
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit 748bf47a89d722c7e77f8700705e2189be14e99e
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Mon, 19 Dec 2022 17:02:20 -0800

    rcu: Test synchronous RCU grace periods at the end of rcu_init()

    This commit tests synchronize_rcu() and synchronize_rcu_expedited()
    at the end of rcu_init(), in addition to the test already at the
    beginning of that function.  These tests are run only in kernels built
    with CONFIG_PROVE_RCU=y.

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-09-22 13:21:24 -04:00
Waiman Long 9f2c6d19fa rcu: Make rcu_blocking_is_gp() stop early-boot might_sleep()
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit 3d1adf7ada352b80e037509d26cdca156f75e830
Author: Zqiang <qiang1.zhang@intel.com>
Date:   Thu, 15 Dec 2022 11:57:55 +0800

    rcu: Make rcu_blocking_is_gp() stop early-boot might_sleep()

    Currently, rcu_blocking_is_gp() invokes might_sleep() even during early
    boot when interrupts are disabled and before the scheduler is scheduling.
    This is at best an accident waiting to happen.  Therefore, this commit
    moves that might_sleep() under an rcu_scheduler_active check in order
    to ensure that might_sleep() is not invoked unless sleeping might actually
    happen.

    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-09-22 13:21:24 -04:00
Waiman Long 80cf97d1a0 rcu: Upgrade header comment for poll_state_synchronize_rcu()
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit 95ff24ee7b809ff8d253cd5edf196f137ae08c44
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Fri, 25 Nov 2022 08:43:10 -0800

    rcu: Upgrade header comment for poll_state_synchronize_rcu()

    This commit emphasizes the possibility of concurrent calls to
    synchronize_rcu() and synchronize_rcu_expedited() causing one or
    the other of the two grace periods being lost from the viewpoint of
    poll_state_synchronize_rcu().

    If you cannot afford to lose grace periods this way, you should
    instead use the _full() variants of the polled RCU API, for
    example, poll_state_synchronize_rcu_full().

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-09-22 13:21:22 -04:00
Waiman Long 268ded33a8 rcu: Throttle callback invocation based on number of ready callbacks
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit 253cbbff621407a6265ce7a6a03c3766f8846f02
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Mon, 14 Nov 2022 09:40:19 -0800

    rcu: Throttle callback invocation based on number of ready callbacks

    Currently, rcu_do_batch() sizes its batches based on the total number
    of callbacks in the callback list.  This can result in some strange
    choices, for example, if there was 12,800 callbacks in the list, but
    only 200 were ready to invoke, RCU would invoke 100 at a time (12,800
    shifted down by seven bits).

    A more measured approach would use the number that were actually ready
    to invoke, an approach that has become feasible only recently given the
    per-segment ->seglen counts in ->cblist.

    This commit therefore bases the batch limit on the number of callbacks
    ready to invoke instead of on the total number of callbacks.

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-09-22 13:21:21 -04:00
Waiman Long b2f1e8fb35 rcu: Consolidate initialization and CPU-hotplug code
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit 5a04848d005e051b8c063206b1a03363aca8ade4
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Sun, 6 Nov 2022 16:33:38 -0800

    rcu: Consolidate initialization and CPU-hotplug code

    This commit consolidates the initialization and CPU-hotplug code at
    the end of kernel/rcu/tree.c.  This is strictly a code-motion commit.
    No functionality has changed.

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-09-22 13:21:21 -04:00
Waiman Long dbe91e0313 rcu: Don't assert interrupts enabled too early in boot
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit 3f6c3d29df58f391cf487b50a24ebd24045ba569
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Thu, 15 Dec 2022 09:26:09 -0800

    rcu: Don't assert interrupts enabled too early in boot

    The rcu_poll_gp_seq_end() and rcu_poll_gp_seq_end_unlocked() both check
    that interrupts are enabled, as they normally should be when waiting for
    an RCU grace period.  Except that it is legal to wait for grace periods
    during early boot, before interrupts have been enabled for the first time,
    and polling for grace periods is required to work during this time.
    This can result in false-positive lockdep splats in the presence of
    boot-time-initiated tracing.

    This commit therefore conditions those interrupts-enabled checks on
    rcu_scheduler_active having advanced past RCU_SCHEDULER_INACTIVE, by
    which time interrupts have been enabled.

    Reported-by: Steven Rostedt <rostedt@goodmis.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-09-22 13:21:20 -04:00
Waiman Long 3f47536644 rcu: Make call_rcu() lazy to save power
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit 3cb278e73be58bfb780ecd55129296d2f74c1fb7
Author: Joel Fernandes (Google) <joel@joelfernandes.org>
Date:   Sun, 16 Oct 2022 16:22:54 +0000

    rcu: Make call_rcu() lazy to save power

    Implement timer-based RCU callback batching (also known as lazy
    callbacks). With this we save about 5-10% of power consumed due
    to RCU requests that happen when system is lightly loaded or idle.

    By default, all async callbacks (queued via call_rcu) are marked
    lazy. An alternate API call_rcu_hurry() is provided for the few users,
    for example synchronize_rcu(), that need the old behavior.

    The batch is flushed whenever a certain amount of time has passed, or
    the batch on a particular CPU grows too big. Also memory pressure will
    flush it in a future patch.

    To handle several corner cases automagically (such as rcu_barrier() and
    hotplug), we re-use bypass lists which were originally introduced to
    address lock contention, to handle lazy CBs as well. The bypass list
    length has the lazy CB length included in it. A separate lazy CB length
    counter is also introduced to keep track of the number of lazy CBs.

    [ paulmck: Fix formatting of inline call_rcu_lazy() definition. ]
    [ paulmck: Apply Zqiang feedback. ]
    [ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ]

    Suggested-by: Paul McKenney <paulmck@kernel.org>
    Acked-by: Frederic Weisbecker <frederic@kernel.org>
    Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-09-22 13:21:15 -04:00
Waiman Long 703b79599b rcu: Fix missing nocb gp wake on rcu_barrier()
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit b8f7aca3f0e0e6223094ba2662bac90353674b04
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Sun, 16 Oct 2022 16:22:53 +0000

    rcu: Fix missing nocb gp wake on rcu_barrier()

    In preparation for RCU lazy changes, wake up the RCU nocb gp thread if
    needed after an entrain.  This change prevents the RCU barrier callback
    from waiting in the queue for several seconds before the lazy callbacks
    in front of it are serviced.

    Reported-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-09-22 13:21:10 -04:00
Waiman Long fdc0d8e219 rcu: Use READ_ONCE() for lockless read of rnp->qsmask
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit aba9645bd10bd9f793732b06495b1312ee44865e
Author: Joel Fernandes (Google) <joel@joelfernandes.org>
Date:   Sat, 17 Sep 2022 16:41:58 +0000

    rcu: Use READ_ONCE() for lockless read of rnp->qsmask

    The rnp->qsmask is locklessly accessed from rcutree_dying_cpu(). This
    may help avoid load tearing due to concurrent access, KCSAN
    issues, and preserve sanity of people reading the mask in tracing.

    Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
    Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-09-22 13:21:08 -04:00
Waiman Long 31732955b5 rcu: Remove duplicate RCU exp QS report from rcu_report_dead()
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit d6fd907a95a73251bd8494e1ba5350342e05e74a
Author: Zqiang <qiang1.zhang@intel.com>
Date:   Tue, 30 Aug 2022 16:31:51 +0800

    rcu: Remove duplicate RCU exp QS report from rcu_report_dead()

    The rcu_report_dead() function invokes rcu_report_exp_rdp() in order
    to force an immediate expedited quiescent state on the outgoing
    CPU, and then it invokes rcu_preempt_deferred_qs() to provide any
    required deferred quiescent state of either sort.  Because the call to
    rcu_preempt_deferred_qs() provides the expedited RCU quiescent state if
    requested, the call to rcu_report_exp_rdp() is potentially redundant.

    One possible issue is a concurrent start of a new expedited RCU
    grace period, but this situation is already handled correctly
    by __sync_rcu_exp_select_node_cpus().  This function will detect
    that the CPU is going offline via the error return from its call
    to smp_call_function_single().  In that case, it will retry, and
    eventually stop retrying due to rcu_report_exp_rdp() clearing the
    ->qsmaskinitnext bit corresponding to the target CPU.  As a result,
    __sync_rcu_exp_select_node_cpus() will report the necessary quiescent
    state after dealing with any remaining CPU.

    This change assumes that control does not enter rcu_report_dead() within
    an RCU read-side critical section, but then again, the surviving call
    to rcu_preempt_deferred_qs() has always made this assumption.

    This commit therefore removes the call to rcu_report_exp_rdp(), thus
    relying on rcu_preempt_deferred_qs() to handle both normal and expedited
    quiescent states.

    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-09-22 13:21:07 -04:00
Waiman Long 27af014085 rcu: Fix __this_cpu_read() lockdep warning in rcu_force_quiescent_state()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit ceb1c8c9b8aa9199da46a0f29d2d5f08d9b44c15
Author: Zqiang <qiang1.zhang@intel.com>
Date:   Thu, 13 Oct 2022 12:41:48 +0800

    rcu: Fix __this_cpu_read() lockdep warning in rcu_force_quiescent_state()

    Running rcutorture with non-zero fqs_duration module parameter in a
    kernel built with CONFIG_PREEMPTION=y results in the following splat:

    BUG: using __this_cpu_read() in preemptible [00000000]
    code: rcu_torture_fqs/398
    caller is __this_cpu_preempt_check+0x13/0x20
    CPU: 3 PID: 398 Comm: rcu_torture_fqs Not tainted 6.0.0-rc1-yoctodev-standard+
    Call Trace:
    <TASK>
    dump_stack_lvl+0x5b/0x86
    dump_stack+0x10/0x16
    check_preemption_disabled+0xe5/0xf0
    __this_cpu_preempt_check+0x13/0x20
    rcu_force_quiescent_state.part.0+0x1c/0x170
    rcu_force_quiescent_state+0x1e/0x30
    rcu_torture_fqs+0xca/0x160
    ? rcu_torture_boost+0x430/0x430
    kthread+0x192/0x1d0
    ? kthread_complete_and_exit+0x30/0x30
    ret_from_fork+0x22/0x30
    </TASK>

    The problem is that rcu_force_quiescent_state() uses __this_cpu_read()
    in preemptible code instead of the proper raw_cpu_read().  This commit
    therefore changes __this_cpu_read() to raw_cpu_read().

    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
    Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:48:16 -04:00
Waiman Long 552ac953d1 rcu: Keep synchronize_rcu() from enabling irqs in early boot
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 31d8aaa87fcef1be5932f3813ea369e21bd3b11d
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Thu, 20 Oct 2022 10:58:14 -0700

    rcu: Keep synchronize_rcu() from enabling irqs in early boot

    Making polled RCU grace periods account for expedited grace periods
    required acquiring the leaf rcu_node structure's lock during early boot,
    but after rcu_init() was called.  This lock is irq-disabled, but the
    code incorrectly assumes that irqs are always disabled when invoking
    synchronize_rcu().  The exception is early boot before the scheduler has
    started, which means that upon return from synchronize_rcu(), irqs will
    be incorrectly enabled.

    This commit fixes this bug by using irqsave/irqrestore locking primitives.

    Fixes: bf95b2bc3e42 ("rcu: Switch polled grace-period APIs to ->gp_seq_polled")

    Reported-by: Steven Rostedt <rostedt@goodmis.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:48:16 -04:00
Waiman Long 9cc21271ea rcu-tasks: Make RCU Tasks Trace check for userspace execution
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 528262f50274079740b53e29bcaaabf219aa7417
Author: Zqiang <qiang1.zhang@intel.com>
Date:   Tue, 19 Jul 2022 12:39:00 +0800

    rcu-tasks: Make RCU Tasks Trace check for userspace execution

    Userspace execution is a valid quiescent state for RCU Tasks Trace,
    but the scheduling-clock interrupt does not currently report such
    quiescent states.

    Of course, the scheduling-clock interrupt is not strictly speaking
    userspace execution.  However, the only way that this code is not
    in a quiescent state is if something invoked rcu_read_lock_trace(),
    and that would be reflected in the ->trc_reader_nesting field in
    the task_struct structure.  Furthermore, this field is checked by
    rcu_tasks_trace_qs(), which is invoked by rcu_tasks_qs() which is in
    turn invoked by rcu_note_voluntary_context_switch() in kernels building
    at least one of the RCU Tasks flavors.  It is therefore safe to invoke
    rcu_tasks_trace_qs() from the rcu_sched_clock_irq().

    But rcu_tasks_qs() also invokes rcu_tasks_classic_qs() for RCU
    Tasks, which lacks the read-side markers provided by RCU Tasks Trace.
    This raises the possibility that an RCU Tasks grace period could start
    after the interrupt from userspace execution, but before the call to
    rcu_sched_clock_irq().  However, it turns out that this is safe because
    the RCU Tasks grace period waits for an RCU grace period, which will
    wait for the entire scheduling-clock interrupt handler, including any
    RCU Tasks read-side critical section that this handler might contain.

    This commit therefore updates the rcu_sched_clock_irq() function's
    check for usermode execution and its call to rcu_tasks_classic_qs()
    to instead check for both usermode execution and interrupt from idle,
    and to instead call rcu_note_voluntary_context_switch().  This
    consolidates code and provides more faster RCU Tasks Trace
    reporting of quiescent states in kernels that do scheduling-clock
    interrupts for userspace execution.

    [ paulmck: Consolidate checks into rcu_sched_clock_irq(). ]

    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:48:15 -04:00
Waiman Long 60efae9a37 rcu: Make synchronize_rcu() fastpath update only boot-CPU counters
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit d761de8a7dcef8e8e9e20a543f85a2c079ae3d0d
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Fri, 5 Aug 2022 15:42:25 -0700

    rcu: Make synchronize_rcu() fastpath update only boot-CPU counters

    Large systems can have hundreds of rcu_node structures, and updating
    counters in each of them might slow down booting.  This commit therefore
    updates only the counters in those rcu_node structures corresponding
    to the boot CPU, up to and including the root rcu_node structure.

    The counters for the remaining rcu_node structures are updated by the
    rcu_scheduler_starting() function, which executes just before the first
    non-boot kthread is spawned.

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:48:10 -04:00
Waiman Long 0e3b334bf3 rcu: Remove ->rgos_polled field from rcu_gp_oldstate structure
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 7ecef0871dd9a879038dbe8a681ab48bd0c92988
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Thu, 4 Aug 2022 17:54:53 -0700

    rcu: Remove ->rgos_polled field from rcu_gp_oldstate structure

    Because both normal and expedited grace periods increment their respective
    counters on their pre-scheduler early boot fastpaths, the rcu_gp_oldstate
    structure no longer needs its ->rgos_polled field.  This commit therefore
    removes this field, shrinking this structure so that it is the same size
    as an rcu_head structure.

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:48:09 -04:00
Waiman Long b698a047d6 rcu: Make synchronize_rcu() fast path update ->gp_seq counters
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 910e12092eac8a9f19b507ed0fdc1c21d8da9483
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Thu, 4 Aug 2022 17:28:01 -0700

    rcu: Make synchronize_rcu() fast path update ->gp_seq counters

    This commit causes the early boot single-CPU synchronize_rcu() fastpath to
    update the rcu_state and rcu_node structures' ->gp_seq and ->gp_seq_needed
    counters.  This will allow the full-state polled grace-period APIs to
    detect all normal grace periods without the need to track the special
    combined polling-only counter, which is a step towards removing the
    ->rgos_polled field from the rcu_gp_oldstate, thereby reducing its size
    by one third.

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:48:08 -04:00
Waiman Long 78f541c306 rcu-tasks: Remove grace-period fast-path rcu-tasks helper
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 5f11bad6b7228858e06729de6dd4079dfc082648
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Thu, 4 Aug 2022 17:16:24 -0700

    rcu-tasks: Remove grace-period fast-path rcu-tasks helper

    Now that the grace-period fast path can only happen during the
    pre-scheduler portion of early boot, this fast path can no longer block
    run-time RCU Tasks and RCU Tasks Trace grace periods.  This commit
    therefore removes the conditional cond_resched_tasks_rcu_qs() invocation.

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:48:07 -04:00
Waiman Long 35bf40e27d rcu: Set rcu_data structures' initial ->gpwrap value to true
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit a5d1b0b68a62afb1bce0b36cc9a1875acf8a6dff
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Thu, 4 Aug 2022 17:01:55 -0700

    rcu: Set rcu_data structures' initial ->gpwrap value to true

    It would be good do reduce the size of the rcu_gp_oldstate structure
    from three unsigned long instances to two, but this requires that the
    boot-time optimized grace periods update the various ->gp_seq fields.
    Updating these fields in the rcu_state structure and in all of the
    rcu_node structures is at least semi-reasonable, but updating them in
    all of the rcu_data structures is a bridge too far.  This means that if
    there are too many early boot-time grace periods, the ->gp_seq field in
    the rcu_data structure cannot be trusted.  This commit therefore sets
    each rcu_data structure's ->gpwrap field to provide the necessary impetus
    for a suitable level of distrust.

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:48:07 -04:00
Waiman Long 5b4d8ec003 rcu: Disable run-time single-CPU grace-period optimization
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 258f887aba60c8fc7946a9f379f9a3889f92fc85
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Thu, 4 Aug 2022 16:07:04 -0700

    rcu: Disable run-time single-CPU grace-period optimization

    The run-time single-CPU grace-period optimization applies only to
    kernels built with CONFIG_SMP=y && CONFIG_PREEMPTION=y that are running
    on a single-CPU system.  But a kernel intended for a single-CPU system
    should instead be built with CONFIG_SMP=n, and in any case, single-CPU
    systems running Linux no longer appear to be the common case.  Plus this
    optimization results in the rcu_gp_oldstate structure being half again
    larger than it needs to be.

    This commit therefore disables the run-time single-CPU grace-period
    optimization, so that this optimization applies only during the
    pre-scheduler portion of the boot sequence.

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:48:06 -04:00
Waiman Long 9922daa4a0 rcu: Add full-sized polling for cond_sync_full()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit b6fe4917ae4353b397079902cb024ae01f20dfb2
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Thu, 4 Aug 2022 13:46:05 -0700

    rcu: Add full-sized polling for cond_sync_full()

    The cond_synchronize_rcu() API compresses the combined expedited and
    normal grace-period states into a single unsigned long, which conserves
    storage, but can miss grace periods in certain cases involving overlapping
    normal and expedited grace periods.  Missing the occasional grace period
    is usually not a problem, but there are use cases that care about each
    and every grace period.

    This commit therefore adds yet another member of the full-state RCU
    grace-period polling API, which is the cond_synchronize_rcu_full()
    function.  This uses up to three times the storage (rcu_gp_oldstate
    structure instead of unsigned long), but is guaranteed not to miss
    grace periods.

    [ paulmck: Apply feedback from kernel test robot and Julia Lawall. ]

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:48:05 -04:00
Waiman Long e0d4dbffcc rcu: Remove blank line from poll_state_synchronize_rcu() docbook header
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit f21e014345e0abf11fdc2e59fb6eb6d6aa6ae4eb
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Wed, 3 Aug 2022 16:57:47 -0700

    rcu: Remove blank line from poll_state_synchronize_rcu() docbook header

    This commit removes the blank line preceding the oldstate parameter to
    the docbook header for the poll_state_synchronize_rcu() function and
    marks uses of this parameter later in that header.

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:48:05 -04:00
Waiman Long 5e6a0e33ba rcu: Add full-sized polling for start_poll()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 76ea364161e72b1878126edf8d507d2a62fb47b0
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Tue, 2 Aug 2022 17:04:54 -0700

    rcu: Add full-sized polling for start_poll()

    The start_poll_synchronize_rcu() API compresses the combined expedited and
    normal grace-period states into a single unsigned long, which conserves
    storage, but can miss grace periods in certain cases involving overlapping
    normal and expedited grace periods.  Missing the occasional grace period
    is usually not a problem, but there are use cases that care about each
    and every grace period.

    This commit therefore adds the next member of the full-state RCU
    grace-period polling API, namely the start_poll_synchronize_rcu_full()
    function.  This uses up to three times the storage (rcu_gp_oldstate
    structure instead of unsigned long), but is guaranteed not to miss
    grace periods.

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:48:04 -04:00
Waiman Long e6a0ce88c8 rcu: Add full-sized polling for get_state()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 3fdefca9b42c8bebe3beea5c1a067c9718ca0fc7
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Thu, 28 Jul 2022 19:58:13 -0700

    rcu: Add full-sized polling for get_state()

    The get_state_synchronize_rcu() API compresses the combined expedited and
    normal grace-period states into a single unsigned long, which conserves
    storage, but can miss grace periods in certain cases involving overlapping
    normal and expedited grace periods.  Missing the occasional grace period
    is usually not a problem, but there are use cases that care about each
    and every grace period.

    This commit therefore adds the next member of the full-state RCU
    grace-period polling API, namely the get_state_synchronize_rcu_full()
    function.  This uses up to three times the storage (rcu_gp_oldstate
    structure instead of unsigned long), but is guaranteed not to miss
    grace periods.

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:48:01 -04:00
Waiman Long 4926ab6d4f rcu: Add full-sized polling for get_completed*() and poll_state*()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 91a967fd6934abc0c7e4b0d26728e38674278707
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Thu, 28 Jul 2022 15:37:05 -0700

    rcu: Add full-sized polling for get_completed*() and poll_state*()

    The get_completed_synchronize_rcu() and poll_state_synchronize_rcu()
    APIs compress the combined expedited and normal grace-period states into a
    single unsigned long, which conserves storage, but can miss grace periods
    in certain cases involving overlapping normal and expedited grace periods.
    Missing the occasional grace period is usually not a problem, but there
    are use cases that care about each and every grace period.

    This commit therefore adds the first members of the full-state RCU
    grace-period polling API, namely the get_completed_synchronize_rcu_full()
    and poll_state_synchronize_rcu_full() functions.  These use up to three
    times the storage (rcu_gp_oldstate structure instead of unsigned long),
    but which are guaranteed not to miss grace periods, at least in situations
    where the single-CPU grace-period optimization does not apply.

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:48:01 -04:00
Waiman Long 40145cdc1e rcu/kvfree: Update KFREE_DRAIN_JIFFIES interval
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 51824b780b719c53113dc39e027fbf670dc66028
Author: Uladzislau Rezki (Sony) <urezki@gmail.com>
Date:   Thu, 30 Jun 2022 18:33:35 +0200

    rcu/kvfree: Update KFREE_DRAIN_JIFFIES interval

    Currently the monitor work is scheduled with a fixed interval of HZ/20,
    which is roughly 50 milliseconds. The drawback of this approach is
    low utilization of the 512 page slots in scenarios with infrequence
    kvfree_rcu() calls.  For example on an Android system:

    <snip>
      kworker/3:3-507     [003] ....   470.286305: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000d0f0dde5 nr_records=6
      kworker/6:1-76      [006] ....   470.416613: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000ea0d6556 nr_records=1
      kworker/6:1-76      [006] ....   470.416625: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000003e025849 nr_records=9
      kworker/3:3-507     [003] ....   471.390000: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000815a8713 nr_records=48
      kworker/1:1-73      [001] ....   471.725785: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000fda9bf20 nr_records=3
      kworker/1:1-73      [001] ....   471.725833: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000a425b67b nr_records=76
      kworker/0:4-1411    [000] ....   472.085673: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000007996be9d nr_records=1
      kworker/0:4-1411    [000] ....   472.085728: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000d0f0dde5 nr_records=5
      kworker/6:1-76      [006] ....   472.260340: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000065630ee4 nr_records=102
    <snip>

    In many cases, out of 512 slots, fewer than 10 were actually used.
    In order to improve batching and make utilization more efficient this
    commit sets a drain interval to a fixed 5-seconds interval. Floods are
    detected when a page fills quickly, and in that case, the reclaim work
    is re-scheduled for the next scheduling-clock tick (jiffy).

    After this change:

    <snip>
      kworker/7:1-371     [007] ....  5630.725708: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000005ab0ffb3 nr_records=121
      kworker/7:1-371     [007] ....  5630.989702: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000060c84761 nr_records=47
      kworker/7:1-371     [007] ....  5630.989714: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000000babf308 nr_records=510
      kworker/7:1-371     [007] ....  5631.553790: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000bb7bd0ef nr_records=169
      kworker/7:1-371     [007] ....  5631.553808: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044c78753 nr_records=510
      kworker/5:6-9428    [005] ....  5631.746102: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000d98519aa nr_records=123
      kworker/4:7-9434    [004] ....  5632.001758: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000526c9d44 nr_records=322
      kworker/4:7-9434    [004] ....  5632.002073: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000002c6a8afa nr_records=185
      kworker/7:1-371     [007] ....  5632.277515: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000007f4a962f nr_records=510
    <snip>

    Here, all but one of the cases, more than one hundreds slots were used,
    representing an order-of-magnitude improvement.

    Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:48:00 -04:00
Waiman Long daffab757b rcu/kfree: Fix kfree_rcu_shrink_count() return value
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 38269096351806bf7315f971c53205b676ada259
Author: Joel Fernandes (Google) <joel@joelfernandes.org>
Date:   Wed, 22 Jun 2022 22:51:02 +0000

    rcu/kfree: Fix kfree_rcu_shrink_count() return value

    As per the comments in include/linux/shrinker.h, .count_objects callback
    should return the number of freeable items, but if there are no objects
    to free, SHRINK_EMPTY should be returned. The only time 0 is returned
    should be when we are unable to determine the number of objects, or the
    cache should be skipped for another reason.

    Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:47:59 -04:00
Waiman Long fe46f6317d rcu: Back off upon fill_page_cache_func() allocation failure
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 093590c16b447f53e66771c8579ae66c96f6ef61
Author: Michal Hocko <mhocko@suse.com>
Date:   Wed, 22 Jun 2022 13:47:11 +0200

    rcu: Back off upon fill_page_cache_func() allocation failure

    The fill_page_cache_func() function allocates couple of pages to store
    kvfree_rcu_bulk_data structures. This is a lightweight (GFP_NORETRY)
    allocation which can fail under memory pressure. The function will,
    however keep retrying even when the previous attempt has failed.

    This retrying is in theory correct, but in practice the allocation is
    invoked from workqueue context, which means that if the memory reclaim
    gets stuck, these retries can hog the worker for quite some time.
    Although the workqueues subsystem automatically adjusts concurrency, such
    adjustment is not guaranteed to happen until the worker context sleeps.
    And the fill_page_cache_func() function's retry loop is not guaranteed
    to sleep (see the should_reclaim_retry() function).

    And we have seen this function cause workqueue lockups:

    kernel: BUG: workqueue lockup - pool cpus=93 node=1 flags=0x1 nice=0 stuck for 32s!
    [...]
    kernel: pool 74: cpus=37 node=0 flags=0x1 nice=0 hung=32s workers=2 manager: 2146
    kernel:   pwq 498: cpus=249 node=1 flags=0x1 nice=0 active=4/256 refcnt=5
    kernel:     in-flight: 1917:fill_page_cache_func
    kernel:     pending: dbs_work_handler, free_work, kfree_rcu_monitor

    Originally, we thought that the root cause of this lockup was several
    retries with direct reclaim, but this is not yet confirmed.  Furthermore,
    we have seen similar lockups without any heavy memory pressure.  This
    suggests that there are other factors contributing to these lockups.
    However, it is not really clear that endless retries are desireable.

    So let's make the fill_page_cache_func() function back off after
    allocation failure.

    Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
    Cc: "Paul E. McKenney" <paulmck@kernel.org>
    Cc: Frederic Weisbecker <frederic@kernel.org>
    Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Cc: Josh Triplett <josh@joshtriplett.org>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Cc: Lai Jiangshan <jiangshanlai@gmail.com>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Signed-off-by: Michal Hocko <mhocko@suse.com>
    Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:47:59 -04:00
Waiman Long e580bb0d98 rcu: Add polled expedited grace-period primitives
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit d96c52fe4907c68adc5e61a0bef7aec0933223d5
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Fri, 15 Apr 2022 10:55:42 -0700

    rcu: Add polled expedited grace-period primitives

    This commit adds expedited grace-period functionality to RCU's polled
    grace-period API, adding start_poll_synchronize_rcu_expedited() and
    cond_synchronize_rcu_expedited(), which are similar to the existing
    start_poll_synchronize_rcu() and cond_synchronize_rcu() functions,
    respectively.

    Note that although start_poll_synchronize_rcu_expedited() can be invoked
    very early, the resulting expedited grace periods are not guaranteed
    to start until after workqueues are fully initialized.  On the other
    hand, both synchronize_rcu() and synchronize_rcu_expedited() can also
    be invoked very early, and the resulting grace periods will be taken
    into account as they occur.

    [ paulmck: Apply feedback from Neeraj Upadhyay. ]

    Link: https://lore.kernel.org/all/20220121142454.1994916-1-bfoster@redhat.com/
    Link: https://docs.google.com/document/d/1RNKWW9jQyfjxw2E8dsXVTdvZYh0HnYeSHDKog9jhdN8/edit?usp=sharing
    Cc: Brian Foster <bfoster@redhat.com>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Ian Kent <raven@themaw.net>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:47:52 -04:00
Waiman Long ce330fc3bc rcu: Make polled grace-period API account for expedited grace periods
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit dd04140531b5d38b77ad9ff7b18117654be5bf5c
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Thu, 14 Apr 2022 06:56:35 -0700

    rcu: Make polled grace-period API account for expedited grace periods

    Currently, this code could splat:

            oldstate = get_state_synchronize_rcu();
            synchronize_rcu_expedited();
            WARN_ON_ONCE(!poll_state_synchronize_rcu(oldstate));

    This situation is counter-intuitive and user-unfriendly.  After all, there
    really was a perfectly valid full grace period right after the call to
    get_state_synchronize_rcu(), so why shouldn't poll_state_synchronize_rcu()
    know about it?

    This commit therefore makes the polled grace-period API aware of expedited
    grace periods in addition to the normal grace periods that it is already
    aware of.  With this change, the above code is guaranteed not to splat.

    Please note that the above code can still splat due to counter wrap on the
    one hand and situations involving partially overlapping normal/expedited
    grace periods on the other.  On 64-bit systems, the second is of course
    much more likely than the first.  It is possible to modify this approach
    to prevent overlapping grace periods from causing splats, but only at
    the expense of greatly increasing the probability of counter wrap, as
    in within milliseconds on 32-bit systems and within minutes on 64-bit
    systems.

    This commit is in preparation for polled expedited grace periods.

    Link: https://lore.kernel.org/all/20220121142454.1994916-1-bfoster@redhat.com/
    Link: https://docs.google.com/document/d/1RNKWW9jQyfjxw2E8dsXVTdvZYh0HnYeSHDKog9jhdN8/edit?usp=sharing
    Cc: Brian Foster <bfoster@redhat.com>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Ian Kent <raven@themaw.net>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:47:51 -04:00
Waiman Long 7df8a78b55 rcu: Switch polled grace-period APIs to ->gp_seq_polled
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit bf95b2bc3e42f11f4d7a5e8a98376c2b4a2aa82f
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Wed, 13 Apr 2022 17:46:15 -0700

    rcu: Switch polled grace-period APIs to ->gp_seq_polled

    This commit switches the existing polled grace-period APIs to use a
    new ->gp_seq_polled counter in the rcu_state structure.  An additional
    ->gp_seq_polled_snap counter in that same structure allows the normal
    grace period kthread to interact properly with the !SMP !PREEMPT fastpath
    through synchronize_rcu().  The first of the two to note the end of a
    given grace period will make knowledge of this transition available to
    the polled API.

    This commit is in preparation for polled expedited grace periods.

    [ paulmck: Fix use of rcu_state.gp_seq_polled to start normal grace period. ]

    Link: https://lore.kernel.org/all/20220121142454.1994916-1-bfoster@redhat.com/
    Link: https://docs.google.com/document/d/1RNKWW9jQyfjxw2E8dsXVTdvZYh0HnYeSHDKog9jhdN8/edit?usp=sharing
    Cc: Brian Foster <bfoster@redhat.com>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Ian Kent <raven@themaw.net>
    Co-developed-by: Boqun Feng <boqun.feng@gmail.com>
    Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:47:51 -04:00
Waiman Long 9b02706bd3 rcu/nocb: Add option to opt rcuo kthreads out of RT priority
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 8f489b4da5278fc6e5fc8f0029ae7fb51c060215
Author: Uladzislau Rezki (Sony) <urezki@gmail.com>
Date:   Wed, 11 May 2022 10:57:03 +0200

    rcu/nocb: Add option to opt rcuo kthreads out of RT priority

    This commit introduces a RCU_NOCB_CPU_CB_BOOST Kconfig option that
    prevents rcuo kthreads from running at real-time priority, even in
    kernels built with RCU_BOOST.  This capability is important to devices
    needing low-latency (as in a few milliseconds) response from expedited
    RCU grace periods, but which are not running a classic real-time workload.
    On such devices, permitting the rcuo kthreads to run at real-time priority
    results in unacceptable latencies imposed on the application tasks,
    which run as SCHED_OTHER.

    See for example the following trace output:

    <snip>
    <...>-60 [006] d..1 2979.028717: rcu_batch_start: rcu_preempt CBs=34619 bl=270
    <snip>

    If that rcuop kthread were permitted to run at real-time SCHED_FIFO
    priority, it would monopolize its CPU for hundreds of milliseconds
    while invoking those 34619 RCU callback functions, which would cause an
    unacceptably long latency spike for many application stacks on Android
    platforms.

    However, some existing real-time workloads require that callback
    invocation run at SCHED_FIFO priority, for example, those running on
    systems with heavy SCHED_OTHER background loads.  (It is the real-time
    system's administrator's responsibility to make sure that important
    real-time tasks run at a higher priority than do RCU's kthreads.)

    Therefore, this new RCU_NOCB_CPU_CB_BOOST Kconfig option defaults to
    "y" on kernels built with PREEMPT_RT and defaults to "n" otherwise.
    The effect is to preserve current behavior for real-time systems, but for
    other systems to allow expedited RCU grace periods to run with real-time
    priority while continuing to invoke RCU callbacks as SCHED_OTHER.

    As you would expect, this RCU_NOCB_CPU_CB_BOOST Kconfig option has no
    effect except on CPUs with offloaded RCU callbacks.

    Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:23 -04:00
Waiman Long 3cd6c37180 rcu: Add nocb_cb_kthread check to rcu_is_callbacks_kthread()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 5103850654fdc651f0a7076ac753b958f018bb85
Author: Zqiang <qiang1.zhang@intel.com>
Date:   Fri, 29 Apr 2022 20:42:22 +0800

    rcu: Add nocb_cb_kthread check to rcu_is_callbacks_kthread()

    Callbacks are invoked in RCU kthreads when calbacks are offloaded
    (rcu_nocbs boot parameter) or when RCU's softirq handler has been
    offloaded to rcuc kthreads (use_softirq==0).  The current code allows
    for the rcu_nocbs case but not the use_softirq case.  This commit adds
    support for the use_softirq case.

    Reported-by: kernel test robot <lkp@intel.com>
    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:22 -04:00
Waiman Long 0cdca8e4a1 rcu/tree: Add comment to describe GP-done condition in fqs loop
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit a03ae49c4785c1bc7b940e38bbdf2e63d79d1470
Author: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Date:   Thu, 9 Jun 2022 12:43:40 +0530

    rcu/tree: Add comment to describe GP-done condition in fqs loop

    Add a comment to explain why !rcu_preempt_blocked_readers_cgp() condition
    is required on root rnp node, for GP completion check in rcu_gp_fqs_loop().

    Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:21 -04:00
Waiman Long cbaed607bb rcu: Initialize first_gp_fqs at declaration in rcu_gp_fqs()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 9bdb5b3a8d8ad1c92db309219859fe1c87c95351
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Wed, 8 Jun 2022 09:34:10 -0700

    rcu: Initialize first_gp_fqs at declaration in rcu_gp_fqs()

    This commit saves a line of code by initializing the rcu_gp_fqs()
    function's first_gp_fqs local variable in its declaration.

    Reported-by: Frederic Weisbecker <frederic@kernel.org>
    Reported-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:21 -04:00
Waiman Long b9bc6190d1 rcu/kvfree: Remove useless monitor_todo flag
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 82d26c36cc68e781400eb4e541f943008208f2d6
Author: Joel Fernandes (Google) <joel@joelfernandes.org>
Date:   Thu, 2 Jun 2022 10:06:43 +0200

    rcu/kvfree: Remove useless monitor_todo flag

    monitor_todo is not needed as the work struct already tracks
    if work is pending. Just use that to know if work is pending
    using schedule_delayed_work() helper.

    Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:21 -04:00
Waiman Long 4ebe041e32 rcu: Cleanup RCU urgency state for offline CPU
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit e2bb1288a381e9239aaf606ae8c1e20ea71c20bd
Author: Zqiang <qiang1.zhang@intel.com>
Date:   Thu, 26 May 2022 09:55:12 +0800

    rcu: Cleanup RCU urgency state for offline CPU

    When a CPU is slow to provide a quiescent state for a given grace
    period, RCU takes steps to encourage that CPU to get with the
    quiescent-state program in a more timely fashion.  These steps
    include these flags in the rcu_data structure:

    1.      ->rcu_urgent_qs, which causes the scheduling-clock interrupt to
            request an otherwise pointless context switch from the scheduler.

    2.      ->rcu_need_heavy_qs, which causes both cond_resched() and RCU's
            context-switch hook to do an immediate momentary quiscent state.

    3.      ->rcu_need_heavy_qs, which causes the scheduler-clock tick to
            be enabled even on nohz_full CPUs with only one runnable task.

    These flags are of course cleared once the corresponding CPU has passed
    through a quiescent state.  Unless that quiescent state is the CPU
    going offline, which means that when the CPU comes back online, it will
    needlessly consume additional CPU time and incur additional latency,
    which constitutes a minor but very real performance bug.

    This commit therefore adds the call to rcu_disable_urgency_upon_qs()
    that clears these flags to the CPU-hotplug offlining code path.

    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:20 -04:00
Waiman Long a371ed853d rcu: Add rnp->cbovldmask check in rcutree_migrate_callbacks()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 52c1d81ee2911ef592048582c6d07975b7399726
Author: Zqiang <qiang1.zhang@intel.com>
Date:   Thu, 5 May 2022 23:52:36 +0800

    rcu: Add rnp->cbovldmask check in rcutree_migrate_callbacks()

    Currently, the rcu_node structure's ->cbovlmask field is set in call_rcu()
    when a given CPU is suffering from callback overload.  But if that CPU
    goes offline, the outgoing CPU's callbacks is migrated to the running
    CPU, which is likely to overload the running CPU.  However, that CPU's
    bit in its leaf rcu_node structure's ->cbovlmask field remains zero.

    Initially, this is OK because the outgoing CPU's bit remains set.
    However, that bit will be cleared at the next end of a grace period,
    at which time it is quite possible that the running CPU will still
    be overloaded.  If the running CPU invokes call_rcu(), then overload
    will be checked for and the bit will be set.  Except that there is no
    guarantee that the running CPU will invoke call_rcu(), in which case the
    next grace period will fail to take the running CPU's overload condition
    into account.  Plus, because the bit is not set, the end of the grace
    period won't check for overload on this CPU.

    This commit therefore adds a call to check_cb_ovld_locked() in
    rcutree_migrate_callbacks() to set the running CPU's ->cbovlmask bit
    appropriately.

    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:19 -04:00
Waiman Long 199fb66385 rcu: Decrease FQS scan wait time in case of callback overloading
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit fb77dccfc701b6ebcc232574c828bc69146cf90a
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Tue, 12 Apr 2022 15:08:14 -0700

    rcu: Decrease FQS scan wait time in case of callback overloading

    The force-quiesce-state loop function rcu_gp_fqs_loop() checks for
    callback overloading and does an immediate initial scan for idle CPUs
    if so.  However, subsequent rescans will be carried out at as leisurely a
    rate as they always are, as specified by the rcutree.jiffies_till_next_fqs
    module parameter.  It might be tempting to just continue immediately
    rescanning, but this turns the RCU grace-period kthread into a CPU hog.
    It might also be tempting to reduce the time between rescans to a single
    jiffy, but this can be problematic on larger systems.

    This commit therefore divides the normal time between rescans by three,
    rounding up.  Thus a small system running at HZ=1000 that is suffering
    from callback overload will wait only one jiffy instead of the normal
    three between rescans.

    [ paulmck: Apply Neeraj Upadhyay feedback. ]

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:19 -04:00
Waiman Long 3436a57e93 context_tracking: Convert state to atomic_t
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 171476775d32a40bfebf83250136c19b2e842672
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Wed, 8 Jun 2022 16:40:35 +0200

    context_tracking: Convert state to atomic_t

    Context tracking's state and dynticks counter are going to be merged
    in a single field so that both updates can happen atomically and at the
    same time. Prepare for that with converting the state into an atomic_t.

    [ paulmck: Apply kernel test robot feedback. ]

    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
    Cc: Marcelo Tosatti <mtosatti@redhat.com>
    Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
    Cc: Yu Liao <liaoyu15@huawei.com>
    Cc: Phil Auld <pauld@redhat.com>
    Cc: Paul Gortmaker<paul.gortmaker@windriver.com>
    Cc: Alex Belits <abelits@marvell.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
    Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:18 -04:00
Waiman Long 5b925bf582 rcu/context-tracking: Move RCU-dynticks internal functions to context_tracking
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 172114552701b85d5c3b1a089a73ee85d0d7786b
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Wed, 8 Jun 2022 16:40:33 +0200

    rcu/context-tracking: Move RCU-dynticks internal functions to context_tracking

    Move the core RCU eqs/dynticks functions to context tracking so that
    we can later merge all that code within context tracking.

    Acked-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
    Cc: Marcelo Tosatti <mtosatti@redhat.com>
    Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
    Cc: Yu Liao <liaoyu15@huawei.com>
    Cc: Phil Auld <pauld@redhat.com>
    Cc: Paul Gortmaker<paul.gortmaker@windriver.com>
    Cc: Alex Belits <abelits@marvell.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
    Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:18 -04:00
Waiman Long 166bdb926e rcu/context-tracking: Move deferred nocb resched to context tracking
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 564506495ca96a6e66d077d3d5b9f02d4b9b0f45
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Wed, 8 Jun 2022 16:40:32 +0200

    rcu/context-tracking: Move deferred nocb resched to context tracking

    To prepare for migrating the RCU eqs accounting code to context tracking,
    split the last-resort deferred nocb resched from rcu_user_enter() and
    move it into a separate call from context tracking.

    Acked-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
    Cc: Marcelo Tosatti <mtosatti@redhat.com>
    Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
    Cc: Yu Liao <liaoyu15@huawei.com>
    Cc: Phil Auld <pauld@redhat.com>
    Cc: Paul Gortmaker<paul.gortmaker@windriver.com>
    Cc: Alex Belits <abelits@marvell.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
    Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:18 -04:00
Waiman Long e0440c243a rcu/context_tracking: Move dynticks_nmi_nesting to context tracking
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 95e04f48ec0a634e2f221081f5fa1a904755f326
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Wed, 8 Jun 2022 16:40:31 +0200

    rcu/context_tracking: Move dynticks_nmi_nesting to context tracking

    The RCU eqs tracking is going to be performed by the context tracking
    subsystem. The related nesting counters thus need to be moved to the
    context tracking structure.

    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
    Cc: Marcelo Tosatti <mtosatti@redhat.com>
    Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
    Cc: Yu Liao <liaoyu15@huawei.com>
    Cc: Phil Auld <pauld@redhat.com>
    Cc: Paul Gortmaker<paul.gortmaker@windriver.com>
    Cc: Alex Belits <abelits@marvell.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
    Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:17 -04:00
Waiman Long c1013cee1d rcu/context_tracking: Move dynticks_nesting to context tracking
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 904e600e60f46f92eb4bcfb95788b1fedf7e8237
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Wed, 8 Jun 2022 16:40:30 +0200

    rcu/context_tracking: Move dynticks_nesting to context tracking

    The RCU eqs tracking is going to be performed by the context tracking
    subsystem. The related nesting counters thus need to be moved to the
    context tracking structure.

    Acked-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
    Cc: Marcelo Tosatti <mtosatti@redhat.com>
    Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
    Cc: Yu Liao <liaoyu15@huawei.com>
    Cc: Phil Auld <pauld@redhat.com>
    Cc: Paul Gortmaker<paul.gortmaker@windriver.com>
    Cc: Alex Belits <abelits@marvell.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
    Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:17 -04:00
Waiman Long 8640b64310 rcu/context_tracking: Move dynticks counter to context tracking
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 62e2412df4b90ae6706ce1f1a9649b789b2e44ef
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Wed, 8 Jun 2022 16:40:29 +0200

    rcu/context_tracking: Move dynticks counter to context tracking

    In order to prepare for merging RCU dynticks counter into the context
    tracking state, move the rcu_data's dynticks field to the context
    tracking structure. It will later be mixed within the context tracking
    state itself.

    [ paulmck: Move enum ctx_state into global scope. ]

    Acked-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
    Cc: Marcelo Tosatti <mtosatti@redhat.com>
    Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
    Cc: Yu Liao <liaoyu15@huawei.com>
    Cc: Phil Auld <pauld@redhat.com>
    Cc: Paul Gortmaker<paul.gortmaker@windriver.com>
    Cc: Alex Belits <abelits@marvell.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
    Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:17 -04:00
Waiman Long 887bd73cb2 rcu/context-tracking: Remove rcu_irq_enter/exit()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 3864caafe7c66f01b188ffccb6a4215f3bf56292
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Wed, 8 Jun 2022 16:40:28 +0200

    rcu/context-tracking: Remove rcu_irq_enter/exit()

    Now rcu_irq_enter/exit() is an unnecessary middle call between
    ct_irq_enter/exit() and nmi_irq_enter/exit(). Take this opportunity
    to remove the former functions and move the comments above them to the
    new entrypoints.

    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
    Cc: Marcelo Tosatti <mtosatti@redhat.com>
    Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
    Cc: Yu Liao <liaoyu15@huawei.com>
    Cc: Phil Auld <pauld@redhat.com>
    Cc: Paul Gortmaker<paul.gortmaker@windriver.com>
    Cc: Alex Belits <abelits@marvell.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
    Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:17 -04:00
Waiman Long 034dc8d70a context_tracking: Take idle eqs entrypoints over RCU
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit e67198cc05b8ecbb7b8e2d8ef9fb5c8d26821873
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Wed, 8 Jun 2022 16:40:25 +0200

    context_tracking: Take idle eqs entrypoints over RCU

    The RCU dynticks counter is going to be merged into the context tracking
    subsystem. Start with moving the idle extended quiescent states
    entrypoints to context tracking. For now those are dumb redirections to
    existing RCU calls.

    [ paulmck: Apply kernel test robot feedback. ]

    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
    Cc: Marcelo Tosatti <mtosatti@redhat.com>
    Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
    Cc: Yu Liao <liaoyu15@huawei.com>
    Cc: Phil Auld <pauld@redhat.com>
    Cc: Paul Gortmaker<paul.gortmaker@windriver.com>
    Cc: Alex Belits <abelits@marvell.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
    Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:16 -04:00
Waiman Long 37eb1b0bb2 rcu: Apply noinstr to rcu_idle_enter() and rcu_idle_exit()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit ed4ae5eff4b38797607cbdd80da394149110fb37
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Tue, 17 May 2022 21:00:04 -0700

    rcu: Apply noinstr to rcu_idle_enter() and rcu_idle_exit()

    This commit applies the "noinstr" tag to the rcu_idle_enter() and
    rcu_idle_exit() functions, which are invoked from portions of the idle
    loop that cannot be instrumented.  These tags require reworking the
    rcu_eqs_enter() and rcu_eqs_exit() functions that these two functions
    invoke in order to cause them to use normal assertions rather than
    lockdep.  In addition, within rcu_idle_exit(), the raw versions of
    local_irq_save() and local_irq_restore() are used, again to avoid issues
    with lockdep in uninstrumented code.

    This patch is based in part on an earlier patch by Jiri Olsa, discussions
    with Peter Zijlstra and Frederic Weisbecker, earlier changes by Thomas
    Gleixner, and off-list discussions with Yonghong Song.

    Link: https://lore.kernel.org/lkml/20220515203653.4039075-1-jolsa@kernel.org/
    Reported-by: Jiri Olsa <jolsa@kernel.org>
    Reported-by: Alexei Starovoitov <ast@kernel.org>
    Reported-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Yonghong Song <yhs@fb.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:11 -04:00
Waiman Long bbdc7c0871 rcu: Provide a get_completed_synchronize_rcu() function
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 414c12385d4741e35d88670c6cc2f40a77809734
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Wed, 13 Apr 2022 15:17:25 -0700

    rcu: Provide a get_completed_synchronize_rcu() function

    It is currently up to the caller to handle stale return values from
    get_state_synchronize_rcu().  If poll_state_synchronize_rcu() returned
    true once, a grace period has elapsed, regardless of the fact that counter
    wrap might cause some future poll_state_synchronize_rcu() invocation to
    return false.  For example, the caller might store a separate flag that
    indicates whether some previous call to poll_state_synchronize_rcu()
    determined that the relevant grace period had already ended.

    This approach works, but it requires extra storage and is easy to get
    wrong.  This commit therefore introduces a get_completed_synchronize_rcu()
    that returns a cookie that causes poll_state_synchronize_rcu() to always
    return true.  This already-completed cookie can be stored in place of the
    cookie that previously caused poll_state_synchronize_rcu() to return true.
    It can also be used to flag a given structure as not having been exposed
    to readers, and thus not requiring a grace period to elapse.

    This commit is in preparation for polled expedited grace periods.

    Link: https://lore.kernel.org/all/20220121142454.1994916-1-bfoster@redhat.com/
    Link: https://docs.google.com/document/d/1RNKWW9jQyfjxw2E8dsXVTdvZYh0HnYeSHDKog9jhdN8/edit?usp=sharing
    Cc: Brian Foster <bfoster@redhat.com>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Ian Kent <raven@themaw.net>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:06 -04:00
Waiman Long 67aa89a8ff rcu: Make normal polling GP be more precise about sequence numbers
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 2403e8044f222e7c816fb2416661f5f469662973
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Mon, 21 Mar 2022 18:41:46 -0700

    rcu: Make normal polling GP be more precise about sequence numbers

    Currently, poll_state_synchronize_rcu() uses rcu_seq_done() to check
    whether the specified grace period has completed.  However, rcu_seq_done()
    does a simple comparison that reserves have of the sequence-number space
    for uncompleted grace periods.  This has the unfortunate side-effect
    of not handling sequence-number wrap gracefully.  Of course, one can
    argue that if someone has already waited for half of the full range of
    grace periods, they can wait for the other half, but why wait at all in
    this case?

    This commit therefore creates a rcu_seq_done_exact() that counts as
    uncompleted only the two grace periods during which the sequence number
    might have been handed out, while still being uncompleted.  This way,
    if sequence-number wrap happens to hit that range, at most two additional
    grace periods need be waited for.

    This commit is in preparation for polled expedited grace periods.

    Link: https://lore.kernel.org/all/20220121142454.1994916-1-bfoster@redhat.com/
    Link: https://docs.google.com/document/d/1RNKWW9jQyfjxw2E8dsXVTdvZYh0HnYeSHDKog9jhdN8/edit?usp=sharing
    Cc: Brian Foster <bfoster@redhat.com>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Ian Kent <raven@themaw.net>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:05 -04:00
Chris von Recklinghausen 8dced2b153 mm: shrinkers: provide shrinkers with names
Bugzilla: https://bugzilla.redhat.com/2160210

commit e33c267ab70de4249d22d7eab1cc7d68a889bac2
Author: Roman Gushchin <roman.gushchin@linux.dev>
Date:   Tue May 31 20:22:24 2022 -0700

    mm: shrinkers: provide shrinkers with names

    Currently shrinkers are anonymous objects.  For debugging purposes they
    can be identified by count/scan function names, but it's not always
    useful: e.g.  for superblock's shrinkers it's nice to have at least an
    idea of to which superblock the shrinker belongs.

    This commit adds names to shrinkers.  register_shrinker() and
    prealloc_shrinker() functions are extended to take a format and arguments
    to master a name.

    In some cases it's not possible to determine a good name at the time when
    a shrinker is allocated.  For such cases shrinker_debugfs_rename() is
    provided.

    The expected format is:
        <subsystem>-<shrinker_type>[:<instance>]-<id>
    For some shrinkers an instance can be encoded as (MAJOR:MINOR) pair.

    After this change the shrinker debugfs directory looks like:
      $ cd /sys/kernel/debug/shrinker/
      $ ls
        dquota-cache-16     sb-devpts-28     sb-proc-47       sb-tmpfs-42
        mm-shadow-18        sb-devtmpfs-5    sb-proc-48       sb-tmpfs-43
        mm-zspool:zram0-34  sb-hugetlbfs-17  sb-pstore-31     sb-tmpfs-44
        rcu-kfree-0         sb-hugetlbfs-33  sb-rootfs-2      sb-tmpfs-49
        sb-aio-20           sb-iomem-12      sb-securityfs-6  sb-tracefs-13
        sb-anon_inodefs-15  sb-mqueue-21     sb-selinuxfs-22  sb-xfs:vda1-36
        sb-bdev-3           sb-nsfs-4        sb-sockfs-8      sb-zsmalloc-19
        sb-bpf-32           sb-pipefs-14     sb-sysfs-26      thp-deferred_split-10
        sb-btrfs:vda2-24    sb-proc-25       sb-tmpfs-1       thp-zero-9
        sb-cgroup2-30       sb-proc-39       sb-tmpfs-27      xfs-buf:vda1-37
        sb-configfs-23      sb-proc-41       sb-tmpfs-29      xfs-inodegc:vda1-38
        sb-dax-11           sb-proc-45       sb-tmpfs-35
        sb-debugfs-7        sb-proc-46       sb-tmpfs-40

    [roman.gushchin@linux.dev: fix build warnings]
      Link: https://lkml.kernel.org/r/Yr+ZTnLb9lJk6fJO@castle
      Reported-by: kernel test robot <lkp@intel.com>
    Link: https://lkml.kernel.org/r/20220601032227.4076670-4-roman.gushchin@linux.dev
    Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
    Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
    Cc: Dave Chinner <dchinner@redhat.com>
    Cc: Hillf Danton <hdanton@sina.com>
    Cc: Kent Overstreet <kent.overstreet@gmail.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:17 -04:00
Waiman Long d45fbffb5b rcu: Move expedited grace period (GP) work to RT kthread_worker
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491
Conflicts:
 1) A merge conflict in kernel/rcu/rcu.h due to upstream merge conflict
    with commit 99d6a2acb895 ("rcutorture: Suppress debugging grace
    period delays during flooding"). Manually merge according to upstream
    merge commit ce13389053a3.
 2) A fuzz in kernel/rcu/tree.c due to upstream merge conflict with
    commit 87c5adf06bfb ("rcu/nocb: Initialize nocb kthreads only
    for boot CPU prior SMP initialization") and commit 3352911fa9b4
    ("rcu: Initialize boost kthread only for boot node prior SMP
    initialization"). See upstream merge commit ce13389053a3.

commit 9621fbee44df940e2e1b94b0676460a538dffefa
Author: Kalesh Singh <kaleshsingh@google.com>
Date:   Fri, 8 Apr 2022 17:35:27 -0700

    rcu: Move expedited grace period (GP) work to RT kthread_worker

    Enabling CONFIG_RCU_BOOST did not reduce RCU expedited grace-period
    latency because its workqueues run at SCHED_OTHER, and thus can be
    delayed by normal processes.  This commit avoids these delays by moving
    the expedited GP work items to a real-time-priority kthread_worker.

    This option is controlled by CONFIG_RCU_EXP_KTHREAD and disabled by
    default on PREEMPT_RT=y kernels which disable expedited grace periods
    after boot by unconditionally setting rcupdate.rcu_normal_after_boot=1.

    The results were evaluated on arm64 Android devices (6GB ram) running
    5.10 kernel, and capturing trace data in critical user-level code.

    The table below shows the resulting order-of-magnitude improvements
    in synchronize_rcu_expedited() latency:

    ------------------------------------------------------------------------
    |                          |   workqueues  |  kthread_worker |  Diff   |
    ------------------------------------------------------------------------
    | Count                    |          725  |            688  |         |
    ------------------------------------------------------------------------
    | Min Duration       (ns)  |          326  |            447  |  37.12% |
    ------------------------------------------------------------------------
    | Q1                 (ns)  |       39,428  |         38,971  |  -1.16% |
    ------------------------------------------------------------------------
    | Q2 - Median        (ns)  |       98,225  |         69,743  | -29.00% |
    ------------------------------------------------------------------------
    | Q3                 (ns)  |      342,122  |        126,638  | -62.98% |
    ------------------------------------------------------------------------
    | Max Duration       (ns)  |  372,766,967  |      2,329,671  | -99.38% |
    ------------------------------------------------------------------------
    | Avg Duration       (ns)  |    2,746,353  |        151,242  | -94.49% |
    ------------------------------------------------------------------------
    | Standard Deviation (ns)  |   19,327,765  |        294,408  |         |
    ------------------------------------------------------------------------

    The below table show the range of maximums/minimums for
    synchronize_rcu_expedited() latency from all experiments:

    ------------------------------------------------------------------------
    |                          |   workqueues  |  kthread_worker |  Diff   |
    ------------------------------------------------------------------------
    | Total No. of Experiments |           25  |             23  |         |
    ------------------------------------------------------------------------
    | Largest  Maximum   (ns)  |  372,766,967  |      2,329,671  | -99.38% |
    ------------------------------------------------------------------------
    | Smallest Maximum   (ns)  |       38,819  |         86,954  | 124.00% |
    ------------------------------------------------------------------------
    | Range of Maximums  (ns)  |  372,728,148  |      2,242,717  |         |
    ------------------------------------------------------------------------
    | Largest  Minimum   (ns)  |       88,623  |         27,588  | -68.87% |
    ------------------------------------------------------------------------
    | Smallest Minimum   (ns)  |          326  |            447  |  37.12% |
    ------------------------------------------------------------------------
    | Range of Minimums  (ns)  |       88,297  |         27,141  |         |
    ------------------------------------------------------------------------

    Cc: "Paul E. McKenney" <paulmck@kernel.org>
    Cc: Tejun Heo <tj@kernel.org>
    Reported-by: Tim Murray <timmurray@google.com>
    Reported-by: Wei Wang <wvw@google.com>
    Tested-by: Kyle Lin <kylelin@google.com>
    Tested-by: Chunwei Lu <chunweilu@google.com>
    Tested-by: Lulu Wang <luluw@google.com>
    Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-08-30 17:38:28 -04:00
Waiman Long da53e146fe rcu: Fix preemption mode check on synchronize_rcu[_expedited]()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491

commit 70ae7b0ce03347fab35d6d8df81e1165d7ea8045
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Mon, 14 Mar 2022 14:37:38 +0100

    rcu: Fix preemption mode check on synchronize_rcu[_expedited]()

    An early check on synchronize_rcu[_expedited]() tries to determine if
    the current CPU is in UP mode on an SMP no-preempt kernel, in which case
    there is no need to start a grace period since the current assumed
    quiescent state is all we need.

    However the preemption mode doesn't take into account the boot selected
    preemption mode under CONFIG_PREEMPT_DYNAMIC=y, missing a possible
    early return if the running flavour is "none" or "voluntary".

    Use the shiny new preempt mode accessors to fix this.  However,
    avoid invoking them during early boot because doing so triggers a
    WARN_ON_ONCE().

    [ paulmck: Update for mainlined API. ]

    Reported-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Cc: Valentin Schneider <valentin.schneider@arm.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-08-30 17:22:13 -04:00
Waiman Long 2ceaa01398 rcu: Add comments to final rcu_gp_cleanup() "if" statement
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491

commit 75182a4eaaf8b697f66d68ad039f021f461dd2a4
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Wed, 2 Mar 2022 11:01:37 -0800

    rcu: Add comments to final rcu_gp_cleanup() "if" statement

    The final "if" statement in rcu_gp_cleanup() has proven to be rather
    confusing, straightforward though it might have seemed when initially
    written.  This commit therefore adds comments to its "then" and "else"
    clauses to at least provide a more elevated form of confusion.

    Reported-by: Boqun Feng <boqun.feng@gmail.com>
    Reported-by: Frederic Weisbecker <frederic@kernel.org>
    Reported-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Reported-by: Uladzislau Rezki <urezki@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-08-30 17:22:12 -04:00
Waiman Long f12dfd4e5c rcu: Check for jiffies going backwards
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491

commit c708b08c65a0dfae127b9ee33b0fb73535a5e066
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Wed, 23 Feb 2022 17:29:37 -0800

    rcu: Check for jiffies going backwards

    A report of a 12-jiffy normal RCU CPU stall warning raises interesting
    questions about the nature of time on the offending system.  This commit
    instruments rcu_sched_clock_irq(), which is RCU's hook into the
    scheduling-clock interrupt, checking for the jiffies counter going
    backwards.

    Reported-by: Saravanan D <sarvanand@fb.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-08-30 17:22:11 -04:00
Waiman Long b0678da638 rcutorture: Suppress debugging grace period delays during flooding
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491

commit 99d6a2acb8955f12489bfba04f2db22bc0b57726
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Fri, 4 Feb 2022 12:45:18 -0800

    rcutorture: Suppress debugging grace period delays during flooding

    Tree RCU supports grace-period delays using the rcutree.gp_cleanup_delay,
    rcutree.gp_init_delay, and rcutree.gp_preinit_delay kernel boot
    parameters.  These delays are strictly for debugging purposes, and have
    proven quite effective at exposing bugs involving race with CPU-hotplug
    operations.  However, these delays can result in false positives when
    used in conjunction with callback flooding, for example, those generated
    by the rcutorture.fwd_progress kernel boot parameter.

    This commit therefore suppresses grace-period delays while callback
    flooding is in progress.

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-08-30 17:22:06 -04:00
Waiman Long bc54b27cee rcu-tasks: Make Tasks RCU account for userspace execution
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491

commit 5d90070816534882b9158f14154b7e2cdef1194a
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Fri, 4 Mar 2022 10:41:44 -0800

    rcu-tasks: Make Tasks RCU account for userspace execution

    The main Tasks RCU quiescent state is voluntary context switch.  However,
    userspace execution is also a valid quiescent state, and is a valuable one
    for userspace applications that spin repeatedly executing light-weight
    non-sleeping system calls.  Currently, such an application can delay a
    Tasks RCU grace period for many tens of seconds.

    This commit therefore enlists the aid of the scheduler-clock interrupt to
    provide a Tasks RCU quiescent state when it interrupted a task executing
    in userspace.

    [ paulmck: Apply feedback from kernel test robot. ]

    Cc: Martin KaFai Lau <kafai@fb.com>
    Cc: Neil Spring <ntspring@fb.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-08-30 17:22:03 -04:00
Waiman Long c54a776b65 rcu/nocb: Initialize nocb kthreads only for boot CPU prior SMP initialization
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491

commit 87c5adf06bfbf14c9d13e59d5d174ff5f2aafc0e
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Wed, 16 Feb 2022 16:42:08 +0100

    rcu/nocb: Initialize nocb kthreads only for boot CPU prior SMP initialization

    The rcu_spawn_gp_kthread() function is called as an early initcall, which
    means that SMP initialization hasn't happened yet and only the boot CPU is
    online. Therefore, create only the NOCB kthreads related to the boot CPU.

    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-08-30 17:22:01 -04:00
Waiman Long b19ed13b34 rcu: Initialize boost kthread only for boot node prior SMP initialization
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491

commit 3352911fa9b47a90165e5c6fed440048c55146d1
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Wed, 16 Feb 2022 16:42:07 +0100

    rcu: Initialize boost kthread only for boot node prior SMP initialization

    The rcu_spawn_gp_kthread() function is called as an early initcall,
    which means that SMP initialization hasn't happened yet and only the
    boot CPU is online.  Therefore, create only the boost kthread for the
    leaf node of the boot CPU.

    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-08-30 17:22:01 -04:00
Waiman Long 5779af3081 rcu: Assume rcu_init() is called before smp
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491

commit 2eed973adc6e749439730e53e6220b122398d319
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Wed, 16 Feb 2022 16:42:06 +0100

    rcu: Assume rcu_init() is called before smp

    The rcu_init() function is called way before SMP is initialized and
    therefore only the boot CPU should be online at this stage.

    Simplify the boot per-cpu initialization accordingly.

    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-08-30 17:22:01 -04:00
Waiman Long a9408fae13 rcu: Add per-CPU rcuc task dumps to RCU CPU stall warnings
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713

commit c9515875850fefcc79492c5189fe8431e75ddec5
Author: Zqiang <qiang1.zhang@intel.com>
Date:   Tue, 25 Jan 2022 10:47:44 +0800

    rcu: Add per-CPU rcuc task dumps to RCU CPU stall warnings

    When the rcutree.use_softirq kernel boot parameter is set to zero, all
    RCU_SOFTIRQ processing is carried out by the per-CPU rcuc kthreads.
    If these kthreads are being starved, quiescent states will not be
    reported, which in turn means that the grace period will not end, which
    can in turn trigger RCU CPU stall warnings.  This commit therefore dumps
    stack traces of stalled CPUs' rcuc kthreads, which can help identify
    what is preventing those kthreads from running.

    Suggested-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
    Reviewed-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-05-12 08:30:04 -04:00
Waiman Long 22f9156241 rcu: Elevate priority of offloaded callback threads
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713

commit c8b16a65267e35ecc5621dbc81cbe7e5b0992fce
Author: Alison Chaiken <achaiken@aurora.tech>
Date:   Tue, 11 Jan 2022 15:32:52 -0800

    rcu: Elevate priority of offloaded callback threads

    When CONFIG_PREEMPT_RT=y, the rcutree.kthread_prio command-line
    parameter signals initialization code to boost the priority of rcuc
    callbacks to the designated value.  With the additional
    CONFIG_RCU_NOCB_CPU=y configuration and an additional rcu_nocbs
    command-line parameter, the callbacks on the listed cores are
    offloaded to new rcuop kthreads that are not pinned to the cores whose
    post-grace-period work is performed.  While the rcuop kthreads perform
    the same function as the rcuc kthreads they offload, the kthread_prio
    parameter only boosts the priority of the rcuc kthreads.  Fix this
    inconsistency by elevating rcuop kthreads to the same priority as the rcuc
    kthreads.

    Signed-off-by: Alison Chaiken <achaiken@aurora.tech>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-05-12 08:26:09 -04:00
Waiman Long 3dc8452aa5 rcu: Move kthread_prio bounds-check to a separate function
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713

commit c8db27dd0ea8071d2ea29a1a401c4ccc611ec6c1
Author: Alison Chaiken <achaiken@aurora.tech>
Date:   Tue, 11 Jan 2022 15:32:50 -0800

    rcu: Move kthread_prio bounds-check to a separate function

    Move the bounds-check of the kthread_prio cmdline parameter to a new
    function in order to faciliate a different callsite.

    Signed-off-by: Alison Chaiken <achaiken@aurora.tech>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-05-12 08:26:08 -04:00
Waiman Long f3300badb5 rcu: Create per-cpu rcuc kthreads only when rcutree.use_softirq=0
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713

commit 4b4399b2450de38916718ba9947e6cdb69c99c55
Author: Zqiang <qiang1.zhang@intel.com>
Date:   Wed, 29 Dec 2021 00:05:10 +0800

    rcu: Create per-cpu rcuc kthreads only when rcutree.use_softirq=0

    The per-CPU "rcuc" kthreads are used only by kernels booted with
    rcutree.use_softirq=0, but they are nevertheless unconditionally created
    by kernels built with CONFIG_RCU_BOOST=y.  This results in "rcuc"
    kthreads being created that are never actually used.  This commit
    therefore refrains from creating these kthreads unless the kernel
    is actually booted with rcutree.use_softirq=0.

    Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-05-12 08:26:03 -04:00
Waiman Long 7e57b41b6e kasan: Record work creation stack trace with interrupts enabled
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713

commit d818cc76e2b4d5f6cebf8c7ce1160d652d7e572b
Author: Zqiang <qiang1.zhang@intel.com>
Date:   Sun, 26 Dec 2021 08:52:04 +0800

    kasan: Record work creation stack trace with interrupts enabled

    Recording the work creation stack trace for KASAN reports in
    call_rcu() is expensive, due to unwinding the stack, but also
    due to acquiring depot_lock inside stackdepot (which may be contended).
    Because calling kasan_record_aux_stack_noalloc() does not require
    interrupts to already be disabled, this may unnecessarily extend
    the time with interrupts disabled.

    Therefore, move calling kasan_record_aux_stack() before the section
    with interrupts disabled.

    Acked-by: Marco Elver <elver@google.com>
    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-05-12 08:26:03 -04:00
Waiman Long 07e0b8909d rcu: Inline __call_rcu() into call_rcu()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713

commit 1fe09ebe7a9c9907f516779fbe4954165dd01529
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Sat, 18 Dec 2021 09:30:33 -0800

    rcu: Inline __call_rcu() into call_rcu()

    Because __call_rcu() is invoked only by call_rcu(), this commit inlines
    the former into the latter.

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-05-12 08:26:02 -04:00
Waiman Long 4c01b1af26 rcu: Make rcu_barrier() no longer block CPU-hotplug operations
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713

commit 80b3fd474c91b3ecfd845b4a0bfb58706b877ba5
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Tue, 14 Dec 2021 13:35:17 -0800

    rcu: Make rcu_barrier() no longer block CPU-hotplug operations

    This commit removes the cpus_read_lock() and cpus_read_unlock() calls
    from rcu_barrier(), thus allowing CPUs to come and go during the course
    of rcu_barrier() execution.  Posting of the ->barrier_head callbacks does
    synchronize with portions of RCU's CPU-hotplug notifiers, but these locks
    are held for short time periods on both sides.  Thus, full CPU-hotplug
    operations could both start and finish during the execution of a given
    rcu_barrier() invocation.

    Additional synchronization is provided by a global ->barrier_lock.
    Since the ->barrier_lock is only used during rcu_barrier() execution and
    during onlining/offlining a CPU, the contention for this lock should
    be low.  It might be tempting to make use of a per-CPU lock just on
    general principles, but straightforward attempts to do this have the
    problems shown below.

    Initial state: 3 CPUs present, CPU 0 and CPU1 do not have
    any callback and CPU2 has callbacks.

    1. CPU0 calls rcu_barrier().

    2. CPU1 starts offlining for CPU2. CPU1 calls
       rcutree_migrate_callbacks(). rcu_barrier_entrain() is called
       from rcutree_migrate_callbacks(), with CPU2's rdp->barrier_lock.
       It does not entrain ->barrier_head for CPU2, as rcu_barrier()
       on CPU0 hasn't started the barrier sequence (by calling
       rcu_seq_start(&rcu_state.barrier_sequence)) yet.

    3. CPU0 starts new barrier sequence. It iterates over
       CPU0 and CPU1, after acquiring their per-cpu ->barrier_lock
       and finds 0 segcblist length. It updates ->barrier_seq_snap
       for CPU0 and CPU1 and continues loop iteration to CPU2.

        for_each_possible_cpu(cpu) {
            raw_spin_lock_irqsave(&rdp->barrier_lock, flags);
            if (!rcu_segcblist_n_cbs(&rdp->cblist)) {
                WRITE_ONCE(rdp->barrier_seq_snap, gseq);
                raw_spin_unlock_irqrestore(&rdp->barrier_lock, flags);
                rcu_barrier_trace(TPS("NQ"), cpu, rcu_state.barrier_sequence);
                continue;
            }

    4. rcutree_migrate_callbacks() completes execution on CPU1.
       Segcblist len for CPU2 becomes 0.

    5. The loop iteration on CPU0, checks rcu_segcblist_n_cbs(&rdp->cblist)
       for CPU2 and completes the loop iteration after setting
       ->barrier_seq_snap.

    6. As there isn't any ->barrier_head callback entrained; at
       this point, rcu_barrier() in CPU0 returns.

    7. The callbacks, which migrated from CPU2 to CPU1, execute.

    Straightforward per-CPU locking is also subject to the following race
    condition noted by Boqun Feng:

    1. CPU0 calls rcu_barrier(), starting a new barrier sequence by invoking
       rcu_seq_start() and init_completion(), but does not yet initialize
       rcu_state.barrier_cpu_count.

    2. CPU1 starts offlining for CPU2, calling rcutree_migrate_callbacks(),
       which in turn calls rcu_barrier_entrain() holding CPU2's.
       rdp->barrier_lock.  It then entrains ->barrier_head for CPU2
       and atomically increments rcu_state.barrier_cpu_count, which is
       unfortunately not yet initialized to the value 2.

    3. The just-entrained RCU callback is invoked.  It atomically
       decrements rcu_state.barrier_cpu_count and sees that it is
       now zero.  This callback therefore invokes complete().

    4. CPU0 continues executing rcu_barrier(), but is not blocked
       by its call to wait_for_completion().  This results in rcu_barrier()
       returning before all pre-existing callbacks have been invoked,
       which is a bug.

    Therefore, synchronization is provided by rcu_state.barrier_lock,
    which is also held across the initialization sequence, especially the
    rcu_seq_start() and the atomic_set() that sets rcu_state.barrier_cpu_count
    to the value 2.  In addition, this lock is held when entraining the
    rcu_barrier() callback, when deciding whether or not a CPU has callbacks
    that rcu_barrier() must wait on, when setting the ->qsmaskinitnext for
    incoming CPUs, and when migrating callbacks from a CPU that is going
    offline.

    Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
    Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-05-12 08:25:57 -04:00
Waiman Long 6d38f5233d rcu: Rework rcu_barrier() and callback-migration logic
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713

commit a16578dd5e3a44b53ca0699ac2971679dab97484
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Tue, 14 Dec 2021 13:15:18 -0800

    rcu: Rework rcu_barrier() and callback-migration logic

    This commit reworks rcu_barrier() and callback-migration logic to
    permit allowing rcu_barrier() to run concurrently with CPU-hotplug
    operations.  The key trick is for callback migration to check to see if
    an rcu_barrier() is in flight, and, if so, enqueue the ->barrier_head
    callback on its behalf.

    This commit adds synchronization with RCU's CPU-hotplug notifiers.  Taken
    together, this will permit a later commit to remove the cpus_read_lock()
    and cpus_read_unlock() calls from rcu_barrier().

    [ paulmck: Updated per kbuild test robot feedback. ]
    [ paulmck: Updated per reviews session with Neeraj, Frederic, Uladzislau, and Boqun. ]

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-05-12 08:25:56 -04:00
Waiman Long e65be485f7 rcu: Refactor rcu_barrier() empty-list handling
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713

commit 0cabb47af3cfaeb6007ba3868379bbd4daee64cc
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Fri, 10 Dec 2021 16:25:20 -0800

    rcu: Refactor rcu_barrier() empty-list handling

    This commit saves a few lines by checking first for an empty callback
    list.  If the callback list is empty, then that CPU is taken care of,
    regardless of its online or nocb state.  Also simplify tracing accordingly
    and fold a few lines together.

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-05-12 08:25:54 -04:00
Waiman Long 9f48f77ccc rcu: Create and use an rcu_rdp_cpu_online()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713

commit 5ae0f1b58b28b53f4ab3708ef9337a2665e79664
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Fri, 10 Dec 2021 13:44:17 -0800

    rcu: Create and use an rcu_rdp_cpu_online()

    The pattern "rdp->grpmask & rcu_rnp_online_cpus(rnp)" occurs frequently
    in RCU code in order to determine whether rdp->cpu is online from an
    RCU perspective.  This commit therefore creates an rcu_rdp_cpu_online()
    function to replace it.

    [ paulmck: Apply kernel test robot unused-variable feedback. ]

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-05-12 08:25:49 -04:00
Waiman Long ba1bfcb746 rcu: Add mutex for rcu boost kthread spawning and affinity setting
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713
Conflicts: A fuzz in rcu_boost_kthread_setaffinity() of
	   kernel/rcu/tree_plugin.h due to the presence of a later
	   ustream commit 04d4e665a609 ("sched/isolation: Use single
	   feature type while referring to housekeeping cpumask").

commit 218b957a6959a2fb5b3967fc824072bb89ac2611
Author: David Woodhouse <dwmw@amazon.co.uk>
Date:   Wed, 8 Dec 2021 23:41:53 +0000

    rcu: Add mutex for rcu boost kthread spawning and affinity setting

    As we handle parallel CPU bringup, we will need to take care to avoid
    spawning multiple boost threads, or race conditions when setting their
    affinity. Spotted by Paul McKenney.

    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-05-12 08:25:17 -04:00
Waiman Long 5824fc0262 rcu: Kill rnp->ofl_seq and use only rcu_state.ofl_lock for exclusion
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713

commit 82980b1622d97017053c6792382469d7dc26a486
Author: David Woodhouse <dwmw@amazon.co.uk>
Date:   Tue, 16 Feb 2021 15:04:34 +0000

    rcu: Kill rnp->ofl_seq and use only rcu_state.ofl_lock for exclusion

    If we allow architectures to bring APs online in parallel, then we end
    up requiring rcu_cpu_starting() to be reentrant. But currently, the
    manipulation of rnp->ofl_seq is not thread-safe.

    However, rnp->ofl_seq is also fairly much pointless anyway since both
    rcu_cpu_starting() and rcu_report_dead() hold rcu_state.ofl_lock for
    fairly much the whole time that rnp->ofl_seq is set to an odd number
    to indicate that an operation is in progress.

    So drop rnp->ofl_seq completely, and use only rcu_state.ofl_lock.

    This has a couple of minor complexities: lockdep will complain when we
    take rcu_state.ofl_lock, and currently accepts the 'excuse' of having
    an odd value in rnp->ofl_seq. So switch it to an arch_spinlock_t to
    avoid that false positive complaint. Since we're killing rnp->ofl_seq
    of course that 'excuse' has to be changed too, so make it check for
    arch_spin_is_locked(rcu_state.ofl_lock).

    There's no arch_spin_lock_irqsave() so we have to manually save and
    restore local interrupts around the locking.

    At Paul's request based on Neeraj's analysis, make rcu_gp_init not just
    wait but *exclude* any CPU online/offline activity, which was fairly
    much true already by virtue of it holding rcu_state.ofl_lock.

    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-05-12 08:19:35 -04:00
Patrick Talbert ea38048f36 Merge: rcu: Backport upstream RCU related commits up to v5.17
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/602

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/602

This patch series backport upstream RCU and various torture tests up to
v5.17 kernel. Beside patch 10 which has a merge conflict due to upstream
merge conflict, the other patches are all applied cleanly with any issue.

Signed-off-by: Waiman Long <longman@redhat.com>
~~~
Waiman Long (112):
  torture: Apply CONFIG_KCSAN_STRICT to kvm.sh --kcsan argument
  torture: Make torture.sh print the number of files to be compressed
  rcu-nocb: Fix a couple of tree_nocb code-style nits
  rcu: Eliminate rcu_implicit_dynticks_qs() local variable rnhqp
  rcu: Eliminate rcu_implicit_dynticks_qs() local variable ruqp
  doc: Add another stall-warning root cause in stallwarn.rst
  rcu: Fix undefined Kconfig macros
  rcu: Comment rcu_gp_init() code waiting for CPU-hotplug operations
  rcu-tasks: Simplify trc_read_check_handler() atomic operations
  rcu-tasks: Add trc_inspect_reader() checks for exiting critical
    section
  rcu-tasks: Remove second argument of rcu_read_unlock_trace_special()
  rcu: Move rcu_dynticks_eqs_online() to rcu_cpu_starting()
  rcu: Simplify rcu_report_dead() call to rcu_report_exp_rdp()
  rcu: Make rcutree_dying_cpu() use its "cpu" parameter
  rcu-tasks: Wait for trc_read_check_handler() IPIs
  rcutorture: Suppressing read-exit testing is not an error
  rcu-tasks: Fix s/instruction/instructions/ typo in comment
  rcutorture: Warn on individual rcu_torture_init() error conditions
  locktorture: Warn on individual lock_torture_init() error conditions
  rcuscale: Warn on individual rcu_scale_init() error conditions
  rcutorture: Don't cpuhp_remove_state() if cpuhp_setup_state() failed
  rcu: Make rcu_normal_after_boot writable again
  rcu: Make rcu update module parameters world-readable
  rcu-tasks: Move RTGS_WAIT_CBS to beginning of rcu_tasks_kthread() loop
  rcu-tasks: Fix s/rcu_add_holdout/trc_add_holdout/ typo in comment
  rcu-tasks: Correct firstreport usage in check_all_holdout_tasks_trace
  rcu-tasks: Correct comparisons for CPU numbers in
    show_stalled_task_trace
  rcu-tasks: Clarify read side section info for rcu_tasks_rude GP
    primitives
  rcu: Fix existing exp request check in sync_sched_exp_online_cleanup()
  rcutorture: Avoid problematic critical section nesting on PREEMPT_RT
  rcu-tasks: Fix read-side primitives comment for call_rcu_tasks_trace
  rcu-tasks: Fix IPI failure handling in trc_wait_for_one_reader
  rcu: Replace ________p1 and _________p1 with __UNIQUE_ID(rcu)
  rcu-tasks: Update comments to cond_resched_tasks_rcu_qs()
  rcu: Ignore rdp.cpu_no_qs.b.exp on preemptible RCU's rcu_qs()
  rcu: Move rcu_data.cpu_no_qs.b.exp reset to rcu_export_exp_rdp()
  rcu: Remove rcu_data.exp_deferred_qs and convert to rcu_data.cpu
    no_qs.b.exp
  rcu-tasks: Don't remove tasks with pending IPIs from holdout list
  torture: Catch kvm.sh help text up with actual options
  rcutorture: Sanitize RCUTORTURE_RDR_MASK
  rcutorture: More thoroughly test nested readers
  srcu: Prevent redundant __srcu_read_unlock() wakeup
  rcutorture: Suppress pi-lock-across read-unlock testing for Tiny SRCU
  doc: Remove obsolete kernel-per-CPU-kthreads RCU_FAST_NO_HZ advice
  rcu: in_irq() cleanup
  rcu: Always inline rcu_dynticks_task*_{enter,exit}()
  rcu: Mark sync_sched_exp_online_cleanup() ->cpu_no_qs.b.exp load
  rcu: Prevent expedited GP from enabling tick on offline CPU
  rcu: Make idle entry report expedited quiescent states
  rcu/nocb: Make local rcu_nocb_lock_irqsave() safe against concurrent
    deoffloading
  rcu/nocb: Prepare state machine for a new step
  rcu/nocb: Invoke rcu_core() at the start of deoffloading
  rcu/nocb: Make rcu_core() callbacks acceleration preempt-safe
  rcu/nocb: Make rcu_core() callbacks acceleration (de-)offloading safe
  rcu/nocb: Check a stable offloaded state to manipulate
    qlen_last_fqs_check
  rcu/nocb: Use appropriate rcu_nocb_lock_irqsave()
  rcu/nocb: Limit number of softirq callbacks only on softirq
  rcu: Fix callbacks processing time limit retaining cond_resched()
  rcu: Apply callbacks processing time limit only on softirq
  rcu/nocb: Don't invoke local rcu core on callback overload from nocb
    kthread
  rcu: Improve tree_plugin.h comments and add code cleanups
  refscale: Simplify the errexit checkpoint
  refscale: Prevent buffer to pr_alert() being too long
  refscale: Always log the error message
  doc: Add refcount analogy to What is RCU
  refscale: Add missing '\n' to flush message
  scftorture: Add missing '\n' to flush message
  scftorture: Remove unused SCFTORTOUT
  scftorture: Account for weight_resched when checking for all zeroes
  rcuscale: Always log error message
  doc: RCU: Avoid 'Symbol' font-family in SVG figures
  scftorture: Always log error message
  locktorture,rcutorture,torture: Always log error message
  rcu-tasks: Create per-CPU callback lists
  rcu-tasks: Introduce ->percpu_enqueue_shift for dynamic queue
    selection
  rcu-tasks: Convert grace-period counter to grace-period sequence
    number
  rcu_tasks: Convert bespoke callback list to rcu_segcblist structure
  rcu-tasks: Use spin_lock_rcu_node() and friends
  rcu-tasks: Inspect stalled task's trc state in locked state
  rcu-tasks: Add a ->percpu_enqueue_lim to the rcu_tasks structure
  rcu-tasks: Abstract checking of callback lists
  rcu-tasks: Abstract invocations of callbacks
  rcutorture: Avoid soft lockup during cpu stall
  torture: Make kvm-find-errors.sh report link-time undefined symbols
  rcu-tasks: Use workqueues for multiple rcu_tasks_invoke_cbs()
    invocations
  rcu-tasks: Make rcu_barrier_tasks*() handle multiple callback queues
  rcu-tasks: Add rcupdate.rcu_task_enqueue_lim to set initial queueing
  rcutorture: Test RCU-tasks multiqueue callback queueing
  rcu: Avoid running boost kthreads on isolated CPUs
  rcu: Avoid alloc_pages() when recording stack
  rcutorture: Add CONFIG_PREEMPT_DYNAMIC=n to tiny scenarios
  torture: Retry download once before giving up
  rcu-tasks: Count trylocks to estimate call_rcu_tasks() contention
  rcu/nocb: Remove rcu_node structure from nocb list when de-offloaded
  rcu/nocb: Prepare nocb_cb_wait() to start with a non-offloaded rdp
  rcu/nocb: Optimize kthreads and rdp initialization
  rcu/nocb: Create kthreads on all CPUs if "rcu_nocbs=" or "nohz_full="
    are passed
  rcu/nocb: Allow empty "rcu_nocbs" kernel parameter
  rcu/nocb: Merge rcu_spawn_cpu_nocb_kthread() and
    rcu_spawn_one_nocb_kthread()
  rcutorture: Enable multiple concurrent callback-flood kthreads
  rcutorture: Cause TREE02 and TREE10 scenarios to do more callback
    flooding
  rcutorture: Add ability to limit callback-flood intensity
  rcutorture: Combine n_max_cbs from all kthreads in a callback flood
  rcu-tasks: Avoid raw-spinlocked wakeups from call_rcu_tasks_generic()
  rcu-tasks: Use more callback queues if contention encountered
  rcutorture: Test RCU Tasks lock-contention detection
  rcu-tasks: Use separate ->percpu_dequeue_lim for callback dequeueing
  rcu-tasks: Use fewer callbacks queues if callback flood ends
  rcu/exp: Mark current CPU as exp-QS in IPI loop second pass
  torture: Fix incorrectly redirected "exit" in kvm-remote.sh
  torture: Properly redirect kvm-remote.sh "echo" commands
  rcu-tasks: Fix computation of CPU-to-list shift counts

 .../Expedited-Grace-Periods/Funnel0.svg       |   4 +-
 .../Expedited-Grace-Periods/Funnel1.svg       |   4 +-
 .../Expedited-Grace-Periods/Funnel2.svg       |   4 +-
 .../Expedited-Grace-Periods/Funnel3.svg       |   4 +-
 .../Expedited-Grace-Periods/Funnel4.svg       |   4 +-
 .../Expedited-Grace-Periods/Funnel5.svg       |   4 +-
 .../Expedited-Grace-Periods/Funnel6.svg       |   4 +-
 .../Expedited-Grace-Periods/Funnel7.svg       |   4 +-
 .../Expedited-Grace-Periods/Funnel8.svg       |   4 +-
 .../Tree-RCU-Memory-Ordering.rst              |  69 +--
 .../Requirements/GPpartitionReaders1.svg      |  36 +-
 .../Requirements/ReadersPartitionGP1.svg      |  62 +-
 Documentation/RCU/stallwarn.rst               |  10 +
 Documentation/RCU/whatisRCU.rst               |  90 ++-
 .../admin-guide/kernel-parameters.txt         |  66 +-
 .../admin-guide/kernel-per-CPU-kthreads.rst   |   2 +-
 arch/sh/configs/sdk7786_defconfig             |   1 -
 arch/xtensa/configs/nommu_kc705_defconfig     |   1 -
 include/linux/rcu_segcblist.h                 |  51 +-
 include/linux/rcupdate.h                      |  50 +-
 include/linux/rcupdate_trace.h                |   5 +-
 include/linux/rcutiny.h                       |   2 +-
 include/linux/srcu.h                          |   3 +-
 include/linux/torture.h                       |  17 +-
 kernel/locking/locktorture.c                  |  18 +-
 kernel/rcu/Kconfig                            |   2 +-
 kernel/rcu/rcu_segcblist.c                    |  10 +-
 kernel/rcu/rcu_segcblist.h                    |  12 +-
 kernel/rcu/rcuscale.c                         |  24 +-
 kernel/rcu/rcutorture.c                       | 320 +++++++---
 kernel/rcu/refscale.c                         |  50 +-
 kernel/rcu/srcutiny.c                         |   2 +-
 kernel/rcu/tasks.h                            | 583 ++++++++++++++----
 kernel/rcu/tree.c                             | 119 ++--
 kernel/rcu/tree.h                             |  24 +-
 kernel/rcu/tree_exp.h                         |  15 +-
 kernel/rcu/tree_nocb.h                        | 162 +++--
 kernel/rcu/tree_plugin.h                      |  61 +-
 kernel/rcu/update.c                           |   8 +-
 kernel/scftorture.c                           |  20 +-
 kernel/torture.c                              |   4 +-
 .../rcutorture/bin/kvm-find-errors.sh         |   4 +-
 .../rcutorture/bin/kvm-recheck-rcu.sh         |   2 +-
 .../selftests/rcutorture/bin/kvm-remote.sh    |  23 +-
 tools/testing/selftests/rcutorture/bin/kvm.sh |  11 +-
 .../selftests/rcutorture/bin/parse-build.sh   |   3 +-
 .../selftests/rcutorture/bin/torture.sh       |   9 +-
 .../selftests/rcutorture/configs/rcu/SRCU-T   |   1 +
 .../selftests/rcutorture/configs/rcu/SRCU-U   |   1 +
 .../rcutorture/configs/rcu/TASKS01.boot       |   1 +
 .../selftests/rcutorture/configs/rcu/TINY01   |   1 +
 .../selftests/rcutorture/configs/rcu/TINY02   |   1 +
 .../rcutorture/configs/rcu/TRACE01.boot       |   1 +
 .../rcutorture/configs/rcu/TRACE02.boot       |   1 +
 .../rcutorture/configs/rcu/TREE02.boot        |   1 +
 .../rcutorture/configs/rcu/TREE10.boot        |   1 +
 .../rcutorture/configs/rcuscale/TINY          |   1 +
 57 files changed, 1360 insertions(+), 637 deletions(-)
 create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/TREE02.boot
 create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/TREE10.boot

Approved-by: Prarit Bhargava <prarit@redhat.com>
Approved-by: Wander Lairson Costa <wander@redhat.com>
Approved-by: Phil Auld <pauld@redhat.com>

Signed-off-by: Patrick Talbert <ptalbert@redhat.com>
2022-04-19 12:23:21 +02:00
Waiman Long bcf6cd7df4 rcu: Avoid alloc_pages() when recording stack
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994

commit 300c0c5e721834f484b03fa3062602dd8ff48413
Author: Jun Miao <jun.miao@intel.com>
Date:   Tue, 16 Nov 2021 07:23:02 +0800

    rcu: Avoid alloc_pages() when recording stack

    The default kasan_record_aux_stack() calls stack_depot_save() with GFP_NOWAIT,
    which in turn can then call alloc_pages(GFP_NOWAIT, ...).  In general, however,
    it is not even possible to use either GFP_ATOMIC nor GFP_NOWAIT in certain
    non-preemptive contexts/RT kernel including raw_spin_locks (see gfp.h and ab00db216c).
    Fix it by instructing stackdepot to not expand stack storage via alloc_pages()
    in case it runs out by using kasan_record_aux_stack_noalloc().

    Jianwei Hu reported:
    BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:969
    in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 15319, name: python3
    INFO: lockdep is turned off.
    irq event stamp: 0
      hardirqs last  enabled at (0): [<0000000000000000>] 0x0
      hardirqs last disabled at (0): [<ffffffff856c8b13>] copy_process+0xaf3/0x2590
      softirqs last  enabled at (0): [<ffffffff856c8b13>] copy_process+0xaf3/0x2590
      softirqs last disabled at (0): [<0000000000000000>] 0x0
      CPU: 6 PID: 15319 Comm: python3 Tainted: G        W  O 5.15-rc7-preempt-rt #1
      Hardware name: Supermicro SYS-E300-9A-8C/A2SDi-8C-HLN4F, BIOS 1.1b 12/17/2018
      Call Trace:
        show_stack+0x52/0x58
        dump_stack+0xa1/0xd6
        ___might_sleep.cold+0x11c/0x12d
        rt_spin_lock+0x3f/0xc0
        rmqueue+0x100/0x1460
        rmqueue+0x100/0x1460
        mark_usage+0x1a0/0x1a0
        ftrace_graph_ret_addr+0x2a/0xb0
        rmqueue_pcplist.constprop.0+0x6a0/0x6a0
         __kasan_check_read+0x11/0x20
         __zone_watermark_ok+0x114/0x270
         get_page_from_freelist+0x148/0x630
         is_module_text_address+0x32/0xa0
         __alloc_pages_nodemask+0x2f6/0x790
         __alloc_pages_slowpath.constprop.0+0x12d0/0x12d0
         create_prof_cpu_mask+0x30/0x30
         alloc_pages_current+0xb1/0x150
         stack_depot_save+0x39f/0x490
         kasan_save_stack+0x42/0x50
         kasan_save_stack+0x23/0x50
         kasan_record_aux_stack+0xa9/0xc0
         __call_rcu+0xff/0x9c0
         call_rcu+0xe/0x10
         put_object+0x53/0x70
         __delete_object+0x7b/0x90
         kmemleak_free+0x46/0x70
         slab_free_freelist_hook+0xb4/0x160
         kfree+0xe5/0x420
         kfree_const+0x17/0x30
         kobject_cleanup+0xaa/0x230
         kobject_put+0x76/0x90
         netdev_queue_update_kobjects+0x17d/0x1f0
         ... ...
         ksys_write+0xd9/0x180
         __x64_sys_write+0x42/0x50
         do_syscall_64+0x38/0x50
         entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Links: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/kasan.h?id=7cb3007ce2da27ec02a1a3211941e7fe6875b642
    Fixes: 84109ab585 ("rcu: Record kvfree_call_rcu() call stack for KASAN")
    Fixes: 26e760c9a7 ("rcu: kasan: record and print call_rcu() call stack")
    Reported-by: Jianwei Hu <jianwei.hu@windriver.com>
    Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
    Acked-by: Marco Elver <elver@google.com>
    Tested-by: Juri Lelli <juri.lelli@redhat.com>
    Signed-off-by: Jun Miao <jun.miao@intel.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-03-24 17:16:18 -04:00
Waiman Long dee4fbd239 rcu/nocb: Don't invoke local rcu core on callback overload from nocb kthread
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994

commit 0598a4d4429c0a952ac0e99e5280354cf4ccc01c
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Tue, 19 Oct 2021 02:08:16 +0200

    rcu/nocb: Don't invoke local rcu core on callback overload from nocb kthread

    rcu_core() tries to ensure that its self-invocation in case of callbacks
    overload only happen in softirq/rcuc mode. Indeed it doesn't make sense
    to trigger local RCU core from nocb_cb kthread since it can execute
    on a CPU different from the target rdp. Also in case of overload, the
    nocb_cb kthread simply iterates a new loop of callbacks processing.

    However the "offloaded" check that aims at preventing misplaced
    rcu_core() invocations is wrong. First of all that state is volatile
    and second: softirq/rcuc can execute while the target rdp is offloaded.
    As a result rcu_core() can be invoked on the wrong CPU while in the
    process of (de-)offloading.

    Fix that with moving the rcu_core() self-invocation to rcu_core() itself,
    irrespective of the rdp offloaded state.

    Tested-by: Valentin Schneider <valentin.schneider@arm.com>
    Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Valentin Schneider <valentin.schneider@arm.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Cc: Josh Triplett <josh@joshtriplett.org>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
    Cc: Uladzislau Rezki <urezki@gmail.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-03-24 17:16:04 -04:00
Waiman Long 27dd5723e4 rcu: Apply callbacks processing time limit only on softirq
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994

commit a554ba288845fd3f6f12311fd76a51694233458a
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Tue, 19 Oct 2021 02:08:15 +0200

    rcu: Apply callbacks processing time limit only on softirq

    Time limit only makes sense when callbacks are serviced in softirq mode
    because:

    _ In case we need to get back to the scheduler,
      cond_resched_tasks_rcu_qs() is called after each callback.

    _ In case some other softirq vector needs the CPU, the call to
      local_bh_enable() before cond_resched_tasks_rcu_qs() takes care about
      them via a call to do_softirq().

    Therefore, make sure the time limit only applies to softirq mode.

    Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
    Tested-by: Valentin Schneider <valentin.schneider@arm.com>
    Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Valentin Schneider <valentin.schneider@arm.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Cc: Josh Triplett <josh@joshtriplett.org>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
    Cc: Uladzislau Rezki <urezki@gmail.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-03-24 17:16:03 -04:00
Waiman Long fb8f304925 rcu: Fix callbacks processing time limit retaining cond_resched()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994

commit 3e61e95e2d095e308616cba4ffb640f95a480e01
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Tue, 19 Oct 2021 02:08:14 +0200

    rcu: Fix callbacks processing time limit retaining cond_resched()

    The callbacks processing time limit makes sure we are not exceeding a
    given amount of time executing the queue.

    However its "continue" clause bypasses the cond_resched() call on
    rcuc and NOCB kthreads, delaying it until we reach the limit, which can
    be very long...

    Make sure the scheduler has a higher priority than the time limit.

    Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
    Tested-by: Valentin Schneider <valentin.schneider@arm.com>
    Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Valentin Schneider <valentin.schneider@arm.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Cc: Josh Triplett <josh@joshtriplett.org>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
    Cc: Uladzislau Rezki <urezki@gmail.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-03-24 17:16:03 -04:00
Waiman Long 4c81879303 rcu/nocb: Limit number of softirq callbacks only on softirq
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994

commit 78ad37a2c50dfdb9a60e42bb9ee1da86d1fe770c
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Tue, 19 Oct 2021 02:08:13 +0200

    rcu/nocb: Limit number of softirq callbacks only on softirq

    The current condition to limit the number of callbacks executed in a
    row checks the offloaded state of the rdp. Not only is it volatile
    but it is also misleading: the rcu_core() may well be executing
    callbacks concurrently with NOCB kthreads, and the offloaded state
    would then be verified on both cases. As a result the limit would
    spuriously not apply anymore on softirq while in the middle of
    (de-)offloading process.

    Fix and clarify the condition with those constraints in mind:

    _ If callbacks are processed either by rcuc or NOCB kthread, the call
      to cond_resched_tasks_rcu_qs() is enough to take care of the overload.

    _ If instead callbacks are processed by softirqs:
      * If need_resched(), exit the callbacks processing
      * Otherwise if CPU is idle we can continue
      * Otherwise exit because a softirq shouldn't interrupt a task for too
        long nor deprive other pending softirq vectors of the CPU.

    Tested-by: Valentin Schneider <valentin.schneider@arm.com>
    Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Valentin Schneider <valentin.schneider@arm.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Cc: Josh Triplett <josh@joshtriplett.org>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
    Cc: Uladzislau Rezki <urezki@gmail.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-03-24 17:16:03 -04:00
Waiman Long 548b78a98f rcu/nocb: Use appropriate rcu_nocb_lock_irqsave()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065994

commit 7b65dfa32dca1be0400d43a3d5bb80ed6e04958e
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Tue, 19 Oct 2021 02:08:12 +0200

    rcu/nocb: Use appropriate rcu_nocb_lock_irqsave()

    Instead of hardcoding IRQ save and nocb lock, use the consolidated
    API (and fix a comment as per Valentin Schneider's suggestion).

    Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
    Tested-by: Valentin Schneider <valentin.schneider@arm.com>
    Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Valentin Schneider <valentin.schneider@arm.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Cc: Josh Triplett <josh@joshtriplett.org>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
    Cc: Uladzislau Rezki <urezki@gmail.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-03-24 17:16:02 -04:00