Commit Graph

150 Commits

Author SHA1 Message Date
Waiman Long dfd6ba19b1 rcutorture: Make rcutorture support print rcu-tasks gp state
JIRA: https://issues.redhat.com/browse/RHEL-55557

commit dddcddef1414be3ebc37a40d13fcc0f6a672ba9f
Author: Zqiang <qiang.zhang1211@gmail.com>
Date:   Mon, 18 Mar 2024 17:34:11 +0800

    rcutorture: Make rcutorture support print rcu-tasks gp state

    This commit make rcu-tasks related rcutorture test support rcu-tasks
    gp state printing when the writer stall occurs or the at the end of
    rcutorture test, and generate rcu_ops->get_gp_data() operation to
    simplify the acquisition of gp state for different types of rcutorture
    tests.

    Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2024-08-26 10:57:47 -04:00
Waiman Long 890d600997 srcu: Improve comments about acceleration leak
JIRA: https://issues.redhat.com/browse/RHEL-55557

commit 67050837ec14fc20a26b237ce965c50c85a318b7
Author: Joel Fernandes (Google) <joel@joelfernandes.org>
Date:   Wed, 27 Dec 2023 12:47:38 -0500

    srcu: Improve comments about acceleration leak

    The comments added in commit 1ef990c4b36b ("srcu: No need to
    advance/accelerate if no callback enqueued") are a bit confusing.
    The comments are describing a scenario for code that was moved and is
    no longer the way it was (snapshot after advancing). Improve the code
    comments to reflect this and also document why acceleration can never
    fail.

    Cc: Frederic Weisbecker <frederic@kernel.org>
    Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com>
    Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
    Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Boqun Feng <boqun.feng@gmail.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2024-08-26 10:57:22 -04:00
Waiman Long 52dceb8212 srcu: Explain why callbacks invocations can't run concurrently
JIRA: https://issues.redhat.com/browse/RHEL-55557

commit c21357e4461f3f9c8ff93302906b5372411ee108
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Wed, 4 Oct 2023 01:29:03 +0200

    srcu: Explain why callbacks invocations can't run concurrently

    If an SRCU barrier is queued while callbacks are running and a new
    callbacks invocator for the same sdp were to run concurrently, the
    RCU barrier might execute too early. As this requirement is non-obvious,
    make sure to keep a record.

    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2024-08-26 10:57:05 -04:00
Waiman Long 9f330005fd srcu: No need to advance/accelerate if no callback enqueued
JIRA: https://issues.redhat.com/browse/RHEL-55557

commit 94c55b9e21979daa88e190bf971c47432a818ebe
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Wed, 4 Oct 2023 01:29:02 +0200

    srcu: No need to advance/accelerate if no callback enqueued

    While in grace period start, there is nothing to accelerate and
    therefore no need to advance the callbacks either if no callback is
    to be enqueued.

    Spare these needless operations in this case.

    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2024-08-26 10:57:04 -04:00
Waiman Long fbd1964642 srcu: Remove superfluous callbacks advancing from srcu_gp_start()
JIRA: https://issues.redhat.com/browse/RHEL-55557

commit 20eb4142397cf3ec221de43f10ea149af462c572
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Wed, 4 Oct 2023 01:29:01 +0200

    srcu: Remove superfluous callbacks advancing from srcu_gp_start()

    Callbacks advancing on SRCU must be performed on two specific places:

    1) On enqueue time in order to make room for the acceleration of the
       new callback.

    2) On invocation time in order to move the callbacks ready to invoke.

    Any other callback advancing callsite is needless. Remove the remaining
    one in srcu_gp_start().

    Co-developed-by: Yong He <zhuangel570@gmail.com>
    Signed-off-by: Yong He <zhuangel570@gmail.com>
    Co-developed-by: Joel Fernandes <joel@joelfernandes.org>
    Signed-off-by: Joel Fernandes <joel@joelfernandes.org>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Co-developed-by: Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>
    Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2024-08-26 10:57:03 -04:00
Waiman Long 46ae167ae6 srcu: Only accelerate on enqueue time
JIRA: https://issues.redhat.com/browse/RHEL-34076

commit 8a77f38bcd28d3c22ab7dd8eff3f299d43c00411
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Wed, 4 Oct 2023 01:29:00 +0200

    srcu: Only accelerate on enqueue time

    Acceleration in SRCU happens on enqueue time for each new callback. This
    operation is expected not to fail and therefore any similar attempt
    from other places shouldn't find any remaining callbacks to accelerate.

    Moreover accelerations performed beyond enqueue time are error prone
    because rcu_seq_snap() then may return the snapshot for a new grace
    period that is not going to be started.

    Remove these dangerous and needless accelerations and introduce instead
    assertions reporting leaking unaccelerated callbacks beyond enqueue
    time.

    Co-developed-by: Yong He <alexyonghe@tencent.com>
    Signed-off-by: Yong He <alexyonghe@tencent.com>
    Co-developed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Co-developed-by: Neeraj upadhyay <Neeraj.Upadhyay@amd.com>
    Signed-off-by: Neeraj upadhyay <Neeraj.Upadhyay@amd.com>
    Reviewed-by: Like Xu <likexu@tencent.com>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2024-05-31 10:56:17 -04:00
Waiman Long 3eedd5a2b3 srcu: Fix callbacks acceleration mishandling
JIRA: https://issues.redhat.com/browse/RHEL-34076

commit 4a8e65b0c348e42107c64381e692e282900be361
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Wed, 4 Oct 2023 01:28:59 +0200

    srcu: Fix callbacks acceleration mishandling

    SRCU callbacks acceleration might fail if the preceding callbacks
    advance also fails. This can happen when the following steps are met:

    1) The RCU_WAIT_TAIL segment has callbacks (say for gp_num 8) and the
       RCU_NEXT_READY_TAIL also has callbacks (say for gp_num 12).

    2) The grace period for RCU_WAIT_TAIL is observed as started but not yet
       completed so rcu_seq_current() returns 4 + SRCU_STATE_SCAN1 = 5.

    3) This value is passed to rcu_segcblist_advance() which can't move
       any segment forward and fails.

    4) srcu_gp_start_if_needed() still proceeds with callback acceleration.
       But then the call to rcu_seq_snap() observes the grace period for the
       RCU_WAIT_TAIL segment (gp_num 8) as completed and the subsequent one
       for the RCU_NEXT_READY_TAIL segment as started
       (ie: 8 + SRCU_STATE_SCAN1 = 9) so it returns a snapshot of the
       next grace period, which is 16.

    5) The value of 16 is passed to rcu_segcblist_accelerate() but the
       freshly enqueued callback in RCU_NEXT_TAIL can't move to
       RCU_NEXT_READY_TAIL which already has callbacks for a previous grace
       period (gp_num = 12). So acceleration fails.

    6) Note in all these steps, srcu_invoke_callbacks() hadn't had a chance
       to run srcu_invoke_callbacks().

    Then some very bad outcome may happen if the following happens:

    7) Some other CPU races and starts the grace period number 16 before the
       CPU handling previous steps had a chance. Therefore srcu_gp_start()
       isn't called on the latter sdp to fix the acceleration leak from
       previous steps with a new pair of call to advance/accelerate.

    8) The grace period 16 completes and srcu_invoke_callbacks() is finally
       called. All the callbacks from previous grace periods (8 and 12) are
       correctly advanced and executed but callbacks in RCU_NEXT_READY_TAIL
       still remain. Then rcu_segcblist_accelerate() is called with a
       snaphot of 20.

    9) Since nothing started the grace period number 20, callbacks stay
       unhandled.

    This has been reported in real load:

            [3144162.608392] INFO: task kworker/136:12:252684 blocked for more
            than 122 seconds.
            [3144162.615986]       Tainted: G           O  K   5.4.203-1-tlinux4-0011.1 #1
            [3144162.623053] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
            disables this message.
            [3144162.631162] kworker/136:12  D    0 252684      2 0x90004000
            [3144162.631189] Workqueue: kvm-irqfd-cleanup irqfd_shutdown [kvm]
            [3144162.631192] Call Trace:
            [3144162.631202]  __schedule+0x2ee/0x660
            [3144162.631206]  schedule+0x33/0xa0
            [3144162.631209]  schedule_timeout+0x1c4/0x340
            [3144162.631214]  ? update_load_avg+0x82/0x660
            [3144162.631217]  ? raw_spin_rq_lock_nested+0x1f/0x30
            [3144162.631218]  wait_for_completion+0x119/0x180
            [3144162.631220]  ? wake_up_q+0x80/0x80
            [3144162.631224]  __synchronize_srcu.part.19+0x81/0xb0
            [3144162.631226]  ? __bpf_trace_rcu_utilization+0x10/0x10
            [3144162.631227]  synchronize_srcu+0x5f/0xc0
            [3144162.631236]  irqfd_shutdown+0x3c/0xb0 [kvm]
            [3144162.631239]  ? __schedule+0x2f6/0x660
            [3144162.631243]  process_one_work+0x19a/0x3a0
            [3144162.631244]  worker_thread+0x37/0x3a0
            [3144162.631247]  kthread+0x117/0x140
            [3144162.631247]  ? process_one_work+0x3a0/0x3a0
            [3144162.631248]  ? __kthread_cancel_work+0x40/0x40
            [3144162.631250]  ret_from_fork+0x1f/0x30

    Fix this with taking the snapshot for acceleration _before_ the read
    of the current grace period number.

    The only side effect of this solution is that callbacks advancing happen
    then _after_ the full barrier in rcu_seq_snap(). This is not a problem
    because that barrier only cares about:

    1) Ordering accesses of the update side before call_srcu() so they don't
       bleed.
    2) See all the accesses prior to the grace period of the current gp_num

    The only things callbacks advancing need to be ordered against are
    carried by snp locking.

    Reported-by: Yong He <alexyonghe@tencent.com>
    Co-developed-by:: Yong He <alexyonghe@tencent.com>
    Signed-off-by: Yong He <alexyonghe@tencent.com>
    Co-developed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by:  Joel Fernandes (Google) <joel@joelfernandes.org>
    Co-developed-by: Neeraj upadhyay <Neeraj.Upadhyay@amd.com>
    Signed-off-by: Neeraj upadhyay <Neeraj.Upadhyay@amd.com>
    Link: http://lore.kernel.org/CANZk6aR+CqZaqmMWrC2eRRPY12qAZnDZLwLnHZbNi=xXMB401g@mail.gmail.com
    Fixes: da915ad5cf ("srcu: Parallelize callback handling")
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2024-05-31 10:56:17 -04:00
Waiman Long faa9279f82 srcu: Fix srcu_struct node grpmask overflow on 64-bit systems
JIRA: https://issues.redhat.com/browse/RHEL-34076

commit d8d5b7bf6f2105883bbd91bbd4d5b67e4e3dff71
Author: Denis Arefev <arefev@swemel.ru>
Date:   Mon, 4 Sep 2023 15:21:14 +0300

    srcu: Fix srcu_struct node grpmask overflow on 64-bit systems

    The value of a bitwise expression 1 << (cpu - sdp->mynode->grplo)
    is subject to overflow due to a failure to cast operands to a larger
    data type before performing the bitwise operation.

    The maximum result of this subtraction is defined by the RCU_FANOUT_LEAF
    Kconfig option, which on 64-bit systems defaults to 16 (resulting in a
    maximum shift of 15), but which can be set up as high as 64 (resulting
    in a maximum shift of 63).  A value of 31 can result in sign extension,
    resulting in 0xffffffff80000000 instead of the desired 0x80000000.
    A value of 32 or greater triggers undefined behavior per the C standard.

    This bug has not been known to cause issues because almost all kernels
    take the default CONFIG_RCU_FANOUT_LEAF=16.  Furthermore, as long as a
    given compiler gives a deterministic non-zero result for 1<<N for N>=32,
    the code correctly invokes all SRCU callbacks, albeit wasting CPU time
    along the way.

    This commit therefore substitutes the correct 1UL for the buggy 1.

    Found by Linux Verification Center (linuxtesting.org) with SVACE.

    Signed-off-by: Denis Arefev <arefev@swemel.ru>
    Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Cc: David Laight <David.Laight@aculab.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2024-05-31 10:56:16 -04:00
Waiman Long 2ef5742a82 rcu: Dump memory object info if callback function is invalid
JIRA: https://issues.redhat.com/browse/RHEL-34076

commit 2cbc482d325ee58001472c4359b311958c4efdd1
Author: Zhen Lei <thunder.leizhen@huawei.com>
Date:   Sat, 5 Aug 2023 11:17:26 +0800

    rcu: Dump memory object info if callback function is invalid

    When a structure containing an RCU callback rhp is (incorrectly) freed
    and reallocated after rhp is passed to call_rcu(), it is not unusual for
    rhp->func to be set to NULL. This defeats the debugging prints used by
    __call_rcu_common() in kernels built with CONFIG_DEBUG_OBJECTS_RCU_HEAD=y,
    which expect to identify the offending code using the identity of this
    function.

    And in kernels build without CONFIG_DEBUG_OBJECTS_RCU_HEAD=y, things
    are even worse, as can be seen from this splat:

    Unable to handle kernel NULL pointer dereference at virtual address 0
    ... ...
    PC is at 0x0
    LR is at rcu_do_batch+0x1c0/0x3b8
    ... ...
     (rcu_do_batch) from (rcu_core+0x1d4/0x284)
     (rcu_core) from (__do_softirq+0x24c/0x344)
     (__do_softirq) from (__irq_exit_rcu+0x64/0x108)
     (__irq_exit_rcu) from (irq_exit+0x8/0x10)
     (irq_exit) from (__handle_domain_irq+0x74/0x9c)
     (__handle_domain_irq) from (gic_handle_irq+0x8c/0x98)
     (gic_handle_irq) from (__irq_svc+0x5c/0x94)
     (__irq_svc) from (arch_cpu_idle+0x20/0x3c)
     (arch_cpu_idle) from (default_idle_call+0x4c/0x78)
     (default_idle_call) from (do_idle+0xf8/0x150)
     (do_idle) from (cpu_startup_entry+0x18/0x20)
     (cpu_startup_entry) from (0xc01530)

    This commit therefore adds calls to mem_dump_obj(rhp) to output some
    information, for example:

      slab kmalloc-256 start ffff410c45019900 pointer offset 0 size 256

    This provides the rough size of the memory block and the offset of the
    rcu_head structure, which as least provides at least a few clues to help
    locate the problem. If the problem is reproducible, additional slab
    debugging can be enabled, for example, CONFIG_DEBUG_SLAB=y, which can
    provide significantly more information.

    Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2024-05-31 10:56:16 -04:00
Waiman Long 1cb8f685e9 srcu: Fix error handling in init_srcu_struct_fields()
JIRA: https://issues.redhat.com/browse/RHEL-34076

commit f0a31b26be1f725e3f56beed76486c1b034120b3
Author: Joel Fernandes (Google) <joel@joelfernandes.org>
Date:   Sat, 29 Jul 2023 14:27:32 +0000

    srcu: Fix error handling in init_srcu_struct_fields()

    The current error handling in init_srcu_struct_fields() is a bit
    inconsistent.  If init_srcu_struct_nodes() fails, the function either
    returns -ENOMEM or 0 depending on whether ssp->sda_is_static is true or
    false. This can make init_srcu_struct_fields() return 0 even if memory
    allocation failed!

    Simplify the error handling by always returning -ENOMEM if either
    init_srcu_struct_nodes() or the per-CPU allocation fails. This makes the
    control flow easier to follow and avoids the inconsistent return values.

    Add goto labels to avoid duplicating the error cleanup code.

    Link: https://lore.kernel.org/r/20230404003508.GA254019@google.com
    Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2024-05-31 10:56:16 -04:00
Waiman Long 5e683ba142 srcu: Clarify comments on memory barrier "E"
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit 754aa6427efeb8a059233e18e810263a108fdd71
Author: Joel Fernandes (Google) <joel@joelfernandes.org>
Date:   Sat, 28 Jan 2023 03:59:01 +0000

    srcu: Clarify comments on memory barrier "E"

    There is an smp_mb() named "E" in srcu_flip() immediately before the
    increment (flip) of the srcu_struct structure's ->srcu_idx.

    The purpose of E is to order the preceding scan's read of lock counters
    against the flipping of the ->srcu_idx, in order to prevent new readers
    from continuing to use the old ->srcu_idx value, which might needlessly
    extend the grace period.

    However, this ordering is already enforced because of the control
    dependency between the preceding scan and the ->srcu_idx flip.
    This control dependency exists because atomic_long_read() is used
    to scan the counts, because WRITE_ONCE() is used to flip ->srcu_idx,
    and because ->srcu_idx is not flipped until the ->srcu_lock_count[] and
    ->srcu_unlock_count[] counts match.  And such a match cannot happen when
    there is an in-flight reader that started before the flip (observation
    courtesy Mathieu Desnoyers).

    The litmus test below (courtesy of Frederic Weisbecker, with changes
    for ctrldep by Boqun and Joel) shows this:

    C srcu
    (*
     * bad condition: P0's first scan (SCAN1) saw P1's idx=0 LOCK count inc, though P1 saw flip.
     *
     * So basically, the ->po ordering on both P0 and P1 is enforced via ->ppo
     * (control deps) on both sides, and both P0 and P1 are interconnected by ->rf
     * relations. Combining the ->ppo with ->rf, a cycle is impossible.
     *)

    {}

    // updater
    P0(int *IDX, int *LOCK0, int *UNLOCK0, int *LOCK1, int *UNLOCK1)
    {
            int lock1;
            int unlock1;
            int lock0;
            int unlock0;

            // SCAN1
            unlock1 = READ_ONCE(*UNLOCK1);
            smp_mb(); // A
            lock1 = READ_ONCE(*LOCK1);

            // FLIP
            if (lock1 == unlock1) {   // Control dep
                    smp_mb(); // E    // Remove E and still passes.
                    WRITE_ONCE(*IDX, 1);
                    smp_mb(); // D

                    // SCAN2
                    unlock0 = READ_ONCE(*UNLOCK0);
                    smp_mb(); // A
                    lock0 = READ_ONCE(*LOCK0);
            }
    }

    // reader
    P1(int *IDX, int *LOCK0, int *UNLOCK0, int *LOCK1, int *UNLOCK1)
    {
            int tmp;
            int idx1;
            int idx2;

            // 1st reader
            idx1 = READ_ONCE(*IDX);
            if (idx1 == 0) {         // Control dep
                    tmp = READ_ONCE(*LOCK0);
                    WRITE_ONCE(*LOCK0, tmp + 1);
                    smp_mb(); /* B and C */
                    tmp = READ_ONCE(*UNLOCK0);
                    WRITE_ONCE(*UNLOCK0, tmp + 1);
            } else {
                    tmp = READ_ONCE(*LOCK1);
                    WRITE_ONCE(*LOCK1, tmp + 1);
                    smp_mb(); /* B and C */
                    tmp = READ_ONCE(*UNLOCK1);
                    WRITE_ONCE(*UNLOCK1, tmp + 1);
            }
    }

    exists (0:lock1=1 /\ 1:idx1=1)

    More complicated litmus tests with multiple SRCU readers also show that
    memory barrier E is not needed.

    This commit therefore clarifies the comment on memory barrier E.

    Why not also remove that redundant smp_mb()?

    Because control dependencies are quite fragile due to their not being
    recognized by most compilers and tools.  Control dependencies therefore
    exact an ongoing maintenance burden, and such a burden cannot be justified
    in this slowpath.  Therefore, that smp_mb() stays until such time as
    its overhead becomes a measurable problem in a real workload running on
    a real production system, or until such time as compilers start paying
    attention to this sort of control dependency.

    Co-developed-by: Frederic Weisbecker <frederic@kernel.org>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Co-developed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Co-developed-by: Boqun Feng <boqun.feng@gmail.com>
    Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
    Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-10-06 21:44:20 -04:00
Waiman Long de3564aca0 srcu: Fix long lines in srcu_funnel_gp_start()
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit cefc0a599b19d8dd0e26d0b2e43311bae7530ca1
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Sat, 18 Mar 2023 12:31:53 -0700

    srcu: Fix long lines in srcu_funnel_gp_start()

    This commit creates an srcu_usage pointer named "sup" as a shorter
    synonym for the "ssp->srcu_sup" that was bloating several lines of code.

    Cc: Christoph Hellwig <hch@lst.de>
    Tested-by: Sachin Sant <sachinp@linux.ibm.com>
    Tested-by: "Zhang, Qiang1" <qiang1.zhang@intel.com>
    Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-10-06 21:44:19 -04:00
Waiman Long 0756127b1a srcu: Fix long lines in srcu_gp_end()
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit 6c366522e10f2b5e916b3f08ac042df1a1cd512a
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Sat, 18 Mar 2023 10:52:48 -0700

    srcu: Fix long lines in srcu_gp_end()

    This commit creates an srcu_usage pointer named "sup" as a shorter
    synonym for the "ssp->srcu_sup" that was bloating several lines of code.

    Cc: Christoph Hellwig <hch@lst.de>
    Tested-by: Sachin Sant <sachinp@linux.ibm.com>
    Tested-by: "Zhang, Qiang1" <qiang1.zhang@intel.com>
    Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-10-06 21:44:18 -04:00
Waiman Long e916f43a12 srcu: Fix long lines in cleanup_srcu_struct()
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit 5ff8319f07db11c3b9347ce1dc0a6d951ae96d29
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Sat, 18 Mar 2023 10:51:50 -0700

    srcu: Fix long lines in cleanup_srcu_struct()

    This commit creates an srcu_usage pointer named "sup" as a shorter
    synonym for the "ssp->srcu_sup" that was bloating several lines of code.

    Cc: Christoph Hellwig <hch@lst.de>
    Tested-by: Sachin Sant <sachinp@linux.ibm.com>
    Tested-by: "Zhang, Qiang1" <qiang1.zhang@intel.com>
    Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-10-06 21:44:17 -04:00
Waiman Long 5725498a44 srcu: Fix long lines in srcu_get_delay()
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit eabe7625f053050e4cecbbe98bd944f7e8eb14f5
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Sat, 18 Mar 2023 09:34:52 -0700

    srcu: Fix long lines in srcu_get_delay()

    This commit creates an srcu_usage pointer named "sup" as a shorter
    synonym for the "ssp->srcu_sup" that was bloating several lines of code.

    Tested-by: Sachin Sant <sachinp@linux.ibm.com>
    Tested-by: "Zhang, Qiang1" <qiang1.zhang@intel.com>
    Cc: Christoph Hellwig <hch@lst.de>
    Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-10-06 21:44:17 -04:00
Waiman Long 57aa7f8ce0 srcu: Check for readers at module-exit time
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit a7bf4d7c16c1cf9753873879630a5d5169eb3206
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Fri, 24 Mar 2023 09:05:50 -0700

    srcu: Check for readers at module-exit time

    If a given statically allocated in-module srcu_struct structure was ever
    used for updates, srcu_module_going() will invoke cleanup_srcu_struct()
    at module-exit time.  This will check for the error case of SRCU readers
    persisting past module-exit time.  On the other hand, if this srcu_struct
    structure never went through a grace period, srcu_module_going() only
    invokes free_percpu(), which would result in strange failures if SRCU
    readers persisted past module-exit time.

    This commit therefore adds a srcu_readers_active() check to
    srcu_module_going(), splatting if readers have persisted and refraining
    from invoking free_percpu() in that case.  Better to leak memory than
    to suffer silent memory corruption!

    [ paulmck: Apply Zhang, Qiang1 feedback on memory leak. ]

    Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-10-06 21:44:16 -04:00
Waiman Long 7b446e704b srcu: Move work-scheduling fields from srcu_struct to srcu_usage
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit fd1b3f8e097b7fbbab8ac4a802b24fc23c703dcf
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Fri, 17 Mar 2023 21:30:32 -0700

    srcu: Move work-scheduling fields from srcu_struct to srcu_usage

    This commit moves the ->reschedule_jiffies, ->reschedule_count, and
    ->work fields from the srcu_struct structure to the srcu_usage structure
    to reduce the size of the former in order to improve cache locality.

    However, this means that the container_of() calls cannot get a pointer
    to the srcu_struct because they are no longer in the srcu_struct.
    This issue is addressed by adding a ->srcu_ssp field in the srcu_usage
    structure that references the corresponding srcu_struct structure.
    And given the presence of the sup pointer to the srcu_usage structure,
    replace some ssp->srcu_usage-> instances with sup->.

    [ paulmck Apply feedback from kernel test robot. ]

    Link: https://lore.kernel.org/oe-kbuild-all/202303191400.iO5BOqka-lkp@intel.com/
    Suggested-by: Christoph Hellwig <hch@lst.de>
    Tested-by: Sachin Sant <sachinp@linux.ibm.com>
    Tested-by: "Zhang, Qiang1" <qiang1.zhang@intel.com>
    Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-10-06 21:44:16 -04:00
Waiman Long b201447229 srcu: Move srcu_barrier() fields from srcu_struct to srcu_usage
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit d20162e0bfc222183a7c94cd00e74b6bbf1a605b
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Fri, 17 Mar 2023 21:08:18 -0700

    srcu: Move srcu_barrier() fields from srcu_struct to srcu_usage

    This commit moves the ->srcu_barrier_seq, ->srcu_barrier_mutex,
    ->srcu_barrier_completion, and ->srcu_barrier_cpu_cnt fields from the
    srcu_struct structure to the srcu_usage structure to reduce the size of
    the former in order to improve cache locality.

    Suggested-by: Christoph Hellwig <hch@lst.de>
    Tested-by: Sachin Sant <sachinp@linux.ibm.com>
    Tested-by: "Zhang, Qiang1" <qiang1.zhang@intel.com>
    Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-10-06 21:44:15 -04:00
Waiman Long 9c5b51a047 srcu: Move ->sda_is_static from srcu_struct to srcu_usage
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit 660349ac79cb22bb64c44b026d879069783e97d5
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Fri, 17 Mar 2023 20:22:58 -0700

    srcu: Move ->sda_is_static from srcu_struct to srcu_usage

    This commit moves the ->sda_is_static field from the srcu_struct structure
    to the srcu_usage structure to reduce the size of the former in order
    to improve cache locality.

    Suggested-by: Christoph Hellwig <hch@lst.de>
    Tested-by: Sachin Sant <sachinp@linux.ibm.com>
    Tested-by: "Zhang, Qiang1" <qiang1.zhang@intel.com>
    Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-10-06 21:44:15 -04:00
Waiman Long 340cb0f7a8 srcu: Move heuristics fields from srcu_struct to srcu_usage
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit 3b46679c623c2766f4c56fd3f9ce8edbb38c5d20
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Fri, 17 Mar 2023 20:01:02 -0700

    srcu: Move heuristics fields from srcu_struct to srcu_usage

    This commit moves the ->srcu_size_jiffies, ->srcu_n_lock_retries,
    and ->srcu_n_exp_nodelay fields from the srcu_struct structure to the
    srcu_usage structure to reduce the size of the former in order to improve
    cache locality.

    Suggested-by: Christoph Hellwig <hch@lst.de>
    Tested-by: Sachin Sant <sachinp@linux.ibm.com>
    Tested-by: "Zhang, Qiang1" <qiang1.zhang@intel.com>
    Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-10-06 21:44:14 -04:00
Waiman Long 871bc36c0e srcu: Move grace-period fields from srcu_struct to srcu_usage
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit 03200b5ca3b4d4edf634dc052bf3b8eb8dc8bbbc
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Fri, 17 Mar 2023 19:30:50 -0700

    srcu: Move grace-period fields from srcu_struct to srcu_usage

    This commit moves the ->srcu_gp_seq, ->srcu_gp_seq_needed,
    ->srcu_gp_seq_needed_exp, ->srcu_gp_start, and ->srcu_last_gp_end fields
    from the srcu_struct structure to the srcu_usage structure to reduce
    the size of the former in order to improve cache locality.

    Suggested-by: Christoph Hellwig <hch@lst.de>
    Tested-by: Sachin Sant <sachinp@linux.ibm.com>
    Tested-by: "Zhang, Qiang1" <qiang1.zhang@intel.com>
    Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-10-06 21:44:14 -04:00
Waiman Long 9238b6fe23 srcu: Move ->srcu_gp_mutex from srcu_struct to srcu_usage
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit e3a6ab25cfa0fcdcb31c346b9871a566d440980d
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Fri, 17 Mar 2023 19:13:16 -0700

    srcu: Move ->srcu_gp_mutex from srcu_struct to srcu_usage

    This commit moves the ->srcu_gp_mutex field from the srcu_struct structure
    to the srcu_usage structure to reduce the size of the former in order
    to improve cache locality.

    Suggested-by: Christoph Hellwig <hch@lst.de>
    Tested-by: Sachin Sant <sachinp@linux.ibm.com>
    Tested-by: "Zhang, Qiang1" <qiang1.zhang@intel.com>
    Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-10-06 21:44:13 -04:00
Waiman Long 464974c24d srcu: Move ->lock from srcu_struct to srcu_usage
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit b3fb11f7e9c3c64dd86403409a070c996d8ac081
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Fri, 17 Mar 2023 18:29:38 -0700

    srcu: Move ->lock from srcu_struct to srcu_usage

    This commit moves the ->lock field from the srcu_struct structure to
    the srcu_usage structure to reduce the size of the former in order to
    improve cache locality.

    Suggested-by: Christoph Hellwig <hch@lst.de>
    Tested-by: Sachin Sant <sachinp@linux.ibm.com>
    Tested-by: "Zhang, Qiang1" <qiang1.zhang@intel.com>
    Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-10-06 21:44:13 -04:00
Waiman Long a9b4dbe54a srcu: Move ->lock initialization after srcu_usage allocation
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit 0839ade94bdef395bab02b3a579416c112062026
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Fri, 17 Mar 2023 17:35:21 -0700

    srcu: Move ->lock initialization after srcu_usage allocation

    Currently, both __init_srcu_struct() in CONFIG_DEBUG_LOCK_ALLOC=y kernels
    and init_srcu_struct() in CONFIG_DEBUG_LOCK_ALLOC=n kernel initialize
    the srcu_struct structure's ->lock before the srcu_usage structure has
    been allocated.  This of course prevents the ->lock from being moved
    to the srcu_usage structure, so this commit moves the initialization
    into the init_srcu_struct_fields() after the srcu_usage structure has
    been allocated.

    Cc: Christoph Hellwig <hch@lst.de>
    Tested-by: Sachin Sant <sachinp@linux.ibm.com>
    Tested-by: "Zhang, Qiang1" <qiang1.zhang@intel.com>
    Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-10-06 21:44:12 -04:00
Waiman Long c47e914c6c srcu: Move ->srcu_cb_mutex from srcu_struct to srcu_usage
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit 574dc1a7efe490dffe5c1ce0285306feec16a880
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Fri, 17 Mar 2023 17:22:27 -0700

    srcu: Move ->srcu_cb_mutex from srcu_struct to srcu_usage

    This commit moves the ->srcu_cb_mutex field from the srcu_struct structure
    to the srcu_usage structure to reduce the size of the former in order
    to improve cache locality.

    Suggested-by: Christoph Hellwig <hch@lst.de>
    Tested-by: Sachin Sant <sachinp@linux.ibm.com>
    Tested-by: "Zhang, Qiang1" <qiang1.zhang@intel.com>
    Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-10-06 21:44:11 -04:00
Waiman Long e6644842d9 srcu: Move ->srcu_size_state from srcu_struct to srcu_usage
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit a0d8cbd3821369dc9478cabd605417afb9eb24dc
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Fri, 17 Mar 2023 17:16:30 -0700

    srcu: Move ->srcu_size_state from srcu_struct to srcu_usage

    This commit moves the ->srcu_size_state field from the srcu_struct
    structure to the srcu_usage structure to reduce the size of the former
    in order to improve cache locality.

    Suggested-by: Christoph Hellwig <hch@lst.de>
    Tested-by: Sachin Sant <sachinp@linux.ibm.com>
    Tested-by: "Zhang, Qiang1" <qiang1.zhang@intel.com>
    Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-10-06 21:44:11 -04:00
Waiman Long 906311b6eb srcu: Move ->level from srcu_struct to srcu_usage
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit 208f41b1312443401353bec0c1939e2bfc28adce
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Fri, 17 Mar 2023 14:43:08 -0700

    srcu: Move ->level from srcu_struct to srcu_usage

    This commit moves the ->level[] array from the srcu_struct structure to
    the srcu_usage structure to reduce the size of the former in order to
    improve cache locality.

    Suggested-by: Christoph Hellwig <hch@lst.de>
    Tested-by: Sachin Sant <sachinp@linux.ibm.com>
    Tested-by: "Zhang, Qiang1" <qiang1.zhang@intel.com>
    Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-10-06 21:44:10 -04:00
Waiman Long 175ca90f08 srcu: Begin offloading srcu_struct fields to srcu_update
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit 95433f7263011e0e6e83caef85d98896dd99cab7
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Thu, 16 Mar 2023 17:58:51 -0700

    srcu: Begin offloading srcu_struct fields to srcu_update

    The current srcu_struct structure is on the order of 200 bytes in size
    (depending on architecture and .config), which is much better than the
    old-style 26K bytes, but still all too inconvenient when one is trying
    to achieve good cache locality on a fastpath involving SRCU readers.

    However, only a few fields in srcu_struct are used by SRCU readers.
    The remaining fields could be offloaded to a new srcu_update
    structure, thus shrinking the srcu_struct structure down to a few
    tens of bytes.  This commit begins this noble quest, a quest that is
    complicated by open-coded initialization of the srcu_struct within the
    srcu_notifier_head structure.  This complication is addressed by updating
    the srcu_notifier_head structure's open coding, given that there does
    not appear to be a straightforward way of abstracting that initialization.

    This commit moves only the ->node pointer to srcu_update.  Later commits
    will move additional fields.

    [ paulmck: Fold in qiang1.zhang@intel.com's memory-leak fix. ]

    Link: https://lore.kernel.org/all/20230320055751.4120251-1-qiang1.zhang@intel.com/
    Suggested-by: Christoph Hellwig <hch@lst.de>
    Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
    Cc: "Michał Mirosław" <mirq-linux@rere.qmqm.pl>
    Cc: Dmitry Osipenko <dmitry.osipenko@collabora.com>
    Tested-by: Sachin Sant <sachinp@linux.ibm.com>
    Tested-by: "Zhang, Qiang1" <qiang1.zhang@intel.com>
    Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-10-06 21:44:08 -04:00
Waiman Long 713d6e5730 srcu: Use static init for statically allocated in-module srcu_struct
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit f4d01a259374ef358cd6b00a96b4dfc0fb05a844
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Fri, 17 Mar 2023 13:28:04 -0700

    srcu: Use static init for statically allocated in-module srcu_struct

    Further shrinking the srcu_struct structure is eased by requiring
    that in-module srcu_struct structures rely more heavily on static
    initialization.  In particular, this preserves the property that
    a module-load-time srcu_struct initialization can fail only due
    to memory-allocation failure of the per-CPU srcu_data structures.
    It might also slightly improve robustness by keeping the number of memory
    allocations that must succeed down percpu_alloc() call.

    This is in preparation for splitting an srcu_usage structure out
    of the srcu_struct structure.

    [ paulmck: Fold in qiang1.zhang@intel.com feedback. ]

    Cc: Christoph Hellwig <hch@lst.de>
    Tested-by: Sachin Sant <sachinp@linux.ibm.com>
    Tested-by: "Zhang, Qiang1" <qiang1.zhang@intel.com>
    Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-10-06 21:44:07 -04:00
Waiman Long d8689d7dfb rcu: Annotate SRCU's update-side lockdep dependencies
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit f0f44752f5f61ee4e3bd88ae033fdb888320aafe
Author: Boqun Feng <boqun.feng@gmail.com>
Date:   Thu, 12 Jan 2023 22:59:54 -0800

    rcu: Annotate SRCU's update-side lockdep dependencies

    Although all flavors of RCU readers are annotated correctly with
    lockdep as recursive read locks, they do not set the lock_acquire
    'check' parameter.  This means that RCU read locks are not added to
    the lockdep dependency graph, which in turn means that lockdep cannot
    detect RCU-based deadlocks.  This is not a problem for RCU flavors having
    atomic read-side critical sections because context-based annotations can
    catch these deadlocks, see for example the RCU_LOCKDEP_WARN() statement
    in synchronize_rcu().  But context-based annotations are not helpful
    for sleepable RCU, especially given that it is perfectly legal to do
    synchronize_srcu(&srcu1) within an srcu_read_lock(&srcu2).

    However, we can detect SRCU-based by: (1) Making srcu_read_lock() a
    'check'ed recursive read lock and (2) Making synchronize_srcu() a empty
    write lock critical section.  Even better, with the newly introduced
    lock_sync(), we can avoid false positives about irq-unsafe/safe.
    This commit therefore makes it so.

    Note that NMI-safe SRCU read side critical sections are currently not
    annotated, but might be annotated in the future.

    Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    [ boqun: Add comments for annotation per Waiman's suggestion ]
    [ boqun: Fix comment warning reported by Stephen Rothwell ]
    Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Boqun Feng <boqun.feng@gmail.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-09-22 13:21:46 -04:00
Waiman Long 793924bf53 srcu: Update comment after the index flip
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit dafc4d1603c27671adc2b41eb7e7827f8cc18961
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Wed, 21 Dec 2022 08:32:51 -0800

    srcu: Update comment after the index flip

    Because there is not guaranteed to be a full memory barrier between
    the ->srcu_unlock_count increment of an srcu_read_unlock() and the
    ->srcu_lock_count increment of the next srcu_read_lock(), this next
    srcu_read_lock() is not guaranteed to see the effect of the index flip
    just prior to this comment.  However, this next srcu_read_lock() will
    execute a full memory barrier, so the srcu_read_lock() after that is
    guaranteed to see that index flip.

    This guarantee is illustrated by the following diagram of events and
    the litmus test following that.

    ------------------------------------------------------------------------

    READER                  UPDATER
    -------------           ----------
                               // idx is initially 0.

                               srcu_flip() {
                                  smp_mb();
    // RSCS

    srcu_read_unlock() {
      smp_mb();
                                  idx++;    // P
                                  smp_mb(); // QQ
                               }

                               srcu_readers_unlock_idx(0) {
            ,--counted------------ count all unlock[0]; // Q
            |
      unlock[0]++;  // X

    }
                                   smp_mb();
    srcu_read_lock() {
      READ(idx) = 0;         ,---- count all lock[0]; // contributes imbalance of 1.
      lock[0]++;  ----counted              |
      smp_mb(); // PP          }           |
    }                                      |
                                           |
    // RSCS                             not going to effect above scan
                                           |
    srcu_read_unlock() {                   |
      smp_mb();                            |
      unlock[0]++;                         |
    }                                      |
                                          /
                                         /
    srcu_read_lock() {                  |
      READ(idx);  // Y  -----cannot be counted because of P (has to sample idx as 1)
      lock[1]++;
      ...
    }

    ------------------------------------------------------------------------

    This makes it similar to the store buffer pattern. Using X, Y, P and Q
    annotated above, we get:

    ------------------------------------------------------------------------

    READER                    UPDATER
    X (write)                 P (write)

    smp_mb(); //PP            smp_mb(); //QQ

    Y (read)                  Q (read)

    ------------------------------------------------------------------------

    ASCII art courtesy of Joel Fernandes.

    Reported-by: Joel Fernandes <joel@joelfernandes.org>
    Reported-by: Boqun Feng <boqun.feng@gmail.com>
    Reported-by: Frederic Weisbecker <frederic@kernel.org>
    Reported-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-09-22 13:21:32 -04:00
Waiman Long 2d5559c0c7 srcu: Yet more detail for srcu_readers_active_idx_check() comments
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit 0cd4b50b12d96d668b0627c149b19b5784ad4898
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Wed, 14 Dec 2022 10:50:30 -0800

    srcu: Yet more detail for srcu_readers_active_idx_check() comments

    The comment in srcu_readers_active_idx_check() following the smp_mb()
    is out of date, hailing from a simpler time when preemption was disabled
    across the bulk of __srcu_read_lock().  The fact that preemption was
    disabled meant that the number of tasks that had fetched the old index
    but not yet incremented counters was limited by the number of CPUs.

    In our more complex modern times, the number of CPUs is no longer a limit.
    This commit therefore updates this comment, additionally giving more
    memory-ordering detail.

    [ paulmck: Apply Nt->Nc feedback from Joel Fernandes. ]

    Reported-by: Boqun Feng <boqun.feng@gmail.com>
    Reported-by: Frederic Weisbecker <frederic@kernel.org>
    Reported-by: "Joel Fernandes (Google)" <joel@joelfernandes.org>
    Reported-by: Neeraj Upadhyay <neeraj.iitr10@gmail.com>
    Reported-by: Uladzislau Rezki <urezki@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-09-22 13:21:31 -04:00
Waiman Long a3d3ec08ee srcu: Remove needless rcu_seq_done() check while holding read lock
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit 1bafbfb3e1a18af7f404977ed0d218dc4f176f8e
Author: Pingfan Liu <kernelfans@gmail.com>
Date:   Wed, 23 Nov 2022 21:56:37 +0800

    srcu: Remove needless rcu_seq_done() check while holding read lock

    The srcu_gp_start_if_needed() function now read-holds the srcu_struct
    whose grace period is being started, which means that the corresponding
    SRCU grace period cannot end.  This in turn means that the SRCU
    grace-period sequence number returned by rcu_seq_snap() cannot expire
    during this time.  And that means that the calls to rcu_seq_done() in
    srcu_funnel_exp_start() and srcu_funnel_gp_start() can never return true.

    This commit therefore removes these rcu_seq_done() checks, but adds checks
    in kernels built with CONFIG_PROVE_RCU=y that splats if rcu_seq_done()
    does somehow return true.

    [ paulmck: Rearrange checks to handle kernels built with lockdep. ]

    Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
    Cc: Lai Jiangshan <jiangshanlai@gmail.com>
    Cc: Frederic Weisbecker <frederic@kernel.org>
    Cc: Josh Triplett <josh@joshtriplett.org>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    To: rcu@vger.kernel.org
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-09-22 13:21:31 -04:00
Waiman Long 133e4893e5 srcu: Fix the comparision in srcu_invl_snp_seq()
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit 50be0c0439fc1d8bda733ff26f6526e49970857a
Author: Pingfan Liu <kernelfans@gmail.com>
Date:   Wed, 16 Nov 2022 09:52:44 +0800

    srcu: Fix the comparision in srcu_invl_snp_seq()

    A grace-period sequence number contains two fields: counter and
    state.  SRCU_SNP_INIT_SEQ provides a guaranteed invalid value for
    grace-period sequence numbers in newly allocated srcu_node structures'
    ->srcu_have_cbs[] and ->srcu_gp_seq_needed_exp fields.  The point of the
    comparison in srcu_invl_snp_seq() is not to detect invalid grace-period
    sequence numbers in general, but rather to detect a newly allocated
    srcu_node structure whose ->srcu_have_cbs[] and ->srcu_gp_seq_needed_exp
    fields need to be brought into line with the srcu_struct structure's
    ->srcu_gp_seq field.

    This commit therefore causes srcu_invl_snp_seq() to compare both fields
    of the specified grace-period sequence number.

    Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
    Cc: Lai Jiangshan <jiangshanlai@gmail.com>
    Cc: "Paul E. McKenney" <paulmck@kernel.org>
    Cc: Josh Triplett <josh@joshtriplett.org>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Cc: <rcu@vger.kernel.org>
    Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-09-22 13:21:30 -04:00
Waiman Long 3cbc967ed1 srcu: Debug NMI safety even on archs that don't require it
JIRA: https://issues.redhat.com/browse/RHEL-5228
Conflicts: A merge conflict in the srcu_barrier() hunk of
	   kernel/rcu/srcutree.c due to the presence of a later upstream
	   commit 7f24626d6dd8 ("srcu: Delegate work to the boot cpu
	   if using SRCU_SIZE_SMALL").

commit e29a4915db1480f96e0bc2e928699d086a71f43c
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Thu, 13 Oct 2022 19:22:44 +0200

    srcu: Debug NMI safety even on archs that don't require it

    Currently the NMI safety debugging is only performed on architectures
    that don't support NMI-safe this_cpu_inc().

    Reorder the code so that other architectures like x86 also detect bad
    uses.

    [ paulmck: Apply kernel test robot, Stephen Rothwell, and Zqiang feedback. ]

    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-09-22 13:21:14 -04:00
Waiman Long 9511be47c8 srcu: Explain the reason behind the read side critical section on GP start
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit ae3c0706160b60ac5e7d36aac428ae6e572dc932
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Thu, 13 Oct 2022 19:22:43 +0200

    srcu: Explain the reason behind the read side critical section on GP start

    Tell about the need to protect against concurrent updaters who may
    overflow the GP counter behind the current update.

    Reported-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-09-22 13:21:14 -04:00
Waiman Long 8cfa692f14 srcu: Warn when NMI-unsafe API is used in NMI
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit 6b77bb9b99c66c6596c58e7a25169bc2ea6b82dd
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Thu, 13 Oct 2022 19:22:42 +0200

    srcu: Warn when NMI-unsafe API is used in NMI

    Using the NMI-unsafe reader API from within an NMI handler is very likely
    to be buggy for three reasons:

    1) NMIs aren't strictly re-entrant (a pending nested NMI will execute at
       the end of the current one) so it should be fine to use a non-atomic
       increment here. However, breakpoints can still interrupt NMIs and if
       a breakpoint callback has a reader on that same ssp, a racy increment
       can happen.

    2) If the only reader site for a given srcu_struct structure is in an
       NMI handler, then RCU should be used instead of SRCU.

    3) Because of the previous reason (2), an srcu_struct structure having
       an SRCU read side critical section in an NMI handler is likely to
       have another one from a task context.

    For all these reasons, warn if an NMI-unsafe reader API is used from an
    NMI handler.

    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-09-22 13:21:13 -04:00
Waiman Long c68d820dda srcu: Check for consistent global per-srcu_struct NMI safety
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit 36f65f1d1553e35cd9e6b281271f40d639a128c3
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Tue, 20 Sep 2022 14:54:41 -0700

    srcu: Check for consistent global per-srcu_struct NMI safety

    This commit adds runtime checks to verify that a given srcu_struct uses
    consistent NMI-safe (or not) read-side primitives globally, but based
    on the per-CPU data.  These global checks are made by the grace-period
    code that must scan the srcu_data structures anyway, and are done only
    in kernels built with CONFIG_PROVE_RCU=y.

    Link: https://lore.kernel.org/all/20220910221947.171557773@linutronix.de/

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: John Ogness <john.ogness@linutronix.de>
    Cc: Petr Mladek <pmladek@suse.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-09-22 13:21:11 -04:00
Waiman Long 8fb0ae0f42 srcu: Check for consistent per-CPU per-srcu_struct NMI safety
JIRA: https://issues.redhat.com/browse/RHEL-5228
Conflicts: A merge conflict in the srcu_barrier() hunk of
	   kernel/rcu/srcutree.c due to the presence of a later upstream
	   commit 7f24626d6dd8 ("srcu: Delegate work to the boot cpu
	   if using SRCU_SIZE_SMALL").

commit 27120e7d2c4d5c438b76f9c6330037a52ad0722e
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Mon, 19 Sep 2022 14:03:07 -0700

    srcu: Check for consistent per-CPU per-srcu_struct NMI safety

    This commit adds runtime checks to verify that a given srcu_struct uses
    consistent NMI-safe (or not) read-side primitives on a per-CPU basis.

    Link: https://lore.kernel.org/all/20220910221947.171557773@linutronix.de/

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: John Ogness <john.ogness@linutronix.de>
    Cc: Petr Mladek <pmladek@suse.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-09-22 13:21:11 -04:00
Waiman Long 2cd5fec95b srcu: Create an srcu_read_lock_nmisafe() and srcu_read_unlock_nmisafe()
JIRA: https://issues.redhat.com/browse/RHEL-5228
Conflicts: A merge conflict in the srcu_barrier() hunk of
	   kernel/rcu/srcutree.c due to the presence of a later upstream
	   commit 7f24626d6dd8 ("srcu: Delegate work to the boot cpu
	   if using SRCU_SIZE_SMALL").

commit 2e83b879fb91dafe995967b46a1d38a5b0889242
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Thu, 15 Sep 2022 14:29:07 -0700

    srcu: Create an srcu_read_lock_nmisafe() and srcu_read_unlock_nmisafe()

    On strict load-store architectures, the use of this_cpu_inc() by
    srcu_read_lock() and srcu_read_unlock() is not NMI-safe in TREE SRCU.
    To see this suppose that an NMI arrives in the middle of srcu_read_lock(),
    just after it has read ->srcu_lock_count, but before it has written
    the incremented value back to memory.  If that NMI handler also does
    srcu_read_lock() and srcu_read_lock() on that same srcu_struct structure,
    then upon return from that NMI handler, the interrupted srcu_read_lock()
    will overwrite the NMI handler's update to ->srcu_lock_count, but
    leave unchanged the NMI handler's update by srcu_read_unlock() to
    ->srcu_unlock_count.

    This can result in a too-short SRCU grace period, which can in turn
    result in arbitrary memory corruption.

    If the NMI handler instead interrupts the srcu_read_unlock(), this
    can result in eternal SRCU grace periods, which is not much better.

    This commit therefore creates a pair of new srcu_read_lock_nmisafe()
    and srcu_read_unlock_nmisafe() functions, which allow SRCU readers in
    both NMI handlers and in process and IRQ context.  It is bad practice
    to mix the existing and the new _nmisafe() primitives on the same
    srcu_struct structure.  Use one set or the other, not both.

    Just to underline that "bad practice" point, using srcu_read_lock() at
    process level and srcu_read_lock_nmisafe() in your NMI handler will not,
    repeat NOT, work.  If you do not immediately understand why this is the
    case, please review the earlier paragraphs in this commit log.

    [ paulmck: Apply kernel test robot feedback. ]
    [ paulmck: Apply feedback from Randy Dunlap. ]
    [ paulmck: Apply feedback from John Ogness. ]
    [ paulmck: Apply feedback from Frederic Weisbecker. ]

    Link: https://lore.kernel.org/all/20220910221947.171557773@linutronix.de/

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
    Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: John Ogness <john.ogness@linutronix.de>
    Cc: Petr Mladek <pmladek@suse.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-09-22 13:21:10 -04:00
Waiman Long 0b57e9b754 srcu: Convert ->srcu_lock_count and ->srcu_unlock_count to atomic
JIRA: https://issues.redhat.com/browse/RHEL-5228

commit 5d0f5953b60f5f7a278085b55ddc73e2932f4c33
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Thu, 15 Sep 2022 12:09:30 -0700

    srcu: Convert ->srcu_lock_count and ->srcu_unlock_count to atomic

    NMI-safe variants of srcu_read_lock() and srcu_read_unlock() are needed
    by printk(), which on many architectures entails read-modify-write
    atomic operations.  This commit prepares Tree SRCU for this change by
    making both ->srcu_lock_count and ->srcu_unlock_count by atomic_long_t.

    [ paulmck: Apply feedback from John Ogness. ]

    Link: https://lore.kernel.org/all/20220910221947.171557773@linutronix.de/

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: John Ogness <john.ogness@linutronix.de>
    Cc: Petr Mladek <pmladek@suse.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-09-22 13:21:08 -04:00
Pingfan Liu 1e3afd647b srcu: Delegate work to the boot cpu if using SRCU_SIZE_SMALL
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2129726
Upstream: Linux tree
Conflicts: Minor context diffs but not affect this simple substitution

commit 7f24626d6dd844bfc6d1f492d214d29c86d02550
Author: Pingfan Liu <kernelfans@gmail.com>
Date:   Mon Oct 31 09:52:37 2022 +0800

    srcu: Delegate work to the boot cpu if using SRCU_SIZE_SMALL

    Commit 994f706872e6 ("srcu: Make Tree SRCU able to operate without
    snp_node array") assumes that cpu 0 is always online.  However, there
    really are situations when some other CPU is the boot CPU, for example,
    when booting a kdump kernel with the maxcpus=1 boot parameter.

    On PowerPC, the kdump kernel can hang as follows:
    ...
    [    1.740036] systemd[1]: Hostname set to <xyz.com>
    [  243.686240] INFO: task systemd:1 blocked for more than 122 seconds.
    [  243.686264]       Not tainted 6.1.0-rc1 #1
    [  243.686272] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [  243.686281] task:systemd         state:D stack:0     pid:1     ppid:0      flags:0x00042000
    [  243.686296] Call Trace:
    [  243.686301] [c000000016657640] [c000000016657670] 0xc000000016657670 (unreliable)
    [  243.686317] [c000000016657830] [c00000001001dec0] __switch_to+0x130/0x220
    [  243.686333] [c000000016657890] [c000000010f607b8] __schedule+0x1f8/0x580
    [  243.686347] [c000000016657940] [c000000010f60bb4] schedule+0x74/0x140
    [  243.686361] [c0000000166579b0] [c000000010f699b8] schedule_timeout+0x168/0x1c0
    [  243.686374] [c000000016657a80] [c000000010f61de8] __wait_for_common+0x148/0x360
    [  243.686387] [c000000016657b20] [c000000010176bb0] __flush_work.isra.0+0x1c0/0x3d0
    [  243.686401] [c000000016657bb0] [c0000000105f2768] fsnotify_wait_marks_destroyed+0x28/0x40
    [  243.686415] [c000000016657bd0] [c0000000105f21b8] fsnotify_destroy_group+0x68/0x160
    [  243.686428] [c000000016657c40] [c0000000105f6500] inotify_release+0x30/0xa0
    [  243.686440] [c000000016657cb0] [c0000000105751a8] __fput+0xc8/0x350
    [  243.686452] [c000000016657d00] [c00000001017d524] task_work_run+0xe4/0x170
    [  243.686464] [c000000016657d50] [c000000010020e94] do_notify_resume+0x134/0x140
    [  243.686478] [c000000016657d80] [c00000001002eb18] interrupt_exit_user_prepare_main+0x198/0x270
    [  243.686493] [c000000016657de0] [c00000001002ec60] syscall_exit_prepare+0x70/0x180
    [  243.686505] [c000000016657e10] [c00000001000bf7c] system_call_vectored_common+0xfc/0x280
    [  243.686520] --- interrupt: 3000 at 0x7fffa47d5ba4
    [  243.686528] NIP:  00007fffa47d5ba4 LR: 0000000000000000 CTR: 0000000000000000
    [  243.686538] REGS: c000000016657e80 TRAP: 3000   Not tainted  (6.1.0-rc1)
    [  243.686548] MSR:  800000000000d033 <SF,EE,PR,ME,IR,DR,RI,LE>  CR: 42044440  XER: 00000000
    [  243.686572] IRQMASK: 0
    [  243.686572] GPR00: 0000000000000006 00007ffffa606710 00007fffa48e7200 0000000000000000
    [  243.686572] GPR04: 0000000000000002 000000000000000a 0000000000000000 0000000000000001
    [  243.686572] GPR08: 000001000c172dd0 0000000000000000 0000000000000000 0000000000000000
    [  243.686572] GPR12: 0000000000000000 00007fffa4ff4bc0 0000000000000000 0000000000000000
    [  243.686572] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
    [  243.686572] GPR20: 0000000132dfdc50 000000000000000e 0000000000189375 0000000000000000
    [  243.686572] GPR24: 00007ffffa606ae0 0000000000000005 000001000c185490 000001000c172570
    [  243.686572] GPR28: 000001000c172990 000001000c184850 000001000c172e00 00007fffa4fedd98
    [  243.686683] NIP [00007fffa47d5ba4] 0x7fffa47d5ba4
    [  243.686691] LR [0000000000000000] 0x0
    [  243.686698] --- interrupt: 3000
    [  243.686708] INFO: task kworker/u16:1:24 blocked for more than 122 seconds.
    [  243.686717]       Not tainted 6.1.0-rc1 #1
    [  243.686724] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [  243.686733] task:kworker/u16:1   state:D stack:0     pid:24    ppid:2      flags:0x00000800
    [  243.686747] Workqueue: events_unbound fsnotify_mark_destroy_workfn
    [  243.686758] Call Trace:
    [  243.686762] [c0000000166736e0] [c00000004fd91000] 0xc00000004fd91000 (unreliable)
    [  243.686775] [c0000000166738d0] [c00000001001dec0] __switch_to+0x130/0x220
    [  243.686788] [c000000016673930] [c000000010f607b8] __schedule+0x1f8/0x580
    [  243.686801] [c0000000166739e0] [c000000010f60bb4] schedule+0x74/0x140
    [  243.686814] [c000000016673a50] [c000000010f699b8] schedule_timeout+0x168/0x1c0
    [  243.686827] [c000000016673b20] [c000000010f61de8] __wait_for_common+0x148/0x360
    [  243.686840] [c000000016673bc0] [c000000010210840] __synchronize_srcu.part.0+0xa0/0xe0
    [  243.686855] [c000000016673c30] [c0000000105f2c64] fsnotify_mark_destroy_workfn+0xc4/0x1a0
    [  243.686868] [c000000016673ca0] [c000000010174ea8] process_one_work+0x2a8/0x570
    [  243.686882] [c000000016673d40] [c000000010175208] worker_thread+0x98/0x5e0
    [  243.686895] [c000000016673dc0] [c0000000101828d4] kthread+0x124/0x130
    [  243.686908] [c000000016673e10] [c00000001000cd40] ret_from_kernel_thread+0x5c/0x64
    [  366.566274] INFO: task systemd:1 blocked for more than 245 seconds.
    [  366.566298]       Not tainted 6.1.0-rc1 #1
    [  366.566305] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [  366.566314] task:systemd         state:D stack:0     pid:1     ppid:0      flags:0x00042000
    [  366.566329] Call Trace:
    ...

    The above splat occurs because PowerPC really does use maxcpus=1
    instead of nr_cpus=1 in the kernel command line.  Consequently, the
    (quite possibly non-zero) kdump CPU is the only online CPU in the kdump
    kernel.  SRCU unconditionally queues a sdp->work on cpu 0, for which no
    worker thread has been created, so sdp->work will be never executed and
    __synchronize_srcu() will never be completed.

    This commit therefore replaces CPU ID 0 with get_boot_cpu_id() in key
    places in Tree SRCU.  Since the CPU indicated by get_boot_cpu_id()
    is guaranteed to be online, this avoids the above splat.

    Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
    Cc: "Paul E. McKenney" <paulmck@kernel.org>
    Cc: Lai Jiangshan <jiangshanlai@gmail.com>
    Cc: Josh Triplett <josh@joshtriplett.org>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    To: rcu@vger.kernel.org
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Pingfan Liu <piliu@redhat.com>
2023-03-13 08:36:27 +08:00
Waiman Long 929a1c06d5 srcu: Make expedited RCU grace periods block even less frequently
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491

commit 4f2bfd9494a072d58203600de6bedd72680e612a
Author: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Date:   Fri, 1 Jul 2022 08:45:45 +0530

    srcu: Make expedited RCU grace periods block even less frequently

    The purpose of commit 282d8998e997 ("srcu: Prevent expedited GPs
    and blocking readers from consuming CPU") was to prevent a long
    series of never-blocking expedited SRCU grace periods from blocking
    kernel-live-patching (KLP) progress.  Although it was successful, it also
    resulted in excessive boot times on certain embedded workloads running
    under qemu with the "-bios QEMU_EFI.fd" command line.  Here "excessive"
    means increasing the boot time up into the three-to-four minute range.
    This increase in boot time was due to the more than 6000 back-to-back
    invocations of synchronize_rcu_expedited() within the KVM host OS, which
    in turn resulted from qemu's emulation of a long series of MMIO accesses.

    Commit 640a7d37c3f4 ("srcu: Block less aggressively for expedited grace
    periods") did not significantly help this particular use case.

    Zhangfei Gao and Shameerali Kolothum Thodi did experiments varying the
    value of SRCU_MAX_NODELAY_PHASE with HZ=250 and with various values
    of non-sleeping per phase counts on a system with preemption enabled,
    and observed the following boot times:

    +──────────────────────────+────────────────+
    | SRCU_MAX_NODELAY_PHASE   | Boot time (s)  |
    +──────────────────────────+────────────────+
    | 100                      | 30.053         |
    | 150                      | 25.151         |
    | 200                      | 20.704         |
    | 250                      | 15.748         |
    | 500                      | 11.401         |
    | 1000                     | 11.443         |
    | 10000                    | 11.258         |
    | 1000000                  | 11.154         |
    +──────────────────────────+────────────────+

    Analysis on the experiment results show additional improvements with
    CPU-bound delays approaching one jiffy in duration. This improvement was
    also seen when number of per-phase iterations were scaled to one jiffy.

    This commit therefore scales per-grace-period phase number of non-sleeping
    polls so that non-sleeping polls extend for about one jiffy. In addition,
    the delay-calculation call to srcu_get_delay() in srcu_gp_end() is
    replaced with a simple check for an expedited grace period.  This change
    schedules callback invocation immediately after expedited grace periods
    complete, which results in greatly improved boot times.  Testing done
    by Marc and Zhangfei confirms that this change recovers most of the
    performance degradation in boottime; for CONFIG_HZ_250 configuration,
    specifically, boot times improve from 3m50s to 41s on Marc's setup;
    and from 2m40s to ~9.7s on Zhangfei's setup.

    In addition to the changes to default per phase delays, this
    change adds 3 new kernel parameters - srcutree.srcu_max_nodelay,
    srcutree.srcu_max_nodelay_phase, and srcutree.srcu_retry_check_delay.
    This allows users to configure the srcu grace period scanning delays in
    order to more quickly react to additional use cases.

    Fixes: 640a7d37c3f4 ("srcu: Block less aggressively for expedited grace periods")
    Fixes: 282d8998e997 ("srcu: Prevent expedited GPs and blocking readers from consuming CPU")
    Reported-by: Zhangfei Gao <zhangfei.gao@linaro.org>
    Reported-by: yueluck <yueluck@163.com>
    Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Tested-by: Marc Zyngier <maz@kernel.org>
    Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
    Link: https://lore.kernel.org/all/20615615-0013-5adc-584f-2b1d5c03ebfc@linaro.org/
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-08-30 17:38:29 -04:00
Waiman Long 9456a65490 srcu: Block less aggressively for expedited grace periods
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491

commit 8f870e6eb8c0c3f9869bf3fcf9db39f86cfcea49
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Sun, 12 Jun 2022 15:00:06 -0700

    srcu: Block less aggressively for expedited grace periods

    Commit 282d8998e997 ("srcu: Prevent expedited GPs and blocking readers
    from consuming CPU") fixed a problem where a long-running expedited SRCU
    grace period could block kernel live patching.  It did so by giving up
    on expediting once a given SRCU expedited grace period grew too old.

    Unfortunately, this added excessive delays to boots of virtual embedded
    systems specifying "-bios QEMU_EFI.fd" to qemu.  This commit therefore
    makes the transition away from expediting less aggressive, increasing
    the per-grace-period phase number of non-sleeping polls of readers from
    one to three and increasing the required grace-period age from one jiffy
    (actually from zero to one jiffies) to two jiffies (actually from one
    to two jiffies).

    Fixes: 282d8998e997 ("srcu: Prevent expedited GPs and blocking readers from consuming CPU")
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reported-by: Zhangfei Gao <zhangfei.gao@linaro.org>
    Reported-by: chenxiang (M)" <chenxiang66@hisilicon.com>
    Cc: Shameerali Kolothum Thodi  <shameerali.kolothum.thodi@huawei.com>
    Cc: Paolo Bonzini <pbonzini@redhat.com>
    Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Link: https://lore.kernel.org/all/20615615-0013-5adc-584f-2b1d5c03ebfc@linaro.org/

Signed-off-by: Waiman Long <longman@redhat.com>
2022-08-30 17:38:28 -04:00
Waiman Long 95af8dfe11 srcu: Drop needless initialization of sdp in srcu_gp_start()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491

commit 586e31d59c436cda65a2e8ac04ff954bed247023
Author: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Date:   Tue, 15 Mar 2022 09:55:49 +0100

    srcu: Drop needless initialization of sdp in srcu_gp_start()

    Commit 9c7ef4c30f12 ("srcu: Make Tree SRCU able to operate without
    snp_node array") initializes the local variable sdp differently depending
    on the srcu's state in srcu_gp_start().  Either way, this initialization
    overwrites the value used when sdp is defined.

    This commit therefore drops this pointless definition-time initialization.
    Although there is no functional change, compiler code generation may
    be affected.

    Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-08-30 17:22:23 -04:00
Waiman Long cc172221e3 srcu: Prevent expedited GPs and blocking readers from consuming CPU
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491

commit 282d8998e9979c2186af7f7d22366f2fc3149838
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Tue, 8 Mar 2022 15:45:33 -0800

    srcu: Prevent expedited GPs and blocking readers from consuming CPU

    If an SRCU reader blocks while a synchronize_srcu_expedited() waits for
    that same reader, then that grace period will spawn an endless series of
    workqueue handlers, consuming a full CPU.  This quickly gets pointless
    because consuming more CPU isn't going to make that reader get done
    faster, especially if it is blocked waiting for an external event.

    This commit therefore spawns at most one pair of back-to-back workqueue
    handlers per expedited grace period phase, instead inserting increasing
    delays as that grace period phase grows older, but capped at 10 jiffies.
    In any case, if there have been at least 100 back-to-back workqueue
    handlers within a single jiffy, regardless of grace period or grace-period
    phase, then a one-jiffy delay is inserted.

    [ paulmck:  Apply feedback from kernel test robot. ]

    Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Reported-by: Song Liu <song@kernel.org>
    Tested-by: kernel test robot <oliver.sang@intel.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-08-30 17:22:23 -04:00
Waiman Long babac7a730 srcu: Add contention check to call_srcu() srcu_data ->lock acquisition
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491

commit c2445d38785086422e56dcbe049b73a53b2ba81f
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Mon, 31 Jan 2022 13:27:15 -0800

    srcu: Add contention check to call_srcu() srcu_data ->lock acquisition

    This commit increases the sensitivity of contention detection by adding
    checks to the acquisition of the srcu_data structure's lock on the
    call_srcu() code path.

    Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-08-30 17:22:22 -04:00
Waiman Long 3538999436 srcu: Automatically determine size-transition strategy at boot
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491

commit a57ffb3c6b67e59e8632f731414b792eacc6cca0
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Mon, 31 Jan 2022 11:21:30 -0800

    srcu: Automatically determine size-transition strategy at boot

    This commit adds a srcutree.convert_to_big option of zero that causes
    SRCU to decide at boot whether to wait for contention (small systems) or
    immediately expand to large (large systems).  A new srcutree.big_cpu_lim
    (defaulting to 128) defines how many CPUs constitute a large system.

    Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-08-30 17:22:22 -04:00
Waiman Long 6e9dff95d2 srcu: Add contention-triggered addition of srcu_node tree
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491

commit 9f2e91d94c91558e3764fe4e01c5e6281a90f239
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Thu, 27 Jan 2022 20:32:05 -0800

    srcu: Add contention-triggered addition of srcu_node tree

    This commit instruments the acquisitions of the srcu_struct structure's
    ->lock, enabling the initiation of a transition from SRCU_SIZE_SMALL
    to SRCU_SIZE_BIG when sufficient contention is experienced.  The
    instrumentation counts the number of trylock failures within the confines
    of a single jiffy.  If that number exceeds the value specified by the
    srcutree.small_contention_lim kernel boot parameter (which defaults to
    100), and if the value specified by the srcutree.convert_to_big kernel
    boot parameter has the 0x10 bit set (defaults to 0), then a transition
    will be automatically initiated.

    By default, there will never be any transitions, so that none of the
    srcu_struct structures ever gains an srcu_node array.

    The useful values for srcutree.convert_to_big are:

    0x00:  Never convert.
    0x01:  Always convert at init_srcu_struct() time.
    0x02:  Convert when rcutorture prints its first round of statistics.
    0x03:  Decide conversion approach at boot given system size.
    0x10:  Convert if contention is encountered.
    0x12:  Convert if contention is encountered or when rcutorture prints
            its first round of statistics, whichever comes first.

    The value 0x11 acts the same as 0x01 because the conversion happens
    before there is any chance of contention.

    [ paulmck: Apply "static" feedback from kernel test robot. ]

    Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-08-30 17:21:59 -04:00
Waiman Long 0d2a81fe27 srcu: Create concurrency-safe helper for initiating size transition
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491

commit 99659f64b14e55cfa48980f5396f83820bafd028
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Thu, 27 Jan 2022 14:56:39 -0800

    srcu: Create concurrency-safe helper for initiating size transition

    Once there are contention-initiated size transitions, it will be
    possible for rcutorture to initiate a transition at the same time
    as a contention-initiated transition.  This commit therefore creates
    a concurrency-safe helper function named srcu_transition_to_big() to
    safely initiate size transitions.

    Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-08-30 17:21:59 -04:00