Commit Graph

169 Commits

Author SHA1 Message Date
Leonardo Bras 18b31478bc trace,smp: Add tracepoints for scheduling remotelly called functions
JIRA: https://issues.redhat.com/browse/RHEL-13876
Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit bf5a8c26ad7caf0772a1cd48c8a0924e48bdbaf0
Author: Leonardo Bras <leobras@redhat.com>
Date:   2023-06-15 03:59:47 -0300

    trace,smp: Add tracepoints for scheduling remotelly called functions

    Add a tracepoint for when a CSD is queued to a remote CPU's
    call_single_queue. This allows finding exactly which CPU queued a given CSD
    when looking at a csd_function_{entry,exit} event, and also enables us to
    accurately measure IPI delivery time with e.g. a synthetic event:

      $ echo 'hist:keys=cpu,csd.hex:ts=common_timestamp.usecs' >\
          /sys/kernel/tracing/events/smp/csd_queue_cpu/trigger
      $ echo 'csd_latency unsigned int dst_cpu; unsigned long csd; u64 time' >\
          /sys/kernel/tracing/synthetic_events
      $ echo \
      'hist:keys=common_cpu,csd.hex:'\
      'time=common_timestamp.usecs-$ts:'\
      'onmatch(smp.csd_queue_cpu).trace(csd_latency,common_cpu,csd,$time)' >\
          /sys/kernel/tracing/events/smp/csd_function_entry/trigger

      $ trace-cmd record -e 'synthetic:csd_latency' hackbench
      $ trace-cmd report
      <...>-467   [001]    21.824263: csd_queue_cpu:        cpu=0 callsite=try_to_wake_up+0x2ea func=sched_ttwu_pending csd=0xffff8880076148b8
      <...>-467   [001]    21.824280: ipi_send_cpu:         cpu=0 callsite=try_to_wake_up+0x2ea callback=generic_smp_call_function_single_interrupt+0x0
      <...>-489   [000]    21.824299: csd_function_entry:   func=sched_ttwu_pending csd=0xffff8880076148b8
      <...>-489   [000]    21.824320: csd_latency:          dst_cpu=0, csd=18446612682193848504, time=36

    Suggested-by: Valentin Schneider <vschneid@redhat.com>
    Signed-off-by: Leonardo Bras <leobras@redhat.com>
    Tested-and-reviewed-by: Valentin Schneider <vschneid@redhat.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lore.kernel.org/r/20230615065944.188876-7-leobras@redhat.com

Signed-off-by: Leonardo Bras <leobras@redhat.com>
2024-06-17 12:58:33 -03:00
Leonardo Bras e8c63e9673 trace,smp: Add tracepoints around remotelly called functions
JIRA: https://issues.redhat.com/browse/RHEL-13876
Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit 949fa3f11ced2a5c8e3737e73b09676adf4b322b
Author: Leonardo Bras <leobras@redhat.com>
Date:   2023-06-15 03:59:45 -0300

    trace,smp: Add tracepoints around remotelly called functions

    The recently added ipi_send_{cpu,cpumask} tracepoints allow finding sources
    of IPIs targeting CPUs running latency-sensitive applications.

    For NOHZ_FULL CPUs, all IPIs are interference, and those tracepoints are
    sufficient to find them and work on getting rid of them. In some setups
    however, not *all* IPIs are to be suppressed, but long-running IPI
    callbacks can still be problematic.

    Add a pair of tracepoints to mark the start and end of processing a CSD IPI
    callback, similar to what exists for softirq, workqueue or timer callbacks.

    Signed-off-by: Leonardo Bras <leobras@redhat.com>
    Tested-and-reviewed-by: Valentin Schneider <vschneid@redhat.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lore.kernel.org/r/20230615065944.188876-5-leobras@redhat.com

Signed-off-by: Leonardo Bras <leobras@redhat.com>
2024-06-17 12:58:32 -03:00
Leonardo Bras 7482210cbc trace,smp: Trace all smp_function_call*() invocations
JIRA: https://issues.redhat.com/browse/RHEL-13876
Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit 5c3124975e15c1fadd5af1c61e4d627cf6d97ba2
Author: Peter Zijlstra <peterz@infradead.org>
Date:   2023-03-22 14:58:36 +0100

    trace,smp: Trace all smp_function_call*() invocations

    (Ab)use the trace_ipi_send_cpu*() family to trace all
    smp_function_call*() invocations, not only those that result in an
    actual IPI.

    The queued entries log their callback function while the actual IPIs
    are traced on generic_smp_call_function_single_interrupt().

    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Signed-off-by: Leonardo Bras <leobras@redhat.com>
2024-06-17 12:58:17 -03:00
Leonardo Bras bd63f8635f locking/csd_lock: Remove added data from CSD lock debugging
JIRA: https://issues.redhat.com/browse/RHEL-13876
Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

conflicts: Fixes (some) conflicts introduced by downstream commit
aa5786b04d ("sched, smp: Trace smp callback causing an IPI")
by applying the original dependency commit, and making it easier to
cherry-pick the next upstream commits due to not having conflicts.

commit 1771257cb447a7b27a15ed9aaf332726c47fcbcf
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   2023-03-20 17:55:14 -0700

    locking/csd_lock: Remove added data from CSD lock debugging

    The diagnostics added by this commit were extremely useful in one instance:

    a5aabace5f ("locking/csd_lock: Add more data to CSD lock debugging")

    However, they have not seen much action since, and there have been some
    concerns expressed that the complexity is not worth the benefit.

    Therefore, manually revert this commit, but leave a comment telling
    people where to find these diagnostics.

    [ paulmck: Apply Juergen Gross feedback. ]

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Acked-by: Juergen Gross <jgross@suse.com>
    Link: https://lore.kernel.org/r/20230321005516.50558-2-paulmck@kernel.org

Signed-off-by: Leonardo Bras <leobras@redhat.com>
2024-06-17 12:58:15 -03:00
Leonardo Bras 6e00a94924 trace,smp: Add tracepoints for scheduling remotelly called functions
JIRA: https://issues.redhat.com/browse/RHEL-13876
Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit c52198601695851622f361d3f16456e9fc857629
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   2023-03-20 17:55:13 -0700

    locking/csd_lock: Add Kconfig option for csd_debug default

    The csd_debug kernel parameter works well, but is inconvenient in cases
    where it is more closely associated with boot loaders or automation than
    with a particular kernel version or release.  Thererfore, provide a new
    CSD_LOCK_WAIT_DEBUG_DEFAULT Kconfig option that defaults csd_debug to
    1 when selected and 0 otherwise, with this latter being the default.

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Acked-by: Juergen Gross <jgross@suse.com>
    Link: https://lore.kernel.org/r/20230321005516.50558-1-paulmck@kernel.org

Signed-off-by: Leonardo Bras <leobras@redhat.com>
2024-06-17 12:58:14 -03:00
Prarit Bhargava bc05ba3f2b smp: don't declare nr_cpu_ids if NR_CPUS == 1
JIRA: https://issues.redhat.com/browse/RHEL-25415

commit 53fc190cc6771c5494d782210334d4ebb50c7103
Author: Yury Norov <yury.norov@gmail.com>
Date:   Mon Sep 5 16:08:16 2022 -0700

    smp: don't declare nr_cpu_ids if NR_CPUS == 1

    SMP and NR_CPUS are independent options, hence nr_cpu_ids may be
    declared even if NR_CPUS == 1, which is useless.

    Signed-off-by: Yury Norov <yury.norov@gmail.com>

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
2024-03-20 09:42:41 -04:00
Prarit Bhargava 4606616e7d smp: add set_nr_cpu_ids()
JIRA: https://issues.redhat.com/browse/RHEL-25415

Conflicts: Not worried about unsupported arches.

commit 38bef8e57f2149acd2c910a98f57dd6291d2e0ec
Author: Yury Norov <yury.norov@gmail.com>
Date:   Mon Sep 5 16:08:17 2022 -0700

    smp: add set_nr_cpu_ids()

    In preparation to support compile-time nr_cpu_ids, add a setter for
    the variable.

    This is a no-op for all arches.

    Signed-off-by: Yury Norov <yury.norov@gmail.com>

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
2024-03-20 09:42:40 -04:00
David Arcari d572cf6194 cpu/hotplug: Mark arch_disable_smp_support() and bringup_nonboot_cpus() __init
JIRA: https://issues.redhat.com/browse/RHEL-15512

commit ba831b7b1a517ba7f25d6fa9736a8092d07b0c74
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Fri May 12 23:07:00 2023 +0200

    cpu/hotplug: Mark arch_disable_smp_support() and bringup_nonboot_cpus() __init

    No point in keeping them around.

    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Tested-by: Michael Kelley <mikelley@microsoft.com>
    Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
    Tested-by: Helge Deller <deller@gmx.de> # parisc
    Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
    Link: https://lore.kernel.org/r/20230512205255.551974164@linutronix.de

Signed-off-by: David Arcari <darcari@redhat.com>
2023-12-05 11:56:51 -05:00
Jerome Marchand 04a26afde2 trace: Add trace_ipi_send_cpu()
Bugzilla: https://bugzilla.redhat.com/2192613

Conflicts: context change due to missing commit ed29b0b4fd83
("io_uring: move to separate directory")

commit 68e2d17c9eb311ab59aeb6d0c38aad8985fa2596
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Wed Mar 22 11:28:36 2023 +0100

    trace: Add trace_ipi_send_cpu()

    Because copying cpumasks around when targeting a single CPU is a bit
    daft...

    Tested-and-reviewed-by: Valentin Schneider <vschneid@redhat.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/20230322103004.GA571242%40hirez.programming.kicks-ass.net

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-09-14 15:36:30 +02:00
Jerome Marchand aa5786b04d sched, smp: Trace smp callback causing an IPI
Bugzilla: https://bugzilla.redhat.com/2192613

Conflicts: Need to modify __smp_call_single_queue_debug() too. It was
removed upstream by commit 1771257cb447 ("locking/csd_lock: Remove
added data from CSD lock debugging")

commit 68f4ff04dbada18dad79659c266a8e5e29e458cd
Author: Valentin Schneider <vschneid@redhat.com>
Date:   Tue Mar 7 14:35:58 2023 +0000

    sched, smp: Trace smp callback causing an IPI

    Context
    =======

    The newly-introduced ipi_send_cpumask tracepoint has a "callback" parameter
    which so far has only been fed with NULL.

    While CSD_TYPE_SYNC/ASYNC and CSD_TYPE_IRQ_WORK share a similar backing
    struct layout (meaning their callback func can be accessed without caring
    about the actual CSD type), CSD_TYPE_TTWU doesn't even have a function
    attached to its struct. This means we need to check the type of a CSD
    before eventually dereferencing its associated callback.

    This isn't as trivial as it sounds: the CSD type is stored in
    __call_single_node.u_flags, which get cleared right before the callback is
    executed via csd_unlock(). This implies checking the CSD type before it is
    enqueued on the call_single_queue, as the target CPU's queue can be flushed
    before we get to sending an IPI.

    Furthermore, send_call_function_single_ipi() only has a CPU parameter, and
    would need to have an additional argument to trickle down the invoked
    function. This is somewhat silly, as the extra argument will always be
    pushed down to the function even when nothing is being traced, which is
    unnecessary overhead.

    Changes
    =======

    send_call_function_single_ipi() is only used by smp.c, and is defined in
    sched/core.c as it contains scheduler-specific ops (set_nr_if_polling() of
    a CPU's idle task).

    Split it into two parts: the scheduler bits remain in sched/core.c, and the
    actual IPI emission is moved into smp.c. This lets us define an
    __always_inline helper function that can take the related callback as
    parameter without creating useless register pressure in the non-traced path
    which only gains a (disabled) static branch.

    Do the same thing for the multi IPI case.

    Signed-off-by: Valentin Schneider <vschneid@redhat.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lore.kernel.org/r/20230307143558.294354-8-vschneid@redhat.com

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-09-14 15:36:30 +02:00
Jerome Marchand ba855c7efc smp: reword smp call IPI comment
Bugzilla: https://bugzilla.redhat.com/2192613

Conflicts: context change from missing commit 1771257cb447
("locking/csd_lock: Remove added data from CSD lock debugging")

commit 253a0fb4c62827cdcaf43afcea5d675507eaf7a3
Author: Valentin Schneider <vschneid@redhat.com>
Date:   Tue Mar 7 14:35:57 2023 +0000

    smp: reword smp call IPI comment

    Accessing the call_single_queue hasn't involved a spinlock since 2014:

      6897fc22ea ("kernel: use lockless list for smp_call_function_single")

    The llist operations (namely cmpxchg() and xchg()) provide similar ordering
    guarantees, update the comment to lessen confusion.

    Signed-off-by: Valentin Schneider <vschneid@redhat.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lore.kernel.org/r/20230307143558.294354-7-vschneid@redhat.com

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-09-14 15:36:30 +02:00
Jerome Marchand 29a8dded56 smp: Trace IPIs sent via arch_send_call_function_ipi_mask()
Bugzilla: https://bugzilla.redhat.com/2192613

Conflicts: context change from missing commit 1771257cb447
("locking/csd_lock: Remove added data from CSD lock debugging")

commit 08407b5f61c1bbd4ebb26a76474df4354fd76fb7
Author: Valentin Schneider <vschneid@redhat.com>
Date:   Tue Mar 7 14:35:54 2023 +0000

    smp: Trace IPIs sent via arch_send_call_function_ipi_mask()

    This simply wraps around the arch function and prepends it with a
    tracepoint, similar to send_call_function_single_ipi().

    Signed-off-by: Valentin Schneider <vschneid@redhat.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Link: https://lore.kernel.org/r/20230307143558.294354-4-vschneid@redhat.com

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-09-14 15:36:30 +02:00
Jerome Marchand 160dc2ad5b sched, smp: Trace IPIs sent via send_call_function_single_ipi()
Bugzilla: https://bugzilla.redhat.com/2192613

Conflicts: context change due to missing commit ed29b0b4fd83
("io_uring: move to separate directory")

commit cc9cb0a71725aa8dd8d8f534a9b562bbf7981f75
Author: Valentin Schneider <vschneid@redhat.com>
Date:   Tue Mar 7 14:35:53 2023 +0000

    sched, smp: Trace IPIs sent via send_call_function_single_ipi()

    send_call_function_single_ipi() is the thing that sends IPIs at the bottom
    of smp_call_function*() via either generic_exec_single() or
    smp_call_function_many_cond(). Give it an IPI-related tracepoint.

    Note that this ends up tracing any IPI sent via __smp_call_single_queue(),
    which covers __ttwu_queue_wakelist() and irq_work_queue_on() "for free".

    Signed-off-by: Valentin Schneider <vschneid@redhat.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Acked-by: Ingo Molnar <mingo@kernel.org>
    Link: https://lore.kernel.org/r/20230307143558.294354-3-vschneid@redhat.com

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-09-14 15:36:30 +02:00
Waiman Long c6babad818 sched/debug: Try trigger_single_cpu_backtrace(cpu) in dump_cpu_task()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit e73dfe30930b75c98746152e7a2f6a8ab6067b51
Author: Zhen Lei <thunder.leizhen@huawei.com>
Date:   Thu, 4 Aug 2022 10:34:19 +0800

    sched/debug: Try trigger_single_cpu_backtrace(cpu) in dump_cpu_task()

    The trigger_all_cpu_backtrace() function attempts to send an NMI to the
    target CPU, which usually provides much better stack traces than the
    dump_cpu_task() function's approach of dumping that stack from some other
    CPU.  So much so that most calls to dump_cpu_task() only happen after
    a call to trigger_all_cpu_backtrace() has failed.  And the exception to
    this rule really should attempt to use trigger_all_cpu_backtrace() first.

    Therefore, move the trigger_all_cpu_backtrace() invocation into
    dump_cpu_task().

    Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Juri Lelli <juri.lelli@redhat.com>
    Cc: Vincent Guittot <vincent.guittot@linaro.org>
    Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
    Cc: Ben Segall <bsegall@google.com>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
    Cc: Valentin Schneider <vschneid@redhat.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:47:57 -04:00
Waiman Long 53355fad00 locking/csd_lock: Change csdlock_debug from early_param to __setup
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516

commit 9c9b26b0df270d4f9246e483a44686fca951a29c
Author: Chen Zhongjin <chenzhongjin@huawei.com>
Date:   Tue, 10 May 2022 17:46:39 +0800

    locking/csd_lock: Change csdlock_debug from early_param to __setup

    The csdlock_debug kernel-boot parameter is parsed by the
    early_param() function csdlock_debug().  If set, csdlock_debug()
    invokes static_branch_enable() to enable csd_lock_wait feature, which
    triggers a panic on arm64 for kernels built with CONFIG_SPARSEMEM=y and
    CONFIG_SPARSEMEM_VMEMMAP=n.

    With CONFIG_SPARSEMEM_VMEMMAP=n, __nr_to_section is called in
    static_key_enable() and returns NULL, resulting in a NULL dereference
    because mem_section is initialized only later in sparse_init().

    This is also a problem for powerpc because early_param() functions
    are invoked earlier than jump_label_init(), also resulting in
    static_key_enable() failures.  These failures cause the warning "static
    key 'xxx' used before call to jump_label_init()".

    Thus, early_param is too early for csd_lock_wait to run
    static_branch_enable(), so changes it to __setup to fix these.

    Fixes: 8d0968cc6b ("locking/csd_lock: Add boot parameter for controlling CSD lock debugging")
    Cc: stable@vger.kernel.org
    Reported-by: Chen jingwen <chenjingwen6@huawei.com>
    Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:20 -04:00
Frantisek Hrbata 37715a7ab5 Merge: Backport scheduler related v5.19 and earlier commits for kernel-rt
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1319

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2120671
Tested: By me with scheduler stress tests.

Series of prerequisites for the RT patch set that touches scheduler code.

Signed-off-by: Phil Auld <pauld@redhat.com>

Approved-by: Rafael Aquini <aquini@redhat.com>
Approved-by: Prarit Bhargava <prarit@redhat.com>
Approved-by: Waiman Long <longman@redhat.com>

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
2022-09-27 08:47:30 -04:00
Phil Auld eed8502760 smp: Make softirq handling RT safe in flush_smp_call_function_queue()
Bugzilla: https://bugzilla.redhat.com/2120671

commit 1a90bfd220201fbe050dfc15deaac20ca5f15638
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Wed Apr 13 15:31:05 2022 +0200

    smp: Make softirq handling RT safe in flush_smp_call_function_queue()

    flush_smp_call_function_queue() invokes do_softirq() which is not available
    on PREEMPT_RT. flush_smp_call_function_queue() is invoked from the idle
    task and the migration task with preemption or interrupts disabled.

    So RT kernels cannot process soft interrupts in that context as that has to
    acquire 'sleeping spinlocks' which is not possible with preemption or
    interrupts disabled and forbidden from the idle task anyway.

    The currently known SMP function call which raises a soft interrupt is in
    the block layer, but this functionality is not enabled on RT kernels due to
    latency and performance reasons.

    RT could wake up ksoftirqd unconditionally, but this wants to be avoided if
    there were soft interrupts pending already when this is invoked in the
    context of the migration task. The migration task might have preempted a
    threaded interrupt handler which raised a soft interrupt, but did not reach
    the local_bh_enable() to process it. The "running" ksoftirqd might prevent
    the handling in the interrupt thread context which is causing latency
    issues.

    Add a new function which handles this case explicitely for RT and falls
    back to do_softirq() on !RT kernels. In the RT case this warns when one of
    the flushed SMP function calls raised a soft interrupt so this can be
    investigated.

    [ tglx: Moved the RT part out of SMP code ]

    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lore.kernel.org/r/YgKgL6aPj8aBES6G@linutronix.de
    Link: https://lore.kernel.org/r/20220413133024.356509586@linutronix.de

Signed-off-by: Phil Auld <pauld@redhat.com>
2022-09-08 11:25:07 -04:00
Phil Auld 035866b87a smp: Rename flush_smp_call_function_from_idle()
Bugzilla: https://bugzilla.redhat.com/2120671

commit 16bf5a5e1ec56474ed2a19d72f272ed09a5d3ea1
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Wed Apr 13 15:31:03 2022 +0200

    smp: Rename flush_smp_call_function_from_idle()

    This is invoked from the stopper thread too, which is definitely not idle.
    Rename it to flush_smp_call_function_queue() and fixup the callers.

    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lore.kernel.org/r/20220413133024.305001096@linutronix.de

Signed-off-by: Phil Auld <pauld@redhat.com>
2022-09-08 11:25:07 -04:00
Waiman Long c08752787a kernel/smp: Provide boot-time timeout for CSD lock diagnostics
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117491

commit 3791a22374715b36ad806db13d8b2afb1b57fd36
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Mon, 28 Feb 2022 18:08:33 -0800

    kernel/smp: Provide boot-time timeout for CSD lock diagnostics

    Debugging of problems involving insanely long-running SMI handlers
    proceeds better if the CSD-lock timeout can be adjusted.  This commit
    therefore provides a new smp.csd_lock_timeout kernel boot parameter
    that specifies the timeout in milliseconds.  The default remains at the
    previously hard-coded value of five seconds.

    [ paulmck: Apply feedback from Juergen Gross. ]

    Cc: Rik van Riel <riel@surriel.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Ingo Molnar <mingo@kernel.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Reviewed-by: Juergen Gross <jgross@suse.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-08-30 17:22:12 -04:00
Phil Auld 03dd929092 sched: Improve wake_up_all_idle_cpus() take #2
Bugzilla: http://bugzilla.redhat.com/2020279

commit 96611c26dc351c33f73b48756a9feacc109e5bab
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Mon Oct 18 16:41:05 2021 +0200

    sched: Improve wake_up_all_idle_cpus() take #2

    As reported by syzbot and experienced by Pavel, using cpus_read_lock()
    in wake_up_all_idle_cpus() generates lock inversion (against mmap_sem
    and possibly others).

    Instead, shrink the preempt disable region by iterating all CPUs and
    checking the online status for each individual CPU while having
    preemption disabled.

    Fixes: 8850cb663b5c ("sched: Simplify wake_up_*idle*()")
    Reported-by: syzbot+d5b23b18d2f4feae8a67@syzkaller.appspotmail.com
    Reported-by: Pavel Machek <pavel@ucw.cz>
    Reported-by: Qian Cai <quic_qiancai@quicinc.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Tested-by: Qian Cai <quic_qiancai@quicinc.com>

Signed-off-by: Phil Auld <pauld@redhat.com>
2021-12-13 16:07:50 -05:00
Phil Auld 1028c3ee10 sched: Simplify wake_up_*idle*()
Bugzilla: http://bugzilla.redhat.com/2020279

commit 8850cb663b5cda04d33f9cfbc38889d73d3c8e24
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Tue Sep 21 22:16:02 2021 +0200

    sched: Simplify wake_up_*idle*()

    Simplify and make wake_up_if_idle() more robust, also don't iterate
    the whole machine with preempt_disable() in it's caller:
    wake_up_all_idle_cpus().

    This prepares for another wake_up_if_idle() user that needs a full
    do_idle() cycle.

    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Acked-by: Vasily Gorbik <gor@linux.ibm.com>
    Tested-by: Vasily Gorbik <gor@linux.ibm.com> # on s390
    Link: https://lkml.kernel.org/r/20210929152428.769328779@infradead.org

Signed-off-by: Phil Auld <pauld@redhat.com>
2021-12-13 16:07:48 -05:00
Arnd Bergmann 1139aeb1c5 smp: Fix smp_call_function_single_async prototype
As of commit 966a967116 ("smp: Avoid using two cache lines for struct
call_single_data"), the smp code prefers 32-byte aligned call_single_data
objects for performance reasons, but the block layer includes an instance
of this structure in the main 'struct request' that is more senstive
to size than to performance here, see 4ccafe0320 ("block: unalign
call_single_data in struct request").

The result is a violation of the calling conventions that clang correctly
points out:

block/blk-mq.c:630:39: warning: passing 8-byte aligned argument to 32-byte aligned parameter 2 of 'smp_call_function_single_async' may result in an unaligned pointer access [-Walign-mismatch]
                smp_call_function_single_async(cpu, &rq->csd);

It does seem that the usage of the call_single_data without cache line
alignment should still be allowed by the smp code, so just change the
function prototype so it accepts both, but leave the default alignment
unchanged for the other users. This seems better to me than adding
a local hack to shut up an otherwise correct warning in the caller.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Jens Axboe <axboe@kernel.dk>
Link: https://lkml.kernel.org/r/20210505211300.3174456-1-arnd@kernel.org
2021-05-06 15:33:49 +02:00
Ingo Molnar a500fc918f Merge branch 'locking/core' into x86/mm, to resolve conflict
There's a non-trivial conflict between the parallel TLB flush
framework and the IPI flush debugging code - merge them
manually.

Conflicts:
	kernel/smp.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-03-06 13:00:58 +01:00
Peter Zijlstra d43f17a1da smp: Micro-optimize smp_call_function_many_cond()
Call the generic send_call_function_single_ipi() function, which
will avoid the IPI when @last_cpu is idle.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-03-06 13:00:22 +01:00
Nadav Amit a5aa5ce300 smp: Inline on_each_cpu_cond() and on_each_cpu()
Simplify the code and avoid having an additional function on the stack
by inlining on_each_cpu_cond() and on_each_cpu().

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
[ Minor edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210220231712.2475218-10-namit@vmware.com
2021-03-06 12:59:10 +01:00
Nadav Amit a32a4d8a81 smp: Run functions concurrently in smp_call_function_many_cond()
Currently, on_each_cpu() and similar functions do not exploit the
potential of concurrency: the function is first executed remotely and
only then it is executed locally. Functions such as TLB flush can take
considerable time, so this provides an opportunity for performance
optimization.

To do so, modify smp_call_function_many_cond(), to allows the callers to
provide a function that should be executed (remotely/locally), and run
them concurrently. Keep other smp_call_function_many() semantic as it is
today for backward compatibility: the called function is not executed in
this case locally.

smp_call_function_many_cond() does not use the optimized version for a
single remote target that smp_call_function_single() implements. For
synchronous function call, smp_call_function_single() keeps a
call_single_data (which is used for synchronization) on the stack.
Interestingly, it seems that not using this optimization provides
greater performance improvements (greater speedup with a single remote
target than with multiple ones). Presumably, holding data structures
that are intended for synchronization on the stack can introduce
overheads due to TLB misses and false-sharing when the stack is used for
other purposes.

Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20210220231712.2475218-2-namit@vmware.com
2021-03-06 12:59:09 +01:00
Juergen Gross a5aabace5f locking/csd_lock: Add more data to CSD lock debugging
In order to help identifying problems with IPI handling and remote
function execution add some more data to IPI debugging code.

There have been multiple reports of CPUs looping long times (many
seconds) in smp_call_function_many() waiting for another CPU executing
a function like tlb flushing. Most of these reports have been for
cases where the kernel was running as a guest on top of KVM or Xen
(there are rumours of that happening under VMWare, too, and even on
bare metal).

Finding the root cause hasn't been successful yet, even after more than
2 years of chasing this bug by different developers.

Commit:

  35feb60474 ("kernel/smp: Provide CSD lock timeout diagnostics")

tried to address this by adding some debug code and by issuing another
IPI when a hang was detected. This helped mitigating the problem
(the repeated IPI unlocks the hang), but the root cause is still unknown.

Current available data suggests that either an IPI wasn't sent when it
should have been, or that the IPI didn't result in the target CPU
executing the queued function (due to the IPI not reaching the CPU,
the IPI handler not being called, or the handler not seeing the queued
request).

Try to add more diagnostic data by introducing a global atomic counter
which is being incremented when doing critical operations (before and
after queueing a new request, when sending an IPI, and when dequeueing
a request). The counter value is stored in percpu variables which can
be printed out when a hang is detected.

The data of the last event (consisting of sequence counter, source
CPU, target CPU, and event type) is stored in a global variable. When
a new event is to be traced, the data of the last event is stored in
the event related percpu location and the global data is updated with
the new event's data. This allows to track two events in one data
location: one by the value of the event data (the event before the
current one), and one by the location itself (the current event).

A typical printout with a detected hang will look like this:

csd: Detected non-responsive CSD lock (#1) on CPU#1, waiting 5000000003 ns for CPU#06 scf_handler_1+0x0/0x50(0xffffa2a881bb1410).
	csd: CSD lock (#1) handling prior scf_handler_1+0x0/0x50(0xffffa2a8813823c0) request.
        csd: cnt(00008cc): ffff->0000 dequeue (src cpu 0 == empty)
        csd: cnt(00008cd): ffff->0006 idle
        csd: cnt(0003668): 0001->0006 queue
        csd: cnt(0003669): 0001->0006 ipi
        csd: cnt(0003e0f): 0007->000a queue
        csd: cnt(0003e10): 0001->ffff ping
        csd: cnt(0003e71): 0003->0000 ping
        csd: cnt(0003e72): ffff->0006 gotipi
        csd: cnt(0003e73): ffff->0006 handle
        csd: cnt(0003e74): ffff->0006 dequeue (src cpu 0 == empty)
        csd: cnt(0003e7f): 0004->0006 ping
        csd: cnt(0003e80): 0001->ffff pinged
        csd: cnt(0003eb2): 0005->0001 noipi
        csd: cnt(0003eb3): 0001->0006 queue
        csd: cnt(0003eb4): 0001->0006 noipi
        csd: cnt now: 0003f00

The idea is to print only relevant entries. Those are all events which
are associated with the hang (so sender side events for the source CPU
of the hanging request, and receiver side events for the target CPU),
and the related events just before those (for adding data needed to
identify a possible race). Printing all available data would be
possible, but this would add large amounts of data printed on larger
configurations.

Signed-off-by: Juergen Gross <jgross@suse.com>
[ Minor readability edits. Breaks col80 but is far more readable. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/20210301101336.7797-4-jgross@suse.com
2021-03-06 12:49:48 +01:00
Juergen Gross de7b09ef65 locking/csd_lock: Prepare more CSD lock debugging
In order to be able to easily add more CSD lock debugging data to
struct call_function_data->csd move the call_single_data_t element
into a sub-structure.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210301101336.7797-3-jgross@suse.com
2021-03-06 12:49:48 +01:00
Juergen Gross 8d0968cc6b locking/csd_lock: Add boot parameter for controlling CSD lock debugging
Currently CSD lock debugging can be switched on and off via a kernel
config option only. Unfortunately there is at least one problem with
CSD lock handling pending for about 2 years now, which has been seen
in different environments (mostly when running virtualized under KVM
or Xen, at least once on bare metal). Multiple attempts to catch this
issue have finally led to introduction of CSD lock debug code, but
this code is not in use in most distros as it has some impact on
performance.

In order to be able to ship kernels with CONFIG_CSD_LOCK_WAIT_DEBUG
enabled even for production use, add a boot parameter for switching
the debug functionality on. This will reduce any performance impact
of the debug coding to a bare minimum when not being used.

Signed-off-by: Juergen Gross <jgross@suse.com>
[ Minor edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210301101336.7797-2-jgross@suse.com
2021-03-06 12:49:48 +01:00
Sebastian Andrzej Siewior f9d34595ae smp: Process pending softirqs in flush_smp_call_function_from_idle()
send_call_function_single_ipi() may wake an idle CPU without sending an
IPI. The woken up CPU will process the SMP-functions in
flush_smp_call_function_from_idle(). Any raised softirq from within the
SMP-function call will not be processed.
Should the CPU have no tasks assigned, then it will go back to idle with
pending softirqs and the NOHZ will rightfully complain.

Process pending softirqs on return from flush_smp_call_function_queue().

Fixes: b2a02fc43a ("smp: Optimize send_call_function_single_ipi()")
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210123201027.3262800-2-bigeasy@linutronix.de
2021-02-17 14:12:42 +01:00
Ingo Molnar a787bdaff8 Merge branch 'linus' into sched/core, to resolve semantic conflict
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-11-27 11:10:50 +01:00
Peter Zijlstra 545b8c8df4 smp: Cleanup smp_call_function*()
Get rid of the __call_single_node union and cleanup the API a little
to avoid external code relying on the structure layout as much.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
2020-11-24 16:47:49 +01:00
Linus Torvalds 41eea65e2a Merge tag 'core-rcu-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU changes from Ingo Molnar:

 - Debugging for smp_call_function()

 - RT raw/non-raw lock ordering fixes

 - Strict grace periods for KASAN

 - New smp_call_function() torture test

 - Torture-test updates

 - Documentation updates

 - Miscellaneous fixes

[ This doesn't actually pull the tag - I've dropped the last merge from
  the RCU branch due to questions about the series.   - Linus ]

* tag 'core-rcu-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (77 commits)
  smp: Make symbol 'csd_bug_count' static
  kernel/smp: Provide CSD lock timeout diagnostics
  smp: Add source and destination CPUs to __call_single_data
  rcu: Shrink each possible cpu krcp
  rcu/segcblist: Prevent useless GP start if no CBs to accelerate
  torture: Add gdb support
  rcutorture: Allow pointer leaks to test diagnostic code
  rcutorture: Hoist OOM registry up one level
  refperf: Avoid null pointer dereference when buf fails to allocate
  rcutorture: Properly synchronize with OOM notifier
  rcutorture: Properly set rcu_fwds for OOM handling
  torture: Add kvm.sh --help and update help message
  rcutorture: Add CONFIG_PROVE_RCU_LIST to TREE05
  torture: Update initrd documentation
  rcutorture: Replace HTTP links with HTTPS ones
  locktorture: Make function torture_percpu_rwsem_init() static
  torture: document --allcpus argument added to the kvm.sh script
  rcutorture: Output number of elapsed grace periods
  rcutorture: Remove KCSAN stubs
  rcu: Remove unused "cpu" parameter from rcu_report_qs_rdp()
  ...
2020-10-18 14:34:50 -07:00
Randy Dunlap 7b7b8a2c95 kernel/: fix repeated words in comments
Fix multiple occurrences of duplicated words in kernel/.

Fix one typo/spello on the same line as a duplicate word.  Change one
instance of "the the" to "that the".  Otherwise just drop one of the
repeated words.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://lkml.kernel.org/r/98202fa6-8919-ef63-9efe-c0fad5ca7af1@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16 11:11:19 -07:00
Wei Yongjun 2b722160f1 smp: Make symbol 'csd_bug_count' static
The sparse tool complains as follows:

kernel/smp.c:107:10: warning:
 symbol 'csd_bug_count' was not declared. Should it be static?

Because variable is not used outside of smp.c, this commit marks it
static.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2020-09-04 11:53:12 -07:00
Paul E. McKenney 35feb60474 kernel/smp: Provide CSD lock timeout diagnostics
This commit causes csd_lock_wait() to emit diagnostics when a CPU
fails to respond quickly enough to one of the smp_call_function()
family of function calls.  These diagnostics are enabled by a new
CSD_LOCK_WAIT_DEBUG Kconfig option that depends on DEBUG_KERNEL.

This commit was inspired by an earlier patch by Josef Bacik.

[ paulmck: Fix for syzbot+0f719294463916a3fc0e@syzkaller.appspotmail.com ]
[ paulmck: Fix KASAN use-after-free issue reported by Qian Cai. ]
[ paulmck: Fix botched nr_cpu_ids comparison per Dan Carpenter. ]
[ paulmck: Apply Peter Zijlstra feedback. ]
Link: https://lore.kernel.org/lkml/00000000000042f21905a991ecea@google.com
Link: https://lore.kernel.org/lkml/0000000000002ef21705a9933cf3@google.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-09-04 11:52:50 -07:00
Paul E. McKenney e48c15b796 smp: Add source and destination CPUs to __call_single_data
This commit adds a destination CPU to __call_single_data, and is inspired
by an earlier commit by Peter Zijlstra.  This version adds #ifdef to
permit use by 32-bit systems and supplying the destination CPU for all
smp_call_function*() requests, not just smp_call_function_single().

If need be, 32-bit systems could be accommodated by shrinking the flags
field to 16 bits (the atomic_t variant is currently unused) and by
providing only eight bits for CPU on such systems.

It is not clear that the addition of the fields to __call_single_node
are really needed.

[ paulmck: Apply Boqun Feng feedback on 32-bit builds. ]
Link: https://lore.kernel.org/lkml/20200615164048.GC2531@hirez.programming.kicks-ass.net/
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-09-04 11:50:50 -07:00
Muchun Song 589343569d smp: Fix a potential usage of stale nr_cpus
The get_option() maybe return 0, it means that the nr_cpus is
not initialized. Then we will use the stale nr_cpus to initialize
the nr_cpu_ids. So fix it.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200716070457.53255-1-songmuchun@bytedance.com
2020-07-22 10:22:04 +02:00
Peter Zijlstra 8c4890d1c3 smp, irq_work: Continue smp_call_function*() and irq_work*() integration
Instead of relying on BUG_ON() to ensure the various data structures
line up, use a bunch of horrible unions to make it all automatic.

Much of the union magic is to ensure irq_work and smp_call_function do
not (yet) see the members of their respective data structures change
name.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lkml.kernel.org/r/20200622100825.844455025@infradead.org
2020-06-28 17:01:20 +02:00
Linus Torvalds d479c5a191 The changes in this cycle are:
- Optimize the task wakeup CPU selection logic, to improve scalability and
    reduce wakeup latency spikes
 
  - PELT enhancements
 
  - CFS bandwidth handling fixes
 
  - Optimize the wakeup path by remove rq->wake_list and replacing it with ->ttwu_pending
 
  - Optimize IPI cross-calls by making flush_smp_call_function_queue()
    process sync callbacks first.
 
  - Misc fixes and enhancements.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl7WPL0RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1i0ThAAs0fbvMzNJ5SWFdwOQ4KZIlA+Im4dEBMK
 sx/XAZqa/hGxvkm1jS0RDVQl1V1JdOlru5UF4C42ctnAFGtBBHDriO5rn9oCpkSw
 DAoLc4eZqzldIXN6sDZ0xMtC14Eu15UAP40OyM4qxBc4GqGlOnnale6Vhn+n+pLQ
 jAuZlMJIkmmzeA6cuvtultevrVh+QUqJ/5oNUANlTER4OM48umjr5rNTOb8cIW53
 9K3vbS3nmqSvJuIyqfRFoMy5GFM6+Jj2+nYuq8aTuYLEtF4qqWzttS3wBzC9699g
 XYRKILkCK8ZP4RB5Ps/DIKj6maZGZoICBxTJEkIgXujJlxlKKTD3mddk+0LBXChW
 Ijznanxn67akoAFpqi/Dnkhieg7cUrE9v1OPRS2J0xy550synSPFcSgOK3viizga
 iqbjptY4scUWkCwHQNjABerxc7MWzrwbIrRt+uNvCaqJLweUh0GnEcV5va8R+4I8
 K20XwOdrzuPLo5KdDWA/BKOEv49guHZDvoykzlwMlR3gFfwHS/UsjzmSQIWK3gZG
 9OMn8ibO2f1OzhRcEpDLFzp7IIj6NJmPFVSW+7xHyL9/vTveUx3ZXPLteb2qxJVP
 BYPsduVx8YeGRBlLya0PJriB23ajQr0lnHWo15g0uR9o/0Ds1ephcymiF3QJmCaA
 To3CyIuQN8M=
 =C2OP
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2020-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:
 "The changes in this cycle are:

   - Optimize the task wakeup CPU selection logic, to improve
     scalability and reduce wakeup latency spikes

   - PELT enhancements

   - CFS bandwidth handling fixes

   - Optimize the wakeup path by remove rq->wake_list and replacing it
     with ->ttwu_pending

   - Optimize IPI cross-calls by making flush_smp_call_function_queue()
     process sync callbacks first.

   - Misc fixes and enhancements"

* tag 'sched-core-2020-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
  irq_work: Define irq_work_single() on !CONFIG_IRQ_WORK too
  sched/headers: Split out open-coded prototypes into kernel/sched/smp.h
  sched: Replace rq::wake_list
  sched: Add rq::ttwu_pending
  irq_work, smp: Allow irq_work on call_single_queue
  smp: Optimize send_call_function_single_ipi()
  smp: Move irq_work_run() out of flush_smp_call_function_queue()
  smp: Optimize flush_smp_call_function_queue()
  sched: Fix smp_call_function_single_async() usage for ILB
  sched/core: Offload wakee task activation if it the wakee is descheduling
  sched/core: Optimize ttwu() spinning on p->on_cpu
  sched: Defend cfs and rt bandwidth quota against overflow
  sched/cpuacct: Fix charge cpuacct.usage_sys
  sched/fair: Replace zero-length array with flexible-array
  sched/pelt: Sync util/runnable_sum with PELT window when propagating
  sched/cpuacct: Use __this_cpu_add() instead of this_cpu_ptr()
  sched/fair: Optimize enqueue_task_fair()
  sched: Make scheduler_ipi inline
  sched: Clean up scheduler_ipi()
  sched/core: Simplify sched_init()
  ...
2020-06-03 13:06:42 -07:00
Ingo Molnar 25de110d14 irq_work: Define irq_work_single() on !CONFIG_IRQ_WORK too
Some SMP platforms don't have CONFIG_IRQ_WORK defined, resulting in a link
error at build time.

Define a stub and clean up the prototype definitions.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-06-02 12:34:45 +02:00
Ingo Molnar 1f8db41505 sched/headers: Split out open-coded prototypes into kernel/sched/smp.h
Move the prototypes for sched_ttwu_pending() and send_call_function_single_ipi()
into the newly created kernel/sched/smp.h header, to make sure they are all
the same, and to architectures happy that use -Wmissing-prototypes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-05-28 11:03:20 +02:00
Peter Zijlstra a148866489 sched: Replace rq::wake_list
The recent commit: 90b5363acd ("sched: Clean up scheduler_ipi()")
got smp_call_function_single_async() subtly wrong. Even though it will
return -EBUSY when trying to re-use a csd, that condition is not
atomic and still requires external serialization.

The change in ttwu_queue_remote() got this wrong.

While on first reading ttwu_queue_remote() has an atomic test-and-set
that appears to serialize the use, the matching 'release' is not in
the right place to actually guarantee this serialization.

The actual race is vs the sched_ttwu_pending() call in the idle loop;
that can run the wakeup-list without consuming the CSD.

Instead of trying to chain the lists, merge them.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200526161908.129371594@infradead.org
2020-05-28 10:54:16 +02:00
Peter Zijlstra 4b44a21dd6 irq_work, smp: Allow irq_work on call_single_queue
Currently irq_work_queue_on() will issue an unconditional
arch_send_call_function_single_ipi() and has the handler do
irq_work_run().

This is unfortunate in that it makes the IPI handler look at a second
cacheline and it misses the opportunity to avoid the IPI. Instead note
that struct irq_work and struct __call_single_data are very similar in
layout, so use a few bits in the flags word to encode a type and stick
the irq_work on the call_single_queue list.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200526161908.011635912@infradead.org
2020-05-28 10:54:15 +02:00
Peter Zijlstra b2a02fc43a smp: Optimize send_call_function_single_ipi()
Just like the ttwu_queue_remote() IPI, make use of _TIF_POLLING_NRFLAG
to avoid sending IPIs to idle CPUs.

[ mingo: Fix UP build bug. ]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200526161907.953304789@infradead.org
2020-05-28 10:54:15 +02:00
Peter Zijlstra afaa653c56 smp: Move irq_work_run() out of flush_smp_call_function_queue()
This ensures flush_smp_call_function_queue() is strictly about
call_single_queue.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200526161907.895109676@infradead.org
2020-05-28 10:54:15 +02:00
Peter Zijlstra 52103be07d smp: Optimize flush_smp_call_function_queue()
The call_single_queue can contain (two) different callbacks,
synchronous and asynchronous. The current interrupt handler runs them
in-order, which means that remote CPUs that are waiting for their
synchronous call can be delayed by running asynchronous callbacks.

Rework the interrupt handler to first run the synchonous callbacks.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200526161907.836818381@infradead.org
2020-05-28 10:54:15 +02:00
Kaitao Cheng 58eb7b77ad smp: Use smp_call_func_t in on_each_cpu()
Use smp_call_func_t instead of the open coded function pointer argument.

Signed-off-by: Kaitao Cheng <pilgrimtao@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lkml.kernel.org/r/20200417162451.91969-1-pilgrimtao@gmail.com
2020-04-19 17:51:48 +02:00
Qais Yousef b99a26593b cpu/hotplug: Move bringup of secondary CPUs out of smp_init()
This is the last direct user of cpu_up() before it can become an internal
implementation detail of the cpu subsystem.

Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200323135110.30522-17-qais.yousef@arm.com
2020-03-25 12:59:37 +01:00
Peter Xu 5a18ceca63 smp: Allow smp_call_function_single_async() to insert locked csd
Previously we will raise an warning if we want to insert a csd object
which is with the LOCK flag set, and if it happens we'll also wait for
the lock to be released.  However, this operation does not match
perfectly with how the function is named - the name with "_async"
suffix hints that this function should not block, while we will.

This patch changed this behavior by simply return -EBUSY instead of
waiting, at the meantime we allow this operation to happen without
warning the user to change this into a feature when the caller wants
to "insert a csd object, if it's there, just wait for that one".

This is pretty safe because in flush_smp_call_function_queue() for
async csd objects (where csd->flags&SYNC is zero) we'll first do the
unlock then we call the csd->func().  So if we see the csd->flags&LOCK
is true in smp_call_function_single_async(), then it's guaranteed that
csd->func() will be called after this smp_call_function_single_async()
returns -EBUSY.

Update the comment of the function too to refect this.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20191216213125.9536-2-peterx@redhat.com
2020-03-06 13:42:28 +01:00