Centos-kernel-stream-9

Commit Graph

Author	SHA1	Message	Date
Phil Auld	04db3263fa	sched: Don't define sched_clock_irqtime as static key JIRA: https://issues.redhat.com/browse/RHEL-78821 commit b9f2b29b94943b08157e3dfc970baabc7944dbc3 Author: Yafang Shao <laoar.shao@gmail.com> Date: Wed Feb 5 11:24:38 2025 +0800 sched: Don't define sched_clock_irqtime as static key The sched_clock_irqtime was defined as a static key in commit 8722903cbb8f ('sched: Define sched_clock_irqtime as static key'). However, this change introduces a 'sleeping in atomic context' warning, as shown below: arch/x86/kernel/tsc.c:1214 mark_tsc_unstable() warn: sleeping in atomic context As analyzed by Dan, the affected code path is as follows: vcpu_load() <- disables preempt -> kvm_arch_vcpu_load() -> mark_tsc_unstable() <- sleeps virt/kvm/kvm_main.c 166 void vcpu_load(struct kvm_vcpu vcpu) 167 { 168 int cpu = get_cpu(); ^^^^^^^^^^ This get_cpu() disables preemption. 169 170 __this_cpu_write(kvm_running_vcpu, vcpu); 171 preempt_notifier_register(&vcpu->preempt_notifier); 172 kvm_arch_vcpu_load(vcpu, cpu); 173 put_cpu(); 174 } arch/x86/kvm/x86.c 4979 if (unlikely(vcpu->cpu != cpu) \|\| kvm_check_tsc_unstable()) { 4980 s64 tsc_delta = !vcpu->arch.last_host_tsc ? 0 : 4981 rdtsc() - vcpu->arch.last_host_tsc; 4982 if (tsc_delta < 0) 4983 mark_tsc_unstable("KVM discovered backwards TSC"); arch/x86/kernel/tsc.c 1206 void mark_tsc_unstable(char reason) 1207 { 1208 if (tsc_unstable) 1209 return; 1210 1211 tsc_unstable = 1; 1212 if (using_native_sched_clock()) 1213 clear_sched_clock_stable(); --> 1214 disable_sched_clock_irqtime(); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ kernel/jump_label.c 245 void static_key_disable(struct static_key key) 246 { 247 cpus_read_lock(); ^^^^^^^^^^^^^^^^ This lock has a might_sleep() in it which triggers the static checker warning. 248 static_key_disable_cpuslocked(key); 249 cpus_read_unlock(); 250 } Let revert this change for now as {disable,enable}_sched_clock_irqtime are used in many places, as pointed out by Sean, including the following: The code path in clocksource_watchdog(): clocksource_watchdog() \| -> spin_lock(&watchdog_lock); \| -> __clocksource_unstable() \| -> clocksource.mark_unstable() == tsc_cs_mark_unstable() \| -> disable_sched_clock_irqtime() And the code path in sched_clock_register(): / Cannot register a sched_clock with interrupts on / local_irq_save(flags); ... / Enable IRQ time accounting if we have a fast enough sched_clock() */ if (irqtime > 0 \|\| (irqtime == -1 && rate >= 1000000)) enable_sched_clock_irqtime(); local_irq_restore(flags); [lkp@intel.com: reported a build error in the prev version] Closes: https://lore.kernel.org/kvm/37a79ba3-9ce0-479c-a5b0-2bd75d573ed3@stanley.mountain/ Fixes: 8722903cbb8f ("sched: Define sched_clock_irqtime as static key") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Debugged-by: Dan Carpenter <dan.carpenter@linaro.org> Debugged-by: Sean Christopherson <seanjc@google.com> Debugged-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20250205032438.14668-1-laoar.shao@gmail.com Signed-off-by: Phil Auld <pauld@redhat.com>	2025-02-27 15:13:12 +00:00
Phil Auld	8657a68d8f	sched: Define sched_clock_irqtime as static key JIRA: https://issues.redhat.com/browse/RHEL-78821 commit 8722903cbb8f0d51057fbf9ef1c680756b74119e Author: Yafang Shao <laoar.shao@gmail.com> Date: Fri Jan 3 10:24:06 2025 +0800 sched: Define sched_clock_irqtime as static key Since CPU time accounting is a performance-critical path, let's define sched_clock_irqtime as a static key to minimize potential overhead. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Michal Koutný <mkoutny@suse.com> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/20250103022409.2544-2-laoar.shao@gmail.com Signed-off-by: Phil Auld <pauld@redhat.com>	2025-02-27 15:13:11 +00:00
Phil Auld	423e539eb2	sched: add READ_ONCE to task_on_rq_queued JIRA: https://issues.redhat.com/browse/RHEL-78821 commit 59297e2093ceced86393a059a4bd36802311f7bb Author: Harshit Agarwal <harshit@nutanix.com> Date: Thu Nov 14 14:08:11 2024 -0700 sched: add READ_ONCE to task_on_rq_queued task_on_rq_queued read p->on_rq without READ_ONCE, though p->on_rq is set with WRITE_ONCE in {activate\|deactivate}_task and smp_store_release in __block_task, and also read with READ_ONCE in task_on_rq_migrating. Make all of these accesses pair together by adding READ_ONCE in the task_on_rq_queued. Signed-off-by: Harshit Agarwal <harshit@nutanix.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Phil Auld <pauld@redhat.com> Link: https://lkml.kernel.org/r/20241114210812.1836587-1-jon@nutanix.com Signed-off-by: Phil Auld <pauld@redhat.com>	2025-02-27 15:13:09 +00:00
Phil Auld	91707bbfc4	sched: Consolidate pick__task to task_is_pushable helper JIRA: https://issues.redhat.com/browse/RHEL-78821 commit 18adad1dac3334ed34f60ad4de2960df03058142 Author: Connor O'Brien <connoro@google.com> Date: Wed Oct 9 16:53:38 2024 -0700 sched: Consolidate pick__task to task_is_pushable helper This patch consolidates rt and deadline pick_*_task functions to a task_is_pushable() helper This patch was broken out from a larger chain migration patch originally by Connor O'Brien. [jstultz: split out from larger chain migration patch, renamed helper function] Signed-off-by: Connor O'Brien <connoro@google.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Metin Kaya <metin.kaya@arm.com> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Tested-by: Metin Kaya <metin.kaya@arm.com> Link: https://lore.kernel.org/r/20241009235352.1614323-6-jstultz@google.com Signed-off-by: Phil Auld <pauld@redhat.com>	2025-02-27 15:13:08 +00:00
Phil Auld	8851a9b9ae	sched: Add move_queued_task_locked helper JIRA: https://issues.redhat.com/browse/RHEL-78821 Conflicts: Context diffs in sched.h due to not having eevdf code. commit 2b05a0b4c08ffd6dedfbd27af8708742cde39b95 Author: Connor O'Brien <connoro@google.com> Date: Wed Oct 9 16:53:37 2024 -0700 sched: Add move_queued_task_locked helper Switch logic that deactivates, sets the task cpu, and reactivates a task on a different rq to use a helper that will be later extended to push entire blocked task chains. This patch was broken out from a larger chain migration patch originally by Connor O'Brien. [jstultz: split out from larger chain migration patch] Signed-off-by: Connor O'Brien <connoro@google.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Metin Kaya <metin.kaya@arm.com> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Qais Yousef <qyousef@layalina.io> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Tested-by: Metin Kaya <metin.kaya@arm.com> Link: https://lore.kernel.org/r/20241009235352.1614323-5-jstultz@google.com Signed-off-by: Phil Auld <pauld@redhat.com>	2025-02-27 15:13:08 +00:00
Phil Auld	b550c6bebd	sched/headers: Move struct pre-declarations to the beginning of the header JIRA: https://issues.redhat.com/browse/RHEL-56494 commit 3cd7271987ffd89c2d5eaeea85d3e9a16aec6894 Author: Ingo Molnar <mingo@kernel.org> Date: Wed Jun 5 13:44:28 2024 +0200 sched/headers: Move struct pre-declarations to the beginning of the header There's a random number of structure pre-declaration lines in kernel/sched/sched.h, some of which are unnecessary duplicates. Move them to the head & order them a bit for readability. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Phil Auld <pauld@redhat.com>	2024-09-23 13:33:03 -04:00
Phil Auld	3917c1b34b	sched/core: Clean up kernel/sched/sched.h a bit JIRA: https://issues.redhat.com/browse/RHEL-56494 Conflicts: Left out hunks in mm_cid code which we don't have in RHEL9. commit 127f6bf1618868920c1f77e0a427d1f4570e450b Author: Ingo Molnar <mingo@kernel.org> Date: Wed Jun 5 13:39:31 2024 +0200 sched/core: Clean up kernel/sched/sched.h a bit - Fix whitespace noise - Fix col80 linebreak damage where possible - Apply CodingStyle consistently - Use consistent #else and #endif comments - Use consistent vertical alignment - Use 'extern' consistently Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Phil Auld <pauld@redhat.com>	2024-09-23 13:33:03 -04:00
Phil Auld	e8bf69e6e0	sched: Fix spelling in comments JIRA: https://issues.redhat.com/browse/RHEL-56494 Conflicts: Dropped hunks in mm_cid code which we don't have. Minor context diffs due to still having IA64 in tree and previous Kabi workarounds. commit 402de7fc880fef055bc984957454b532987e9ad0 Author: Ingo Molnar <mingo@kernel.org> Date: Mon May 27 16:54:52 2024 +0200 sched: Fix spelling in comments Do a spell-checking pass. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Phil Auld <pauld@redhat.com>	2024-09-23 13:33:02 -04:00
Phil Auld	10626dfce1	sched/syscalls: Split out kernel/sched/syscalls.c from kernel/sched/core.c JIRA: https://issues.redhat.com/browse/RHEL-56494 Conflicts: Worked around RHEL-only commits `9b35f92491` ("sched/core: Make sched_setaffinity() always return -EINVAL on empty cpumask"),90f7bb0c1823 ("sched/core: Don't return -ENODEV from sched_setaffinity()") and `05fddaaaac` ("sched/core: Use empty mask to reset cpumasks in sched_setaffinity()") by removing the changes and re-applying them to the new syscalls.c file. Reverting and re-applying was not possible since there have been other changes on top of these as well. commit 04746ed80bcf3130951ed4d5c1bc5b0bcabdde22 Author: Ingo Molnar <mingo@kernel.org> Date: Sun Apr 7 10:43:15 2024 +0200 sched/syscalls: Split out kernel/sched/syscalls.c from kernel/sched/core.c core.c has become rather large, move most scheduler syscall related functionality into a separate file, syscalls.c. This is about ~15% of core.c's raw linecount. Move the alloc_user_cpus_ptr(), __rt_effective_prio(), rt_effective_prio(), uclamp_none(), uclamp_se_set() and uclamp_bucket_id() inlines to kernel/sched/sched.h. Internally export the __sched_setscheduler(), __sched_setaffinity(), __setscheduler_prio(), set_load_weight(), enqueue_task(), dequeue_task(), check_class_changed(), splice_balance_callbacks() and balance_callbacks() methods to better facilitate this. Move the new file's build to sched_policy.c, because it fits there semantically, but also because it's the smallest of the 4 build units under an allmodconfig build: -rw-rw-r-- 1 mingo mingo 7.3M May 27 12:35 kernel/sched/core.i -rw-rw-r-- 1 mingo mingo 6.4M May 27 12:36 kernel/sched/build_utility.i -rw-rw-r-- 1 mingo mingo 6.3M May 27 12:36 kernel/sched/fair.i -rw-rw-r-- 1 mingo mingo 5.8M May 27 12:36 kernel/sched/build_policy.i This better balances build time for scheduler subsystem rebuilds. I build-tested this new file as a standalone syscalls.o file for a bit, to make sure all the encapsulations & abstractions are robust. Also update/add my copyright notices to these files. Build time measurements: # -Before/+After: kepler:~/tip> perf stat -e 'cycles,instructions,duration_time' --sync --repeat 5 --pre 'rm -f kernel/sched/*.o' m kernel/sched/built-in.a >/dev/null Performance counter stats for 'm kernel/sched/built-in.a' (5 runs): - 71,938,508,607 cycles ( +- 0.17% ) + 71,992,916,493 cycles ( +- 0.22% ) - 106,214,780,964 instructions # 1.48 insn per cycle ( +- 0.01% ) + 105,450,231,154 instructions # 1.46 insn per cycle ( +- 0.01% ) - 5,878,232,620 ns duration_time ( +- 0.38% ) + 5,290,085,069 ns duration_time ( +- 0.21% ) - 5.8782 +- 0.0221 seconds time elapsed ( +- 0.38% ) + 5.2901 +- 0.0111 seconds time elapsed ( +- 0.21% ) Build time improvement of -11.1% (duration_time) is expected: the parallel build time of the scheduler subsystem is determined by the largest, slowest to build object file, which is kernel/sched/core.o. By moving ~15% of its complexity into another build unit, we reduced build time by -11%. Measured cycles spent on building is within its ~0.2% stddev noise envelope. The -0.7% reduction in instructions spent on building the scheduler is statistically reliable and somewhat surprising - I can only speculate: maybe compilers aren't that efficient at building & optimizing 10+ KLOC files (core.c), and it's an overall win to balance the linecount a bit. Anyway, this might be a data point that suggests that reducing the linecount of our largest files will improve not just code readability and maintainability, but might also improve build times a bit. Code generation got a bit worse, by 0.5kb text on an x86 defconfig build: # -Before/+After: kepler:~/tip> size vmlinux text data bss dec hex filename -26475475 10439178 1740804 38655457 24dd5e1 vmlinux +26476003 10439178 1740804 38655985 24dd7f1 vmlinux kepler:~/tip> size kernel/sched/built-in.a text data bss dec hex filename - 76056 30025 489 106570 1a04a kernel/sched/core.o (ex kernel/sched/built-in.a) + 63452 29453 489 93394 16cd2 kernel/sched/core.o (ex kernel/sched/built-in.a) 44299 2181 104 46584 b5f8 kernel/sched/fair.o (ex kernel/sched/built-in.a) - 42764 3424 120 46308 b4e4 kernel/sched/build_policy.o (ex kernel/sched/built-in.a) + 55651 4044 120 59815 e9a7 kernel/sched/build_policy.o (ex kernel/sched/built-in.a) 44866 12655 2192 59713 e941 kernel/sched/build_utility.o (ex kernel/sched/built-in.a) 44866 12655 2192 59713 e941 kernel/sched/build_utility.o (ex kernel/sched/built-in.a) This is primarily due to the extra functions exported, and the size gets exaggerated somewhat by __pfx CFI function padding: ffffffff810cc710 <__pfx_enqueue_task>: ffffffff810cc710: 90 nop ffffffff810cc711: 90 nop ffffffff810cc712: 90 nop ffffffff810cc713: 90 nop ffffffff810cc714: 90 nop ffffffff810cc715: 90 nop ffffffff810cc716: 90 nop ffffffff810cc717: 90 nop ffffffff810cc718: 90 nop ffffffff810cc719: 90 nop ffffffff810cc71a: 90 nop ffffffff810cc71b: 90 nop ffffffff810cc71c: 90 nop ffffffff810cc71d: 90 nop ffffffff810cc71e: 90 nop ffffffff810cc71f: 90 nop AFAICS the cost is primarily not to core.o and fair.o though (which contain most performance sensitive scheduler functions), only to syscalls.o that get called with much lower frequency - so I think this is an acceptable trade-off for better code separation. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Link: https://lore.kernel.org/r/20240407084319.1462211-2-mingo@kernel.org Signed-off-by: Phil Auld <pauld@redhat.com>	2024-09-23 13:33:02 -04:00
Phil Auld	14a470e760	sched/pelt: Remove shift of thermal clock JIRA: https://issues.redhat.com/browse/RHEL-56494 commit 97450eb909658573dcacc1063b06d3d08642c0c1 Author: Vincent Guittot <vincent.guittot@linaro.org> Date: Tue Mar 26 10:16:16 2024 +0100 sched/pelt: Remove shift of thermal clock The optional shift of the clock used by thermal/hw load avg has been introduced to handle case where the signal was not always a high frequency hw signal. Now that cpufreq provides a signal for firmware and SW pressure, we can remove this exception and always keep this PELT signal aligned with other signals. Mark sysctl_sched_migration_cost boot parameter as deprecated Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Lukasz Luba <lukasz.luba@arm.com> Reviewed-by: Qais Yousef <qyousef@layalina.io> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://lore.kernel.org/r/20240326091616.3696851-6-vincent.guittot@linaro.org Signed-off-by: Phil Auld <pauld@redhat.com>	2024-09-23 13:33:02 -04:00
Phil Auld	51c743b331	sched/cpufreq: Rename arch_update_thermal_pressure() => arch_update_hw_pressure() JIRA: https://issues.redhat.com/browse/RHEL-56494 Conflicts: Minor differences since we already have ddae0ca2a8f ("sched: Move psi_account_irqtime() out of update_rq_clock_task() hotpath") which changes some nearby code. commit d4dbc991714eefcbd8d54a3204bd77a0a52bd32d Author: Vincent Guittot <vincent.guittot@linaro.org> Date: Tue Mar 26 10:16:15 2024 +0100 sched/cpufreq: Rename arch_update_thermal_pressure() => arch_update_hw_pressure() Now that cpufreq provides a pressure value to the scheduler, rename arch_update_thermal_pressure into HW pressure to reflect that it returns a pressure applied by HW (i.e. with a high frequency change) and not always related to thermal mitigation but also generated by max current limitation as an example. Such high frequency signal needs filtering to be smoothed and provide an value that reflects the average available capacity into the scheduler time scale. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Lukasz Luba <lukasz.luba@arm.com> Reviewed-by: Qais Yousef <qyousef@layalina.io> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://lore.kernel.org/r/20240326091616.3696851-5-vincent.guittot@linaro.org Signed-off-by: Phil Auld <pauld@redhat.com>	2024-09-23 13:33:02 -04:00
Phil Auld	3aaab109b4	sched/balancing: Simplify the sg_status bitmask and use separate ->overloaded and ->overutilized flags JIRA: https://issues.redhat.com/browse/RHEL-56494 commit 4475cd8bfd9bcb898953fcadb2f51b3432eb68a1 Author: Ingo Molnar <mingo@kernel.org> Date: Thu Mar 28 12:07:48 2024 +0100 sched/balancing: Simplify the sg_status bitmask and use separate ->overloaded and ->overutilized flags SG_OVERLOADED and SG_OVERUTILIZED flags plus the sg_status bitmask are an unnecessary complication that only make the code harder to read and slower. We only ever set them separately: thule:~/tip> git grep SG_OVER kernel/sched/ kernel/sched/fair.c: set_rd_overutilized_status(rq->rd, SG_OVERUTILIZED); kernel/sched/fair.c: sg_status \|= SG_OVERLOADED; kernel/sched/fair.c: sg_status \|= SG_OVERUTILIZED; kernel/sched/fair.c: sg_status \|= SG_OVERLOADED; kernel/sched/fair.c: set_rd_overloaded(env->dst_rq->rd, sg_status & SG_OVERLOADED); kernel/sched/fair.c: sg_status & SG_OVERUTILIZED); kernel/sched/fair.c: } else if (sg_status & SG_OVERUTILIZED) { kernel/sched/fair.c: set_rd_overutilized_status(env->dst_rq->rd, SG_OVERUTILIZED); kernel/sched/sched.h:#define SG_OVERLOADED 0x1 / More than one runnable task on a CPU. / kernel/sched/sched.h:#define SG_OVERUTILIZED 0x2 / One or more CPUs are over-utilized. / kernel/sched/sched.h: set_rd_overloaded(rq->rd, SG_OVERLOADED); And use them separately, which results in suboptimal code: / update overload indicator if we are at root domain / set_rd_overloaded(env->dst_rq->rd, sg_status & SG_OVERLOADED); / Update over-utilization (tipping point, U >= 0) indicator */ set_rd_overutilized_status(env->dst_rq->rd, Introduce separate sg_overloaded and sg_overutilized flags in update_sd_lb_stats() and its lower level functions, and change all of them to 'bool'. Remove the now unused SG_OVERLOADED and SG_OVERUTILIZED flags. Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Shrikanth Hegde <sshegde@linux.ibm.com> Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com> Cc: Qais Yousef <qyousef@layalina.io> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/ZgVPhODZ8/nbsqbP@gmail.com Signed-off-by: Phil Auld <pauld@redhat.com>	2024-09-20 04:38:50 -04:00
Phil Auld	72be6d9b90	sched/fair: Rename SG_OVERLOAD to SG_OVERLOADED JIRA: https://issues.redhat.com/browse/RHEL-56494 commit 7bda10ba7f453729f210264dd07d38989fb858d9 Author: Ingo Molnar <mingo@kernel.org> Date: Thu Mar 28 11:44:16 2024 +0100 sched/fair: Rename SG_OVERLOAD to SG_OVERLOADED Follow the rename of the root_domain::overloaded flag. Note that this also matches the SG_OVERUTILIZED flag better. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Qais Yousef <qyousef@layalina.io> Cc: Shrikanth Hegde <sshegde@linux.ibm.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/ZgVHq65XKsOZpfgK@gmail.com Signed-off-by: Phil Auld <pauld@redhat.com>	2024-09-20 04:38:50 -04:00
Phil Auld	11e709c903	sched/fair: Rename {set\|get}_rd_overload() to {set\|get}_rd_overloaded() JIRA: https://issues.redhat.com/browse/RHEL-56494 commit 76cc4f91488af0a808bec97794bfe434dece7d67 Author: Ingo Molnar <mingo@kernel.org> Date: Thu Mar 28 11:41:31 2024 +0100 sched/fair: Rename {set\|get}_rd_overload() to {set\|get}_rd_overloaded() Follow the rename of the root_domain::overloaded flag. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Qais Yousef <qyousef@layalina.io> Cc: Shrikanth Hegde <sshegde@linux.ibm.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/ZgVHq65XKsOZpfgK@gmail.com Signed-off-by: Phil Auld <pauld@redhat.com>	2024-09-20 04:38:50 -04:00
Phil Auld	96b4819653	sched/fair: Rename root_domain::overload to ::overloaded JIRA: https://issues.redhat.com/browse/RHEL-56494 commit dfb83ef7b8b064c15be19cf7fcbde0996712de8f Author: Ingo Molnar <mingo@kernel.org> Date: Thu Mar 28 11:33:20 2024 +0100 sched/fair: Rename root_domain::overload to ::overloaded It is silly to use an ambiguous noun instead of a clear adjective when naming such a flag ... Note how root_domain::overutilized already used a proper adjective. rd->overloaded is now set to 1 when the root domain is overloaded. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Qais Yousef <qyousef@layalina.io> Cc: Shrikanth Hegde <sshegde@linux.ibm.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/ZgVHq65XKsOZpfgK@gmail.com Signed-off-by: Phil Auld <pauld@redhat.com>	2024-09-20 04:38:50 -04:00
Phil Auld	b4c29ee118	sched/fair: Use helper functions to access root_domain::overload JIRA: https://issues.redhat.com/browse/RHEL-56494 commit caac6291728ed5493d8a53f4b086c270849ce0c4 Author: Shrikanth Hegde <sshegde@linux.ibm.com> Date: Mon Mar 25 11:15:05 2024 +0530 sched/fair: Use helper functions to access root_domain::overload Introduce two helper functions to access & set the root_domain::overload flag: get_rd_overload() set_rd_overload() To make sure code is always following READ_ONCE()/WRITE_ONCE() access methods. No change in functionality intended. [ mingo: Renamed the accessors to get_/set_rd_overload(), tidied up the changelog. ] Suggested-by: Qais Yousef <qyousef@layalina.io> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Qais Yousef <qyousef@layalina.io> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/20240325054505.201995-3-sshegde@linux.ibm.com Signed-off-by: Phil Auld <pauld@redhat.com>	2024-09-20 04:38:50 -04:00
Phil Auld	096e01219f	sched/topology: Remove root_domain::max_cpu_capacity JIRA: https://issues.redhat.com/browse/RHEL-56494 commit fa427e8e53d8db15090af7e952a55870dc2a453f Author: Qais Yousef <qyousef@layalina.io> Date: Sun Mar 24 00:45:51 2024 +0000 sched/topology: Remove root_domain::max_cpu_capacity The value is no longer used as we now keep track of max_allowed_capacity for each task instead. Signed-off-by: Qais Yousef <qyousef@layalina.io> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/20240324004552.999936-4-qyousef@layalina.io Signed-off-by: Phil Auld <pauld@redhat.com>	2024-09-20 04:38:49 -04:00
Phil Auld	b921e17c63	sched/topology: Export asym_cap_list JIRA: https://issues.redhat.com/browse/RHEL-56494 commit 77222b0d12e8ae6f082261842174cc2e981bf99c Author: Qais Yousef <qyousef@layalina.io> Date: Sun Mar 24 00:45:49 2024 +0000 sched/topology: Export asym_cap_list So that we can use it to iterate through available capacities in the system. Sort asym_cap_list in descending order as expected users are likely to be interested on the highest capacity first. Make the list RCU protected to allow for cheap access in hot paths. Signed-off-by: Qais Yousef <qyousef@layalina.io> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/20240324004552.999936-2-qyousef@layalina.io Signed-off-by: Phil Auld <pauld@redhat.com>	2024-09-20 04:38:49 -04:00
Phil Auld	32938a738c	sched/balancing: Rename trigger_load_balance() => sched_balance_trigger() JIRA: https://issues.redhat.com/browse/RHEL-56494 Conflicts: Dropped CN documentation since not in RHEL. Minor fuzz in sched-domains.rst. commit 983be0628c061989b6cc175d2f5e429b40699fbb Author: Ingo Molnar <mingo@kernel.org> Date: Fri Mar 8 12:18:09 2024 +0100 sched/balancing: Rename trigger_load_balance() => sched_balance_trigger() Standardize scheduler load-balancing function names on the sched_balance_() prefix. Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com> Link: https://lore.kernel.org/r/20240308111819.1101550-4-mingo@kernel.org Signed-off-by: Phil Auld <pauld@redhat.com>	2024-09-20 04:38:48 -04:00
Phil Auld	df588c9291	sched/balancing: Rename rebalance_domains() => sched_balance_domains() JIRA: https://issues.redhat.com/browse/RHEL-56494 Conflicts: Dropped CN documentation since not in RHEL. commit 14ff4dbd34f46cc6b6105f549983321241ccbba9 Author: Ingo Molnar <mingo@kernel.org> Date: Fri Mar 8 12:18:10 2024 +0100 sched/balancing: Rename rebalance_domains() => sched_balance_domains() Standardize scheduler load-balancing function names on the sched_balance_() prefix. Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com> Link: https://lore.kernel.org/r/20240308111819.1101550-5-mingo@kernel.org Signed-off-by: Phil Auld <pauld@redhat.com>	2024-09-20 04:38:48 -04:00
Phil Auld	acd3db6848	sched/fair: Add READ_ONCE() and use existing helper function to access ->avg_irq JIRA: https://issues.redhat.com/browse/RHEL-56494 commit a6965b31888501f889261a6783f0de6afff84f8d Author: Shrikanth Hegde <sshegde@linux.vnet.ibm.com> Date: Mon Jan 1 21:16:24 2024 +0530 sched/fair: Add READ_ONCE() and use existing helper function to access ->avg_irq Use existing helper function cpu_util_irq() instead of open-coding access to ->avg_irq. During review it was noted that ->avg_irq could be updated by a different CPU than the one which is trying to access it. ->avg_irq is updated with WRITE_ONCE(), use READ_ONCE to access it in order to avoid any compiler optimizations. Signed-off-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/20240101154624.100981-3-sshegde@linux.vnet.ibm.com Signed-off-by: Phil Auld <pauld@redhat.com>	2024-09-20 04:38:46 -04:00
Phil Auld	d414c1e069	sched: Move psi_account_irqtime() out of update_rq_clock_task() hotpath JIRA: https://issues.redhat.com/browse/RHEL-48226 Conflicts: Minor context differences in sched/core.c due to not having scheduler_tick() renamed sched_tick and d4dbc991714e ("sched/cpufreq: Rename arch_update_thermal_pressure() => arch_update_hw_pressure()"). commit ddae0ca2a8fe12d0e24ab10ba759c3fbd755ada8 Author: John Stultz <jstultz@google.com> Date: Tue Jun 18 14:58:55 2024 -0700 sched: Move psi_account_irqtime() out of update_rq_clock_task() hotpath It was reported that in moving to 6.1, a larger then 10% regression was seen in the performance of clock_gettime(CLOCK_THREAD_CPUTIME_ID,...). Using a simple reproducer, I found: 5.10: 100000000 calls in 24345994193 ns => 243.460 ns per call 100000000 calls in 24288172050 ns => 242.882 ns per call 100000000 calls in 24289135225 ns => 242.891 ns per call 6.1: 100000000 calls in 28248646742 ns => 282.486 ns per call 100000000 calls in 28227055067 ns => 282.271 ns per call 100000000 calls in 28177471287 ns => 281.775 ns per call The cause of this was finally narrowed down to the addition of psi_account_irqtime() in update_rq_clock_task(), in commit 52b1364ba0b1 ("sched/psi: Add PSI_IRQ to track IRQ/SOFTIRQ pressure"). In my initial attempt to resolve this, I leaned towards moving all accounting work out of the clock_gettime() call path, but it wasn't very pretty, so it will have to wait for a later deeper rework. Instead, Peter shared this approach: Rework psi_account_irqtime() to use its own psi_irq_time base for accounting, and move it out of the hotpath, calling it instead from sched_tick() and __schedule(). In testing this, we found the importance of ensuring psi_account_irqtime() is run under the rq_lock, which Johannes Weiner helpfully explained, so also add some lockdep annotations to make that requirement clear. With this change the performance is back in-line with 5.10: 6.1+fix: 100000000 calls in 24297324597 ns => 242.973 ns per call 100000000 calls in 24318869234 ns => 243.189 ns per call 100000000 calls in 24291564588 ns => 242.916 ns per call Reported-by: Jimmy Shiu <jimmyshiu@google.com> Originally-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev> Reviewed-by: Qais Yousef <qyousef@layalina.io> Link: https://lore.kernel.org/r/20240618215909.4099720-1-jstultz@google.com Signed-off-by: Phil Auld <pauld@redhat.com>	2024-07-15 11:13:20 -04:00
Lucas Zampieri	f67ab7550c	Merge: Scheduler: rhel9.5 updates MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3975 JIRA: https://issues.redhat.com/browse/RHEL-25535 JIRA: https://issues.redhat.com/browse/RHEL-20158 JIRA: https://issues.redhat.com/browse/RHEL-15622 Depends: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3935 Tested: Scheduler stress tests. Perf Qe will do a performance regression test. A collection of fixes and updates that brings the core scheduler code up to v6.8. EEVDF related commits are skipped since we are not planning to take the new task scheduler in rhel9. Signed-off-by: Phil Auld <pauld@redhat.com> Approved-by: Juri Lelli <juri.lelli@redhat.com> Approved-by: Wander Lairson Costa <wander@redhat.com> Approved-by: Rafael Aquini <aquini@redhat.com> Merged-by: Lucas Zampieri <lzampier@redhat.com>	2024-05-08 20:13:47 +00:00
Lucas Zampieri	d23522d08a	Merge: Sched: schedutil/cpufreq updates MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3935 JIRA: https://issues.redhat.com/browse/RHEL-29020 Bring schedutil code up to about v6.8. This includes some fixes for code in rhel9 from the 5.14 rebase. There are few pieces in cpufreq driver code and the arm architectures needed to make it complete. Tested: Ran stress tests with schedutil governor. Ran general scheduler stress and performance tests. Signed-off-by: Phil Auld <pauld@redhat.com> Approved-by: Mark Langsdorf <mlangsdo@redhat.com> Approved-by: Waiman Long <longman@redhat.com> Merged-by: Lucas Zampieri <lzampier@redhat.com>	2024-04-26 12:34:20 +00:00
Lucas Zampieri	79eb65d175	Merge: sched: apply class and guard cleanups MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3865 JIRA: https://issues.redhat.com/browse/RHEL-29017 Apply the changes using the macros in include/linux/cleanup.h providing scoped guards. There is no real functional change. We rely on the compiler to cleanup rather than having explicit unwiding with gotos. Signed-off-by: Phil Auld <pauld@redhat.com> Approved-by: Juri Lelli <juri.lelli@redhat.com> Approved-by: Waiman Long <longman@redhat.com> Merged-by: Lucas Zampieri <lzampier@redhat.com>	2024-04-22 12:41:20 +00:00
Phil Auld	39ff726e7b	sched: fair: move unused stub functions to header JIRA: https://issues.redhat.com/browse/RHEL-25535 commit b1c3efe07987592c16d5f59ce235e6ddbea65a73 Author: Arnd Bergmann <arnd@arndb.de> Date: Thu Nov 23 12:05:03 2023 +0100 sched: fair: move unused stub functions to header These four functions have a normal definition for CONFIG_FAIR_GROUP_SCHED, and empty one that is only referenced when FAIR_GROUP_SCHED is disabled but CGROUP_SCHED is still enabled. If both are turned off, the functions are still defined but the misisng prototype causes a W=1 warning: kernel/sched/fair.c:12544:6: error: no previous prototype for 'free_fair_sched_group' kernel/sched/fair.c:12546:5: error: no previous prototype for 'alloc_fair_sched_group' kernel/sched/fair.c:12553:6: error: no previous prototype for 'online_fair_sched_group' kernel/sched/fair.c:12555:6: error: no previous prototype for 'unregister_fair_sched_group' Move the alternatives into the header as static inline functions with the correct combination of #ifdef checks to avoid the warning without adding even more complexity. [A different patch with the same description got applied by accident and was later reverted, but the original patch is still missing] Link: https://lkml.kernel.org/r/20231123110506.707903-4-arnd@kernel.org Fixes: 7aa55f2a5902 ("sched/fair: Move unused stub functions to header") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Kees Cook <keescook@chromium.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nicolas Schier <nicolas@fjasle.eu> Cc: Palmer Dabbelt <palmer@rivosinc.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Tudor Ambarus <tudor.ambarus@linaro.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Phil Auld <pauld@redhat.com>	2024-04-08 15:47:16 -04:00
Phil Auld	8e7f4729fa	sched/deadline: Introduce deadline servers JIRA: https://issues.redhat.com/browse/RHEL-25535 Conflicts: Context diff in include/linux/sched.h mostly due to not having fd593511cdfc ("tracing/user_events: Track fork/exec/exit for mm lifetime"). commit 63ba8422f876e32ee564ea95da9a7313b13ff0a1 Author: Peter Zijlstra <peterz@infradead.org> Date: Sat Nov 4 11:59:21 2023 +0100 sched/deadline: Introduce deadline servers Low priority tasks (e.g., SCHED_OTHER) can suffer starvation if tasks with higher priority (e.g., SCHED_FIFO) monopolize CPU(s). RT Throttling has been introduced a while ago as a (mostly debug) countermeasure one can utilize to reserve some CPU time for low priority tasks (usually background type of work, e.g. workqueues, timers, etc.). It however has its own problems (see documentation) and the undesired effect of unconditionally throttling FIFO tasks even when no lower priority activity needs to run (there are mechanisms to fix this issue as well, but, again, with their own problems). Introduce deadline servers to service low priority tasks needs under starvation conditions. Deadline servers are built extending SCHED_DEADLINE implementation to allow 2-level scheduling (a sched_deadline entity becomes a container for lower priority scheduling entities). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/4968601859d920335cf85822eb573a5f179f04b8.1699095159.git.bristot@kernel.org Signed-off-by: Phil Auld <pauld@redhat.com>	2024-04-08 15:47:16 -04:00
Phil Auld	32ae9572f2	sched/deadline: Move bandwidth accounting into {en,de}queue_dl_entity JIRA: https://issues.redhat.com/browse/RHEL-25535 Conflicts: One hunk applied by hand in sched.h due to not having eevdf commit d07f09a1f99c ("sched/fair: Propagate enqueue flags into place_entity()"). commit 2f7a0f58948d8231236e2facecc500f1930fb996 Author: Peter Zijlstra <peterz@infradead.org> Date: Sat Nov 4 11:59:20 2023 +0100 sched/deadline: Move bandwidth accounting into {en,de}queue_dl_entity In preparation of introducing !task sched_dl_entity; move the bandwidth accounting into {en.de}queue_dl_entity(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Phil Auld <pauld@redhat.com> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Link: https://lkml.kernel.org/r/a86dccbbe44e021b8771627e1dae01a69b73466d.1699095159.git.bristot@kernel.org Signed-off-by: Phil Auld <pauld@redhat.com>	2024-04-08 15:47:16 -04:00
Phil Auld	f0cdbfa9cb	sched/deadline: Collect sched_dl_entity initialization JIRA: https://issues.redhat.com/browse/RHEL-25535 Conflicts: Minor fuzz due to unrelated whitespace difference from upstream. commit 9e07d45c5210f5dd6701c00d55791983db7320fa Author: Peter Zijlstra <peterz@infradead.org> Date: Sat Nov 4 11:59:19 2023 +0100 sched/deadline: Collect sched_dl_entity initialization Create a single function that initializes a sched_dl_entity. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Phil Auld <pauld@redhat.com> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Link: https://lkml.kernel.org/r/51acc695eecf0a1a2f78f9a044e11ffd9b316bcf.1699095159.git.bristot@kernel.org Signed-off-by: Phil Auld <pauld@redhat.com>	2024-04-08 15:47:16 -04:00
Phil Auld	7fc27e6f01	sched: Unify runtime accounting across classes JIRA: https://issues.redhat.com/browse/RHEL-25535 Conflicts: Whitespace context difference in removed code in sched.h. Minor context diff in fair.c due to not having the eevdf scheduler patches in rhel. commit 5d69eca542ee17c618f9a55da52191d5e28b435f Author: Peter Zijlstra <peterz@infradead.org> Date: Sat Nov 4 11:59:18 2023 +0100 sched: Unify runtime accounting across classes All classes use sched_entity::exec_start to track runtime and have copies of the exact same code around to compute runtime. Collapse all that. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Phil Auld <pauld@redhat.com> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lkml.kernel.org/r/54d148a144f26d9559698c4dd82d8859038a7380.1699095159.git.bristot@kernel.org Signed-off-by: Phil Auld <pauld@redhat.com>	2024-04-08 15:47:16 -04:00
Phil Auld	fea5f42a4c	sched/fair: Remove SIS_PROP JIRA: https://issues.redhat.com/browse/RHEL-25535 commit 984ffb6a4366752c949f7b39640aecdce222607f Author: Peter Zijlstra <peterz@infradead.org> Date: Fri Oct 20 12:35:33 2023 +0200 sched/fair: Remove SIS_PROP SIS_UTIL seems to work well, lets remove the old thing. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20231020134337.GD33965@noisy.programming.kicks-ass.net Signed-off-by: Phil Auld <pauld@redhat.com>	2024-04-08 15:47:15 -04:00
Phil Auld	219d789d21	sched/fair: Scan cluster before scanning LLC in wake-up path JIRA: https://issues.redhat.com/browse/RHEL-15622 commit 8881e1639f1f899b64e9bccf6cc14d51c1d3c822 Author: Barry Song <song.bao.hua@hisilicon.com> Date: Thu Oct 19 11:33:22 2023 +0800 sched/fair: Scan cluster before scanning LLC in wake-up path For platforms having clusters like Kunpeng920, CPUs within the same cluster have lower latency when synchronizing and accessing shared resources like cache. Thus, this patch tries to find an idle cpu within the cluster of the target CPU before scanning the whole LLC to gain lower latency. This will be implemented in 2 steps in select_idle_sibling(): 1. When the prev_cpu/recent_used_cpu are good wakeup candidates, use them if they're sharing cluster with the target CPU. Otherwise trying to scan for an idle CPU in the target's cluster. 2. Scanning the cluster prior to the LLC of the target CPU for an idle CPU to wakeup. Testing has been done on Kunpeng920 by pinning tasks to one numa and two numa. On Kunpeng920, Each numa has 8 clusters and each cluster has 4 CPUs. With this patch, We noticed enhancement on tbench and netperf within one numa or cross two numa on top of tip-sched-core commit 9b46f1abc6d4 ("sched/debug: Print 'tgid' in sched_show_task()") tbench results (node 0): baseline patched 1: 327.2833 372.4623 ( 13.80%) 4: 1320.5933 1479.8833 ( 12.06%) 8: 2638.4867 2921.5267 ( 10.73%) 16: 5282.7133 5891.5633 ( 11.53%) 32: 9810.6733 9877.3400 ( 0.68%) 64: 7408.9367 7447.9900 ( 0.53%) 128: 6203.2600 6191.6500 ( -0.19%) tbench results (node 0-1): baseline patched 1: 332.0433 372.7223 ( 12.25%) 4: 1325.4667 1477.6733 ( 11.48%) 8: 2622.9433 2897.9967 ( 10.49%) 16: 5218.6100 5878.2967 ( 12.64%) 32: 10211.7000 11494.4000 ( 12.56%) 64: 13313.7333 16740.0333 ( 25.74%) 128: 13959.1000 14533.9000 ( 4.12%) netperf results TCP_RR (node 0): baseline patched 1: 76546.5033 90649.9867 ( 18.42%) 4: 77292.4450 90932.7175 ( 17.65%) 8: 77367.7254 90882.3467 ( 17.47%) 16: 78519.9048 90938.8344 ( 15.82%) 32: 72169.5035 72851.6730 ( 0.95%) 64: 25911.2457 25882.2315 ( -0.11%) 128: 10752.6572 10768.6038 ( 0.15%) netperf results TCP_RR (node 0-1): baseline patched 1: 76857.6667 90892.2767 ( 18.26%) 4: 78236.6475 90767.3017 ( 16.02%) 8: 77929.6096 90684.1633 ( 16.37%) 16: 77438.5873 90502.5787 ( 16.87%) 32: 74205.6635 88301.5612 ( 19.00%) 64: 69827.8535 71787.6706 ( 2.81%) 128: 25281.4366 25771.3023 ( 1.94%) netperf results UDP_RR (node 0): baseline patched 1: 96869.8400 110800.8467 ( 14.38%) 4: 97744.9750 109680.5425 ( 12.21%) 8: 98783.9863 110409.9637 ( 11.77%) 16: 99575.0235 110636.2435 ( 11.11%) 32: 95044.7250 97622.8887 ( 2.71%) 64: 32925.2146 32644.4991 ( -0.85%) 128: 12859.2343 12824.0051 ( -0.27%) netperf results UDP_RR (node 0-1): baseline patched 1: 97202.4733 110190.1200 ( 13.36%) 4: 95954.0558 106245.7258 ( 10.73%) 8: 96277.1958 105206.5304 ( 9.27%) 16: 97692.7810 107927.2125 ( 10.48%) 32: 79999.6702 103550.2999 ( 29.44%) 64: 80592.7413 87284.0856 ( 8.30%) 128: 27701.5770 29914.5820 ( 7.99%) Note neither Kunpeng920 nor x86 Jacobsville supports SMT, so the SMT branch in the code has not been tested but it supposed to work. Chen Yu also noticed this will improve the performance of tbench and netperf on a 24 CPUs Jacobsville machine, there are 4 CPUs in one cluster sharing L2 Cache. [https://lore.kernel.org/lkml/Ytfjs+m1kUs0ScSn@worktop.programming.kicks-ass.net] Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com> Reviewed-by: Chen Yu <yu.c.chen@intel.com> Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Tested-and-reviewed-by: Chen Yu <yu.c.chen@intel.com> Tested-by: Yicong Yang <yangyicong@hisilicon.com> Link: https://lkml.kernel.org/r/20231019033323.54147-3-yangyicong@huawei.com Signed-off-by: Phil Auld <pauld@redhat.com>	2024-04-08 15:47:04 -04:00
Phil Auld	26c251b772	sched: Add cpus_share_resources API JIRA: https://issues.redhat.com/browse/RHEL-15622 commit b95303e0aeaf446b65169dd4142cacdaeb7d4c8b Author: Barry Song <song.bao.hua@hisilicon.com> Date: Thu Oct 19 11:33:21 2023 +0800 sched: Add cpus_share_resources API Add cpus_share_resources() API. This is the preparation for the optimization of select_idle_cpu() on platforms with cluster scheduler level. On a machine with clusters cpus_share_resources() will test whether two cpus are within the same cluster. On a non-cluster machine it will behaves the same as cpus_share_cache(). So we use "resources" here for cache resources. Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Tested-and-reviewed-by: Chen Yu <yu.c.chen@intel.com> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lkml.kernel.org/r/20231019033323.54147-2-yangyicong@huawei.com Signed-off-by: Phil Auld <pauld@redhat.com>	2024-04-08 15:46:43 -04:00
Phil Auld	de762ac709	sched/headers: Remove comment referring to rq::cpu_load, since this has been removed JIRA: https://issues.redhat.com/browse/RHEL-25535 commit b19fdb16fb2167c6bc9ee8fbc0c1d2d4fd3e2eb8 Author: Colin Ian King <colin.i.king@gmail.com> Date: Tue Oct 10 16:57:44 2023 +0100 sched/headers: Remove comment referring to rq::cpu_load, since this has been removed There is a comment that refers to cpu_load, however, this cpu_load was removed with: `55627e3cd2` ("sched/core: Remove rq->cpu_load[]") ... back in 2019. The comment does not make sense with respect to this removed array, so remove the comment. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20231010155744.1381065-1-colin.i.king@gmail.com Signed-off-by: Phil Auld <pauld@redhat.com>	2024-04-08 09:40:57 -04:00
Phil Auld	38176213a7	sched/topology: Move the declaration of 'schedutil_gov' to kernel/sched/sched.h JIRA: https://issues.redhat.com/browse/RHEL-25535 commit f2273f4e19e29f7d0be6a2393f18369cd1b496c8 Author: Ingo Molnar <mingo@kernel.org> Date: Mon Oct 9 17:31:26 2023 +0200 sched/topology: Move the declaration of 'schedutil_gov' to kernel/sched/sched.h Move it out of the .c file into the shared scheduler-internal header file, to gain type-checking. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Shrikanth Hegde <sshegde@linux.vnet.ibm.com> Cc: Valentin Schneider <vschneid@redhat.com> Link: https://lore.kernel.org/r/20231009060037.170765-3-sshegde@linux.vnet.ibm.com Signed-off-by: Phil Auld <pauld@redhat.com>	2024-04-08 09:40:57 -04:00
Phil Auld	660107a034	sched/deadline: Make dl_rq->pushable_dl_tasks update drive dl_rq->overloaded JIRA: https://issues.redhat.com/browse/RHEL-25535 commit 5fe7765997b139e2d922b58359dea181efe618f9 Author: Valentin Schneider <vschneid@redhat.com> Date: Thu Sep 28 17:02:51 2023 +0200 sched/deadline: Make dl_rq->pushable_dl_tasks update drive dl_rq->overloaded dl_rq->dl_nr_migratory is increased whenever a DL entity is enqueued and it has nr_cpus_allowed > 1. Unlike the pushable_dl_tasks tree, dl_rq->dl_nr_migratory includes a dl_rq's current task. This means a dl_rq can have a migratable current, N non-migratable queued tasks, and be flagged as overloaded and have its CPU set in the dlo_mask, despite having an empty pushable_tasks tree. Make an dl_rq's overload logic be driven by {enqueue,dequeue}_pushable_dl_task(), in other words make DL RQs only be flagged as overloaded if they have at least one runnable-but-not-current migratable task. o push_dl_task() is unaffected, as it is a no-op if there are no pushable tasks. o pull_dl_task() now no longer scans runqueues whose sole migratable task is their current one, which it can't do anything about anyway. It may also now pull tasks to a DL RQ with dl_nr_running > 1 if only its current task is migratable. Since dl_rq->dl_nr_migratory becomes unused, remove it. RT had the exact same mechanism (rt_rq->rt_nr_migratory) which was dropped in favour of relying on rt_rq->pushable_tasks, see: 612f769edd06 ("sched/rt: Make rt_rq->pushable_tasks updates drive rto_mask") Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lore.kernel.org/r/20230928150251.463109-1-vschneid@redhat.com Signed-off-by: Phil Auld <pauld@redhat.com>	2024-04-08 09:40:56 -04:00
Phil Auld	8883ff7c00	sched/rt: Make rt_rq->pushable_tasks updates drive rto_mask JIRA: https://issues.redhat.com/browse/RHEL-25535 commit 612f769edd06a6e42f7cd72425488e68ddaeef0a Author: Valentin Schneider <vschneid@redhat.com> Date: Fri Aug 11 12:20:44 2023 +0100 sched/rt: Make rt_rq->pushable_tasks updates drive rto_mask Sebastian noted that the rto_push_work IRQ work can be queued for a CPU that has an empty pushable_tasks list, which means nothing useful will be done in the IPI other than queue the work for the next CPU on the rto_mask. rto_push_irq_work_func() only operates on tasks in the pushable_tasks list, but the conditions for that irq_work to be queued (and for a CPU to be added to the rto_mask) rely on rq_rt->nr_migratory instead. nr_migratory is increased whenever an RT task entity is enqueued and it has nr_cpus_allowed > 1. Unlike the pushable_tasks list, nr_migratory includes a rt_rq's current task. This means a rt_rq can have a migratible current, N non-migratible queued tasks, and be flagged as overloaded / have its CPU set in the rto_mask, despite having an empty pushable_tasks list. Make an rt_rq's overload logic be driven by {enqueue,dequeue}_pushable_task(). Since rt_rq->{rt_nr_migratory,rt_nr_total} become unused, remove them. Note that the case where the current task is pushed away to make way for a migration-disabled task remains unchanged: the migration-disabled task has to be in the pushable_tasks list in the first place, which means it has nr_cpus_allowed > 1. Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/20230811112044.3302588-1-vschneid@redhat.com Signed-off-by: Phil Auld <pauld@redhat.com>	2024-04-08 09:40:56 -04:00
Phil Auld	8738be5be2	sched/fair: Make cfs_rq->throttled_csd_list available on !SMP JIRA: https://issues.redhat.com/browse/RHEL-25535 Conflicts: Minor fuzz in sched.h due to context from kABI additions. commit 30797bce8ef0c73f0c388148ffac92458533b10e Author: Josh Don <joshdon@google.com> Date: Fri Sep 22 16:05:34 2023 -0700 sched/fair: Make cfs_rq->throttled_csd_list available on !SMP This makes the following patch cleaner by avoiding extra CONFIG_SMP conditionals on the availability of rq->throttled_csd_list. Signed-off-by: Josh Don <joshdon@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230922230535.296350-1-joshdon@google.com Signed-off-by: Phil Auld <pauld@redhat.com>	2024-04-08 09:40:55 -04:00
Phil Auld	7e2b960e90	sched/fair: Rename check_preempt_curr() to wakeup_preempt() JIRA: https://issues.redhat.com/browse/RHEL-25535 Conflicts: Minor fuzz in fair.c due to having RT merged, specifically: ea622076b76f ("sched: Add support for lazy preemption") commit e23edc86b09df655bf8963bbcb16647adc787395 Author: Ingo Molnar <mingo@kernel.org> Date: Tue Sep 19 10:38:21 2023 +0200 sched/fair: Rename check_preempt_curr() to wakeup_preempt() The name is a bit opaque - make it clear that this is about wakeup preemption. Also rename the ->check_preempt_curr() methods similarly. Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Phil Auld <pauld@redhat.com>	2024-04-08 09:40:55 -04:00
Phil Auld	380b737290	sched/headers: Remove duplicated includes in kernel/sched/sched.h JIRA: https://issues.redhat.com/browse/RHEL-25535 commit 7ad0354d18ae05e9c8885251e234cbcf141f8972 Author: GUO Zihua <guozihua@huawei.com> Date: Fri Aug 18 09:56:33 2023 +0800 sched/headers: Remove duplicated includes in kernel/sched/sched.h Remove duplicated includes of linux/cgroup.h and linux/psi.h. Both of these includes are included regardless of the config and they are all protected by ifndef, so no point including them again. Signed-off-by: GUO Zihua <guozihua@huawei.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230818015633.18370-1-guozihua@huawei.com Signed-off-by: Phil Auld <pauld@redhat.com>	2024-04-08 09:40:55 -04:00
Phil Auld	1c89ce0800	sched/fair: Ratelimit update to tg->load_avg JIRA: https://issues.redhat.com/browse/RHEL-25535 JIRA: https://issues.redhat.com/browse/RHEL-20158 commit 1528c661c24b407e92194426b0adbb43de859ce0 Author: Aaron Lu <aaron.lu@intel.com> Date: Tue Sep 12 14:58:08 2023 +0800 sched/fair: Ratelimit update to tg->load_avg When using sysbench to benchmark Postgres in a single docker instance with sysbench's nr_threads set to nr_cpu, it is observed there are times update_cfs_group() and update_load_avg() shows noticeable overhead on a 2sockets/112core/224cpu Intel Sapphire Rapids(SPR): 13.75% 13.74% [kernel.vmlinux] [k] update_cfs_group 10.63% 10.04% [kernel.vmlinux] [k] update_load_avg Annotate shows the cycles are mostly spent on accessing tg->load_avg with update_load_avg() being the write side and update_cfs_group() being the read side. tg->load_avg is per task group and when different tasks of the same taskgroup running on different CPUs frequently access tg->load_avg, it can be heavily contended. E.g. when running postgres_sysbench on a 2sockets/112cores/224cpus Intel Sappire Rapids, during a 5s window, the wakeup number is 14millions and migration number is 11millions and with each migration, the task's load will transfer from src cfs_rq to target cfs_rq and each change involves an update to tg->load_avg. Since the workload can trigger as many wakeups and migrations, the access(both read and write) to tg->load_avg can be unbound. As a result, the two mentioned functions showed noticeable overhead. With netperf/nr_client=nr_cpu/UDP_RR, the problem is worse: during a 5s window, wakeup number is 21millions and migration number is 14millions; update_cfs_group() costs ~25% and update_load_avg() costs ~16%. Reduce the overhead by limiting updates to tg->load_avg to at most once per ms. The update frequency is a tradeoff between tracking accuracy and overhead. 1ms is chosen because PELT window is roughly 1ms and it delivered good results for the tests that I've done. After this change, the cost of accessing tg->load_avg is greatly reduced and performance improved. Detailed test results below. ============================== postgres_sysbench on SPR: 25% base: 42382±19.8% patch: 50174±9.5% (noise) 50% base: 67626±1.3% patch: 67365±3.1% (noise) 75% base: 100216±1.2% patch: 112470±0.1% +12.2% 100% base: 93671±0.4% patch: 113563±0.2% +21.2% ============================== hackbench on ICL: group=1 base: 114912±5.2% patch: 117857±2.5% (noise) group=4 base: 359902±1.6% patch: 361685±2.7% (noise) group=8 base: 461070±0.8% patch: 491713±0.3% +6.6% group=16 base: 309032±5.0% patch: 378337±1.3% +22.4% ============================= hackbench on SPR: group=1 base: 100768±2.9% patch: 103134±2.9% (noise) group=4 base: 413830±12.5% patch: 378660±16.6% (noise) group=8 base: 436124±0.6% patch: 490787±3.2% +12.5% group=16 base: 457730±3.2% patch: 680452±1.3% +48.8% ============================ netperf/udp_rr on ICL 25% base: 114413±0.1% patch: 115111±0.0% +0.6% 50% base: 86803±0.5% patch: 86611±0.0% (noise) 75% base: 35959±5.3% patch: 49801±0.6% +38.5% 100% base: 61951±6.4% patch: 70224±0.8% +13.4% =========================== netperf/udp_rr on SPR 25% base: 104954±1.3% patch: 107312±2.8% (noise) 50% base: 55394±4.6% patch: 54940±7.4% (noise) 75% base: 13779±3.1% patch: 36105±1.1% +162% 100% base: 9703±3.7% patch: 28011±0.2% +189% ============================================== netperf/tcp_stream on ICL (all in noise range) 25% base: 43092±0.1% patch: 42891±0.5% 50% base: 19278±14.9% patch: 22369±7.2% 75% base: 16822±3.0% patch: 17086±2.3% 100% base: 18216±0.6% patch: 18078±2.9% =============================================== netperf/tcp_stream on SPR (all in noise range) 25% base: 34491±0.3% patch: 34886±0.5% 50% base: 19278±14.9% patch: 22369±7.2% 75% base: 16822±3.0% patch: 17086±2.3% 100% base: 18216±0.6% patch: 18078±2.9% Reported-by: Nitin Tekchandani <nitin.tekchandani@intel.com> Suggested-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Aaron Lu <aaron.lu@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: David Vernet <void@manifault.com> Tested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Tested-by: Swapnil Sapkal <Swapnil.Sapkal@amd.com> Link: https://lkml.kernel.org/r/20230912065808.2530-2-aaron.lu@intel.com Signed-off-by: Phil Auld <pauld@redhat.com>	2024-04-08 09:40:26 -04:00
Phil Auld	b3d7247782	sched/rt: Change the type of 'sysctl_sched_rt_period' from 'unsigned int' to 'int' JIRA: https://issues.redhat.com/browse/RHEL-29436 commit 089768dfeb3ab294f9ab6a1f2462001f0f879fbb Author: Yajun Deng <yajun.deng@linux.dev> Date: Sun Oct 8 10:15:38 2023 +0800 sched/rt: Change the type of 'sysctl_sched_rt_period' from 'unsigned int' to 'int' Doing this matches the natural type of 'int' based calculus in sched_rt_handler(), and also enables the adding in of a correct upper bounds check on the sysctl interface. [ mingo: Rewrote the changelog. ] Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20231008021538.3063250-1-yajun.deng@linux.dev Signed-off-by: Phil Auld <pauld@redhat.com>	2024-04-05 09:55:45 -04:00
Phil Auld	9f16bf1bd9	sched: add WF_CURRENT_CPU and externise ttwu JIRA: https://issues.redhat.com/browse/RHEL-25535 commit ab83f455f04df5b2f7c6d4de03b6d2eaeaa27b8a Author: Peter Oskolkov <posk@google.com> Date: Tue Mar 7 23:31:57 2023 -0800 sched: add WF_CURRENT_CPU and externise ttwu Add WF_CURRENT_CPU wake flag that advices the scheduler to move the wakee to the current CPU. This is useful for fast on-CPU context switching use cases. In addition, make ttwu external rather than static so that the flag could be passed to it from outside of sched/core.c. Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: Andrei Vagin <avagin@google.com> Acked-by: "Peter Zijlstra (Intel)" <peterz@infradead.org> Link: https://lore.kernel.org/r/20230308073201.3102738-3-avagin@google.com Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Phil Auld <pauld@redhat.com>	2024-04-05 09:49:13 -04:00
Phil Auld	dd1a6e3897	sched: add throttled time stat for throttled children JIRA: https://issues.redhat.com/browse/RHEL-25535 commit 677ea015f231aa38b3972aa7be54ecd2637e99fd Author: Josh Don <joshdon@google.com> Date: Tue Jun 20 11:32:47 2023 -0700 sched: add throttled time stat for throttled children We currently export the total throttled time for cgroups that are given a bandwidth limit. This patch extends this accounting to also account the total time that each children cgroup has been throttled. This is useful to understand the degree to which children have been affected by the throttling control. Children which are not runnable during the entire throttled period, for example, will not show any self-throttling time during this period. Expose this in a new interface, 'cpu.stat.local', which is similar to how non-hierarchical events are accounted in 'memory.events.local'. Signed-off-by: Josh Don <joshdon@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230620183247.737942-2-joshdon@google.com Signed-off-by: Phil Auld <pauld@redhat.com>	2024-04-05 09:49:13 -04:00
Phil Auld	2adae1808f	sched/cpufreq: Rework iowait boost JIRA: https://issues.redhat.com/browse/RHEL-29020 commit f12560779f9d734446508f3df17f5632e9aaa2c8 Author: Vincent Guittot <vincent.guittot@linaro.org> Date: Wed Nov 22 14:39:04 2023 +0100 sched/cpufreq: Rework iowait boost Use the max value that has already been computed inside sugov_get_util() to cap the iowait boost and remove dependency with uclamp_rq_util_with() which is not used anymore. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Link: https://lore.kernel.org/r/20231122133904.446032-3-vincent.guittot@linaro.org Signed-off-by: Phil Auld <pauld@redhat.com>	2024-04-05 09:43:59 -04:00
Phil Auld	74c4d90dda	sched/cpufreq: Rework schedutil governor performance estimation JIRA: https://issues.redhat.com/browse/RHEL-29020 commit 9c0b4bb7f6303c9c4e2e34984c46f5a86478f84d Author: Vincent Guittot <vincent.guittot@linaro.org> Date: Wed Nov 22 14:39:03 2023 +0100 sched/cpufreq: Rework schedutil governor performance estimation The current method to take into account uclamp hints when estimating the target frequency can end in a situation where the selected target frequency is finally higher than uclamp hints, whereas there are no real needs. Such cases mainly happen because we are currently mixing the traditional scheduler utilization signal with the uclamp performance hints. By adding these 2 metrics, we loose an important information when it comes to select the target frequency, and we have to make some assumptions which can't fit all cases. Rework the interface between the scheduler and schedutil governor in order to propagate all information down to the cpufreq governor. effective_cpu_util() interface changes and now returns the actual utilization of the CPU with 2 optional inputs: - The minimum performance for this CPU; typically the capacity to handle the deadline task and the interrupt pressure. But also uclamp_min request when available. - The maximum targeting performance for this CPU which reflects the maximum level that we would like to not exceed. By default it will be the CPU capacity but can be reduced because of some performance hints set with uclamp. The value can be lower than actual utilization and/or min performance level. A new sugov_effective_cpu_perf() interface is also available to compute the final performance level that is targeted for the CPU, after applying some cpufreq headroom and taking into account all inputs. With these 2 functions, schedutil is now able to decide when it must go above uclamp hints. It now also has a generic way to get the min performance level. The dependency between energy model and cpufreq governor and its headroom policy doesn't exist anymore. eenv_pd_max_util() asks schedutil for the targeted performance after applying the impact of the waking task. [ mingo: Refined the changelog & C comments. ] Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Link: https://lore.kernel.org/r/20231122133904.446032-2-vincent.guittot@linaro.org Signed-off-by: Phil Auld <pauld@redhat.com>	2024-04-05 09:43:59 -04:00
Phil Auld	49d1b3f5c9	sched/topology: Consolidate and clean up access to a CPU's max compute capacity JIRA: https://issues.redhat.com/browse/RHEL-29020 commit 7bc263840bc3377186cb06b003ac287bb2f18ce2 Author: Vincent Guittot <vincent.guittot@linaro.org> Date: Mon Oct 9 12:36:16 2023 +0200 sched/topology: Consolidate and clean up access to a CPU's max compute capacity Remove the rq::cpu_capacity_orig field and use arch_scale_cpu_capacity() instead. The scheduler uses 3 methods to get access to a CPU's max compute capacity: - arch_scale_cpu_capacity(cpu) which is the default way to get a CPU's capacity. - cpu_capacity_orig field which is periodically updated with arch_scale_cpu_capacity(). - capacity_orig_of(cpu) which encapsulates rq->cpu_capacity_orig. There is no real need to save the value returned by arch_scale_cpu_capacity() in struct rq. arch_scale_cpu_capacity() returns: - either a per_cpu variable. - or a const value for systems which have only one capacity. Remove rq::cpu_capacity_orig and use arch_scale_cpu_capacity() everywhere. No functional changes. Some performance tests on Arm64: - small SMP device (hikey): no noticeable changes - HMP device (RB5): hackbench shows minor improvement (1-2%) - large smp (thx2): hackbench and tbench shows minor improvement (1%) Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lore.kernel.org/r/20231009103621.374412-2-vincent.guittot@linaro.org Signed-off-by: Phil Auld <pauld@redhat.com>	2024-04-05 09:43:50 -04:00
Phil Auld	7d8b86de57	sched: Simplify set_user_nice() JIRA: https://issues.redhat.com/browse/RHEL-29017 commit 94b548a15e8ec47dfbf6925bdfb64bb5657dce0c Author: Peter Zijlstra <peterz@infradead.org> Date: Fri Jun 9 20:52:55 2023 +0200 sched: Simplify set_user_nice() Use guards to reduce gotos and simplify control flow. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Phil Auld <pauld@redhat.com>	2024-03-13 14:36:13 -04:00
Phil Auld	2d0ed06667	sched: Simplify wake_up_if_idle() JIRA: https://issues.redhat.com/browse/RHEL-29017 commit 4eb054f92b066ec0a0cba6896ee8eff4c91dfc9e Author: Peter Zijlstra <peterz@infradead.org> Date: Tue Aug 1 22:41:25 2023 +0200 sched: Simplify wake_up_if_idle() Use guards to reduce gotos and simplify control flow. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Link: https://lore.kernel.org/r/20230801211812.032678917@infradead.org Signed-off-by: Phil Auld <pauld@redhat.com>	2024-03-13 14:35:47 -04:00
Phil Auld	f4b0880d3d	sched: Simplify: migrate_swap_stop() JIRA: https://issues.redhat.com/browse/RHEL-29017 commit 5bb76f1ddf2a7dd98f5a89d7755600ed1b4a7fcd Author: Peter Zijlstra <peterz@infradead.org> Date: Tue Aug 1 22:41:24 2023 +0200 sched: Simplify: migrate_swap_stop() Use guards to reduce gotos and simplify control flow. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Link: https://lore.kernel.org/r/20230801211811.964370836@infradead.org Signed-off-by: Phil Auld <pauld@redhat.com>	2024-03-13 14:35:43 -04:00

1 2 3 4 5 ...

654 Commits