linux-kernelorg-stable/arch/ia64/kernel
Linus Torvalds 596ff4a09b cpumask: re-introduce constant-sized cpumask optimizations
Commit aa47a7c215 ("lib/cpumask: deprecate nr_cpumask_bits") resulted
in the cpumask operations potentially becoming hugely less efficient,
because suddenly the cpumask was always considered to be variable-sized.

The optimization was then later added back in a limited form by commit
6f9c07be9d ("lib/cpumask: add FORCE_NR_CPUS config option"), but that
FORCE_NR_CPUS option is not useful in a generic kernel and more of a
special case for embedded situations with fixed hardware.

Instead, just re-introduce the optimization, with some changes.

Instead of depending on CPUMASK_OFFSTACK being false, and then always
using the full constant cpumask width, this introduces three different
cpumask "sizes":

 - the exact size (nr_cpumask_bits) remains identical to nr_cpu_ids.

   This is used for situations where we should use the exact size.

 - the "small" size (small_cpumask_bits) is the NR_CPUS constant if it
   fits in a single word and the bitmap operations thus end up able
   to trigger the "small_const_nbits()" optimizations.

   This is used for the operations that have optimized single-word
   cases that get inlined, notably the bit find and scanning functions.

 - the "large" size (large_cpumask_bits) is the NR_CPUS constant if it
   is an sufficiently small constant that makes simple "copy" and
   "clear" operations more efficient.

   This is arbitrarily set at four words or less.

As a an example of this situation, without this fixed size optimization,
cpumask_clear() will generate code like

        movl    nr_cpu_ids(%rip), %edx
        addq    $63, %rdx
        shrq    $3, %rdx
        andl    $-8, %edx
        callq   memset@PLT

on x86-64, because it would calculate the "exact" number of longwords
that need to be cleared.

In contrast, with this patch, using a MAX_CPU of 64 (which is quite a
reasonable value to use), the above becomes a single

	movq $0,cpumask

instruction instead, because instead of caring to figure out exactly how
many CPU's the system has, it just knows that the cpumask will be a
single word and can just clear it all.

Note that this does end up tightening the rules a bit from the original
version in another way: operations that set bits in the cpumask are now
limited to the actual nr_cpu_ids limit, whereas we used to do the
nr_cpumask_bits thing almost everywhere in the cpumask code.

But if you just clear bits, or scan for bits, we can use the simpler
compile-time constants.

In the process, remove 'cpumask_complement()' and 'for_each_cpu_not()'
which were not useful, and which fundamentally have to be limited to
'nr_cpu_ids'.  Better remove them now than have somebody introduce use
of them later.

Of course, on x86-64 with MAXSMP there is no sane small compile-time
constant for the cpumask sizes, and we end up using the actual CPU bits,
and will generate the above kind of horrors regardless.  Please don't
use MAXSMP unless you really expect to have machines with thousands of
cores.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-03-05 14:30:34 -08:00
..
syscalls ia64: fix clock_getres(CLOCK_MONOTONIC) to report ITC frequency 2022-09-11 21:55:07 -07:00
.gitignore
Makefile kbuild: remove --include-dir MAKEFLAG from top Makefile 2023-02-05 18:51:22 +09:00
Makefile.gate
acpi-ext.c
acpi.c cpumask: re-introduce constant-sized cpumask optimizations 2023-03-05 14:30:34 -08:00
asm-offsets.c
audit.c audit: add support for the openat2 syscall 2021-10-01 16:52:48 -04:00
brl_emu.c
crash.c
crash_dump.c vmcore: convert copy_oldmem_page() to take an iov_iter 2022-04-29 14:37:59 -07:00
cyclone.c
dma-mapping.c
efi.c efi: Drop minimum EFI version check at boot 2023-02-03 18:01:07 +01:00
efi_stub.S mm: update legacy flush_tlb_* to use vma 2021-06-29 10:53:52 -07:00
elfcore.c elfcore: Add a cprm parameter to elf_core_extra_{phdrs,data_size} 2023-01-05 15:12:12 +00:00
entry.S
entry.h
err_inject.c
esi.c
esi_stub.S
fsys.S
fsyscall_gtod_data.h
ftrace.c ftrace: Cleanup ftrace_dyn_arch_init() 2021-10-08 19:41:39 -04:00
gate-data.S
gate.S
gate.lds.S
head.S
iosapic.c genirq: Add and use an irq_data_update_affinity helper 2022-07-07 09:38:04 +01:00
irq.c genirq: Add and use an irq_data_update_affinity helper 2022-07-07 09:38:04 +01:00
irq.h
irq_ia64.c
irq_lsapic.c
ivt.S
kprobes.c ia64: replace comments with C99 initializers 2022-04-28 23:17:25 -07:00
machine_kexec.c
mca.c ia64: mca: use strscpy() is more robust and safer 2022-10-11 18:51:10 -07:00
mca_asm.S
mca_drv.c exit: Add and use make_task_dead. 2021-12-13 12:04:45 -06:00
mca_drv.h
mca_drv_asm.S
minstate.h
module.c ia64: Rename 'ip' to 'addr' in 'struct fdesc' 2022-02-16 23:25:11 +11:00
msi_ia64.c genirq: Add and use an irq_data_update_affinity helper 2022-07-07 09:38:04 +01:00
numa.c
pal.S
palinfo.c ia64: fix typos in comments 2022-04-28 23:17:25 -07:00
patch.c
pci-dma.c
perfmon_itanium.h
process.c arch/idle: Change arch_cpu_idle() behavior: always exit with IRQs disabled 2023-01-13 11:48:15 +01:00
ptrace.c ia64: ptrace: user_regset_copyin_ignore() always returns 0 2022-11-15 14:30:40 -08:00
relocate_kernel.S
sal.c
salinfo.c proc: remove PDE_DATA() completely 2022-01-22 08:33:37 +02:00
setup.c ia64: move from strlcpy with unused retval to strscpy 2022-09-11 21:55:09 -07:00
sigframe.h
signal.c resume_user_mode: Move to resume_user_mode.h 2022-03-10 16:51:50 -06:00
smp.c profile: setup_profiling_timer() is moslty not implemented 2022-07-29 18:12:36 -07:00
smpboot.c ia64: cleanup remove_siblinginfo() 2022-06-03 06:52:58 -07:00
stacktrace.c
sys_ia64.c ia64: fix build error due to switch case label appearing next to declaration 2023-01-31 16:44:08 -08:00
time.c sched/cputime: Fix IA64 build error of missing arch_vtime_task_switch() prototype 2023-01-11 10:31:57 +01:00
topology.c drivers/base/node: consolidate node device subsystem initialization in node_dev_init() 2022-03-22 15:57:10 -07:00
traps.c ia64: fix typos in comments 2022-04-28 23:17:25 -07:00
unaligned.c ia64: remove CONFIG_SET_FS support 2022-02-25 09:36:06 +01:00
uncached.c mm: use for_each_online_node and node_online instead of open coding 2022-04-29 14:36:58 -07:00
unwind.c
unwind_decoder.c
unwind_i.h
vmlinux.lds.S objtool/idle: Validate __cpuidle code as noinstr 2023-01-13 11:48:15 +01:00