glibc

History

Adhemerval Zanella 8a0152b61b math: New generic fmaf implementation The current implementation relies on setting the rounding mode for different calculations (FE_TOWARDZERO) to obtain correctly rounded results. For most CPUs, this adds significant performance overhead because it requires executing a typically slow instruction (to get/set the floating-point status), necessitates flushing the pipeline, and breaks some compiler assumptions/optimizations. The original implementation adds tests to handle underflow in corner cases, whereas this implementation uses a different strategy that checks both the mantissa and the result to determine whether the result is not subject to double rounding. I tested this implementation on various targets (x86_64, i686, arm, aarch64, powerpc), including some by manually disabling the compiler instructions. Performance-wise, it shows large improvements: reciprocal-throughput master patched improvement x86_64 [1] 58.09 7.96 7.33x i686 [1] 279.41 16.97 16.46x aarch64 [2] 26.09 4.10 6.35x armhf [2] 30.25 4.20 7.18x powerpc [3] 9.46 1.46 6.45x latency master patched improvement x86_64 64.50 14.25 4.53x i686 304.39 61.04 4.99x aarch64 27.71 5.74 4.82x armhf 33.46 7.34 4.55x powerpc 10.96 2.65 4.13x Checked on x86_64-linux-gnu and i686-linux-gnu with —disable-multi-arch, and on arm-linux-gnueabihf. [1] gcc 15.2.1, Zen3 [2] gcc 15.2.1, Neoverse N1 [3] gcc 15.2.1, POWER10 Signed-off-by: Szabolcs Nagy <nsz@gcc.gnu.org> Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Co-authored-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>		2025-11-27 14:52:25 -03:00
..
aarch64	aarch64: make GCS configure checks aarch64-only	2025-11-26 13:50:15 +00:00
alpha	Add gmp-arch and udiv_qrnnd	2025-11-25 14:52:15 -03:00
arc	…
arm	math: New generic fma implementation	2025-11-26 10:10:06 -03:00
csky	…
generic	stdlib: Remove longlong.h	2025-11-26 10:10:06 -03:00
gnu	…
hppa	Add gmp-arch and udiv_qrnnd	2025-11-25 14:52:15 -03:00
htl	htl: move c11 symbols into libc.	2025-11-22 03:28:48 +01:00
hurd	…
i386	math: New generic fmaf implementation	2025-11-27 14:52:25 -03:00
ieee754	math: New generic fmaf implementation	2025-11-27 14:52:25 -03:00
loongarch	Add add_ssaaaa and sub_ssaaaa to gmp-arch.h	2025-11-26 10:10:02 -03:00
m68k	…
mach	htl: move c11 symbols into libc.	2025-11-22 03:28:48 +01:00
microblaze	…
mips	math: Don't redirect inlined builtin math functions	2025-11-17 11:17:07 -03:00
nptl	htl: move c11 symbols into libc.	2025-11-22 03:28:48 +01:00
or1k	Remove TLS_TCB_ALIGN and TLS_INIT_TCB_ALIGN	2025-11-15 22:01:07 +01:00
posix	…
powerpc	Add add_ssaaaa and sub_ssaaaa to gmp-arch.h	2025-11-26 10:10:02 -03:00
pthread	htl: move c11 symbols into libc.	2025-11-22 03:28:48 +01:00
riscv	Add add_ssaaaa and sub_ssaaaa to gmp-arch.h	2025-11-26 10:10:02 -03:00
s390	Remove support for lock elision.	2025-11-18 14:21:13 +01:00
sh	…
sparc	Revert __HAVE_64B_ATOMICS configure check	2025-11-14 14:05:20 -03:00
unix	Linux: Ignore PIDFD_GET_INFO in tst-pidfd-consts	2025-11-27 14:34:58 +01:00
wordsize-32	stdlib: Remove longlong.h	2025-11-26 10:10:06 -03:00
wordsize-64	…
x86	stdlib: Remove longlong.h	2025-11-26 10:10:06 -03:00
x86_64	x86: Fix strstr ifunc on clang	2025-11-17 11:17:07 -03:00