Commit Graph

43039 Commits

Author SHA1 Message Date
Adhemerval Zanella 0e1a1178ee math: Remove the SVID error handling from remainder
The optimized i386 version is faster than the generic one, and
gcc implements it through the builtin. This optimization enables
us to migrate the implementation to a C version.  The performance
on a Zen3 chip is similar to the SVID one.

The m68k provided an optimized version through __m81_u(remainderf)
(mathimpl.h), and gcc does not implement it through a builtin
(different than i386).

Performance improves a bit on x86_64 (Zen3, gcc 15.2.1):

reciprocal-throughput           input    master   NO-SVID  improvement
x86_64                     subnormals   18.8522   16.2506       13.80%
x86_64                         normal  421.8260  403.9270        4.24%
x86_64                 close-exponent   21.0579   18.7642       10.89%
i686                       subnormals   21.3443   21.4229       -0.37%
i686                           normal  525.8380   538.807       -2.47%
i686                   close-exponent   21.6589   21.7983       -0.64%

Tested on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-04 04:14:01 -03:00
Adhemerval Zanella c4c6c79d70 math: Remove the SVID error handling from remainderf
The optimized i386 version is faster than the generic one, and gcc
implements it through the builtin.  This optimization enables us to
migrate the implementation to a C version.  The performance on a Zen3
chip is similar to the SVID one.

The m68k provided an optimized version through __m81_u(remainderf)
(mathimpl.h), and gcc does not implement it through a builtin (different
than i386).

Performance improves a bit on x86_64 (Zen3, gcc 15.2.1):

reciprocal-throughput          input   master  NO-SVID  improvement
x86_64                    subnormals  17.5349  15.6125       10.96%
x86_64                        normal  53.8134  52.5754        2.30%
x86_64                close-exponent  20.0211  18.6656        6.77%
i686                      subnormals  21.8105  20.1856        7.45%
i686                          normal  73.1945  71.2199        2.70%
i686                  close-exponent  22.2141   20.331        8.48%

Tested on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-04 04:14:01 -03:00
Wilco Dijkstra 324c088a18 nptl: Remove ATOMIC_EXCHANGE_USES_CAS usage
The only usage was for pthread_spin_lock, introduced by 12d2dd7060,
as a way to optimize the code for certain architectures. Now that atomic
builtins are used by default, let the compiler use the best code sequence
for the atomic exchange.

Co-authored-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-04 04:14:01 -03:00
Wilco Dijkstra 53807741fb Define __HAVE_64B_ATOMICS from compiler support
Now that atomic builtins are used by default, we can rely on the
compiler to define when to use 64-bit atomic operations.

It allows the use of 64-bit atomic operations on some 32-bit ABIs where
they were not previously enabled due to missing pre-processor handling:
hppa, mips64n32, s390, and sparcv9.

Co-authored-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Reviewed-by: Uros Bizjak <ubizjak@gmail.com>
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-04 04:14:01 -03:00
Adhemerval Zanella 95a0ad1ea1 atomic: Consolidate atomic_write_barrier implementation
All ABIs, except alpha and sparc, define it to
atomic_full_barrier/__sync_synchronize, which can be mapped to
__atomic_thread_fence (__ATOMIC_RELEASE).

For alpha, it uses a 'wmb' which does not map to any of C11
barriers.

For sparc it uses a stronger 'member #LoadStore | #StoreStore',
where the release barrier maps to just 'membar #StoreLoad'.  The
patch keeps the sparc definition.

For PowerPC, it allows the use of lwsync for additional chips
(since _ARCH_PWR4 does not cover all chips that support it).

Tested on aarch64-linux-gnu.

Co-authored-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-04 04:14:01 -03:00
Adhemerval Zanella 304b22d7f9 atomic: Consolidate atomic_read_barrier implementation
All ABIs, except alpha, powerpc, and x86_64, define it to
atomic_full_barrier/__sync_synchronize, which can be mapped to
__atomic_thread_fence (__ATOMIC_SEQ_CST) in most cases, with the
exception of aarch64 (where the acquire fence is generated as
'dmb ishld' instead of 'dmb ish').

For s390x, it defaults to a memory barrier where __sync_synchronize
emits a 'bcr 15,0' (which the manual describes as pipeline
synchronization).

For PowerPC, it allows the use of lwsync for additional chips
(since _ARCH_PWR4 does not cover all chips that support it).

Tested on aarch64-linux-gnu, where the acquire produces a different
instruction that the current code.

Co-authored-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-04 04:14:01 -03:00
Adhemerval Zanella 70ee250fb8 atomic: Consolidate atomic_full_barrier implementation
All ABIs save for sparcv9 and s390 defines it to __sync_synchronize,
which can be mapped to __atomic_thread_fence (__ATOMIC_SEQ_CST).

For Sparc, it uses a stricter #StoreStore|#LoadStore|#StoreLoad|#LoadLoad
instead of the #StoreLoad generated by __sync_synchronize.

For s390x, it defaults to a memory barrier where __sync_synchronize
emits a 'bcr 15,0' (which the manual describes as pipeline synchronization).

The barrier is used only in one place (pthread_mutex_setprioceiling),
and using a stricter barrier for s390 is ok performance-wise.

Co-authored-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-04 04:14:01 -03:00
Adhemerval Zanella c797303237 microblaze: Remove USE_ATOMIC_COMPILER_BUILTINS definition
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-04 04:14:01 -03:00
Adhemerval Zanella f6dedc65fd alpha: Remove USE_ATOMIC_COMPILER_BUILTINS definition
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-04 04:14:01 -03:00
Adhemerval Zanella 7e5fe1974c sh: Move atomic-machine to generic sysdep
There is no Linux specific definitions.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-04 04:14:01 -03:00
Adhemerval Zanella 1f5d8663ea riscv: Consolidade atomic-machine.h and remove ununsed atomic macros
The resulting definitions are not Linux specific, so move the header
to generic sysdep folder.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-04 04:14:01 -03:00
Adhemerval Zanella d76e20791b powerpc: Consolidate atomic-machine.h
The __HAVE_64B_ATOMICS can be define based on __WORDSIZE, and
the __ARCH_ACQ_INSTR, MUTEX_HINT_*, and barriers definition are
defined by the target cpu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-04 04:14:01 -03:00
Adhemerval Zanella 9201eabed8 loongarch: Consolidate atomic-machine.h and remove ununsed atomic macros
These are already provided by the generic include/atomic.h and
the resulting macros are not Linux specific.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-04 04:14:01 -03:00
Adhemerval Zanella 3642bf4800 m68k: Consolidade atomic-machine.h and Remove ununsed atomic macros
Both m68k and m68k-colfire do not support 64 bit atomis.  The
atomic_barrier syscall on m68k is a no-op, so it can use the compiler
builtin.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-04 04:14:01 -03:00
Adhemerval Zanella 6322a325fc hppa: Move atomic-machine to generic sysdep
There is no Linux specific definitions.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-04 04:14:01 -03:00
Adhemerval Zanella 5a7a9a57c2 arm: Consolidate atomic-machine.h and Remove ununsed atomic macros
The libgcc provides the required support to calling the kernel
auxiliary routines for !__GCC_HAVE_SYNC_COMPARE_AND_SWAP_4.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-04 04:14:01 -03:00
Adhemerval Zanella fd27081d8e x86: Remove ununsed atomic macros
These are already provided by the generic include/atomic.h.

Reviewed-by: Uros Bizjak <ubizjak@gmail.com>
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-04 04:14:01 -03:00
Adhemerval Zanella ebfd1b9e4d sparc: Remove ununsed atomic macros
These are already provided by the generic include/atomic.h.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-04 04:14:01 -03:00
Adhemerval Zanella 08c345104f s390: Remove ununsed atomic macros
These are already provided by the generic include/atomic.h.  Also
remove outdated comment from unsupported gcc versions.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-04 04:14:01 -03:00
Adhemerval Zanella c0fc170c78 or1k: Remove ununsed atomic macros
These are already provided by the generic include/atomic.h.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-04 04:14:01 -03:00
Adhemerval Zanella c787f0ec3e mips: Remove ununsed atomic macros
These are already provided by the generic include/atomic.h.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-04 04:14:01 -03:00
Adhemerval Zanella ba69286641 csky: Remove ununsed atomic macros
These are already provided by the generic include/atomic.h.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-04 04:14:01 -03:00
Adhemerval Zanella eeeb882c97 arc: Remove ununsed atomic macros
These are already provided by the generic include/atomic.h.
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-04 04:14:01 -03:00
Adhemerval Zanella b299332fb4 aarch64: Remove ununsed atomic macros
These are already provided by the generic include/atomic.h.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-04 04:14:01 -03:00
H.J. Lu b93632ede7 Build programs in $(others-noinstall) like tests if libgcc_s is available
Build programs in $(others-noinstall) like tests only if libgcc_s is
available.  Otherwise, "build-many-glibcs.py compilers" will fail to
build the initial glibc with the initial limited gcc due to the missing
libgcc_s.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-11-04 14:48:18 +08:00
Joseph Myers fa7f43a982 Support assert as a variadic macro for C23
C23 makes assert into a variadic macro to handle cases of an argument
that would be interpreted as a single function argument but more than
one macro argument (in particular, compound literals with an
unparenthesized comma in an initializer list); this change was made by
N2829.  Note that this only applies to assert, not to other macros
specified in the C standard with particular numbers of arguments.

Implement this support in glibc.  This change is only for C; C++ would
need a separate change to its separate assert implementations.  It's
also applied only in C23 mode.  It depends on support for (C99)
variadic macros, and also (in order to detect calls where more than
one expression is passed, via an unevaluated function call) a C99
boolean type.  These requirements are encapsulated in the definition
of __ASSERT_VARIADIC.  Tests with -std=c99 and -std=gnu99 (using
implementations continue to work.

I don't think we have a way in the glibc testsuite to validate that
passing more than one expression as an argument does produce the
desired error.

Tested for x86_64.
2025-11-03 19:56:42 +00:00
Frédéric Bérat d4d472366b docs: Add dynamic linker environment variable docs
The Dynamic Linker chapter now includes a new section detailing
environment variables that influence its behavior.

This new section documents the `LD_DEBUG` environment variable,
explaining how to enable debugging output and listing its various
keywords like `libs`, `reloc`, `files`, `symbols`, `bindings`,
`versions`, `scopes`, `tls`, `all`, `statistics`, `unused`, and `help`.

It also documents `LD_DEBUG_OUTPUT`, which controls where the debug
output is written, allowing redirection to a file with the process ID
appended.

This provides users with essential information for controlling and
debugging the dynamic linker.

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-11-03 10:47:56 +01:00
Frédéric Bérat 332f8e62af tls: Add debug logging for TLS and TCB management
Introduce the `DL_DEBUG_TLS` debug mask to enable detailed logging for
Thread-Local Storage (TLS) and Thread Control Block (TCB) management.

This change integrates a new `tls` option into the `LD_DEBUG`
environment variable, allowing developers to trace:
- TCB allocation, deallocation, and reuse events in `dl-tls.c`,
  `nptl/allocatestack.c`, and `nptl/nptl-stack.c`.
- Thread startup events, including the TID and TCB address, in
  `nptl/pthread_create.c`.

A new test, `tst-dl-debug-tid`, has been added to validate the
functionality of this new debug logging, ensuring that relevant messages
are correctly generated for both main and worker threads.

This enhances the debugging capabilities for diagnosing issues related
to TLS allocation and thread lifecycle within the dynamic linker.

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-11-03 10:47:28 +01:00
Pincheng Wang 720e891637 riscv: Add Zbkb optimized repeat_bytes helper
Introduce a RISC-V specific string-misc.h to provide an optimized
repeat_bytes implementation when the Zbkb extension is available.
The new version uses packh/packw/pack instruction count and avoiding
high latency instructions. This helper is used by several mem and string
functions, and falls back to the generic implementation when Zbkb is not
present.

Signed-off-by: Pincheng Wang <pincheng.plct@isrc.iscas.ac.cn>
Reviewed-by: Peter Bergner <bergner@tenstorrent.com>
2025-10-31 16:23:57 -05:00
Wilco Dijkstra 1136c036a3 math: Remove xfail from pow test [BZ #33563]
Remove xfail from pow testcase since pow and powf have been fixed.
Also check float128 maximum value.  See BZ #33563.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-10-31 19:13:53 +00:00
Wilco Dijkstra 0212fc23b0 math: Fix pow special case [BZ #33563]
Fix pow (DBL_MAX, 1.0) to return DBL_MAX when rouding upwards without FMA.
This fixes BZ #33563.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-10-31 19:13:41 +00:00
Wilco Dijkstra 8917bd3eb3 math: Fix powf special case [BZ #33563]
Fix powf (0x1.fffffep+127, 1.0f) to return 0x1.fffffep+127 when
rouding upwards.  Cleanup the special case code - performance
improves by ~1.2%.  This fixes BZ #33563.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-10-31 19:12:47 +00:00
Yury Khrustalev 7d99ff550f debug: mark __libc_message_wrapper as always inline
When building with -Og to enable debugging, there is currently a compiler error
because if __libc_message_wrapper() is not inline, the __va_arg_pack_len macro
cannot be used.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-10-31 10:01:33 +00:00
Yury Khrustalev 2f77aec043 aarch64: fix cfi directives around __libc_arm_za_disable
Incorrect CFI directive corrupted call stack information
and prevented debuggers from correctly displaying call
stack information.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-10-31 09:48:47 +00:00
Eric Wong 3ac0112b5d cdefs: allow __attribute__ on tcc
According to the tcc (tiny C compiler) Changelog, tcc supports
__attribute__ since 0.9.3.  Looking at history of tcc at
<https://repo.or.cz/tinycc.git>, __attribute__ support was added
in commit 14658993425878be300aae2e879560698e0c6c4c on 2002-01-03,
which also looks like the release of 0.9.3.  While I'm unable to
find release tags for tcc before 0.9.18 (2003-04-14), the next
release (0.9.28) will include __attribute__((cleanup(func)) which
I rely on.

Reviewed-by: Collin Funk <collin.funk1@gmail.com>
2025-10-30 20:03:00 -07:00
Collin Funk 3fe3f62833 Cleanup some recently added whitespace.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-10-30 18:56:58 -07:00
Yao Zihong 09a94c86ca riscv: memcpy_noalignment: Reorder to store via a3, then bump a3
Rewrite the copy micro-step from:

    REG_L  a4, 0(a5)
    addi   a3, a3, SZREG
    addi   a5, a5, SZREG
    REG_S  a4, -SZREG(a3)

to:

    REG_L  a4, 0(a5)
    addi   a5, a5, SZREG
    REG_S  a4, 0(a3)
    addi   a3, a3, SZREG

Semantics are unchanged: both read *(a5_old), write *(a3_old), and then
increment a3/a5 by SZREG. memcpy assumes non-overlapping regions, so the
reordering preserves correctness.

No functional change.

Signed-off-by: Yao Zihong <zihong.plct@isrc.iscas.ac.cn>
Reviewed-by: Peter Bergner <bergner@tenstorrent.com>
2025-10-30 17:49:21 -05:00
Yao Zihong 0698fd462a riscv: memcpy_noalignment: Fold SZREG/BLOCK_SIZE alignment to single andi
Simplify the alignment steps for SZREG and BLOCK_SIZE multiples. The previous
three-instruction sequences

    addi    a7, a2, -SZREG
    andi    a7, a7, -SZREG
    addi    a7, a7, SZREG

and

    addi    a7, a2, -BLOCK_SIZE
    andi    a7, a7, -BLOCK_SIZE
    addi    a7, a7, BLOCK_SIZE

are equivalent to a single

    andi    a7, a2, -SZREG
    andi    a7, a2, -BLOCK_SIZE

because SZREG and BLOCK_SIZE are powers of two in this context, making the
surrounding addi steps cancel out. Folding to one instruction reduces code
size with identical semantics.

No functional change.

sysdeps/riscv/multiarch/memcpy_noalignment.S: Remove redundant addi around
alignment; keep a single andi for SZREG/BLOCK_SIZE rounding.

Signed-off-by: Yao Zihong <zihong.plct@isrc.iscas.ac.cn>
Reviewed-by: Peter Bergner <bergner@tenstorrent.com>
2025-10-30 17:47:24 -05:00
Yao Zihong 444d81284e riscv: memcpy_noalignment: Make register allocation Zca-friendly
Tidy the temporary register allocation to favor registers eligible for
compressed encodings when Zca/Zcb are enabled. This keeps the ABI and
clobber set unchanged and does not alter control flow or memory access
behavior.

No functional change.

sysdeps/riscv/multiarch/memcpy_noalignment.S: Reassign temps to improve
compressed encoding opportunities.

Signed-off-by: Yao Zihong <zihong.plct@isrc.iscas.ac.cn>
Reviewed-by: Peter Bergner <bergner@tenstorrent.com>
2025-10-30 17:44:58 -05:00
Adhemerval Zanella ee946212fe math: Remove the SVID error handling wrapper from yn/jn
Tested on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-30 15:41:35 -03:00
Adhemerval Zanella 8d4815e6d7 math: Remove the SVID error handling wrapper from y1/j1
Tested on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-30 15:41:33 -03:00
Adhemerval Zanella b050cb53b0 math: Remove the SVID error handling wrapper from y0/j0
Tested on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-30 15:41:31 -03:00
Adhemerval Zanella 03eeeba705 math: Remove the SVID error handling from coshf
It improves latency for about 3-10% and throughput for about 5-15%.

Tested on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-30 15:41:28 -03:00
Adhemerval Zanella 555c39c0fc math: Remove the SVID error handling from atanhf
It improves latency for about 1-10% and throughput for about 5-10%.

Tested on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-30 15:41:26 -03:00
Adhemerval Zanella 8facb464b4 math: Remove the SVID error handling from acoshf
It improves latency for about 3-7% and throughput for about 5-10%.

Tested on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-30 15:41:24 -03:00
Adhemerval Zanella f92aba68bc math: Remove the SVID error handling from asinf
It improves latency for about 2% and throughput for about 5%.

Tested on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-30 15:41:22 -03:00
Adhemerval Zanella 9f8dea5b5d math: Remove the SVID error handling from acosf
It improves latency for about 2-10% and throughput for about 5-10%.

Tested on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-30 15:41:20 -03:00
Adhemerval Zanella 0b484d7b77 math: Remove the SVID error handling from log10f
It improves latency for about 3-10% and throughput for about 5-10%.

Tested on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-30 15:41:17 -03:00
Adhemerval Zanella 6deadd4eb6 m68k: Remove SVID error handling on fmod
The m68k provided an optimized version through __m81_u(fmod)
(mathimpl.h), and gcc does not implement it through a builtin
(different than i386).

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-30 15:41:15 -03:00
Adhemerval Zanella b19904cfb2 m68k: Avoid include e_fmod.c on fmod/remainder implementation
And open-code each implementation.  It simplifies SVID error handling
removal.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-30 15:41:12 -03:00