glibc

Commit Graph

Author	SHA1	Message	Date
Adhemerval Zanella	0e1a1178ee	math: Remove the SVID error handling from remainder The optimized i386 version is faster than the generic one, and gcc implements it through the builtin. This optimization enables us to migrate the implementation to a C version. The performance on a Zen3 chip is similar to the SVID one. The m68k provided an optimized version through __m81_u(remainderf) (mathimpl.h), and gcc does not implement it through a builtin (different than i386). Performance improves a bit on x86_64 (Zen3, gcc 15.2.1): reciprocal-throughput input master NO-SVID improvement x86_64 subnormals 18.8522 16.2506 13.80% x86_64 normal 421.8260 403.9270 4.24% x86_64 close-exponent 21.0579 18.7642 10.89% i686 subnormals 21.3443 21.4229 -0.37% i686 normal 525.8380 538.807 -2.47% i686 close-exponent 21.6589 21.7983 -0.64% Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Adhemerval Zanella	c4c6c79d70	math: Remove the SVID error handling from remainderf The optimized i386 version is faster than the generic one, and gcc implements it through the builtin. This optimization enables us to migrate the implementation to a C version. The performance on a Zen3 chip is similar to the SVID one. The m68k provided an optimized version through __m81_u(remainderf) (mathimpl.h), and gcc does not implement it through a builtin (different than i386). Performance improves a bit on x86_64 (Zen3, gcc 15.2.1): reciprocal-throughput input master NO-SVID improvement x86_64 subnormals 17.5349 15.6125 10.96% x86_64 normal 53.8134 52.5754 2.30% x86_64 close-exponent 20.0211 18.6656 6.77% i686 subnormals 21.8105 20.1856 7.45% i686 normal 73.1945 71.2199 2.70% i686 close-exponent 22.2141 20.331 8.48% Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Wilco Dijkstra	324c088a18	nptl: Remove ATOMIC_EXCHANGE_USES_CAS usage The only usage was for pthread_spin_lock, introduced by `12d2dd7060`, as a way to optimize the code for certain architectures. Now that atomic builtins are used by default, let the compiler use the best code sequence for the atomic exchange. Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Wilco Dijkstra	53807741fb	Define __HAVE_64B_ATOMICS from compiler support Now that atomic builtins are used by default, we can rely on the compiler to define when to use 64-bit atomic operations. It allows the use of 64-bit atomic operations on some 32-bit ABIs where they were not previously enabled due to missing pre-processor handling: hppa, mips64n32, s390, and sparcv9. Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Uros Bizjak <ubizjak@gmail.com> Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Adhemerval Zanella	95a0ad1ea1	atomic: Consolidate atomic_write_barrier implementation All ABIs, except alpha and sparc, define it to atomic_full_barrier/__sync_synchronize, which can be mapped to __atomic_thread_fence (__ATOMIC_RELEASE). For alpha, it uses a 'wmb' which does not map to any of C11 barriers. For sparc it uses a stronger 'member #LoadStore \| #StoreStore', where the release barrier maps to just 'membar #StoreLoad'. The patch keeps the sparc definition. For PowerPC, it allows the use of lwsync for additional chips (since _ARCH_PWR4 does not cover all chips that support it). Tested on aarch64-linux-gnu. Co-authored-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Adhemerval Zanella	304b22d7f9	atomic: Consolidate atomic_read_barrier implementation All ABIs, except alpha, powerpc, and x86_64, define it to atomic_full_barrier/__sync_synchronize, which can be mapped to __atomic_thread_fence (__ATOMIC_SEQ_CST) in most cases, with the exception of aarch64 (where the acquire fence is generated as 'dmb ishld' instead of 'dmb ish'). For s390x, it defaults to a memory barrier where __sync_synchronize emits a 'bcr 15,0' (which the manual describes as pipeline synchronization). For PowerPC, it allows the use of lwsync for additional chips (since _ARCH_PWR4 does not cover all chips that support it). Tested on aarch64-linux-gnu, where the acquire produces a different instruction that the current code. Co-authored-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Adhemerval Zanella	70ee250fb8	atomic: Consolidate atomic_full_barrier implementation All ABIs save for sparcv9 and s390 defines it to __sync_synchronize, which can be mapped to __atomic_thread_fence (__ATOMIC_SEQ_CST). For Sparc, it uses a stricter #StoreStore\|#LoadStore\|#StoreLoad\|#LoadLoad instead of the #StoreLoad generated by __sync_synchronize. For s390x, it defaults to a memory barrier where __sync_synchronize emits a 'bcr 15,0' (which the manual describes as pipeline synchronization). The barrier is used only in one place (pthread_mutex_setprioceiling), and using a stricter barrier for s390 is ok performance-wise. Co-authored-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Adhemerval Zanella	c797303237	microblaze: Remove USE_ATOMIC_COMPILER_BUILTINS definition Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Adhemerval Zanella	f6dedc65fd	alpha: Remove USE_ATOMIC_COMPILER_BUILTINS definition Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Adhemerval Zanella	7e5fe1974c	sh: Move atomic-machine to generic sysdep There is no Linux specific definitions. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Adhemerval Zanella	1f5d8663ea	riscv: Consolidade atomic-machine.h and remove ununsed atomic macros The resulting definitions are not Linux specific, so move the header to generic sysdep folder. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Adhemerval Zanella	d76e20791b	powerpc: Consolidate atomic-machine.h The __HAVE_64B_ATOMICS can be define based on __WORDSIZE, and the __ARCH_ACQ_INSTR, MUTEX_HINT_*, and barriers definition are defined by the target cpu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Adhemerval Zanella	9201eabed8	loongarch: Consolidate atomic-machine.h and remove ununsed atomic macros These are already provided by the generic include/atomic.h and the resulting macros are not Linux specific. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Adhemerval Zanella	3642bf4800	m68k: Consolidade atomic-machine.h and Remove ununsed atomic macros Both m68k and m68k-colfire do not support 64 bit atomis. The atomic_barrier syscall on m68k is a no-op, so it can use the compiler builtin. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Adhemerval Zanella	6322a325fc	hppa: Move atomic-machine to generic sysdep There is no Linux specific definitions. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Adhemerval Zanella	5a7a9a57c2	arm: Consolidate atomic-machine.h and Remove ununsed atomic macros The libgcc provides the required support to calling the kernel auxiliary routines for !__GCC_HAVE_SYNC_COMPARE_AND_SWAP_4. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Adhemerval Zanella	fd27081d8e	x86: Remove ununsed atomic macros These are already provided by the generic include/atomic.h. Reviewed-by: Uros Bizjak <ubizjak@gmail.com> Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Adhemerval Zanella	ebfd1b9e4d	sparc: Remove ununsed atomic macros These are already provided by the generic include/atomic.h. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Adhemerval Zanella	08c345104f	s390: Remove ununsed atomic macros These are already provided by the generic include/atomic.h. Also remove outdated comment from unsupported gcc versions. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Adhemerval Zanella	c0fc170c78	or1k: Remove ununsed atomic macros These are already provided by the generic include/atomic.h. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Adhemerval Zanella	c787f0ec3e	mips: Remove ununsed atomic macros These are already provided by the generic include/atomic.h. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Adhemerval Zanella	ba69286641	csky: Remove ununsed atomic macros These are already provided by the generic include/atomic.h. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Adhemerval Zanella	eeeb882c97	arc: Remove ununsed atomic macros These are already provided by the generic include/atomic.h. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Adhemerval Zanella	b299332fb4	aarch64: Remove ununsed atomic macros These are already provided by the generic include/atomic.h. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
H.J. Lu	b93632ede7	Build programs in $(others-noinstall) like tests if libgcc_s is available Build programs in $(others-noinstall) like tests only if libgcc_s is available. Otherwise, "build-many-glibcs.py compilers" will fail to build the initial glibc with the initial limited gcc due to the missing libgcc_s. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>	2025-11-04 14:48:18 +08:00
Joseph Myers	fa7f43a982	Support assert as a variadic macro for C23 C23 makes assert into a variadic macro to handle cases of an argument that would be interpreted as a single function argument but more than one macro argument (in particular, compound literals with an unparenthesized comma in an initializer list); this change was made by N2829. Note that this only applies to assert, not to other macros specified in the C standard with particular numbers of arguments. Implement this support in glibc. This change is only for C; C++ would need a separate change to its separate assert implementations. It's also applied only in C23 mode. It depends on support for (C99) variadic macros, and also (in order to detect calls where more than one expression is passed, via an unevaluated function call) a C99 boolean type. These requirements are encapsulated in the definition of __ASSERT_VARIADIC. Tests with -std=c99 and -std=gnu99 (using implementations continue to work. I don't think we have a way in the glibc testsuite to validate that passing more than one expression as an argument does produce the desired error. Tested for x86_64.	2025-11-03 19:56:42 +00:00
Frédéric Bérat	d4d472366b	docs: Add dynamic linker environment variable docs The Dynamic Linker chapter now includes a new section detailing environment variables that influence its behavior. This new section documents the `LD_DEBUG` environment variable, explaining how to enable debugging output and listing its various keywords like `libs`, `reloc`, `files`, `symbols`, `bindings`, `versions`, `scopes`, `tls`, `all`, `statistics`, `unused`, and `help`. It also documents `LD_DEBUG_OUTPUT`, which controls where the debug output is written, allowing redirection to a file with the process ID appended. This provides users with essential information for controlling and debugging the dynamic linker. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-11-03 10:47:56 +01:00
Frédéric Bérat	332f8e62af	tls: Add debug logging for TLS and TCB management Introduce the `DL_DEBUG_TLS` debug mask to enable detailed logging for Thread-Local Storage (TLS) and Thread Control Block (TCB) management. This change integrates a new `tls` option into the `LD_DEBUG` environment variable, allowing developers to trace: - TCB allocation, deallocation, and reuse events in `dl-tls.c`, `nptl/allocatestack.c`, and `nptl/nptl-stack.c`. - Thread startup events, including the TID and TCB address, in `nptl/pthread_create.c`. A new test, `tst-dl-debug-tid`, has been added to validate the functionality of this new debug logging, ensuring that relevant messages are correctly generated for both main and worker threads. This enhances the debugging capabilities for diagnosing issues related to TLS allocation and thread lifecycle within the dynamic linker. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-11-03 10:47:28 +01:00
Pincheng Wang	720e891637	riscv: Add Zbkb optimized repeat_bytes helper Introduce a RISC-V specific string-misc.h to provide an optimized repeat_bytes implementation when the Zbkb extension is available. The new version uses packh/packw/pack instruction count and avoiding high latency instructions. This helper is used by several mem and string functions, and falls back to the generic implementation when Zbkb is not present. Signed-off-by: Pincheng Wang <pincheng.plct@isrc.iscas.ac.cn> Reviewed-by: Peter Bergner <bergner@tenstorrent.com>	2025-10-31 16:23:57 -05:00
Wilco Dijkstra	1136c036a3	math: Remove xfail from pow test [BZ #33563 ] Remove xfail from pow testcase since pow and powf have been fixed. Also check float128 maximum value. See BZ #33563. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-10-31 19:13:53 +00:00
Wilco Dijkstra	0212fc23b0	math: Fix pow special case [BZ #33563 ] Fix pow (DBL_MAX, 1.0) to return DBL_MAX when rouding upwards without FMA. This fixes BZ #33563. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-10-31 19:13:41 +00:00
Wilco Dijkstra	8917bd3eb3	math: Fix powf special case [BZ #33563 ] Fix powf (0x1.fffffep+127, 1.0f) to return 0x1.fffffep+127 when rouding upwards. Cleanup the special case code - performance improves by ~1.2%. This fixes BZ #33563. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-10-31 19:12:47 +00:00
Yury Khrustalev	7d99ff550f	debug: mark __libc_message_wrapper as always inline When building with -Og to enable debugging, there is currently a compiler error because if __libc_message_wrapper() is not inline, the __va_arg_pack_len macro cannot be used. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-10-31 10:01:33 +00:00
Yury Khrustalev	2f77aec043	aarch64: fix cfi directives around __libc_arm_za_disable Incorrect CFI directive corrupted call stack information and prevented debuggers from correctly displaying call stack information. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-10-31 09:48:47 +00:00
Eric Wong	3ac0112b5d	cdefs: allow __attribute__ on tcc According to the tcc (tiny C compiler) Changelog, tcc supports __attribute__ since 0.9.3. Looking at history of tcc at <https://repo.or.cz/tinycc.git>, __attribute__ support was added in commit 14658993425878be300aae2e879560698e0c6c4c on 2002-01-03, which also looks like the release of 0.9.3. While I'm unable to find release tags for tcc before 0.9.18 (2003-04-14), the next release (0.9.28) will include __attribute__((cleanup(func)) which I rely on. Reviewed-by: Collin Funk <collin.funk1@gmail.com>	2025-10-30 20:03:00 -07:00
Collin Funk	3fe3f62833	Cleanup some recently added whitespace. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-10-30 18:56:58 -07:00
Yao Zihong	09a94c86ca	riscv: memcpy_noalignment: Reorder to store via a3, then bump a3 Rewrite the copy micro-step from: REG_L a4, 0(a5) addi a3, a3, SZREG addi a5, a5, SZREG REG_S a4, -SZREG(a3) to: REG_L a4, 0(a5) addi a5, a5, SZREG REG_S a4, 0(a3) addi a3, a3, SZREG Semantics are unchanged: both read (a5_old), write (a3_old), and then increment a3/a5 by SZREG. memcpy assumes non-overlapping regions, so the reordering preserves correctness. No functional change. Signed-off-by: Yao Zihong <zihong.plct@isrc.iscas.ac.cn> Reviewed-by: Peter Bergner <bergner@tenstorrent.com>	2025-10-30 17:49:21 -05:00
Yao Zihong	0698fd462a	riscv: memcpy_noalignment: Fold SZREG/BLOCK_SIZE alignment to single andi Simplify the alignment steps for SZREG and BLOCK_SIZE multiples. The previous three-instruction sequences addi a7, a2, -SZREG andi a7, a7, -SZREG addi a7, a7, SZREG and addi a7, a2, -BLOCK_SIZE andi a7, a7, -BLOCK_SIZE addi a7, a7, BLOCK_SIZE are equivalent to a single andi a7, a2, -SZREG andi a7, a2, -BLOCK_SIZE because SZREG and BLOCK_SIZE are powers of two in this context, making the surrounding addi steps cancel out. Folding to one instruction reduces code size with identical semantics. No functional change. sysdeps/riscv/multiarch/memcpy_noalignment.S: Remove redundant addi around alignment; keep a single andi for SZREG/BLOCK_SIZE rounding. Signed-off-by: Yao Zihong <zihong.plct@isrc.iscas.ac.cn> Reviewed-by: Peter Bergner <bergner@tenstorrent.com>	2025-10-30 17:47:24 -05:00
Yao Zihong	444d81284e	riscv: memcpy_noalignment: Make register allocation Zca-friendly Tidy the temporary register allocation to favor registers eligible for compressed encodings when Zca/Zcb are enabled. This keeps the ABI and clobber set unchanged and does not alter control flow or memory access behavior. No functional change. sysdeps/riscv/multiarch/memcpy_noalignment.S: Reassign temps to improve compressed encoding opportunities. Signed-off-by: Yao Zihong <zihong.plct@isrc.iscas.ac.cn> Reviewed-by: Peter Bergner <bergner@tenstorrent.com>	2025-10-30 17:44:58 -05:00
Adhemerval Zanella	ee946212fe	math: Remove the SVID error handling wrapper from yn/jn Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:35 -03:00
Adhemerval Zanella	8d4815e6d7	math: Remove the SVID error handling wrapper from y1/j1 Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:33 -03:00
Adhemerval Zanella	b050cb53b0	math: Remove the SVID error handling wrapper from y0/j0 Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:31 -03:00
Adhemerval Zanella	03eeeba705	math: Remove the SVID error handling from coshf It improves latency for about 3-10% and throughput for about 5-15%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:28 -03:00
Adhemerval Zanella	555c39c0fc	math: Remove the SVID error handling from atanhf It improves latency for about 1-10% and throughput for about 5-10%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:26 -03:00
Adhemerval Zanella	8facb464b4	math: Remove the SVID error handling from acoshf It improves latency for about 3-7% and throughput for about 5-10%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:24 -03:00
Adhemerval Zanella	f92aba68bc	math: Remove the SVID error handling from asinf It improves latency for about 2% and throughput for about 5%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:22 -03:00
Adhemerval Zanella	9f8dea5b5d	math: Remove the SVID error handling from acosf It improves latency for about 2-10% and throughput for about 5-10%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:20 -03:00
Adhemerval Zanella	0b484d7b77	math: Remove the SVID error handling from log10f It improves latency for about 3-10% and throughput for about 5-10%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:17 -03:00
Adhemerval Zanella	6deadd4eb6	m68k: Remove SVID error handling on fmod The m68k provided an optimized version through __m81_u(fmod) (mathimpl.h), and gcc does not implement it through a builtin (different than i386). Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:15 -03:00
Adhemerval Zanella	b19904cfb2	m68k: Avoid include e_fmod.c on fmod/remainder implementation And open-code each implementation. It simplifies SVID error handling removal. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:12 -03:00

1 2 3 4 5 ...

43039 Commits All Branches Search

43039 Commits

All Branches