glibc

Commit Graph

Author	SHA1	Message	Date
Stefan Liebler	b9579342c6	Remove support for lock elision. The support for lock elision was already deprecated with glibc 2.42: commit `77438db8cf` "Mark support for lock elision as deprecated." See also discussions: https://sourceware.org/pipermail/libc-alpha/2025-July/168492.html This patch removes the architecture specific support for lock elision for x86, powerpc and s390 by removing the elision-conf.h, elision-conf.c, elision-lock.c, elision-timed.c, elision-unlock.c, elide.h, htm.h/hle.h files. Those generic files are also removed. The architecture specific structures are adjusted and the elision fields are marked as unused. See struct_mutex.h files. Furthermore in struct_rwlock.h, the leftover __rwelision was also removed. Those were originally removed with commit `0377a7fde6` "nptl: Remove rwlock elision definitions" and by chance reintroduced with commit `7df8af43ad` "nptl: Add struct_rwlock.h" The common code (e.g. the pthread_mutex-files) are changed back to the time before lock elision was introduced with the x86-support: - commit `1cdbe57948` "Add the low level infrastructure for pthreads lock elision with TSX" - commit `b023e4ca99` "Add new internal mutex type flags for elision." - commit `68cc29355f` "Add minimal test suite changes for elision enabled kernels" - commit `e8c659d74e` "Add elision to pthread_mutex_{try,timed,un}lock" - commit `49186d21ef` "Disable elision for any pthread_mutexattr_settype call" - commit `1717da59ae` "Add a configure option to enable lock elision and disable by default" Elision is removed also from the tunables, the initialization part, the pretty-printers and the manual. Some extra handling in the testsuite is removed as well as the full tst-mutex10 testcase, which tested a race while enabling lock elision. I've also searched the code for "elision", "elide", "transaction" and e.g. cleaned some comments. I've run the testsuite on x86_64 and s390x and run the build-many-glibcs.py script. Thanks to Sachin Monga, this patch is also tested on powerpc. A NEWS entry also mentions the removal. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-18 14:21:13 +01:00
Adhemerval Zanella	7fec8a5de6	Revert __HAVE_64B_ATOMICS configure check The `53807741fb` added a configure check for 64-bit atomic operations that were not previously enabled on some 32-bit ABIs. However, the NPTL semaphore code casts a sem_t to a new_sem and issues a 64-bit atomic operation for __HAVE_64B_ATOMICS. Since sem_t has 32-bit alignment on 32-bit architectures, this prevents the use of 64-bit atomics even if the ABI supports them. Assume 64-bit atomic support from __WORDSIZE, which maps to how glibc defines it before the broken change. Also rename __HAVE_64B_ATOMICS to USE_64B_ATOMICS to define better the flag meaning. Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-14 14:05:20 -03:00
Carlos O'Donell	5bdf3c9092	x86: Increase allowable TSX abort rate to 6%. In pre-commit CI on an E5-2698 v4 we sometimes see ~5% aborts. Set the trip point to 6%. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-11-14 08:18:36 -05:00
Xie jiamei	1707b23382	Set Prefer_No_AVX512 flag for hygon platform Benchmarks indicate evex can be more profitable on Hygon hardware than AVX512. So add Prefer_No_AVX512 to make it run with evex. Change-Id: Icc59492f71fde7a783a8bd315714ffd6f7ecaf29 Signed-off-by: Li jing <lijing@hygon.cn> Signed-off-by: Xie jiamei <xiejiamei@hygon.cn>	2025-11-11 10:47:26 +08:00
Adhemerval Zanella	427c25278d	x86: Adapt "%v" usage on clang to emit VEX enconding clang does not support the %v to select the AVX encoding, nor the '%d' asm contrain, and for AVX build it requires all 3 arguments. This patch add a new internal header, math-inline-asm.h, that adds functions to abstract the inline asm required differences between gcc and clang. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-11-10 08:58:06 -03:00
Adhemerval Zanella	d25db12c2a	x86: math: Use of __libgcc_cmp_return__ iff compiler supports it clang does not support '__attribute__ ((mode (__libgcc_cmp_return__)))', so use a more close related type instead fo the default 'int'.	2025-11-10 08:57:59 -03:00
Wilco Dijkstra	324c088a18	nptl: Remove ATOMIC_EXCHANGE_USES_CAS usage The only usage was for pthread_spin_lock, introduced by `12d2dd7060`, as a way to optimize the code for certain architectures. Now that atomic builtins are used by default, let the compiler use the best code sequence for the atomic exchange. Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Wilco Dijkstra	53807741fb	Define __HAVE_64B_ATOMICS from compiler support Now that atomic builtins are used by default, we can rely on the compiler to define when to use 64-bit atomic operations. It allows the use of 64-bit atomic operations on some 32-bit ABIs where they were not previously enabled due to missing pre-processor handling: hppa, mips64n32, s390, and sparcv9. Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Uros Bizjak <ubizjak@gmail.com> Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Adhemerval Zanella	95a0ad1ea1	atomic: Consolidate atomic_write_barrier implementation All ABIs, except alpha and sparc, define it to atomic_full_barrier/__sync_synchronize, which can be mapped to __atomic_thread_fence (__ATOMIC_RELEASE). For alpha, it uses a 'wmb' which does not map to any of C11 barriers. For sparc it uses a stronger 'member #LoadStore \| #StoreStore', where the release barrier maps to just 'membar #StoreLoad'. The patch keeps the sparc definition. For PowerPC, it allows the use of lwsync for additional chips (since _ARCH_PWR4 does not cover all chips that support it). Tested on aarch64-linux-gnu. Co-authored-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Adhemerval Zanella	304b22d7f9	atomic: Consolidate atomic_read_barrier implementation All ABIs, except alpha, powerpc, and x86_64, define it to atomic_full_barrier/__sync_synchronize, which can be mapped to __atomic_thread_fence (__ATOMIC_SEQ_CST) in most cases, with the exception of aarch64 (where the acquire fence is generated as 'dmb ishld' instead of 'dmb ish'). For s390x, it defaults to a memory barrier where __sync_synchronize emits a 'bcr 15,0' (which the manual describes as pipeline synchronization). For PowerPC, it allows the use of lwsync for additional chips (since _ARCH_PWR4 does not cover all chips that support it). Tested on aarch64-linux-gnu, where the acquire produces a different instruction that the current code. Co-authored-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Adhemerval Zanella	70ee250fb8	atomic: Consolidate atomic_full_barrier implementation All ABIs save for sparcv9 and s390 defines it to __sync_synchronize, which can be mapped to __atomic_thread_fence (__ATOMIC_SEQ_CST). For Sparc, it uses a stricter #StoreStore\|#LoadStore\|#StoreLoad\|#LoadLoad instead of the #StoreLoad generated by __sync_synchronize. For s390x, it defaults to a memory barrier where __sync_synchronize emits a 'bcr 15,0' (which the manual describes as pipeline synchronization). The barrier is used only in one place (pthread_mutex_setprioceiling), and using a stricter barrier for s390 is ok performance-wise. Co-authored-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Adhemerval Zanella	fd27081d8e	x86: Remove ununsed atomic macros These are already provided by the generic include/atomic.h. Reviewed-by: Uros Bizjak <ubizjak@gmail.com> Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Collin Funk	3fe3f62833	Cleanup some recently added whitespace. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-10-30 18:56:58 -07:00
Adhemerval Zanella	970364dac0	Annotate swtich fall-through The clang default to warning for missing fall-through and it does not support all comment-like annotation that gcc does. Use C23 [[fallthrough]] annotation instead. proper attribute instead. Reviewed-by: Collin Funk <collin.funk1@gmail.com>	2025-10-29 12:54:01 -03:00
Adhemerval Zanella	36b4c553e6	Replace count_leading_zeros with stdc_leading_zeros Checked on x86_64-linux-gnu and aarch64-linux-gnu. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Collin Funk <collin.funk1@gmail.com>	2025-10-29 12:53:55 -03:00
litenglong	00d406e77b	x86: Disable AVX Fast Unaligned Load on Hygon 1/2/3 - Performance testing revealed significant memcpy performance degradation when bit_arch_AVX_Fast_Unaligned_Load is enabled on Hygon 3. - Hygon confirmed AVX performance issues in certain memory functions. - Glibc benchmarks show SSE outperforms AVX for memcpy/memmove/memset/strcmp/strcpy/strlen and so on. - Hardware differences primarily in floating-point operations don't justify AVX usage for memory operations. Reviewed-by: gaoxiang <gaoxiang@kylinos.cn> Signed-off-by: litenglong <litenglong@kylinos.cn> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-10-27 05:16:30 +08:00
Sunil K Pandey	a114e29ddd	x86: Detect Intel Nova Lake Processor Detect Intel Nova Lake Processor and tune it similar to Intel Panther Lake. https://cdrdv2.intel.com/v1/dl/getContent/671368 Section 1.2. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-10-07 20:50:24 -07:00
Sunil K Pandey	f8dd52901b	x86: Detect Intel Wildcat Lake Processor Detect Intel Wildcat Lake Processor and tune it similar to Intel Panther Lake. https://cdrdv2.intel.com/v1/dl/getContent/671368 Section 1.2. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-10-07 16:41:06 -07:00
Uros Bizjak	a9a8b106bb	x86: Restore "&" GCC asm memory operand workaround to installed fpu-control.h fpu_control.h is an installed header so a wider range of compiler versions (including ones older than GCC 9) are relevant with it than are relevant for building glibc. Fixes commit `3014dec3ad` ('x86: Remove obsolete "&" GCC asm memory operand workaround') Signed-off-by: Uros Bizjak <ubizjak@gmail.com>	2025-09-24 08:04:41 +02:00
Uros Bizjak	ff8be6152b	x86: Use "%v" to emit VEX encoded instructions for AVX targets Legacy encodings of SSE instructions incur AVX-SSE domain transition penalties on some Intel microarchitectures (e.g. Haswell, Broadwell). Using the VEX forms avoids these penatlies and keeps all instructions in the VEX decode domain. Use "%v" sequence to emit the "v" prefix for opcodes when compiling with -mavx. No functional changes intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Reviewed-by: Florian Weimer <fweimer@redhat.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-09-22 17:33:25 +02:00
Uros Bizjak	3014dec3ad	x86: Remove obsolete "&" GCC asm memory operand workaround GCC now accept plain variable names as valid lvalues for "m" constraints, automatically spilling locals to memory if necessary. The long-standing "&" pattern was originally used as a defensive workaround for older compiler versions that rejected operands such as: asm ("incl %0" : "+m"(x)); with errors like "memory input is not directly addressable". Modern compilers (GCC >= 9) reliably generate correct code without the workaround, and the resulting assembly is identical. No functional changes intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Reviewed-by: Florian Weimer <fweimer@redhat.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-09-22 17:33:25 +02:00
H.J. Lu	1fa5773eb1	x86: Don't use asm statement for trunc/truncf Compiler inlines trunc and truncf with SSE4.1. But older versions of GCC doesn't inline them with -Os: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121861 Don't use asm statement for trunc and truncf if compiler can inline them with -Os. It removes one register move with GCC 16: __modff_sse41: __modff_sse41: .LFB23: .LFB23: .cfi_startproc .cfi_startproc endbr64 endbr64 subq $24, %rsp subq $24, %rsp .cfi_def_cfa_offset 32 .cfi_def_cfa_offset 32 movq %fs:40, %rax movq %fs:40, %rax movq %rax, 8(%rsp) movq %rax, 8(%rsp) xorl %eax, %eax xorl %eax, %eax movd %xmm0, %eax movd %xmm0, %eax addl %eax, %eax addl %eax, %eax cmpl $-16777216, %eax cmpl $-16777216, %eax je .L7 je .L7 > movaps %xmm0, %xmm3 movaps %xmm0, %xmm4 movaps %xmm0, %xmm4 movss .LC0(%rip), %xmm2 \| movss .LC0(%rip), %xmm1 movaps %xmm2, %xmm3 \| movaps %xmm1, %xmm2 andps %xmm0, %xmm2 \| roundss $11, %xmm3, %xmm3 roundss $11, %xmm0, %xmm1 \| subss %xmm3, %xmm4 subss %xmm1, %xmm4 \| andps %xmm0, %xmm1 andnps %xmm4, %xmm3 \| andnps %xmm4, %xmm2 orps %xmm3, %xmm2 \| orps %xmm2, %xmm1 .L3: .L3: movss %xmm1, (%rdi) \| movss %xmm3, (%rdi) movq 8(%rsp), %rax movq 8(%rsp), %rax subq %fs:40, %rax subq %fs:40, %rax jne .L8 jne .L8 movaps %xmm2, %xmm0 \| movaps %xmm1, %xmm0 addq $24, %rsp addq $24, %rsp .cfi_remember_state .cfi_remember_state .cfi_def_cfa_offset 8 .cfi_def_cfa_offset 8 ret ret Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Uros Bizjak <ubizjak@gmail.com>	2025-09-17 04:30:11 -07:00
Adhemerval Zanella	b5d88fa6c3	math: Fix x86_64 build for -Os (BZ 33367) The compiler might not inline the trunc function call for USE_TRUNC_BUILTIN [1]. This patch adds an optimized __trunc/__truncf for x86 used on modf ifunc variant to avoid the trunc libcall. Checked on x86_64, x86_64-v2, x86_64-v3, and x86_64-v4. Used -O2 and -Os options. Performed a full make check on x86_64 with both optimizations. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121861 Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-09-11 06:23:33 -07:00
Uros Bizjak	4be94f6a9c	x86: Remove x86 version of thread_pointer.h The x86 version of thread_pointer.h is the same as the generic one. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: H.J.Lu <hjl.tools@gmail.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Carlos O'Donell <carlos@redhat.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-09-10 05:30:07 -07:00
Uros Bizjak	e5222ceb73	x86: Remove stale __GNUC_PREREQ (11, 1) test from __thread_pointer() GCC 12 is currently the minimum supported compiler version. Remove no longer needed __GNUC_PREREQ (11, 1) test from __thread_pointer(). Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: H.J.Lu <hjl.tools@gmail.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Carlos O'Donell <carlos@redhat.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-09-10 05:30:07 -07:00
Uros Bizjak	b8253693b7	x86: Define atomic_compare_and_exchange_{val, bool}_acq using __atomic_compare_exchange_n No functional changes. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> Cc: Wilco Dijkstra <Wilco.Dijkstra@arm.com> Cc: Collin Funk <collin.funk1@gmail.com> Cc: H.J.Lu <hjl.tools@gmail.com> Cc: Carlos O'Donell <carlos@redhat.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-09-09 07:58:52 -07:00
Uros Bizjak	935ee691bc	x86: Define atomic_exchange_acq using __atomic_exchange_n The resulting libc.so is identical on both x86_64 and i386 targets compared to unpatched builds: $ sha1sum libc-x86_64-old.so libc-x86_64-new.so 74eca1b87f2ecc9757a984c089a582b7615d93e7 libc-x86_64-old.so 74eca1b87f2ecc9757a984c089a582b7615d93e7 libc-x86_64-new.so $ sha1sum libc-i386-old.so libc-i386-new.so 882bbab8324f79f4fbc85224c4c914fc6822ece7 libc-i386-old.so 882bbab8324f79f4fbc85224c4c914fc6822ece7 libc-i386-new.so Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> Cc: Wilco Dijkstra <Wilco.Dijkstra@arm.com> Cc: Collin Funk <collin.funk1@gmail.com> Cc: H.J.Lu <hjl.tools@gmail.com> Cc: Carlos O'Donell <carlos@redhat.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-09-09 07:51:41 -07:00
Uros Bizjak	e6b5ad1b1d	x86: Define atomic_full_barrier using __sync_synchronize For x86_64 targets, __sync_synchronize emits a full 64-bit 'LOCK ORQ $0x0,(%rsp)' instead of 'LOCK ORL $0x0,(%rsp)'. No functional changes. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> Cc: Wilco Dijkstra <Wilco.Dijkstra@arm.com> Cc: Collin Funk <collin.funk1@gmail.com> Cc: H.J.Lu <hjl.tools@gmail.com> Cc: Carlos O'Donell <carlos@redhat.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-09-09 07:44:41 -07:00
Uros Bizjak	4eef002328	x86: Remove catomic_* locking primitives Remove obsolete catomic_* locking primitives which don't map to standard compiler builtins. There are still a couple of places in the tree that uses them (malloc/arena.c and malloc/malloc.c). x86 didn't define __arch_c_compare_and_exchange_bool_* primitives so fallback code used __arch_c_compare_and_exchange_val_* primitives instead. This resulted in unoptimal code for catomic_compare_and_exchange_bool_acq where superfluous CMP was emitted after CMPXCHG, e.g. in arena_get2: 775b8: 48 8d 4a 01 lea 0x1(%rdx),%rcx 775bc: 48 89 d0 mov %rdx,%rax 775bf: 64 83 3c 25 18 00 00 cmpl $0x0,%fs:0x18 775c6: 00 00 775c8: 74 01 je 775cb <arena_get2+0x35b> 775ca: f0 48 0f b1 0d 75 3d lock cmpxchg %rcx,0x163d75(%rip) # 1db348 <narenas> 775d1: 16 00 775d3: 48 39 c2 cmp %rax,%rdx 775d6: 74 7f je 77657 <arena_get2+0x3e7> that now becomes: 775b8: 48 8d 4a 01 lea 0x1(%rdx),%rcx 775bc: 48 89 d0 mov %rdx,%rax 775bf: f0 48 0f b1 0d 80 3d lock cmpxchg %rcx,0x163d80(%rip) # 1db348 <narenas> 775c6: 16 00 775c8: 74 7f je 77649 <arena_get2+0x3d9> OTOH, catomic_decrement does not fallback to atomic_fetch_add (, -1) builtin but to the cmpxchg loop, so the generated code in arena_get2 regresses a bit, from using LOCK DECQ insn: 77829: 64 83 3c 25 18 00 00 cmpl $0x0,%fs:0x18 77830: 00 00 77832: 74 01 je 77835 <arena_get2+0x5c5> 77834: f0 48 ff 0d 0c 3b 16 lock decq 0x163b0c(%rip) # 1db348 <narenas> 7783b: 00 to a cmpxchg loop: 7783d: 48 8b 0d 04 3b 16 00 mov 0x163b04(%rip),%rcx # 1db348 <narenas> 77844: 48 8d 71 ff lea -0x1(%rcx),%rsi 77848: 48 89 c8 mov %rcx,%rax 7784b: f0 48 0f b1 35 f4 3a lock cmpxchg %rsi,0x163af4(%rip) # 1db348 <narenas> 77852: 16 00 77854: 0f 84 c9 fa ff ff je 77323 <arena_get2+0xb3> 7785a: eb e1 jmp 7783d <arena_get2+0x5cd> Defining catomic_exchange_and_add using __atomic_fetch_add solves the above issue and generates optimal: 77809: f0 48 83 2d 36 3b 16 lock subq $0x1,0x163b36(%rip) # 1db348 <narenas> 77810: 00 01 Depending on the target processor, the compiler may emit either 'LOCK ADD/SUB $1, m' or 'INC/DEC $1, m' instruction, due to partial flag register stall issue. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> Cc: Wilco Dijkstra <Wilco.Dijkstra@arm.com> Cc: Collin Funk <collin.funk1@gmail.com> Cc: H.J.Lu <hjl.tools@gmail.com> Cc: Carlos O'Donell <carlos@redhat.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-09-09 07:36:02 -07:00
Uros Bizjak	af5b01dc26	x86: Remove unused atomics Remove unused atomics from <sysdeps/x86/atomic-machine.h>. The resulting libc.so is identical on both x86_64 and i386 targets compared to unpatched builds: $ sha1sum libc-x86_64-old.so libc-x86_64-new.so b89aaa2b71efd435104ebe6f4cd0f2ef89fcac90 libc-x86_64-old.so b89aaa2b71efd435104ebe6f4cd0f2ef89fcac90 libc-x86_64-new.so $ sha1sum libc-i386-old.so libc-i386-new.so aa70f2d64da2f0f516634b116014cfe7af3e5b1a libc-i386-old.so aa70f2d64da2f0f516634b116014cfe7af3e5b1a libc-i386-new.so Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> Cc: Wilco Dijkstra <Wilco.Dijkstra@arm.com> Cc: Collin Funk <collin.funk1@gmail.com> Cc: H.J.Lu <hjl.tools@gmail.com> Cc: Carlos O'Donell <carlos@redhat.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-09-09 07:29:57 -07:00
H.J. Lu	5c522d7a58	x86: Include <bits/stdlib-bsearch.h> in dl-cacheinfo.h On x86-64, when glibc is configured with --enable-stack-protector=all and compiled with -Os, ld.so crashes very early: (gdb) r --direct Starting program: /export/build/gnu/tools-build/glibc-gitlab/build-x86_64-linux/string/test-memswap --direct Program received signal SIGSEGV, Segmentation fault. 0x00007ffff7f41b0a in bsearch (__key=__key@entry=0x7fffffffda28, __base=__base@entry=0x7ffff7fca140 <intel_02_known>, __nmemb=__nmemb@entry=68, __size=__size@entry=8, __compar=__compar@entry=0x7ffff7f3b691 <intel_02_known_compare>) at ../bits/stdlib-bsearch.h:22 22 { (gdb) disass Dump of assembler code for function bsearch: 0x00007ffff7f41af0 <+0>: push %r15 0x00007ffff7f41af2 <+2>: mov %rcx,%r15 0x00007ffff7f41af5 <+5>: push %r14 0x00007ffff7f41af7 <+7>: push %r13 0x00007ffff7f41af9 <+9>: mov %rsi,%r13 0x00007ffff7f41afc <+12>: push %r12 0x00007ffff7f41afe <+14>: mov %rdi,%r12 0x00007ffff7f41b01 <+17>: push %rbp 0x00007ffff7f41b02 <+18>: mov %rdx,%rbp 0x00007ffff7f41b05 <+21>: push %rbx 0x00007ffff7f41b06 <+22>: sub $0x18,%rsp => 0x00007ffff7f41b0a <+26>: mov %fs:0x28,%r14 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ We can't use stack protector at this point. 0x00007ffff7f41b13 <+35>: mov %r14,0x8(%rsp) 0x00007ffff7f41b18 <+40>: mov %r8,%r14 0x00007ffff7f41b1b <+43>: test %rbp,%rbp 0x00007ffff7f41b1e <+46>: je 0x7ffff7f41b48 <bsearch+88> 0x00007ffff7f41b20 <+48>: mov %rbp,%rbx 0x00007ffff7f41b23 <+51>: mov %r12,%rdi 0x00007ffff7f41b26 <+54>: shr $1,%rbx 0x00007ffff7f41b29 <+57>: imul %r15,%rbx 0x00007ffff7f41b2d <+61>: add %r13,%rbx 0x00007ffff7f41b30 <+64>: mov %rbx,%rsi (gdb) bt #0 0x00007ffff7f41b0a in bsearch (__key=__key@entry=0x7fffffffda28, __base=__base@entry=0x7ffff7fca140 <intel_02_known>, __nmemb=__nmemb@entry=68, __size=__size@entry=8, __compar=__compar@entry=0x7ffff7f3b691 <intel_02_known_compare>) at ../bits/stdlib-bsearch.h:22 #1 0x00007ffff7f3c1be in intel_check_word (name=188, value=1979933440, has_level_2=has_level_2@entry=0x7fffffffda7f, no_level_2_or_3=no_level_2_or_3@entry=0x7fffffffda7e, cpu_features=<optimized out>) at ../sysdeps/x86/dl-cacheinfo.h:217 #2 0x00007ffff7f3c29f in handle_intel (name=name@entry=188, cpu_features=<optimized out>) at ../sysdeps/x86/dl-cacheinfo.h:279 #3 0x00007ffff7f3ccf9 in dl_init_cacheinfo (cpu_features=<optimized out>) at ../sysdeps/x86/dl-cacheinfo.h:852 #4 init_cpu_features (cpu_features=<optimized out>) at ../sysdeps/x86/cpu-features.c:1153 #5 0x00007ffff7f3d6f9 in __libc_start_main_impl (main=0x7ffff7f396dc <main>, argc=2, argv=0x7fffffffdbe8, init=<optimized out>, fini=<optimized out>, rtld_fini=0x0, stack_end=0x7fffffffdbd8) at ../csu/libc-start.c:269 #6 0x00007ffff7f39901 in _start () at ../sysdeps/x86_64/start.S:115 (gdb) The problem is that since __USE_EXTERN_INLINES isn't defined with -Os, the inline bsearch in <bits/stdlib-bsearch.h> isn't available and the external bsearch is compiled with stack protector. Include <bits/stdlib-bsearch.h> in dl-cacheinfo.h fixed BZ #33374. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org>	2025-09-08 20:31:36 -07:00
Uros Bizjak	119d658ac2	x86: Use flag output operands for inline asm in atomic-machine.h Use the flag output constraints feature available in gcc 6+ ("=@cc<cond>") instead of explicitly setting a boolean variable with SETcc instruction. This approach decouples the instruction that sets the flags from the code that consumes them, allowing the compiler to create better code when working with flags users. Instead of e.g.: lock add %esi,(%rdi) sets %sil test %sil,%sil jne <...> the compiler now generates: lock add %esi,(%rdi) js <...> No functional changes intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: H.J.Lu <hjl.tools@gmail.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Carlos O'Donell <carlos@redhat.com> Reviewed-by: Florian Weimer <fweimer@redhat.com>	2025-08-29 09:05:23 +02:00
Henrik Lindström	c49a32d7eb	x86/configure: Improve portability of isa level check wc -l pads the output with leading spaces on some systems, e.g. FreeBSD. This results in the check `test "$count" = 1` failing. Use -eq for integer comparison instead. Signed-off-by: Henrik Lindström <henrik@lxm.se> Reviewed-by: Arjun Shankar <arjun@redhat.com>	2025-08-27 17:14:15 +02:00
H.J. Lu	027505a07b	Don't pass -c to LIBC_TRY_TEST_CC_OPTION LIBC_TRY_TEST_CC_OPTION is defined with LIBC_TRY_CC_OPTION: dnl Test a compiler option or options with an empty input file. dnl LIBC_TRY_CC_OPTION([options], [action-if-true], [action-if-false]) AC_DEFUN([LIBC_TRY_CC_OPTION], [AS_IF([AC_TRY_COMMAND([${CC-cc} $1 -xc /dev/null -S -o /dev/null])], [$2], [$3])]) which passes -S to compiler. Unlike gcc, when -c is also passed to clang 20, we get configure:7838: clang -c -Werror -fsemantic-interposition -xc /dev/null -S -o /dev/null clang: error: argument unused during compilation: '-c' [-Werror,-Wunused-command-line-argument] Don't pass -c to LIBC_TRY_TEST_CC_OPTION since -c isn't needed. This fixes BZ #33318. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org>	2025-08-23 15:59:42 -07:00
H.J. Lu	dd4394b249	x86: Set have-protected-data to no if unsupported If the building compiler enables no direct external data access by default, access to protected data in shared libraries from executables must be compiled with no direct external data access. If the testing compiler doesn't support it, set have-protected-data to no to disable the tests which requires no direct external data access. Add LIBC_TRY_CC_COMMAND to test a building compiler option or options with an input file. This fixes BZ #33286. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org>	2025-08-22 17:55:32 -07:00
H.J. Lu	bd4628f3f1	i386: Also add GLIBC_ABI_GNU2_TLS version [BZ #33129 ] Since the GNU2 TLS run-time bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31372 affects both i386 and x86-64, also add GLIBC_ABI_GNU2_TLS version to i386 to indicate the working GNU2 TLS run-time. For x86-64, the additional GNU2 TLS run-time bug fix is needed for https://sourceware.org/bugzilla/show_bug.cgi?id=31501 Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org>	2025-08-18 11:58:01 -07:00
H.J. Lu	aec8498873	x86-64: Properly compile ISA optimized modf and modff There are 3 variants of modf and modff: SSE2, SSE4.1 and AVX. s_modf.c and s_modff.c include the generic implementation compiled with the minimum x86 ISA level. The IFUNC selector is used only if the minimum ISA level is less than AVX. SSE4.1 variant is included only if the ISA level is less than SSE4.1. AVX variant is included only the ISA level is less than AVX. AVX variant should be compiled with -mavx, not -msse2avx -DSSE2AVX which are used to encode SSE assembly sources with EVEX encoding. The routines that are shared between libc and libm should use different rules to avoid using the same MODULE_NAME, to avoid potential issues like BZ #33165 where __stack_chk_fail not being routed to the internal symbol. Tested with -march=x86-64, -march=x86-64-v2, -march=x86-64-v3 and -march=x86-64-v4. This fixes BZ #33165 and BZ #33173. Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-07-18 10:22:19 -07:00
H.J. Lu	7130c2ae97	x86: Avoid vector/r16-r31 registers and memcpy/memset in mcount_internal Since mcount_internal is called from mcount/__fentry__ which preserve only RAX, RCX, RDX, RSI, RDI, R8 and R9, compile mcount.c with -fno-tree-loop-distribute-patterns -mgeneral-regs-only -mno-apxf to void vector/r16-r31 registers and memcpy/memset in mcount_internal. This fixes BZ #33134. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Andreas K. Huettel <dilfridge@gentoo.org>	2025-07-09 05:33:05 +08:00
H.J. Lu	0ef7965e5b	x86: Update tst-gnu2-tls2 tests Update tst-gnu2-tls2 tests to set XMM0...XMM7 to all 1s in malloc to verify that XMM registers are preserved when _dl_tlsdesc_dynamic is called by clearing vectors with zeroed XMM registers before _dl_tlsdesc_dynamic and using these XMM registers to clear vectors after _dl_tlsdesc_dynamic. This improves the BZ #31372 test. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org>	2025-06-19 05:46:31 +08:00
H.J. Lu	848f0e46f0	i386: Update ___tls_get_addr to preserve vector registers Compiler generates the following instruction sequence for dynamic TLS access: leal tls_var@tlsgd(,%ebx,1), %eax call ___tls_get_addr@PLT CALL instruction is transparent to compiler which assumes all registers, except for EFLAGS, AX, CX, and DX, are unchanged after CALL. But ___tls_get_addr is a normal function which doesn't preserve any vector registers. 1. Rename the generic __tls_get_addr function to ___tls_get_addr_internal. 2. Change ___tls_get_addr to a wrapper function with implementations for FNSAVE, FXSAVE, XSAVE and XSAVEC to save and restore all vector registers. 3. dl-tlsdesc-dynamic.h has: _dl_tlsdesc_dynamic: /* Like all TLS resolvers, preserve call-clobbered registers. We need two scratch regs anyway. */ subl $32, %esp cfi_adjust_cfa_offset (32) It is wrong to use movl %ebx, -28(%esp) movl %esp, %ebx cfi_def_cfa_register(%ebx) ... mov %ebx, %esp cfi_def_cfa_register(%esp) movl -28(%esp), %ebx to preserve EBX on stack. Fix it with: movl %ebx, 28(%esp) movl %esp, %ebx cfi_def_cfa_register(%ebx) ... mov %ebx, %esp cfi_def_cfa_register(%esp) movl 28(%esp), %ebx 4. Update _dl_tlsdesc_dynamic to call ___tls_get_addr_internal directly. 5. Add have-test-mtls-traditional to compile tst-tls23-mod.c with traditional TLS variant to verify the fix. 6. Define DL_RUNTIME_RESOLVE_REALIGN_STACK in sysdeps/x86/sysdep.h. This fixes BZ #32996. Co-Authored-By: Adhemerval Zanella <adhemerval.zanella@linaro.org> Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-06-19 04:30:31 +08:00
H.J. Lu	0a027674a1	x86: Avoid GLRO(dl_x86_cpu_features) In init_cpu_features, replace GLRO(dl_x86_cpu_features) with cpu_features to avoid an extra load. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Florian Weimer <fweimer@redhat.com>	2025-06-09 13:03:13 +08:00
H.J. Lu	de14f1959e	x86: Detect Intel Diamond Rapids Detect Intel Diamond Rapids and tune it similar to Intel Granite Rapids. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>	2025-04-12 09:43:15 -07:00
Sunil K Pandey	9f0deff558	x86: Handle unknown Intel processor with default tuning Enable default tuning for unknown Intel processor. Tested on x86, no regression. Co-Authored-By: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-04-11 17:05:22 -07:00
Sunil K Pandey	e53eb952b9	x86: Add ARL/PTL/CWF model detection support - Add ARROWLAKE model detection. - Add PANTHERLAKE model detection. - Add CLEARWATERFOREST model detection. Intel® Architecture Instruction Set Extensions Programming Reference https://cdrdv2.intel.com/v1/dl/getContent/671368 Section 1.2. No regression, validated model detection on SDE. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-04-10 07:47:08 -07:00
Sunil K Pandey	70b6488551	x86: Optimize xstate size calculation Scan xstate IDs up to the maximum supported xstate ID. Remove the separate AMX xstate calculation. Instead, exclude the AMX space from the start of TILECFG to the end of TILEDATA in xsave_state_size. Completed validation on SKL/SKX/SPR/SDE and compared xsave state size with "ld.so --list-diagnostics" option, no regression. Co-Authored-By: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>	2025-04-05 07:51:38 -07:00
Florian Weimer	c6e2895695	x86: Link tst-gnu2-tls2-x86-noxsave{,c,xsavec} with libpthread This fixes a test build failure on Hurd. Fixes commit `145097dff1` ("x86: Use separate variable for TLSDESC XSAVE/XSAVEC state size (bug 32810)"). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-03-31 21:33:18 +02:00
YLK	dbb2880e61	Fix typo in comment	2025-03-31 10:54:52 -03:00
Florian Weimer	145097dff1	x86: Use separate variable for TLSDESC XSAVE/XSAVEC state size (bug 32810) Previously, the initialization code reused the xsave_state_full_size member of struct cpu_features for the TLSDESC state size. However, the tunable processing code assumes that this member has the original XSAVE (non-compact) state size, so that it can use its value if XSAVEC is disabled via tunable. This change uses a separate variable and not a struct member because the value is only needed in ld.so and the static libc, but not in libc.so. As a result, struct cpu_features layout does not change, helping a future backport of this change. Fixes commit `9b7091415a` ("x86-64: Update _dl_tlsdesc_dynamic to preserve AMX registers"). Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-03-29 09:17:38 +01:00
Florian Weimer	59585ddaa2	x86: Skip XSAVE state size reset if ISA level requires XSAVE If we have to use XSAVE or XSAVEC trampolines, do not adjust the size information they need. Technically, it is an operator error to try to run with -XSAVE,-XSAVEC on such builds, but this change here disables some unnecessary code with higher ISA levels and simplifies testing. Related to commit `befe2d3c4d` ("x86-64: Don't use SSE resolvers for ISA level 3 or above"). Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-03-29 09:17:38 +01:00
Adhemerval Zanella	1d60b9dfda	Remove dl-procinfo.h powerpc was the only architecture with arch-specific hooks for LD_SHOW_AUXV, and with the information moved to ld diagnostics there is no need to keep the _dl_procinfo hook. Checked with a build for all affected ABIs. Reviewed-by: Peter Bergner <bergner@linux.ibm.com>	2025-03-05 11:22:09 -03:00

1 2 3 4 5 ...

611 Commits