glibc

Commit Graph

Author	SHA1	Message	Date
Sachin Monga	f8cdc03e1e	ppc64le: Restore optimized strncmp for power10 This patch addresses the actual cause of CVE-2025-5745 The vector non-volatile registers are not used anymore for 32 byte load and comparison operation Additionally, the assembler workaround used earlier for the instruction lxvp is replaced with actual instruction. Signed-off-by: Sachin Monga <smonga@linux.ibm.com> Co-authored-by: Paul Murphy <paumurph@redhat.com> (cherry picked from commit `2ea943f7d4`)	2025-11-21 01:31:08 -05:00
Sachin Monga	040256e79b	ppc64le: Restore optimized strcmp for power10 This patch addresses the actual cause of CVE-2025-5702 The vector non-volatile registers are not used anymore for 32 byte load and comparison operation Additionally, the assembler workaround used earlier for the instruction lxvp is replaced with actual instruction. Signed-off-by: Sachin Monga <smonga@linux.ibm.com> Co-authored-by: Paul Murphy <paumurph@redhat.com> (cherry picked from commit `9a40b1cda5`)	2025-11-21 01:30:45 -05:00
Pierre Blanchard	5c6445672a	AArch64: Fix and improve SVE pow(f) special cases powf: Update scalar special case function to best use new interface. pow: Make specialcase NOINLINE to prevent str/ldr leaking in fast path. Remove depency in sv_call2, as new callback impl is not a performance gain. Replace with vectorised specialcase since structure of scalar routine is fairly simple. Throughput gain of about 5-10% on V1 for large values and 25% for subnormal `x`. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> (cherry picked from commit `bb6519de1e`)	2025-11-18 16:12:52 +00:00
Pierre Blanchard	1e16b570bb	AArch64: fix SVE tanpi(f) [BZ #33642 ] Fixed svld1rq using incorrect predicates (BZ #33642). Next to no performance variations (tested on V1). Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> (cherry picked from commit `e889160273`)	2025-11-18 16:12:52 +00:00
Joe Ramsay	442cc3901b	AArch64: Fix instability in AdvSIMD sinh Previously presence of special-cases in one lane could affect the results in other lanes due to unconditional scalar fallback. The old WANT_SIMD_EXCEPT option (which has never been enabled in libmvec) has been removed from AOR, making it easier to spot and fix this. No measured change in performance. This patch applies cleanly as far back as 2.41, however there are conflicts with 2.40 where sinh was first introduced. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> (cherry picked from commit `e45af510bc`)	2025-11-18 16:12:52 +00:00
Joe Ramsay	ca489fc802	AArch64: Fix instability in AdvSIMD tan Previously presence of special-cases in one lane could affect the results in other lanes due to unconditional scalar fallback. The old WANT_SIMD_EXCEPT option (which has never been enabled in libmvec) has been removed from AOR, making it easier to spot and fix this. 4% improvement in throughput with GCC 14 on Neoverse V1. This bug is present as far back as 2.39 (where tan was first introduced). Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> (cherry picked from commit `6c22823da5`)	2025-11-18 16:12:52 +00:00
Joe Ramsay	360f60fb63	AArch64: Optimise SVE scalar callbacks Instead of using SVE instructions to marshall special results into the correct lane, just write the entire vector (and the predicate) to memory, then use cheaper scalar operations. Geomean speedup of 16% in special intervals on Neoverse with GCC 14. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> (cherry picked from commit `5b82fb1882`)	2025-11-18 16:12:52 +00:00
Yury Khrustalev	215e9155ea	aarch64: fix includes in SME tests Use the correct include for the SIGCHLD macro: signal.h Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> (cherry picked from commit `a9c426bcca`) (cherry picked from commit `17c3eab387`)	2025-11-13 13:38:09 +00:00
Florian Weimer	75b6b263e9	aarch64: Do not link conform tests with -Wl,-z,force-bti (bug 33601) If the toolchain does not default to generate BTI markers in GCC, the main program for conform runtime tests will not have the BTI marker that -Wl,-z,force-bti requires. Without -Wl,-z,force-bti, the link editor will not tell the dynamic linker to enable BTI, and the missing BTI marker is harmless. Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>	2025-11-13 14:33:09 +01:00
Yury Khrustalev	5bf8ee7ad5	aarch64: fix cfi directives around __libc_arm_za_disable Incorrect CFI directive corrupted call stack information and prevented debuggers from correctly displaying call stack information. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit `2f77aec043`) (cherry picked from commit `de1fe81f47`)	2025-11-10 11:31:05 +00:00
Yury Khrustalev	e4ffcf32b9	aarch64: tests for SME This commit adds tests for the following use cases relevant to handing of the SME state: - fork() and vfork() - clone() and clone3() - signal handler While most cases are trivial, the case of clone3() is more complicated since the clone3() symbol is not public in Glibc. To avoid having to check all possible ways clone3() may be called via other public functions (e.g. vfork() or pthread_create()), we put together a test that links directly with clone3.o. All the existing functions that have calls to clone3() may not actually use it, in which case the outcome of such tests would be unexpected. Having a direct call to the clone3() symbol in the test allows to check precisely what we need to test: that the __arm_za_disable() function is indeed called and has the desired effect. Linking to clone3.o also requires linking to __arm_za_disable.o that in turn requires the _dl_hwcap2 hidden symbol which to provide in the test and initialise it before using. Co-authored-by: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit `ecb0fc2f0f`) (cherry picked from commit `71874f167a`)	2025-11-10 11:31:05 +00:00
Yury Khrustalev	889ae4bdbb	aarch64: clear ZA state of SME before clone and clone3 syscalls This change adds a call to the __arm_za_disable() function immediately before the SVC instruction inside clone() and clone3() wrappers. It also adds a macro for inline clone() used in fork() and adds the same call to the vfork implementation. This sets the ZA state of SME to "off" on return from these functions (for both the child and the parent). The __arm_za_disable() function is described in [1] (8.1.3). Note that the internal Glibc name for this function is __libc_arm_za_disable(). When this change was originally proposed [2,3], it generated a long discussion where several questions and concerns were raised. Here we will address these concerns and explain why this change is useful and, in fact, necessary. In a nutshell, a C library that conforms to the AAPCS64 spec [1] (pertinent to this change, mainly, the chapters 6.2 and 6.6), should have a call to the __arm_za_disable() function in clone() and clone3() wrappers. The following explains in detail why this is the case. When we consider using the __arm_za_disable() function inside the clone() and clone3() libc wrappers, we talk about the C library subroutines clone() and clone3() rather than the syscalls with similar names. In the current version of Glibc, clone() is public and clone3() is private, but it being private is not pertinent to this discussion. We will begin with stating that this change is NOT a bug fix for something in the kernel. The requirement to call __arm_za_disable() does NOT come from the kernel. It also is NOT needed to satisfy a contract between the kernel and userspace. This is why it is not for the kernel documentation to describe this requirement. This requirement is instead needed to satisfy a pure userspace scheme outlined in [1] and to make sure that software that uses Glibc (or any other C library that has correct handling of SME states (see below)) conforms to [1] without having to unnecessarily become SME-aware thus losing portability. To recap (see [1] (6.2)), SME extension defines SME state which is part of processor state. Part of this SME state is ZA state that is necessary to manage ZA storage register in the context of the ZA lazy saving scheme [1] (6.6). This scheme exists because it would be challenging to handle ZA storage of SME in either callee-saved or caller-saved manner. There are 3 kinds of ZA state that are defined in terms of the PSTATE.ZA bit and the TPIDR2_EL0 register (see [1] (6.6.3)): - "off": PSTATE.ZA == 0 - "active": PSTATE.ZA == 1 TPIDR2_EL0 == null - "dormant": PSTATE.ZA == 1 TPIDR2_EL0 != null As [1] (6.7.2) outlines, every subroutine has exactly one SME-interface depending on the permitted ZA-states on entry and on normal return from a call to this subroutine. Callers of a subroutine must know and respect the ZA-interface of the subroutines they are using. Using a subroutine in a way that is not permitted by its ZA-interface is undefined behaviour. In particular, clone() and clone3() (the C library functions) have the ZA-private interface. This means that the permitted ZA-states on entry are "off" and "dormant" and that the permitted states on return are "off" or "dormant" (but if and only if it was "dormant" on entry). This means that both functions in question should correctly handle both "off" and "dormant" ZA-states on entry. The conforming states on return are "off" and "dormant" (if inbound state was already "dormant"). This change ensures that the ZA-state on return is always "off". Note, that, in the context of clone() and clone3(), "on return" means a point when execution resumes at certain address after transferring from clone() or clone3(). For the caller (we may refer to it as "parent") this is the return address in the link register where the RET instruction jumps. For the "child", this is the target branch address. So, the "off" state on return is permitted and conformant. Why can't we retain the "dormant" state? In theory, we can, but we shouldn't, here is why. Every subroutine with a private-ZA interface, including clone() and clone3(), must comply with the lazy saving scheme [1] (6.7.2). This puts additional responsibility on a subroutine if ZA-state on return is "dormant" because this state has special meaning. The "caller" (that is the place in code where execution is transferred to, so this include both "parent" and "child") may check the ZA-state and use it as per the spec of the "dormant" state that is outlined in [1] (6.6.6 and 6.6.7). Conforming to this would require more code inside of clone() and clone3() which hardly is desirable. For the return to "parent" this could be achieved in theory, but given that neither clone() nor clone3() are supposed to be used in the middle of an SME operation, if wouldn't be useful. For the "return" to "child" this would be particularly difficult to achieve given the complexity of these functions and their interfaces. Most importantly, it would be illegal and somewhat meaningless to allow a "child" to start execution in the "dormant" ZA-state because the very essence of the "dormant" state implies that there is a place to return and that there is some outer context that we are allowed to interact with. To sum up, calling __arm_za_disable() to ensure the "off" ZA-state when the execution resumes after a call to clone() or clone3() is correct and also the most simple way to conform to [1]. Can there be situations when we can avoid calling __arm_za_disable()? Calling __arm_za_disable() implies certain (sufficiently small) overhead, so one might rightly ponder avoiding making a call to this function when we can afford not to. The most trivial cases like this (e.g. when the calling thread doesn't have access to SME or to the TPIDR2_EL0 register) are already handled by this function (see [1] (8.1.3 and 8.1.2)). Reasoning about other possible use cases would require making code inside clone() and clone3() more complicated and it would defeat the point of trying to make an optimisation of not calling __arm_za_disable(). Why can't the kernel do this instead? The handling of SME state by the kernel is described in [4]. In short, kernel must not impose a specific ZA-interface onto a userspace function. Interaction with the kernel happens (among other thing) via system calls. In Glibc many of the system calls (notably, including SYS_clone and SYS_clone3) are used via wrappers, and the kernel has no control of them and, moreover, it cannot dictate how these wrappers should behave because it is simply outside of the kernel's remit. However, in certain cases, the kernel may ensure that a "child" doesn't start in an incorrect state. This is what is done by the recent change included in 6.16 kernel [5]. This is not enough to ensure that code that uses clone() and clone3() function conforms to [1] when it runs on a system that provides SME, hence this change. [1]: https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst [2]: https://inbox.sourceware.org/libc-alpha/20250522114828.2291047-1-yury.khrustalev@arm.com [3]: https://inbox.sourceware.org/libc-alpha/20250609121407.3316070-1-yury.khrustalev@arm.com [4]: https://www.kernel.org/doc/html/v6.16/arch/arm64/sme.html [5]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cde5c32db55740659fca6d56c09b88800d88fd29 Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit `27effb3d50`) (cherry picked from commit `256030b984`)	2025-11-10 11:31:05 +00:00
Yury Khrustalev	1a0ee26714	aarch64: define macro for calling __libc_arm_za_disable A common sequence of instructions is used in several places in assembly files, so define it in one place as an assembly macro. Note that PAC instructions are not included in the new macro because they are redundant given how we call the arm_za_disable function (return address is not saved on stack, so no need to sign it). (based on commits `6de12fc9ad` and `c0f0db2d59`) Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2025-11-10 11:31:04 +00:00
Yury Khrustalev	97076e0cf1	aarch64: update tests for SME Add test that checks that ZA state is disabled after setjmp and sigsetjmp Update existing SME test that uses setjmp Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> (cherry picked from commit `251f932624`)	2025-11-10 11:28:19 +00:00
Yury Khrustalev	1f57ffdf35	aarch64: Disable ZA state of SME in setjmp and sigsetjmp Due to the nature of the ZA state, setjmp() should clear it in the same manner as it is already done by longjmp. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> (cherry picked from commit `a7f6fd976c`)	2025-11-10 11:28:19 +00:00
Jiamei Xie	580746904b	x86: fix wmemset ifunc stray '!' (bug 33542) The ifunc selector for wmemset had a stray '!' in the X86_ISA_CPU_FEATURES_ARCH_P(...) check: if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load, !)) This effectively negated the predicate and caused the AVX2/AVX512 paths to be skipped, making the dispatcher fall back to the SSE2 implementation even on CPUs where AVX2/AVX512 are available. The regression leads to noticeable throughput loss for wmemset. Remove the stray '!' so the AVX_Fast_Unaligned_Load capability is tested as intended and the correct AVX2/EVEX variants are selected. Impact: - On AVX2/AVX512-capable x86_64, wmemset no longer incorrectly falls back to SSE2; perf now shows __wmemset_evex/avx2 variants. Testing: - benchtests/bench-wmemset shows improved bandwidth across sizes. - perf confirm the selected symbol is no longer SSE2. Signed-off-by: xiejiamei <xiejiamei@hygon.com> Signed-off-by: Li jing <lijing@hygon.cn> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit `4d86b6cdd8`)	2025-11-04 12:22:22 +00:00
Sunil K Pandey	7c0632472d	x86: Detect Intel Nova Lake Processor Detect Intel Nova Lake Processor and tune it similar to Intel Panther Lake. https://cdrdv2.intel.com/v1/dl/getContent/671368 Section 1.2. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `a114e29ddd`)	2025-10-08 11:36:36 -07:00
Sunil K Pandey	4b1f8c90f9	x86: Detect Intel Wildcat Lake Processor Detect Intel Wildcat Lake Processor and tune it similar to Intel Panther Lake. https://cdrdv2.intel.com/v1/dl/getContent/671368 Section 1.2. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `f8dd52901b`)	2025-10-08 11:36:14 -07:00
Florian Weimer	cf926cd7fb	nss: Group merge does not react to ERANGE during merge (bug 33361) The break statement in CHECK_MERGE is expected to exit the surrounding while loop, not the do-while loop with in the macro. Remove the do-while loop from the macro. It is not needed to turn the macro expansion into a single statement due to the way CHECK_MERGE is used (and the statement expression would cover this anyway). Reviewed-by: Collin Funk <collin.funk1@gmail.com> (cherry picked from commit `0fceed2545`)	2025-09-19 09:20:48 +02:00
Pierre Blanchard	9867e44cdc	AArch64: Fix SVE powf routine [BZ #33299 ] Fix a bug in predicate logic introduced in last change. A slight performance improvement from relying on all true predicates during conversion from single to double. This fixes BZ #33299. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> (cherry picked from commit `aac077645a`)	2025-09-03 13:31:38 +00:00
Florian Weimer	a52c9b75c7	Optimize __libc_tsd_* thread variable access These variables are not exported, and libc.so TLS is initial-exec anyway. Declare these variables as hidden and use the initial-exec TLS model. Reviewed-by: Frédéric Bérat <fberat@redhat.com> (cherry picked from commit `a894f04d87`)	2025-08-20 16:06:40 +02:00
H.J. Lu	ed4672abb5	i386: Add GLIBC_ABI_GNU_TLS version [BZ #33221 ] On i386, programs and shared libraries with __thread usage may fail silently at run-time against glibc without the TLS run-time fix for: https://sourceware.org/bugzilla/show_bug.cgi?id=32996 Add GLIBC_ABI_GNU_TLS version to indicate that glibc has the working GNU TLS run-time. Linker can add the GLIBC_ABI_GNU_TLS version to binaries which depend on the working TLS run-time so that such programs and shared libraries will fail to load and run at run-time against libc.so without the GLIBC_ABI_GNU_TLS version, instead of fail silently at random. This fixes BZ #33221. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org> (cherry picked from commit `ed1b7a5a48`)	2025-08-19 16:20:46 -07:00
H.J. Lu	7aa907241c	i386: Also add GLIBC_ABI_GNU2_TLS version [BZ #33129 ] Since the GNU2 TLS run-time bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31372 affects both i386 and x86-64, also add GLIBC_ABI_GNU2_TLS version to i386 to indicate the working GNU2 TLS run-time. For x86-64, the additional GNU2 TLS run-time bug fix is needed for https://sourceware.org/bugzilla/show_bug.cgi?id=31501 Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org> (cherry picked from commit `bd4628f3f1`)	2025-08-19 06:39:13 -07:00
H.J. Lu	c1bec0b52d	i386: Update ___tls_get_addr to preserve vector registers Compiler generates the following instruction sequence for dynamic TLS access: leal tls_var@tlsgd(,%ebx,1), %eax call ___tls_get_addr@PLT CALL instruction is transparent to compiler which assumes all registers, except for EFLAGS, AX, CX, and DX, are unchanged after CALL. But ___tls_get_addr is a normal function which doesn't preserve any vector registers. 1. Rename the generic __tls_get_addr function to ___tls_get_addr_internal. 2. Change ___tls_get_addr to a wrapper function with implementations for FNSAVE, FXSAVE, XSAVE and XSAVEC to save and restore all vector registers. 3. dl-tlsdesc-dynamic.h has: _dl_tlsdesc_dynamic: /* Like all TLS resolvers, preserve call-clobbered registers. We need two scratch regs anyway. */ subl $32, %esp cfi_adjust_cfa_offset (32) It is wrong to use movl %ebx, -28(%esp) movl %esp, %ebx cfi_def_cfa_register(%ebx) ... mov %ebx, %esp cfi_def_cfa_register(%esp) movl -28(%esp), %ebx to preserve EBX on stack. Fix it with: movl %ebx, 28(%esp) movl %esp, %ebx cfi_def_cfa_register(%ebx) ... mov %ebx, %esp cfi_def_cfa_register(%esp) movl 28(%esp), %ebx 4. Update _dl_tlsdesc_dynamic to call ___tls_get_addr_internal directly. 5. Add have-test-mtls-traditional to compile tst-tls23-mod.c with traditional TLS variant to verify the fix. 6. Define DL_RUNTIME_RESOLVE_REALIGN_STACK in sysdeps/x86/sysdep.h. This fixes BZ #32996. Co-Authored-By: Adhemerval Zanella <adhemerval.zanella@linaro.org> Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit `848f0e46f0`)	2025-08-18 12:44:54 -07:00
Florian Weimer	aa5dbd5332	elf: Preserve _rtld_global layout for the release branch Backporting commit `9d6577fdff` ("elf: Introduce _dl_debug_change_state") removed the _ns_debug member. Keep it to preseve struct layout.	2025-08-18 14:15:33 +02:00
Florian Weimer	42f9c70ac2	elf: Test dlopen (NULL, RTLD_LAZY) from an ELF constructor This call must not complete initialization of all shared objects in the global scope because the ELF constructor which makes the call likely has not finished initialization. Calling more constructors at this point would expose those to a partially constructed dependency. This completes the revert of commit `9897ced8e7` ("elf: Run constructors on cyclic recursive dlopen (bug 31986)"). (cherry picked from commit `d604f9c500`)	2025-08-18 13:06:12 +02:00
Florian Weimer	3f3b4fdd0b	elf: Fix handling of symbol versions which hash to zero (bug 29190) This was found through code inspection. No application impact is known. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit `46d3198094`)	2025-08-18 13:06:12 +02:00
H.J. Lu	e2d9e9eb26	x86-64: Add GLIBC_ABI_DT_X86_64_PLT [BZ #33212 ] When the linker -z mark-plt option is used to add DT_X86_64_PLT, DT_X86_64_PLTSZ and DT_X86_64_PLTENT, the r_addend field of the R_X86_64_JUMP_SLOT relocation stores the offset of the indirect branch instruction. However, glibc versions without the commit: commit `f8587a6189` Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri May 20 19:21:48 2022 -0700 x86-64: Ignore r_addend for R_X86_64_GLOB_DAT/R_X86_64_JUMP_SLOT According to x86-64 psABI, r_addend should be ignored for R_X86_64_GLOB_DAT and R_X86_64_JUMP_SLOT. Since linkers always set their r_addends to 0, we can ignore their r_addends. Reviewed-by: Fangrui Song <maskray@google.com> won't ignore the r_addend value in the R_X86_64_JUMP_SLOT relocation. Such programs and shared libraries will fail at run-time randomly. Add GLIBC_ABI_DT_X86_64_PLT version to indicate that glibc is compatible with DT_X86_64_PLT. The linker can add the glibc GLIBC_ABI_DT_X86_64_PLT version dependency whenever -z mark-plt is passed to the linker. The resulting programs and shared libraries will fail to load at run-time against libc.so without the GLIBC_ABI_DT_X86_64_PLT version, instead of fail randomly. This fixes BZ #33212. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org> (cherry picked from commit `399384e0c8`)	2025-08-15 22:23:22 +01:00
H.J. Lu	dd87fcda43	x86-64: Add GLIBC_ABI_GNU2_TLS version [BZ #33129 ] Programs and shared libraries compiled with -mtls-dialect=gnu2 may fail silently at run-time against glibc without the GNU2 TLS run-time fix for: https://sourceware.org/bugzilla/show_bug.cgi?id=31372 Add GLIBC_ABI_GNU2_TLS version to indicate that glibc has the working GNU2 TLS run-time. Linker can add the GLIBC_ABI_GNU2_TLS version to binaries which depend on the working GNU2 TLS run-time: https://sourceware.org/bugzilla/show_bug.cgi?id=33130 so that such programs and shared libraries will fail to load and run at run-time against libc.so without the GLIBC_ABI_GNU2_TLS version, instead of fail silently at random. This fixes BZ #33129. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org> (cherry picked from commit `9df8fa397d`)	2025-08-15 22:17:42 +01:00
Florian Weimer	85780b6055	elf: Compile _dl_debug_state separately (bug 33224) This ensures that the compiler will not inline it, so that debuggers which do not use the Systemtap probes can reliably set a breakpoint on it. Reviewed-by: Andreas K. Huettel <dilfridge@gentoo.org> Tested-by: Andreas K. Huettel <dilfridge@gentoo.org> (cherry picked from commit `620f0730f3`)	2025-08-15 19:43:48 +02:00
Florian Weimer	b05ce0de3d	elf: Restore support for _r_debug interpositions and copy relocations The changes in commit `a93d9e03a3` ("Extend struct r_debug to support multiple namespaces [BZ #15971]") break the dyninst dynamic instrumentation tool. It brings its own definition of _r_debug (rather than a declaration). Furthermore, it turns out it is rather hard to use the proposed handshake for accessing _r_debug via DT_DEBUG. If applications want to access _r_debug, they can do so directly if the relevant code has been built as PIC. To protect against harm from accidental copy relocations due to linker relaxations, this commit restores copy relocation support by adjusting both copies if interposition or copy relocations are in play. Therefore, it is possible to use a hidden reference in ld.so to access _r_debug. Only perform the copy relocation initialization if libc has been loaded. Otherwise, the ld.so search scope can be empty, and the lookup of the _r_debug symbol mail fail. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `ea85e7d550`)	2025-08-15 19:43:48 +02:00
Florian Weimer	9d6577fdff	elf: Introduce _dl_debug_change_state It combines updating r_state with the debugger notification. The second change to _dl_open introduces an additional debugger notification for dlmopen, but debuggers are expected to ignore it. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `8329939a37`)	2025-08-15 19:43:48 +02:00
Florian Weimer	4a3e5f1e4c	elf: Introduce separate _r_debug_array variable It replaces the ns_debug member of the namespaces. Previously, the base namespace had an unused ns_debug member. This change also fixes a concurrency issue: Now _dl_debug_initialize only updates r_next of the previous namespace's r_debug after the new r_debug is initialized, so that only the initialized version is observed. (Client code accessing _r_debug will benefit from load dependency tracking in CPUs even without explicit barriers.) Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `7278d11f3a`)	2025-08-15 19:43:48 +02:00
Jens Remus	940d821afb	Use TLS initial-exec model for __libc_tsd_CTYPE_* thread variables [BZ #33234 ] Commit `10a66a8e42` ("Remove <libc-tsd.h>") removed the TLS initial-exec (IE) model attribute from the __libc_tsd_CTYPE_* thread variable declarations and definitions. Commit `a894f04d87` ("Optimize __libc_tsd_* thread variable access") restored it on declarations. Restore the TLS initial-exec model attribute on __libc_tsd_CTYPE_* thread variable definitions. This resolves test tst-locale1 failure on s390 32-bit, when using a GNU linker without the fix from GNU binutils commit aefebe82dc89 ("IBM zSystems: Fix offset relative to static TLS"). Reviewed-by: Florian Weimer <fweimer@redhat.com> (cherry picked from commit `e5363e6f46`)	2025-08-14 12:07:42 +02:00
Florian Weimer	5d8614b284	ctype: Fallback initialization of TLS using relocations (bug 19341, bug 32483) This ensures that the ctype data pointers in TLS are valid in secondary namespaces even without initialization via __ctype_init. Reviewed-by: Frédéric Bérat <fberat@redhat.com> (cherry picked from commit `2745db8dd3`)	2025-08-14 11:41:34 +02:00
Florian Weimer	f409ec073f	Use proper extern declaration for _nl_C_LC_CTYPE_{class,toupper,tolower} The existing initializers already contain explicit casts. Keep them due to int/uint32_t mismatch. Reviewed-by: Frédéric Bérat <fberat@redhat.com> (cherry picked from commit `e0c0f856f5`)	2025-08-14 11:41:34 +02:00
Florian Weimer	330cd035df	Remove <libc-tsd.h> Use __thread variables directly instead. The macros do not save any typing. It seems unlikely that a future port will lack __thread variable support. Some of the __libc_tsd_* variables are referenced from assembler files, so keep their names. Previously, <libc-tls.h> included <tls.h>, which in turn included <errno.h>, so a few direct includes of <errno.h> are now required. Reviewed-by: Frédéric Bérat <fberat@redhat.com> (cherry picked from commit `10a66a8e42`)	2025-08-14 11:41:34 +02:00
Luna Lamb	c467918138	AArch64: Improve codegen SVE log1p helper Improve codegen by packing coefficients. 4% and 2% improvement in throughput microbenchmark on Neoverse V1, for acosh and atanh respectively. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> (cherry picked from commit `6849c5b791`)	2025-08-11 15:31:30 +00:00
Dylan Fleming	6db0f659c8	AArch64: Optimise SVE FP64 Hyperbolics Reworke SVE FP64 hyperbolics to use the SVE FEXPA instruction. Also update the special case handelling for large inputs to be entirely vectorised. Performance improvements on Neoverse V1: cosh_sve: 19% for \|x\| < 709, 5x otherwise sinh_sve: 24% for \|x\| < 709, 5.9x otherwise tanh_sve: 12% for \|x\| < 19, 9x otherwise Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> (cherry picked from commit `dee22d2a81`)	2025-08-11 15:31:16 +00:00
Dylan Fleming	503f7a7d33	AArch64: Optimize SVE exp functions Improve performance of SVE exps by making better use of the SVE FEXPA instruction. Performance improvement on Neoverse V1: exp2_sve: 21% exp2f_sve: 24% exp10f_sve: 23% expm1_sve: 25% Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> (cherry picked from commit `1e3d1ddf97`)	2025-08-11 15:31:10 +00:00
Luna Lamb	307a8a4434	AArch64: Improve codegen in SVE log1p Improves memory access, reformat evaluation scheme to pack coefficients. 5% improvement in throughput microbenchmark on Neoverse V1. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> (cherry picked from commit `da196e6134`)	2025-08-11 15:30:54 +00:00
Dylan Fleming	2a0b52fdd6	AArch64: Optimize inverse trig functions Improve performance of Inverse trig functions by altering how coefficients are loaded. Performance improvement on Neoverse V1: SVE acos 14% AdvSIMD acos 6% AdvSIMD asin 6% SVE asin 5% AdvSIMD asinf 2% AdvSIMD atanf 22% SVE atanf 20% SVE atan 11% AdvSIMD atan 5% SVE atan2 7% SVE atan2f 4% AdvSIMD atan2f 3% AdvSIMD atan2 2% Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> (cherry picked from commit `1e84509e00`)	2025-08-11 15:30:38 +00:00
Pierre Blanchard	0bb6dad5af	AArch64: Optimize algorithm in users of SVE expf helper Polynomial order was unnecessarily high, unlocking multiple optimizations. Max error for new SVE expf is 0.88 +0.5ULP. Max error for new SVE coshf is 2.56 +0.5ULP. Performance improvement on Neoverse V1: expf (30%), coshf (26%). Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> (cherry picked from commit `cf56eb28fa`)	2025-08-11 15:30:28 +00:00
Wilco Dijkstra	daa4de5253	AArch64: Avoid memset ifunc in cpu-features.c [BZ #33112 ] During early startup memcpy or memset must not be called since many targets use ifuncs for them which won't be initialized yet. Security hardening may use -ftrivial-auto-var-init=zero which inserts calls to memset. Redirect memset to memset_generic by including dl-symbol-redir-ifunc.h in cpu-features.c. This fixes BZ #33112. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit `681a24ae4d`)	2025-08-11 14:58:23 +00:00
Adhemerval Zanella	1502c248d5	nptl: Fix SYSCALL_CANCEL for return values larger than INT_MAX (BZ 33245) The SYSCALL_CANCEL calls __syscall_cancel, which in turn calls __internal_syscall_cancel with an 'int' return instead of the expected 'long int'. This causes issues with syscalls that return values larger than INT_MAX, such as copy_file_range [1]. Checked on x86_64-linux-gnu. [1] https://debbugs.gnu.org/cgi/bugreport.cgi?bug=79139 Reviewed-by: Andreas K. Huettel <dilfridge@gentoo.org> (cherry picked from commit `7107bebf19`)	2025-08-01 17:43:29 -03:00
Florian Weimer	513629b14d	elf: Handle ld.so with LOAD segment gaps in _dl_find_object (bug 31943) Detect if ld.so not contiguous and handle that case in _dl_find_object. Set l_find_object_processed even for initially loaded link maps, otherwise dlopen of an initially loaded object adds it to _dlfo_loaded_mappings (where maps are expected to be contiguous), in addition to _dlfo_nodelete_mappings. Test elf/tst-link-map-contiguous-ldso iterates over the loader image, reading every word to make sure memory is actually mapped. It only does that if the l_contiguous flag is set for the link map. Otherwise, it finds gaps with mmap and checks that _dl_find_object does not return the ld.so mapping for them. The test elf/tst-link-map-contiguous-main does the same thing for the libc.so shared object. This only works if the kernel loaded the main program because the glibc dynamic loader may fill the gaps with PROT_NONE mappings in some cases, making it contiguous, but accesses to individual words may still fault. Test elf/tst-link-map-contiguous-libc is again slightly different because the dynamic loader always fills the gaps with PROT_NONE mappings, so a different form of probing has to be used. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit `20681be149`)	2025-08-01 19:28:30 +02:00
Florian Weimer	f48de98bce	elf: Extract rtld_setup_phdr function from dl_main Remove historic binutils reference from comment and update how this data is used by applications. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit `2cac9559e0`)	2025-08-01 19:28:30 +02:00
Florian Weimer	62928cf7d8	elf: Do not add a copy of _dl_find_object to libc.so This reduces code size and dependencies on ld.so internals from libc.so. Fixes commit `f4c142bb9f` ("arm: Use _dl_find_object on __gnu_Unwind_Find_exidx (BZ 31405)"). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit `96429bcc91`)	2025-08-01 19:28:30 +02:00
Davide Cavalca	373408c19f	stdlib: resolve a double lock init issue after fork [BZ #32994 ] The __abort_fork_reset_child (introduced in `d40ac01cbb`) call resets the lock after the fork. This causes a DRD regression in valgrind (https://bugs.kde.org/show_bug.cgi?id=503668), as it's effectively a double initialization, despite it being actually ok in this case. As suggested in https://sourceware.org/bugzilla/show_bug.cgi?id=32994#c2 we replace it here with a memcpy of another initialized lock instead, which makes valgrind happy. Reviewed-by: Florian Weimer <fweimer@redhat.com> (cherry picked from commit `d9a348d092`)	2025-08-01 13:42:43 +02:00
Florian Weimer	e7c419a295	iconv: iconv -o should not create executable files (bug 33164) The mistake is that open must use 0666 to pick up the umask, and not 0777 (which is required by mkdir). Fixes commit `8ef3cff9d1` ("iconv: Support in-place conversions (bug 10460, bug 32033)"). Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `cdcf24ee14`)	2025-07-24 09:53:50 +02:00

1 2 3 4 5 ...

42161 Commits All Branches Search

42161 Commits

All Branches