Commit Graph

2147 Commits

Author SHA1 Message Date
H.J. Lu 762bb01d4e int128: Check BITS_PER_MP_LIMB == 32 instead of __WORDSIZE == 32
commit 8cd6efca5b
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Thu Nov 20 15:30:06 2025 -0300

    Add add_ssaaaa and sub_ssaaaa to gmp-arch.h

checks __WORDSIZE == 32 to decide if int128 should be used, which breaks
x32 which has int128 and __WORDSIZE == 32.  Check BITS_PER_MP_LIMB == 32,
instead of __WORDSIZE == 32.  This fixes BZ #33677.

Tested on x32, x86-64 and i686.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-12-04 07:46:20 +08:00
H.J. Lu 3dd2cbfa35 Use 64-bit atomic on sem_t with 8-byte alignment [BZ #33632]
commit 7fec8a5de6
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Thu Nov 13 14:26:08 2025 -0300

    Revert __HAVE_64B_ATOMICS configure check

uses 64-bit atomic operations on sem_t if 64-bit atomics are supported.
But sem_t may be aligned to 32-bit on 32-bit architectures.

1. Add a macro, SEM_T_ALIGN, for sem_t alignment.
2. Add a macro, HAVE_UNALIGNED_64B_ATOMICS.  Define it if unaligned 64-bit
atomic operations are supported.
3. Add a macro, USE_64B_ATOMICS_ON_SEM_T.  Define to 1 if 64-bit atomic
operations are supported and SEM_T_ALIGN is at least 8-byte aligned or
HAVE_UNALIGNED_64B_ATOMICS is defined.
4. Assert that size and alignment of sem_t are not lower than those of
the internal struct new_sem.
5. Check USE_64B_ATOMICS_ON_SEM_T, instead of USE_64B_ATOMICS, when using
64-bit atomic operations on sem_t.

This fixes BZ #33632.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-12-02 06:50:49 +08:00
Adhemerval Zanella 5dab2a3195 stdlib: Remove longlong.h
The gmp-arch.h now provides all the required definitions.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-26 10:10:06 -03:00
Adhemerval Zanella 7a0471f149 Add umul_ppmm to gmp-arch.hdoc
To enable “longlong.h” removal, the umul_ppmm is moved to a gmp-arch.h.
The generic implementation now uses a static inline, which provides
better type checking than the GNU extension to cast the asm constraint
(and it works better with clang).

Most of the architecture uses the generic implementation, which is
expanded from a macro, except for alpha, arm, hppa, x86, m68k, mips,
powerpc, and sparc.  The 32 bit architectures the compiler generates
good enough code using uint64_t types, where for 64 bit architecture
the patch leverages the math_u128.h definitions that uses 128-bit
integers when available (all 64 bit architectures on gcc 15).

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-26 10:10:06 -03:00
Adhemerval Zanella 8cd6efca5b Add add_ssaaaa and sub_ssaaaa to gmp-arch.h
To enable “longlong.h” removal, add_ssaaaa and sub_ssaaaa are moved to
gmp-arch.h.  The generic implementation now uses a static inline.  This
provides better type checking than the GNU extension, which casts the
asm constraint; and it also works better with clang.

Most architectures use the generic implementation, with except of
arc, arm, hppa, x86, m68k, powerpc, and sparc.  The 32 bit architectures
the compiler generates good enough code using uint64_t types, where
for 64 bit architecture the patch leverages the math_u128.h definitions
that uses 128-bit integers when available (all 64 bit architectures
on gcc 15).

The strongly typed implementation required some changes.  I adjusted
_FP_W_TYPE, _FP_WS_TYPE, and _FP_I_TYPE to use the same type as
mp_limb_t on aarch64, powerpc64le, x86_64, and riscv64.  This basically
means using “long” instead of “long long.”

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-26 10:10:02 -03:00
Adhemerval Zanella 476e962af7 Add gmp-arch and udiv_qrnnd
To enable “longlong.h” removal, the udiv_qrnnd is moved to a gmp-arch.h
file.  It allows each architecture to implement its own arch-specific
optimizations.  The generic implementation now uses a static inline,
which provides better type checking than the GNU extension to cast the
asm constraint (and it works better with clang).

Most of the architecture uses the generic implementation, which is
expanded from a macro, except for alpha, x86, m68k, sh, and sparc.
I kept that alpha, which uses out-of-the-line implementations and x86,
where there is no easy way to use the div{q} instruction from C code.
For the rest, the compiler generates good enough code.

The hppa also provides arch-specific implementations, but they are not
routed in “longlong.h” and thus never used.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-25 14:52:15 -03:00
Samuel Thibault 0f7b73f2ed htl: Fix conditions for thread list variables
_dl_stack_used/user/etc. vs _dl_pthread_num_threads etc. is really an
nptl vs htl question rather than pthread being in libc.
2025-11-22 21:55:02 +01:00
Adhemerval Zanella 8d26bed1eb Enable --enable-fortify-source with clang
clang generates internal calls for some _chk symbol, so add internal
aliases for them, and stub some with rtld-stubbed-symbols to avoid
ld.so linker issues.

Reviewed-by: Sam James <sam@gentoo.org>
2025-11-21 13:13:11 -03:00
Stefan Liebler b9579342c6 Remove support for lock elision.
The support for lock elision was already deprecated with glibc 2.42:
commit 77438db8cf
"Mark support for lock elision as deprecated."
See also discussions:
https://sourceware.org/pipermail/libc-alpha/2025-July/168492.html

This patch removes the architecture specific support for lock elision
for x86, powerpc and s390 by removing the elision-conf.h, elision-conf.c,
elision-lock.c, elision-timed.c, elision-unlock.c, elide.h, htm.h/hle.h files.
Those generic files are also removed.

The architecture specific structures are adjusted and the elision fields are
marked as unused.  See struct_mutex.h files.
Furthermore in struct_rwlock.h, the leftover __rwelision was also removed.
Those were originally removed with commit 0377a7fde6
"nptl: Remove rwlock elision definitions"
and by chance reintroduced with commit 7df8af43ad
"nptl: Add struct_rwlock.h"

The common code (e.g. the pthread_mutex-files) are changed back to the time
before lock elision was introduced with the x86-support:
- commit 1cdbe57948
"Add the low level infrastructure for pthreads lock elision with TSX"
- commit b023e4ca99
"Add new internal mutex type flags for elision."
- commit 68cc29355f
"Add minimal test suite changes for elision enabled kernels"
- commit e8c659d74e
"Add elision to pthread_mutex_{try,timed,un}lock"
- commit 49186d21ef
"Disable elision for any pthread_mutexattr_settype call"
- commit 1717da59ae
"Add a configure option to enable lock elision and disable by default"

Elision is removed also from the tunables, the initialization part, the
pretty-printers and the manual.

Some extra handling in the testsuite is removed as well as the full tst-mutex10
testcase, which tested a race while enabling lock elision.

I've also searched the code for "elision", "elide", "transaction" and e.g.
cleaned some comments.

I've run the testsuite on x86_64 and s390x and run the build-many-glibcs.py
script.
Thanks to Sachin Monga, this patch is also tested on powerpc.

A NEWS entry also mentions the removal.
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-18 14:21:13 +01:00
Adhemerval Zanella 13cfd77bf5 math: Don't redirect inlined builtin math functions
When we want to inline builtin math functions, like truncf, for

  extern float truncf (float __x) __attribute__ ((__nothrow__ )) __attribute__ ((__const__));
  extern float __truncf (float __x) __attribute__ ((__nothrow__ )) __attribute__ ((__const__));

  float (truncf) (float) asm ("__truncf");

compiler may redirect truncf calls to __truncf, instead of inlining it
(for instance, clang).  The USE_TRUNCF_BUILTIN is 1 to indicate that
truncf should be inlined.  In this case, we don't want the truncf
redirection:

  1. For each math function which may be inlined, we define

  #if USE_TRUNCF_BUILTIN
   # define NO_truncf_BUILTIN inline_truncf
   #else
   # define NO_truncf_BUILTIN truncf
   #endif

in <math-use-builtins.h>.

  2. Include <math-use-builtins.h> in include/math.h.

  3. Change MATH_REDIRECT to

   #define MATH_REDIRECT(FUNC, PREFIX, ARGS)		\
    float (NO_ ## FUNC ## f ## _BUILTIN) (ARGS (float))	\
      asm (PREFIX #FUNC "f");

With this change If USE_TRUNCF_BUILTIN is 0, we get

  float (truncf) (float) asm ("__truncf");
  truncf will be redirected to __truncf.

And for USE_TRUNCF_BUILTIN 1, we get:

  float (inline_truncf) (float) asm ("__truncf");

In both cases either truncf will be inlined or the internal alias
(__truncf) will be called.

It is not required for all math-use-builtin symbol, only the one
defined in math.h.  It also allows to remove all the math-use-builtin
inclusion, since it is now implicitly included by math.h.

For MIPS, some math-use-builtin headers include sysdep.h and this
in turn includes a lot of extra headers that do not allow ldbl-128
code to override alias definition (math.h will include
some stdlib.h definition).  The math-use-builtin only requires
the __mips_isa_rev, so move the defintion to sgidefs.h.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Co-authored-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-11-17 11:17:07 -03:00
Florian Weimer c6f151839b Reference COPYING.LIB in <sframe.h> copyright header
Commit 3360913c37 ("elf: Add SFrame
stack tracing") added this file with an inconsistent copyright header.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2025-11-17 11:15:13 +01:00
Adhemerval Zanella 7fec8a5de6 Revert __HAVE_64B_ATOMICS configure check
The 53807741fb added a configure check
for 64-bit atomic operations that were not previously enabled on some
32-bit ABIs.

However, the NPTL semaphore code casts a sem_t to a new_sem and issues
a 64-bit atomic operation for __HAVE_64B_ATOMICS.  Since sem_t has
32-bit alignment on 32-bit architectures, this prevents the use of
64-bit atomics even if the ABI supports them.

Assume 64-bit atomic support from __WORDSIZE, which maps to how glibc
defines it before the broken change.  Also rename __HAVE_64B_ATOMICS
to USE_64B_ATOMICS to define better the flag meaning.

Checked on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-14 14:05:20 -03:00
Adhemerval Zanella 3078358ac6 math: Remove the SVID error handling from tgammaf
It improves latency for about 1.5% and throughput for about 2-4%.

Tested on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-05 10:19:37 -03:00
Wilco Dijkstra 324c088a18 nptl: Remove ATOMIC_EXCHANGE_USES_CAS usage
The only usage was for pthread_spin_lock, introduced by 12d2dd7060,
as a way to optimize the code for certain architectures. Now that atomic
builtins are used by default, let the compiler use the best code sequence
for the atomic exchange.

Co-authored-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-04 04:14:01 -03:00
Adhemerval Zanella 95a0ad1ea1 atomic: Consolidate atomic_write_barrier implementation
All ABIs, except alpha and sparc, define it to
atomic_full_barrier/__sync_synchronize, which can be mapped to
__atomic_thread_fence (__ATOMIC_RELEASE).

For alpha, it uses a 'wmb' which does not map to any of C11
barriers.

For sparc it uses a stronger 'member #LoadStore | #StoreStore',
where the release barrier maps to just 'membar #StoreLoad'.  The
patch keeps the sparc definition.

For PowerPC, it allows the use of lwsync for additional chips
(since _ARCH_PWR4 does not cover all chips that support it).

Tested on aarch64-linux-gnu.

Co-authored-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-04 04:14:01 -03:00
Adhemerval Zanella 304b22d7f9 atomic: Consolidate atomic_read_barrier implementation
All ABIs, except alpha, powerpc, and x86_64, define it to
atomic_full_barrier/__sync_synchronize, which can be mapped to
__atomic_thread_fence (__ATOMIC_SEQ_CST) in most cases, with the
exception of aarch64 (where the acquire fence is generated as
'dmb ishld' instead of 'dmb ish').

For s390x, it defaults to a memory barrier where __sync_synchronize
emits a 'bcr 15,0' (which the manual describes as pipeline
synchronization).

For PowerPC, it allows the use of lwsync for additional chips
(since _ARCH_PWR4 does not cover all chips that support it).

Tested on aarch64-linux-gnu, where the acquire produces a different
instruction that the current code.

Co-authored-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-04 04:14:01 -03:00
Adhemerval Zanella 70ee250fb8 atomic: Consolidate atomic_full_barrier implementation
All ABIs save for sparcv9 and s390 defines it to __sync_synchronize,
which can be mapped to __atomic_thread_fence (__ATOMIC_SEQ_CST).

For Sparc, it uses a stricter #StoreStore|#LoadStore|#StoreLoad|#LoadLoad
instead of the #StoreLoad generated by __sync_synchronize.

For s390x, it defaults to a memory barrier where __sync_synchronize
emits a 'bcr 15,0' (which the manual describes as pipeline synchronization).

The barrier is used only in one place (pthread_mutex_setprioceiling),
and using a stricter barrier for s390 is ok performance-wise.

Co-authored-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-04 04:14:01 -03:00
Frédéric Bérat 332f8e62af tls: Add debug logging for TLS and TCB management
Introduce the `DL_DEBUG_TLS` debug mask to enable detailed logging for
Thread-Local Storage (TLS) and Thread Control Block (TCB) management.

This change integrates a new `tls` option into the `LD_DEBUG`
environment variable, allowing developers to trace:
- TCB allocation, deallocation, and reuse events in `dl-tls.c`,
  `nptl/allocatestack.c`, and `nptl/nptl-stack.c`.
- Thread startup events, including the TID and TCB address, in
  `nptl/pthread_create.c`.

A new test, `tst-dl-debug-tid`, has been added to validate the
functionality of this new debug logging, ensuring that relevant messages
are correctly generated for both main and worker threads.

This enhances the debugging capabilities for diagnosing issues related
to TLS allocation and thread lifecycle within the dynamic linker.

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-11-03 10:47:28 +01:00
Wilco Dijkstra 35807cc5cd math: Add builtin support for (l)lround(f)
Add builtin support for (l)lround(f) via the math-use-builtins
header mechanism.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-10-17 17:03:54 +00:00
Adhemerval Zanella 63ba1a1509 math: Add fetestexcept internal alias
To avoid linknamespace issues on old standards.  It is required
if the fallback fma implementation is used if/when it is also
used internally for other implementation.
Reviewed-by: DJ Delorie <dj@redhat.com>
2025-09-11 14:46:07 -03:00
Cupertino Miranda 3b2b88ccee elf: early conversion of elf p_flags to mprotect flags
This patch replaces _dl_stack_flags global variable by
_dl_stack_prot_flags.
The advantage is that any convertion from p_flags to final used mprotect
flags occurs at loading of p_flags. It avoids repeated spurious
convertions of _dl_stack_flags, for example in allocate_thread_stack.

This modification was suggested in:
  https://sourceware.org/pipermail/libc-alpha/2025-March/165537.html

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-08-27 10:45:45 -03:00
Claudiu Zissulescu 072b5a9922 sframe: Add support for SFRAME_F_FDE_FUNC_START_PCREL flag
The Sframe V2 has a new errata which introduces the
SFRAME_F_FDE_FUNC_START_PCREL flag. This flag indicates the encoding
of the SFrame FDE function start address field like this:

- if set, sfde_func_start_address field contains the offset in bytes
to the start PC of the associated function from the field itself.

- if unset, sfde_func_start_address field contains the offset in bytes
to the start PC of the associated function from the start of the
SFrame section.

Signed-off-by: Claudiu Zissulescu <claudiu.zissulescu-ianculescu@oracle.com>
Reviewed-by: Sam James <sam@gentoo.org>
2025-07-24 15:51:58 -03:00
Adhemerval Zanella 20528165bd Disable SFrame support by default
And add extra checks to enable for binutils 2.45 and if the architecture
explicitly enables it.  When SFrame is disabled, all the related code
is also not enabled for backtrace() and _dl_find_object(), so SFrame
backtracking is not used even if the binary has the SFrame segment.

This patch also adds some other related fixes:

  * Fixed an issue with AC_CHECK_PROG_VER, where the READELF_SFRAME
    usage prevented specifying a different readelf through READELF
    environment variable at configure time.

  * Add an extra arch-specific internal definition,
    libc_cv_support_sframe, to disable --enable-sframe on architectures
    that have binutils but not glibc support (s390x).

  * Renamed the tests without the .sframe segment and move the
    tst-backtrace1 from pthread to debug.

  * Use the built compiler strip to remove the .sframe segment,
    instead of the system one (which might not support SFrame).

Checked on x86_64-linux-gnu and aarch64-linux-gnu.

Reviewed-by: Sam James <sam@gentoo.org>
2025-07-24 15:51:58 -03:00
Claudiu Zissulescu 3360913c37
elf: Add SFrame stack tracing
This patch adds the necessary bits to enable stack tracing using
SFrame.  In the case the new SFrame stack tracing procedure doesn't
find SFrame related info, the stack tracing falls back on default
Dwarf implementation.

The new SFrame stack tracing procedure is added to debug/backtrace.c
file, the support functions are added in sysdeps folder, namely
sframe.h, read-sframe.c and read-sfame.h.

Signed-off-by: Claudiu Zissulescu <claudiu.zissulescu-ianculescu@oracle.com>
Reviewed-by: DJ Delorie <dj@redhat.com>
2025-07-14 10:56:37 +01:00
Florian Weimer ea85e7d550 elf: Restore support for _r_debug interpositions and copy relocations
The changes in commit a93d9e03a3
("Extend struct r_debug to support multiple namespaces [BZ #15971]")
break the dyninst dynamic instrumentation tool.  It brings its
own definition of _r_debug (rather than a declaration).

Furthermore, it turns out it is rather hard to use the proposed
handshake for accessing _r_debug via DT_DEBUG. If applications want
to access _r_debug, they can do so directly if the relevant code has
been built as PIC.  To protect against harm from accidental copy
relocations due to linker relaxations, this commit restores copy
relocation support by adjusting both copies if interposition or
copy relocations are in play.  Therefore, it is possible to
use a hidden reference in ld.so to access _r_debug.

Only perform the copy relocation initialization if libc has been
loaded.  Otherwise, the ld.so search scope can be empty, and the
lookup of the _r_debug symbol mail fail.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-07-05 20:15:12 +02:00
Florian Weimer 8329939a37 elf: Introduce _dl_debug_change_state
It combines updating r_state with the debugger notification.

The second change to  _dl_open introduces an additional debugger
notification for dlmopen, but debuggers are expected to ignore it.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-07-05 20:15:12 +02:00
Florian Weimer 7278d11f3a elf: Introduce separate _r_debug_array variable
It replaces the ns_debug member of the namespaces.  Previously,
the base namespace had an unused ns_debug member.

This change also fixes a concurrency issue: Now _dl_debug_initialize
only updates r_next of the previous namespace's r_debug after the new
r_debug is initialized, so that only the initialized version is
observed.  (Client code accessing _r_debug will benefit from load
dependency tracking in CPUs even without explicit barriers.)

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-07-05 20:15:12 +02:00
Florian Weimer 27cc947dce generic: Add missing parameter name to __getrandom_early_init
This is required after commit 03da41d47d
("Turn on -Wmissing-parameter-name by default if available").

Reviewed-by: Sam James <sam@gentoo.org>
2025-05-28 10:00:41 +02:00
Florian Weimer 10a66a8e42 Remove <libc-tsd.h>
Use __thread variables directly instead.  The macros do not save any
typing.  It seems unlikely that a future port will lack __thread
variable support.

Some of the __libc_tsd_* variables are referenced from assembler
files, so keep their names.  Previously, <libc-tls.h> included
<tls.h>, which in turn included <errno.h>, so a few direct includes
of <errno.h> are now required.

Reviewed-by: Frédéric Bérat <fberat@redhat.com>
2025-05-16 19:53:09 +02:00
Stefan Liebler 0fc76d8762 S390: Use cfi_val_offset instead of cfi_escape.
Due to raising the minimum binutils version to version >=2.28,
the used cfi_escape for cfi_val_offset can now be ommitted.

Checked with "objdump -WF" / "objdump -Wf" that the previous
cfi_escape and the new cfi_val_offset are equal.
2025-05-14 10:35:55 +02:00
Adhemerval Zanella 0c34259423 nptl: Fix pthread_getattr_np when modules with execstack are allowed (BZ 32897)
The BZ 32653 fix (12a497c716) kept the
stack pointer zeroing from make_main_stack_executable on
_dl_make_stack_executable.  However, previously the 'stack_endp'
pointed to temporary variable created before the call of
_dl_map_object_from_fd; while now we use the __libc_stack_end
directly.

Since pthread_getattr_np relies on correct __libc_stack_end, if
_dl_make_stack_executable is called (for instance, when
glibc.rtld.execstack=2 is set) __libc_stack_end will be set to zero,
and the call will always fail.

The __libc_stack_end zero was used a mitigation hardening, but since
52a01100ad it is used solely on
pthread_getattr_np code.  So there is no point in zeroing anymore.

Checked on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Sam James <sam@gentoo.org>
2025-04-28 10:13:46 -03:00
Adhemerval Zanella 12a497c716 elf: Extend glibc.rtld.execstack tunable to force executable stack (BZ 32653)
From the bug report [1], multiple programs still require to dlopen
shared libraries with either missing PT_GNU_STACK or with the executable
bit set.  Although, in some cases, it seems to be a hard-craft assembly
source without the required .note.GNU-stack marking (so the static linker
is forced to set the stack executable if the ABI requires it), other
cases seem that the library uses trampolines [2].

Unfortunately, READ_IMPLIES_EXEC is not an option since on some ABIs
(x86_64), the kernel clears the bit, making it unsupported.  To avoid
reinstating the broken code that changes stack permission on dlopen
(0ca8785a28), this patch extends the glibc.rtld.execstack tunable to
allow an option to force an executable stack at the program startup.

The tunable is a security issue because it defeats the PT_GNU_STACK
hardening.  It has the slight advantage of making it explicit by the
caller, and, as for other tunables, this is disabled for setuid binaries.
A tunable also allows us to eventually remove it, but from previous
experiences, it would require some time.

Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu.

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=32653
[2] https://github.com/conda-forge/ctng-compiler-activation-feedstock/issues/143
Reviewed-by: Sam James <sam@gentoo.org>
2025-04-08 16:19:49 -03:00
Joseph Myers 75ad83f564 Implement C23 pown
C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the pown functions, which are like pow but with an
integer exponent.  That exponent has type long long int in C23; it was
intmax_t in TS 18661-4, and as with other interfaces changed after
their initial appearance in the TS, I don't think we need to support
the original version of the interface.  The test inputs are based on
the subset of test inputs for pow that use integer exponents that fit
in long long.

As the first such template implementation that saves and restores the
rounding mode internally (to avoid possible issues with directed
rounding and intermediate overflows or underflows in the wrong
rounding mode), support also needed to be added for using
SET_RESTORE_ROUND* in such template function implementations.  This
required math-type-macros-float128.h to include <fenv_private.h>, so
it can tell whether SET_RESTORE_ROUNDF128 is defined.  In turn, the
include order with <fenv_private.h> included before <math_private.h>
broke loongarch builds, showing up that
sysdeps/loongarch/math_private.h is really a fenv_private.h file
(maybe implemented internally before the consistent split of those
headers in 2018?) and needed to be renamed to fenv_private.h to avoid
errors with duplicate macro definitions if <math_private.h> is
included after <fenv_private.h>.

The underlying implementation uses __ieee754_pow functions (called
more than once in some cases, where the exponent does not fit in the
floating type).  I expect a custom implementation for a given format,
that only handles integer exponents but handles larger exponents
directly, could be faster and more accurate in some cases.

I encourage searching for worst cases for ulps error for these
implementations (necessarily non-exhaustively, given the size of the
input space).

Tested for x86_64 and x86, and with build-many-glibcs.py.
2025-03-27 10:44:44 +00:00
Adhemerval Zanella ed6a68bac7 debug: Improve '%n' fortify detection (BZ 30932)
The 7bb8045ec0 path made the '%n' fortify check ignore EMFILE errors
while trying to open /proc/self/maps, and this added a security
issue where EMFILE can be attacker-controlled thus making it
ineffective for some cases.

The EMFILE failure is reinstated but with a different error
message.  Also, to improve the false positive of the hardening for
the cases where no new files can be opened, the
_dl_readonly_area now uses  _dl_find_object to check if the
memory area is within a writable ELF segment.  The procfs method is
still used as fallback.

Checked on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Arjun Shankar <arjun@redhat.com>
2025-03-21 15:46:48 -03:00
Adhemerval Zanella 1894e219dc Remove eloop-threshold.h
On both Linux and Hurd the __eloop_threshold() is always a constant
(40 and 32 respectively), so there is no need to always call
__sysconf (_SC_SYMLOOP_MAX) for Linux case (!SYMLOOP_MAX).  To avoid
a name clash with gnulib, rename the new file min-eloop-threshold.h.

Checked on x86_64-linux-gnu and with a build for x86_64-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
2025-03-21 15:46:48 -03:00
Adhemerval Zanella 9b646f5dc9 elf: Canonicalize $ORIGIN in an explicit ld.so invocation [BZ 25263]
When an executable is invoked directly, we calculate $ORIGIN by calling
readlink on /proc/self/exe, which the Linux kernel resolves to the
target of any symlinks.  However, if an executable is run through ld.so,
we cannot use /proc/self/exe and instead use the path given as an
argument.  This leads to a different calculation of $ORIGIN, which is
most notable in that it causes ldd to behave differently (e.g., by not
finding a library) from directly running the program.

To make the behavior consistent, take advantage of the fact that the
kernel also resolves /proc/self/fd/ symlinks to the target of any
symlinks in the same manner, so once we have opened the main executable
in order to load it, replace the user-provided path with the result of
calling readlink("/proc/self/fd/N").

(On non-Linux platforms this resolution does not happen and so no
behavior change is needed.)

The __fd_to_filename requires _fitoa_word and _itoa_word, which for
32-bits pulls a lot of definitions from _itoa.c (due _ITOA_NEEDED
being defined).  To simplify the build move the required function
to a new file, _fitoa_word.c.

Checked on x86_64-linux-gnu and i686-linux-gnu.

Co-authored-by: Geoffrey Thomas <geofft@ldpreload.com>
Reviewed-by: Geoffrey Thomas <geofft@ldpreload.com>
Tested-by: Geoffrey Thomas <geofft@ldpreload.com>
2025-03-13 16:50:16 -03:00
Adhemerval Zanella 3e8814903c math: Refactor how to use libm-test-ulps
The current approach tracks math maximum supported errors by explicitly
setting them per function and architecture. On newer implementations or
new compiler versions, the file is updated with newer values if it
shows higher results. The idea is to track the maximum known error, to
update the manual with the obtained values.

The constant libm-test-ulps shows little value, where it is usually a
mechanical change done by the maintainer, for past releases it is
usually ignored whether the ulp change resulted from a compiler
regression, and the math tests already have a maximum ulp error that
triggers a regression.

It was shown by a recent update after the new acosf [1] implementation
that is correctly rounded, where the libm-test-ulps was indeed from a
compiler issue.

This patch removes all arch-specific libm-test-ulps, adds system generic
libm-test-ulps where applicable, and changes its semantics. The generic
files now track specific implementation constraints, like if it is
expected to be correctly rounded, or if the system-specific has
different error expectations.

Now multiple libm-test-ulps can be defined, and system-specific
overrides generic implementation.  This is for the case where
arch-specific implementation might show worse precision than generic
implementation, for instance, the cbrtf on i686.

Regressions are only reported if the implementation shows larger errors
than 9 ulps (13 for IBM long double) unless it is overridden by
libm-test-ulps and the maximum error is not printed at the end of tests.
The regen-ulps rule is also removed since it does not make sense to
update the libm-test-ulps automatically.

The manual error table is also removed, Paul Zimmermann and others have
been tracking libm precision with a more comprehensive analysis for some
releases; so link to his work instead.

[1] https://sourceware.org/git/?p=glibc.git;a=commit;h=9cc9f8e11e8fb8f54f1e84d9f024917634a78201
2025-03-12 13:40:07 -03:00
Adhemerval Zanella 1d60b9dfda Remove dl-procinfo.h
powerpc was the only architecture with arch-specific hooks for
LD_SHOW_AUXV, and with the information moved to ld diagnostics there
is no need to keep the _dl_procinfo hook.

Checked with a build for all affected ABIs.

Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2025-03-05 11:22:09 -03:00
Wilco Dijkstra e5893e6349 Remove unused dl-procinfo.h
Remove unused _dl_hwcap_string defines.  As a result many dl-procinfo.h headers
can be removed.  This also removes target specific _dl_procinfo implementations
which only printed HWCAP strings using dl_hwcap_string.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-02-28 16:55:18 +00:00
Florian Weimer 749310c61b elf: Add l_soname accessor function for DT_SONAME values
It's not necessary to introduce temporaries because the compiler
is able to evaluate l_soname just once in constracts like:

  l_soname (l) != NULL && strcmp (l_soname (l), LIBC_SO) != 0
2025-02-02 20:10:09 +01:00
Florian Weimer aa1bf89039 elf: Split _dl_lookup_map, _dl_map_new_object from _dl_map_object
So that they can eventually be called separately from dlopen.
2025-02-02 20:10:08 +01:00
Petr Malat 4c43173eba ld.so: Decorate BSS mappings
Decorate BSS mappings with [anon: glibc: .bss <file>], for example
[anon: glibc: .bss /lib/libc.so.6]. The string ".bss" is already used
by bionic so use the same, but add the filename as well. If the name
would be longer than what the kernel allows, drop the directory part
of the path.

Refactor glibc.mem.decorate_maps check to a separate function and use
it to avoid assembling a name, which would not be used later.

Signed-off-by: Petr Malat <oss@malat.biz>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-01-30 10:16:37 -03:00
Michael Jeanson 0e411c5d30 Add generic 'extra TLS'
Add the logic to append an 'extra TLS' block in the TLS block allocator
with a generic stub implementation. The duplicated code in
'csu/libc-tls.c' and 'elf/dl-tls.c' is to handle both statically linked
applications and the ELF dynamic loader.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2025-01-10 20:19:28 +00:00
Florian Weimer d1da011118 elf: Always define TLS_TP_OFFSET
This will be needed to compute __rseq_offset outside of the TLS
relocation machinery.

Reviewed-by: Michael Jeanson <mjeanson@efficios.com>
2025-01-09 19:30:44 +01:00
Florian Weimer 7a3e2e877a Move <thread_pointer.h> to kernel-independent sysdeps directories
Hurd is expected to use the same thread ABI as Linux.

Reviewed-by: Michael Jeanson <mjeanson@efficios.com>
2025-01-09 19:30:16 +01:00
Florian Weimer ceae7e2770 elf: Introduce generic <dl-tls.h>
On arc, the definition of TLS_DTV_UNALLOCATED now comes from
<dl-dtv.h>.

For x86-64 x32, a separate version is needed because unsigned long int
is 32 bits on this target.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-01-02 13:45:27 +01:00
Paul Eggert 2642002380 Update copyright dates with scripts/update-copyrights 2025-01-01 11:22:09 -08:00
Adhemerval Zanella 0ca8785a28 elf: Do not change stack permission on dlopen/dlmopen
If some shared library loaded with dlopen/dlmopen requires an executable
stack, either implicitly because of a missing GNU_STACK ELF header
(where the ABI default flags implies in the executable bit) or explicitly
because of the executable bit from GNU_STACK; the loader will try to set
the both the main thread and all thread stacks (from the pthread cache)
as executable.

Besides the issue where any __nptl_change_stack_perm failure does not
undo the previous executable transition (meaning that if the library
fails to load, there can be thread stacks with executable stacks), this
behavior was used on a CVE [1] as a vector for RCE.

This patch changes that if a shared library requires an executable
stack, and the current stack is not executable, dlopen fails.  The
change is done only for dynamically loaded modules, if the program
or any dependency requires an executable stack, the loader will still
change the main thread before program execution and any thread created
with default stack configuration.

[1] https://www.qualys.com/2023/07/19/cve-2023-38408/rce-openssh-forwarded-ssh-agent.txt

Checked on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2024-12-31 09:04:20 -03:00
Adhemerval Zanella a2b0ff98a0 include/sys/cdefs.h: Add __attribute_optimization_barrier__
Add __attribute_optimization_barrier__ to disable inlining and cloning on a
function.  For Clang, expand it to

__attribute__ ((optnone))

Otherwise, expand it to

__attribute__ ((noinline, clone))

Co-Authored-By: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
2024-12-23 06:28:55 +08:00
Florian Weimer ef5823d955 elf: Move _dl_rtld_map, _dl_rtld_audit_state out of GL
This avoids immediate GLIBC_PRIVATE ABI issues if the size of
struct link_map or struct auditstate changes.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-12-20 15:52:57 +01:00