Commit Graph

38755 Commits

Author SHA1 Message Date
Sachin Monga a94467ce05 ppc64le: Power 10 rawmemchr clobbers v20 (bug #33091)
Replace non-volatile(v20) by volatile(v17)
since v20 is not restored

Reviewed-by: Peter Bergner <bergner@tenstorrent.com>
(cherry picked from commit b59799f14f)
2025-11-27 11:34:19 -05:00
H.J. Lu 4e50046821
x86-64: Add GLIBC_ABI_DT_X86_64_PLT [BZ #33212]
When the linker -z mark-plt option is used to add DT_X86_64_PLT,
DT_X86_64_PLTSZ and DT_X86_64_PLTENT, the r_addend field of the
R_X86_64_JUMP_SLOT relocation stores the offset of the indirect
branch instruction.  However, glibc versions without the commit:

commit f8587a6189
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Fri May 20 19:21:48 2022 -0700

    x86-64: Ignore r_addend for R_X86_64_GLOB_DAT/R_X86_64_JUMP_SLOT

    According to x86-64 psABI, r_addend should be ignored for R_X86_64_GLOB_DAT
    and R_X86_64_JUMP_SLOT.  Since linkers always set their r_addends to 0, we
    can ignore their r_addends.

    Reviewed-by: Fangrui Song <maskray@google.com>

won't ignore the r_addend value in the R_X86_64_JUMP_SLOT relocation.
Such programs and shared libraries will fail at run-time randomly.

Add GLIBC_ABI_DT_X86_64_PLT version to indicate that glibc is compatible
with DT_X86_64_PLT.

The linker can add the glibc GLIBC_ABI_DT_X86_64_PLT version dependency
whenever -z mark-plt is passed to the linker.  The resulting programs and
shared libraries will fail to load at run-time against libc.so without the
GLIBC_ABI_DT_X86_64_PLT version, instead of fail randomly.

This fixes BZ #33212.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
(cherry picked from commit 399384e0c8)
2025-08-18 01:05:46 +01:00
Florian Weimer c97735cfde elf: Handle ld.so with LOAD segment gaps in _dl_find_object (bug 31943)
Detect if ld.so not contiguous and handle that case in _dl_find_object.
Set l_find_object_processed even for initially loaded link maps,
otherwise dlopen of an initially loaded object adds it to
_dlfo_loaded_mappings (where maps are expected to be contiguous),
in addition to _dlfo_nodelete_mappings.

Test elf/tst-link-map-contiguous-ldso iterates over the loader
image, reading every word to make sure memory is actually mapped.
It only does that if the l_contiguous flag is set for the link map.
Otherwise, it finds gaps with mmap and checks that _dl_find_object
does not return the ld.so mapping for them.

The test elf/tst-link-map-contiguous-main does the same thing for
the libc.so shared object.  This only works if the kernel loaded
the main program because the glibc dynamic loader may fill
the gaps with PROT_NONE mappings in some cases, making it contiguous,
but accesses to individual words may still fault.

Test elf/tst-link-map-contiguous-libc is again slightly different
because the dynamic loader always fills the gaps with PROT_NONE
mappings, so a different form of probing has to be used.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
(cherry picked from commit 20681be149)
2025-08-14 10:22:30 +02:00
Florian Weimer 96cc65a28a elf: Extract rtld_setup_phdr function from dl_main
Remove historic binutils reference from comment and update
how this data is used by applications.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
(cherry picked from commit 2cac9559e0)
2025-08-14 10:22:00 +02:00
Florian Weimer e3f04f64fa elf: Do not add a copy of _dl_find_object to libc.so
This reduces code size and dependencies on ld.so internals from
libc.so.

Fixes commit f4c142bb9f
("arm: Use _dl_find_object on __gnu_Unwind_Find_exidx (BZ 31405)").

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 96429bcc91)
2025-08-14 10:20:33 +02:00
Adhemerval Zanella bfae8bf49c arm: Use _dl_find_object on __gnu_Unwind_Find_exidx (BZ 31405)
Instead of __dl_iterate_phdr. On ARM dlfo_eh_frame/dlfo_eh_count
maps to PT_ARM_EXIDX vaddr start / length.

On a Neoverse N1 machine with 160 cores, the following program:

  $ cat test.c
  #include <stdlib.h>
  #include <pthread.h>
  #include <assert.h>

  enum {
    niter = 1024,
    ntimes = 128,
  };

  static void *
  tf (void *arg)
  {
    int a = (int) arg;

    for (int i = 0; i < niter; i++)
      {
        void *p[ntimes];
        for (int j = 0; j < ntimes; j++)
  	p[j] = malloc (a * 128);
        for (int j = 0; j < ntimes; j++)
  	free (p[j]);
      }

    return NULL;
  }

  int main (int argc, char *argv[])
  {
    enum { nthreads = 16 };
    pthread_t t[nthreads];

    for (int i = 0; i < nthreads; i ++)
      assert (pthread_create (&t[i], NULL, tf, (void *) i) == 0);

    for (int i = 0; i < nthreads; i++)
      {
        void *r;
        assert (pthread_join (t[i], &r) == 0);
        assert (r == NULL);
      }

    return 0;
  }
  $ arm-linux-gnueabihf-gcc -fsanitize=address test.c -o test

Improves from ~15s to 0.5s.

Checked on arm-linux-gnueabihf.

(cherry picked from commit f4c142bb9f)
2025-08-14 10:20:33 +02:00
Florian Weimer a66bc3941f posix: Fix double-free after allocation failure in regcomp (bug 33185)
If a memory allocation failure occurs during bracket expression
parsing in regcomp, a double-free error may result.

Reported-by: Anastasia Belova <abelova@astralinux.ru>
Co-authored-by: Paul Eggert <eggert@cs.ucla.edu>
Reviewed-by: Andreas K. Huettel <dilfridge@gentoo.org>
(cherry picked from commit 7ea06e9940)
2025-07-24 10:55:54 +02:00
Florian Weimer 8040100201 Fix error reporting (false negatives) in SGID tests
And simplify the interface of support_capture_subprogram_self_sgid.

Use the existing framework for temporary directories (now with
mode 0700) and directory/file deletion.  Handle all execution
errors within support_capture_subprogram_self_sgid.  In particular,
this includes test failures because the invoked program did not
exit with exit status zero.  Existing tests that expect exit
status 42 are adjusted to use zero instead.

In addition, fix callers not to call exit (0) with test failures
pending (which may mask them, especially when running with --direct).

Fixes commit 35fc356fa3
("elf: Fix subprocess status handling for tst-dlopen-sgid (bug 32987)").

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 3a3fb2ed83)
2025-06-20 11:56:25 +02:00
Florian Weimer c6ec750be5 support: Pick group in support_capture_subprogram_self_sgid if UID == 0
When running as root, it is likely that we can run under any group.
Pick a harmless group from /etc/group in this case.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 2f769cec44)
2025-06-20 11:56:25 +02:00
Siddhesh Poyarekar c9e44b6467 support: Don't fail on fchown when spawning sgid processes
In some cases (e.g. when podman creates user containers), the only other
group assigned to the executing user is nobody and fchown fails with it
because the group is not mapped.  Do not fail the test in this case,
instead exit as unsupported.

Reported-by: Frédéric Bérat <fberat@redhat.com>
Tested-by: Frédéric Bérat <fberat@redhat.com>
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 6286cca2cb)
2025-06-20 11:56:25 +02:00
Adhemerval Zanella 621c65ccf1 elf: Ignore LD_LIBRARY_PATH and debug env var for setuid for static
It mimics the ld.so behavior.

Checked on x86_64-linux-gnu.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>

(cherry picked from commit 5451fa962c)

Changes:
	Keep EXTRA_UNSECURE_ENVVARS support.
	Use __rawmemchr instead of strchr.
	Keep LD_PROFILE_OUTPUT support.
	Make tunables support optional via HAVE_TUNABLES.
2025-06-03 14:09:17 +02:00
Florian Weimer c7ff2bc297 Revert "elf: Ignore LD_LIBRARY_PATH and debug env var for setuid for static"
This reverts commit bff3b0f16c.

Reason for revert: Accidentally drops EXTRA_UNSECURE_ENVVARS support.
2025-06-03 10:55:18 +02:00
Florian Weimer 8624f6431b elf: Fix subprocess status handling for tst-dlopen-sgid (bug 32987)
This should really move into support_capture_subprogram_self_sgid.

Reviewed-by: Sam James <sam@gentoo.org>
(cherry picked from commit 35fc356fa3)
2025-05-21 09:05:16 +02:00
Florian Weimer ed10034f00 elf: Test case for bug 32976 (CVE-2025-4802)
Check that LD_LIBRARY_PATH is ignored for AT_SECURE statically
linked binaries, using support_capture_subprogram_self_sgid.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit d8f7a79335)
2025-05-20 21:08:16 +02:00
Florian Weimer 08aea7712d support: Add support_record_failure_barrier
This can be used to stop execution after a TEST_COMPARE_BLOB
failure, for example.

(cherry picked from commit d0b8aa6de4)
2025-05-20 21:08:16 +02:00
Florian Weimer 901e24b128 support: Use const char * argument in support_capture_subprogram_self_sgid
The function does not modify the passed-in string, so make this clear
via the prototype.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit f0c09fe616)
2025-05-20 21:08:16 +02:00
Adhemerval Zanella bff3b0f16c elf: Ignore LD_LIBRARY_PATH and debug env var for setuid for static
It mimics the ld.so behavior.

Checked on x86_64-linux-gnu.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>

(cherry picked from commit 5451fa962c)

Changes:

	git/elf/dl-support.c
	  (missing commit 55f41ef8de
	   ("elf: Remove LD_PROFILE for static binaries"),
	   missing removal of tunables support)
2025-05-20 21:08:16 +02:00
Wilco Dijkstra d2febe7c40 math: Improve layout of exp/exp10 data
GCC aligns global data to 16 bytes if their size is >= 16 bytes.  This patch
changes the exp_data struct slightly so that the fields are better aligned
and without gaps.  As a result on targets that support them, more load-pair
instructions are used in exp.

The exp benchmark improves 2.5%, "144bits" by 7.2%, "768bits" by 12.7% on
Neoverse V2.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
(cherry picked from commit 5afaf99edb)
(cherry picked from commit 5a08d049dc)
2025-02-28 15:13:31 +00:00
Wilco Dijkstra 20b5d5ce26 AArch64: Use prefer_sve_ifuncs for SVE memset
Use prefer_sve_ifuncs for SVE memset just like memcpy.

Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>
(cherry picked from commit 0f044be1da)
2025-02-28 15:13:31 +00:00
Wilco Dijkstra 9569a67a58 AArch64: Add SVE memset
Add SVE memset based on the generic memset with predicated load for sizes < 16.
Unaligned memsets of 128-1024 are improved by ~20% on average by using aligned
stores for the last 64 bytes.  Performance of random memset benchmark improves
by ~2% on Neoverse V1.

Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>
(cherry picked from commit 163b1bbb76)
2025-02-28 15:13:30 +00:00
Wilco Dijkstra 59f67e1b82 math: Improve layout of expf data
GCC aligns global data to 16 bytes if their size is >= 16 bytes.  This patch
changes the exp2f_data struct slightly so that the fields are better aligned.
As a result on targets that support them, load-pair instructions accessing
poly_scaled and invln2_scaled are now 16-byte aligned.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
(cherry picked from commit 44fa9c1080)
2025-02-28 15:12:39 +00:00
Wilco Dijkstra 904c58e47b AArch64: Remove zva_128 from memset
Remove ZVA 128 support from memset - the new memset no longer
guarantees count >= 256, which can result in underflow and a
crash if ZVA size is 128 ([1]).  Since only one CPU uses a ZVA
size of 128 and its memcpy implementation was removed in commit
e162ab2bf1, remove this special
case too.

[1] https://sourceware.org/pipermail/libc-alpha/2024-November/161626.html

Reviewed-by: Andrew Pinski <quic_apinski@quicinc.com>
(cherry picked from commit a08d9a52f9)
2025-02-28 15:12:39 +00:00
Wilco Dijkstra 8042d17638 AArch64: Optimize memset
Improve small memsets by avoiding branches and use overlapping stores.
Use DC ZVA for copies over 128 bytes.  Remove unnecessary code for ZVA sizes
other than 64 and 128.  Performance of random memset benchmark improves by 24%
on Neoverse N1.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
(cherry picked from commit cec3aef324)
2025-02-28 15:12:37 +00:00
Wilco Dijkstra be451d6053 AArch64: Improve generic strlen
Improve performance by handling another 16 bytes before entering the loop.
Use ADDHN in the loop to avoid SHRN+FMOV when it terminates.  Change final
size computation to avoid increasing latency.  On Neoverse V1 performance
of the random strlen benchmark improves by 4.6%.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 3dc426b642)
2025-02-28 15:12:18 +00:00
Siddhesh Poyarekar 8b3d09dc0d assert: Add test for CVE-2025-0395
Use the __progname symbol to override the program name to induce the
failure that CVE-2025-0395 describes.

This is related to BZ #32582

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
(cherry picked from commit cdb9ba8419)
2025-02-13 13:42:54 -05:00
Carlos O'Donell 29d9b1e59e assert: Reformat Makefile.
Reflow all long lines adding comment terminators.
Sort all reflowed text using scripts/sort-makefile-lines.py.

No code generation changes observed in binary artifacts.
No regressions on x86_64 and i686.

(cherry picked from commit ebd928224a)

 # Conflicts:
 #	assert/Makefile
 (Missing __libc_assert_fail)
2025-02-13 13:42:26 -05:00
H.J. Lu 549d831579 stdlib: Test using setenv with updated environ [BZ #32588]
Add a test for setenv with updated environ.  Verify that BZ #32588 is
fixed.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit 8ab34497de)
2025-01-25 11:31:04 +08:00
Florian Weimer 8b5d4be762 Fix underallocation of abort_msg_s struct (CVE-2025-0395)
Include the space needed to store the length of the message itself, in
addition to the message string.  This resolves BZ #32582.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
(cherry picked from commit 68ee0f704c)

Conflict in sysdeps/posix/libc_fatal.c due to missing cleanup after
backtrace removal.
2025-01-22 18:31:24 +01:00
Florian Weimer 525e5f13de stdlib: Simplify buffer management in canonicalize
Move the buffer management from realpath_stk to __realpath.  This
allows returning directly after allocation errors.

Always make a copy of the result buffer using strdup even if it is
already heap-allocated.  (Heap-allocated buffers are somewhat rare.)
This avoids GCC warnings at certain optimization levels.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit ef0700004b)
2025-01-20 15:19:48 +01:00
Siddhesh Poyarekar 5eae275400 realpath: Bring back GNU extension on ENOENT and EACCES [BZ #28996]
The GNU extension for realpath states that if the path resolution fails
with ENOENT or EACCES and the resolved buffer is non-NULL, it will
contain part of the path that failed resolution.

commit 949ad78a18 broke this when it
omitted the copy on failure.  Bring it back partially to continue
supporting this GNU extension.

Resolves: BZ #28996

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Andreas Schwab <schwab@linux-m68k.org>
(cherry picked from commit b416555431)
2025-01-20 15:19:26 +01:00
Siddhesh Poyarekar 8a82a76a42 realpath: Do not copy result on failure (BZ #28815)
On failure, the contents of the resolved buffer passed in by the caller
to realpath are undefined.  Do not copy any partial resolution to the
buffer and also do not test resolved contents in test-canon.c.

Resolves: BZ #28815

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
(cherry picked from commit 949ad78a18)
2025-01-20 15:18:06 +01:00
Stafford Horne e369114462 misc: Add support for Linux uio.h RWF_NOAPPEND flag
In Linux 6.9 a new flag is added to allow for Per-io operations to
disable append mode even if a file was opened with the flag O_APPEND.
This is done with the new RWF_NOAPPEND flag.

This caused two test failures as these tests expected the flag 0x00000020
to be unused.  Adding the flag definition now fixes these tests on Linux
6.9 (v6.9-rc1).

  FAIL: misc/tst-preadvwritev2
  FAIL: misc/tst-preadvwritev64v2

This patch adds the flag, adjusts the test and adds details to
documentation.

Link: https://lore.kernel.org/all/20200831153207.GO3265@brightrain.aerifal.cx/
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
(cherry picked from commit 3db9d208dd)
2025-01-12 14:20:53 -08:00
Yu Chien Peter Lin 3f1ab0ed66 nptl: Convert tst-setuid2 to test-driver
Use <support/test-driver.c> and replace pthread calls to its xpthread
equivalents.

Signed-off-by: Yu Chien Peter Lin <peterlin@andestech.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
(cherry picked from commit 365b3af67e)
2025-01-11 14:52:42 -08:00
Yu Chien Peter Lin 76adee6e0f support: Add xpthread_cond_signal wrapper
Signed-off-by: Yu Chien Peter Lin <peterlin@andestech.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
(cherry picked from commit 3bea50ccbc)
2025-01-11 14:52:32 -08:00
Florian Weimer c3beedeb70 elf: Support recursive use of dynamic TLS in interposed malloc
It turns out that quite a few applications use bundled mallocs that
have been built to use global-dynamic TLS (instead of the recommended
initial-exec TLS).  The previous workaround from
commit afe42e935b ("elf: Avoid some
free (NULL) calls in _dl_update_slotinfo") does not fix all
encountered cases unfortunatelly.

This change avoids the TLS generation update for recursive use
of TLS from a malloc that was called during a TLS update.  This
is possible because an interposed malloc has a fixed module ID and
TLS slot.  (It cannot be unloaded.)  If an initially-loaded module ID
is encountered in __tls_get_addr and the dynamic linker is already
in the middle of a TLS update, use the outdated DTV, thus avoiding
another call into malloc.  It's still necessary to update the
DTV to the most recent generation, to get out of the slow path,
which is why the check for recursion is needed.

The bookkeeping is done using a global counter instead of per-thread
flag because TLS access in the dynamic linker is tricky.

All this will go away once the dynamic linker stops using malloc
for TLS, likely as part of a change that pre-allocates all TLS
during pthread_create/dlopen.

Fixes commit d2123d6827 ("elf: Fix slow
tls access after dlopen [BZ #19924]").

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 018f0fc3b8)
2025-01-11 09:30:26 -08:00
Florian Weimer f48d763ab8 elf: Avoid some free (NULL) calls in _dl_update_slotinfo
This has been confirmed to work around some interposed mallocs.  Here
is a discussion of the impact test ust/libc-wrapper/test_libc-wrapper
in lttng-tools:

  New TLS usage in libgcc_s.so.1, compatibility impact
  <https://inbox.sourceware.org/libc-alpha/8734v1ieke.fsf@oldenburg.str.redhat.com/>

Reportedly, this patch also papers over a similar issue when tcmalloc
2.9.1 is not compiled with -ftls-model=initial-exec.  Of course the
goal really should be to compile mallocs with the initial-exec TLS
model, but this commit appears to be a useful interim workaround.

Fixes commit d2123d6827 ("elf: Fix slow
tls access after dlopen [BZ #19924]").

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit afe42e935b)
2025-01-11 09:30:04 -08:00
H.J. Lu 710057676d sysdeps/x86/Makefile: Split and sort tests
Put each test on a separate line and sort tests.

(cherry picked from commit 7e03e0de7e)
2025-01-10 12:10:28 -08:00
Noah Goldstein a4207d4e83 x86: Only align destination to 1x VEC_SIZE in memset 4x loop
Current code aligns to 2x VEC_SIZE. Aligning to 2x has no affect on
performance other than potentially resulting in an additional
iteration of the loop.
1x maintains aligned stores (the only reason to align in this case)
and doesn't incur any unnecessary loop iterations.
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>

(cherry picked from commit 9469261cf1)
2025-01-10 12:10:19 -08:00
Szabolcs Nagy 889f99c149 elf: Fix slow tls access after dlopen [BZ #19924]
In short: __tls_get_addr checks the global generation counter and if
the current dtv is older then _dl_update_slotinfo updates dtv up to the
generation of the accessed module. So if the global generation is newer
than generation of the module then __tls_get_addr keeps hitting the
slow dtv update path. The dtv update path includes a number of checks
to see if any update is needed and this already causes measurable tls
access slow down after dlopen.

It may be possible to detect up-to-date dtv faster.  But if there are
many modules loaded (> TLS_SLOTINFO_SURPLUS) then this requires at
least walking the slotinfo list.

This patch tries to update the dtv to the global generation instead, so
after a dlopen the tls access slow path is only hit once.  The modules
with larger generation than the accessed one were not necessarily
synchronized before, so additional synchronization is needed.

This patch uses acquire/release synchronization when accessing the
generation counter.

Note: in the x86_64 version of dl-tls.c the generation is only loaded
once, since relaxed mo is not faster than acquire mo load.

I have not benchmarked this. Tested by Adhemerval Zanella on aarch64,
powerpc, sparc, x86 who reported that it fixes the performance issue
of bug 19924.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
(cherry picked from commit d2123d6827)
2025-01-10 12:10:09 -08:00
H.J. Lu 543efedcb3 x86: Check the lower byte of EAX of CPUID leaf 2 [BZ #30643]
The old Intel software developer manual specified that the low byte of
EAX of CPUID leaf 2 returned 1 which indicated the number of rounds of
CPUDID leaf 2 was needed to retrieve the complete cache information. The
newer Intel manual has been changed to that it should always return 1
and be ignored.  If the lower byte isn't 1, CPUID leaf 2 can't be used.
In this case, we ignore CPUID leaf 2 and use CPUID leaf 4 instead.  If
CPUID leaf 4 doesn't contain the cache information, cache information
isn't available at all.  This addresses BZ #30643.

(cherry picked from commit 1493622f4f)
2025-01-10 12:09:58 -08:00
H.J. Lu 41a3e51233 x86_64: Add log1p with FMA
On Skylake, it changes log1p bench performance by:

        Before       After     Improvement
max     63.349       58.347       8%
min     4.448        5.651        -30%
mean    12.0674      10.336       14%

The minimum code path is

 if (hx < 0x3FDA827A)                          /* x < 0.41422  */
    {
      if (__glibc_unlikely (ax >= 0x3ff00000))           /* x <= -1.0 */
        {
	   ...
        }
      if (__glibc_unlikely (ax < 0x3e200000))           /* |x| < 2**-29 */
        {
          math_force_eval (two54 + x);          /* raise inexact */
          if (ax < 0x3c900000)                  /* |x| < 2**-54 */
            {
	      ...
            }
          else
            return x - x * x * 0.5;

FMA and non-FMA code sequences look similar.  Non-FMA version is slightly
faster.  Since log1p is called by asinh and atanh, it improves asinh
performance by:

        Before       After     Improvement
max     75.645       63.135       16%
min     10.074       10.071       0%
mean    15.9483      14.9089      6%

and improves atanh performance by:

        Before       After     Improvement
max     91.768       75.081       18%
min     15.548       13.883       10%
mean    18.3713      16.8011      8%

(cherry picked from commit a8ecb126d4)
2025-01-10 12:09:49 -08:00
H.J. Lu 0d1c70aa4c x86_64: Add expm1 with FMA
On Skylake, it improves expm1 bench performance by:

        Before       After     Improvement
max     70.204       68.054       3%
min     20.709       16.2         22%
mean    22.1221      16.7367      24%

NB: Add

extern long double __expm1l (long double);
extern long double __expm1f128 (long double);

for __typeof (__expm1l) and __typeof (__expm1f128) when __expm1 is
defined since __expm1 may be expanded in their declarations which
causes the build failure.

(cherry picked from commit 1b214630ce)
2025-01-10 12:09:38 -08:00
H.J. Lu 516180d399 x86_64: Add log2 with FMA
On Skylake, it improves log2 bench performance by:

        Before       After     Improvement
max     208.779      63.827       69%
min     9.977        6.55         34%
mean    10.366       6.8191       34%

(cherry picked from commit f6b10ed8e9)
2025-01-10 12:09:24 -08:00
H.J. Lu 30384b91ad x86_64: Sort fpu/multiarch/Makefile
Sort Makefile variables using scripts/sort-makefile-lines.py.

No code generation changes observed in libm.  No regressions on x86_64.

(cherry picked from commit 881546979d)
2025-01-10 12:04:52 -08:00
Florian Weimer d626c31ce5 x86: Avoid integer truncation with large cache sizes (bug 32470)
Some hypervisors report 1 TiB L3 cache size.  This results
in some variables incorrectly getting zeroed, causing crashes
in memcpy/memmove because invariants are violated.

(cherry picked from commit 61c3450db9)
2024-12-17 19:14:43 +01:00
Michael Jeanson 7ea35e28b4 nptl: initialize cpu_id_start prior to rseq registration
When adding explicit initialization of rseq fields prior to
registration, I glossed over the fact that 'cpu_id_start' is also
documented as initialized by user-space.

While current kernels don't validate the content of this field on
registration, future ones could.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
(cherry picked from commit d9f40387d3)
2024-12-06 16:11:31 +00:00
Michael Jeanson 47d70ca8d9 nptl: initialize rseq area prior to registration
Per the rseq syscall documentation, 3 fields are required to be
initialized by userspace prior to registration, they are 'cpu_id',
'rseq_cs' and 'flags'. Since we have no guarantee that 'struct pthread'
is cleared on all architectures, explicitly set those 3 fields prior to
registration.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit 97f60abd25)
2024-12-06 16:11:31 +00:00
Siddhesh Poyarekar 37214df5f1 libio: Attempt wide backup free only for non-legacy code
_wide_data and _mode are not available in legacy code, so do not attempt
to free the wide backup buffer in legacy code.

Resolves: BZ #32137 and BZ #27821

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit ae4d44b1d5)
2024-09-11 09:27:30 +02:00
Maciej W. Rozycki 09fb06d3d6 nptl: Use <support/check.h> facilities in tst-setuid3
Remove local FAIL macro in favor to FAIL_EXIT1 from <support/check.h>,
which provides equivalent reporting, with the name of the file and the
line number within of the failure site additionally included.  Remove
FAIL_ERR altogether and include ": %m" explicitly with the format string
supplied to FAIL_EXIT1 as there seems little value to have a separate
macro just for this.

Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 8c98195af6)
2024-08-30 15:29:12 -04:00
Maciej W. Rozycki 507983797e posix: Use <support/check.h> facilities in tst-truncate and tst-truncate64
Remove local FAIL macro in favor to FAIL_RET from <support/check.h>,
which provides equivalent reporting, with the name of the file of the
failure site additionally included, for the tst-truncate-common core
shared between the tst-truncate and tst-truncate64 tests.

Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit fe47595504)
2024-08-30 15:29:09 -04:00