Commit Graph

1950 Commits

Author SHA1 Message Date
Florian Weimer 145097dff1 x86: Use separate variable for TLSDESC XSAVE/XSAVEC state size (bug 32810)
Previously, the initialization code reused the xsave_state_full_size
member of struct cpu_features for the TLSDESC state size.  However,
the tunable processing code assumes that this member has the
original XSAVE (non-compact) state size, so that it can use its
value if XSAVEC is disabled via tunable.

This change uses a separate variable and not a struct member because
the value is only needed in ld.so and the static libc, but not in
libc.so.  As a result, struct cpu_features layout does not change,
helping a future backport of this change.

Fixes commit 9b7091415a ("x86-64:
Update _dl_tlsdesc_dynamic to preserve AMX registers").

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-03-29 09:17:38 +01:00
Sunil K Pandey c7c4a5906f x86_64: Add atanh with FMA
On SPR, it improves atanh bench performance by:

			Before		After		Improvement
reciprocal-throughput	15.1715		14.8628		2%
latency			57.1941		56.1883		2%

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-03-13 14:30:47 -07:00
Sunil K Pandey dded0d20f6 x86_64: Add sinh with FMA
On SPR, it improves sinh bench performance by:

			Before		After		Improvement
reciprocal-throughput	14.2017		11.815		17%
latency			36.4917		35.2114		4%

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-03-13 10:55:25 -07:00
Sunil K Pandey c6352111c7 x86_64: Add tanh with FMA
On Skylake, it improves tanh bench performance by:

	Before 		After 		Improvement
max	110.89		95.826		14%
min	20.966		20.157		4%
mean	30.9601		29.8431		4%

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-03-13 06:20:32 -07:00
Adhemerval Zanella 3e8814903c math: Refactor how to use libm-test-ulps
The current approach tracks math maximum supported errors by explicitly
setting them per function and architecture. On newer implementations or
new compiler versions, the file is updated with newer values if it
shows higher results. The idea is to track the maximum known error, to
update the manual with the obtained values.

The constant libm-test-ulps shows little value, where it is usually a
mechanical change done by the maintainer, for past releases it is
usually ignored whether the ulp change resulted from a compiler
regression, and the math tests already have a maximum ulp error that
triggers a regression.

It was shown by a recent update after the new acosf [1] implementation
that is correctly rounded, where the libm-test-ulps was indeed from a
compiler issue.

This patch removes all arch-specific libm-test-ulps, adds system generic
libm-test-ulps where applicable, and changes its semantics. The generic
files now track specific implementation constraints, like if it is
expected to be correctly rounded, or if the system-specific has
different error expectations.

Now multiple libm-test-ulps can be defined, and system-specific
overrides generic implementation.  This is for the case where
arch-specific implementation might show worse precision than generic
implementation, for instance, the cbrtf on i686.

Regressions are only reported if the implementation shows larger errors
than 9 ulps (13 for IBM long double) unless it is overridden by
libm-test-ulps and the maximum error is not printed at the end of tests.
The regen-ulps rule is also removed since it does not make sense to
update the libm-test-ulps automatically.

The manual error table is also removed, Paul Zimmermann and others have
been tracking libm precision with a more comprehensive analysis for some
releases; so link to his work instead.

[1] https://sourceware.org/git/?p=glibc.git;a=commit;h=9cc9f8e11e8fb8f54f1e84d9f024917634a78201
2025-03-12 13:40:07 -03:00
Joseph Myers 77261698b4 Implement C23 rsqrt
C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the rsqrt functions (1/sqrt(x)).  The test inputs are
taken from those for sqrt.

Tested for x86_64 and x86, and with build-many-glibcs.py.
2025-03-07 19:15:26 +00:00
Adhemerval Zanella 8f170dc819 math: Use tanpif from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic tanpif.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

latency                      master        patched   improvement
x86_64                      85.1683        47.7990        43.88%
x86_64v2                    76.8219        41.4679        46.02%
x86_64v3                    73.7775        37.7734        48.80%
aarch64 (Neoverse)          35.4514        18.0742        49.02%
power8                      22.7604        10.1054        55.60%
power10                     22.1358         9.9553        55.03%

reciprocal-throughput        master        patched   improvement
x86_64                      41.0174        19.4718        52.53%
x86_64v2                    34.8565        11.3761        67.36%
x86_64v3                    34.0325         9.6989        71.50%
aarch64 (Neoverse)          25.4349         9.2017        63.82%
power8                      13.8626         3.8486        72.24%
power10                     11.7933         3.6420        69.12%

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-02-12 16:31:57 -03:00
Adhemerval Zanella de2fca9fe2 math: Use sinpif from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic sinpif.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

latency                      master        patched   improvement
x86_64                      47.5710        38.4455        19.18%
x86_64v2                    46.8828        40.7563        13.07%
x86_64v3                    44.0034        34.1497        22.39%
aarch64 (Neoverse)          19.2493        14.1968        26.25%
power8                      23.5312        16.3854        30.37%
power10                     22.6485        10.2888        54.57%

reciprocal-throughput        master        patched   improvement
x86_64                      21.8858        11.6717        46.67%
x86_64v2                    22.0620        11.9853        45.67%
x86_64v3                    21.5653        11.3291        47.47%
aarch64 (Neoverse)          13.0615         6.5499        49.85%
power8                      16.2030         6.9580        57.06%
power10                     12.8911         4.2858        66.75%

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-02-12 16:31:57 -03:00
Adhemerval Zanella be85208b9f math: Use cospif from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic cospif.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

latency                    master        patched   improvement
x86_64                    47.4679        38.4157        19.07%
x86_64v2                  46.9686        38.3329        18.39%
x86_64v3                  43.8929        31.8510        27.43%
aarch64 (Neoverse)        18.8867        13.2089        30.06%
power8                    22.9435         7.8023        65.99%
power10                   15.4472        7.77505        49.67%

reciprocal-throughput      master        patched   improvement
x86_64                    20.9518        11.4991        45.12%
x86_64v2                  19.8699        10.5921        46.69%
x86_64v3                  19.3475         9.3998        51.42%
aarch64 (Neoverse)        12.5767         6.2158        50.58%
power8                    15.0566         3.2654        78.31%
power10                    9.2866         3.1147        66.46%

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-02-12 16:31:57 -03:00
Adhemerval Zanella 95a01ea955 math: Use atanpif from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic atanpif.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

latency                     master        patched   improvement
x86_64                     66.3296        52.7558        20.46%
x86_64v2                   66.0429        51.4007        22.17%
x86_64v3                   60.6294        48.7876        19.53%
aarch64 (Neoverse)         24.3163        20.9110        14.00%
power8                     16.5766        13.3620        19.39%
power10                    16.5115        13.4072        18.80%

reciprocal-throughput       master        patched   improvement
x86_64                     30.8599        16.0866        47.87%
x86_64v2                   29.2286        15.4688        47.08%
x86_64v3                   23.0960        12.8510        44.36%
aarch64 (Neoverse)         15.4619        10.6752        30.96%
power8                      7.9200         5.2483        33.73%
power10                     6.8539         4.6262        32.50%

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-02-12 16:31:57 -03:00
Adhemerval Zanella 1cd9ccd8c0 math: Use atan2pif from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic atan2pif.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

latency                 master        patched   improvement
x86_64                 79.4006        70.8726        10.74%
x86_64v2               77.5136        69.1424        10.80%
x86_64v3               71.8050        68.1637         5.07%
aarch64 (Neoverse)     27.8363        24.7700        11.02%
power8                 39.3893        17.2929        56.10%
power10                19.7200        16.8187        14.71%

reciprocal-throughput   master        patched   improvement
x86_64                 38.3457        30.9471        19.29%
x86_64v2               37.4023        30.3112        18.96%
x86_64v3               33.0713        24.4891        25.95%
aarch64 (Neoverse)     19.3683        15.3259        20.87%
power8                 19.5507        8.27165        57.69%
power10                9.05331        7.63775        15.64%

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-02-12 16:31:57 -03:00
Adhemerval Zanella ae679a0aca math: Use asinpif from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic asinpif.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

latency                 master        patched   improvement
x86_64                 46.4996        41.6126        10.51%
x86_64v2               46.7551        38.8235        16.96%
x86_64v3               42.6235        33.7603        20.79%
aarch64 (Neoverse)     17.4161        14.3604        17.55%
power8                 10.7347         9.0193        15.98%
power10                10.6420         9.0362        15.09%

reciprocal-throughput   master        patched   improvement
x86_64                 24.7208        16.5544        33.03%
x86_64v2               24.2177        14.8938        38.50%
x86_64v3               20.5617        10.5452        48.71%
aarch64 (Neoverse)     13.4827        7.17613        46.78%
power8                 6.46134        3.56089        44.89%
power10                5.79007        3.49544        39.63%

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-02-12 16:31:57 -03:00
Adhemerval Zanella edb2a8f0ae math: Use acospif from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic acospif.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

latency                  master        patched   improvement
x86_64                  54.8281        42.9070        21.74%
x86_64v2                54.1717        42.7497        21.08%
x86_64v3                49.3552        34.1512        30.81%
aarch64 (Neoverse)      17.9395        14.3733        19.88%
power8                  20.3110         8.8609        56.37%
power10                 11.3113        8.84067        21.84%

reciprocal-throughput    master        patched   improvement
x86_64                  21.2301        14.4803        31.79%
x86_64v2                20.6858        13.9506        32.56%
x86_64v3                16.1944        11.3377        29.99%
aarch64 (Neoverse)      11.4474        7.13282        37.69%
power8                  10.6916        3.57547        66.56%
power10                 4.64269        3.54145        23.72%

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-02-12 16:31:57 -03:00
H.J. Lu 0b6ad02b33 x86-64: Cast __rseq_offset to long long int [BZ #32543]
commit 494d65129e
Author: Michael Jeanson <mjeanson@efficios.com>
Date:   Thu Aug 1 10:35:34 2024 -0400

    nptl: Introduce <rseq-access.h> for RSEQ_* accessors

added things like

       asm volatile ("movl %%fs:%P1(%q2),%0"                                  \
                     : "=r" (__value)                                         \
                     : "i" (offsetof (struct rseq_area, member)),             \
                       "r" (__rseq_offset));				      \

But this doesn't work for x32 when __rseq_offset is negative since the
address is computed as

FS + 32-bit to 64-bit zero extension of __rseq_offset
+ offsetof (struct rseq_area, member)

Cast __rseq_offset to long long int

                       "r" ((long long int) __rseq_offset));		      \

to sign-extend 32-bit __rseq_offset to 64-bit.  This is a no-op for x86-64
since x86-64 __rseq_offset is 64-bit.  This fixes BZ #32543.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2025-01-12 07:08:27 +08:00
Michael Jeanson 494d65129e nptl: Introduce <rseq-access.h> for RSEQ_* accessors
In preparation to move the rseq area to the 'extra TLS' block, we need
accessors based on the thread pointer and the rseq offset. The ONCE
variant of the accessors ensures single-copy atomicity for loads and
stores which is required for all fields once the registration is active.

A separate header is required to allow including <atomic.h> which
results in an include loop when added to <tcb-access.h>.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2025-01-10 20:20:17 +00:00
Florian Weimer d1da011118 elf: Always define TLS_TP_OFFSET
This will be needed to compute __rseq_offset outside of the TLS
relocation machinery.

Reviewed-by: Michael Jeanson <mjeanson@efficios.com>
2025-01-09 19:30:44 +01:00
Adhemerval Zanella 9cc9f8e11e math: Fix acosf when building with gcc <= 11
GCC <= 11 wrongly assumes the rounding is to nearest and performs a
constant folding where it should evaluate since the result is not
exact [1].

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57245
2025-01-09 12:53:58 -03:00
Florian Weimer a257f201dd Revert "x86_64: Remove unused padding from tcbhead_t"
This reverts commit 30d3fd7f4f.

The padding is required by Chromium's MaybeUpdateGlibcTidCache
in sandbox/linux/services/namespace_sandbox.cc.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-01-07 09:17:01 +01:00
Paul Zimmermann e5ca265a9c new inputs with large errors for [a]cospi, [a]sinpi, [a]tanpi, atan2pi
These inputs were generated with the programs from
https://gitlab.inria.fr/zimmerma/math_accuracy,
with rounding to nearest:

* for univariate binary32 functions by exhaustive search
* for other functions with the "threshold" parameter up to 10^6
2025-01-02 18:26:36 +01:00
Florian Weimer ceae7e2770 elf: Introduce generic <dl-tls.h>
On arc, the definition of TLS_DTV_UNALLOCATED now comes from
<dl-dtv.h>.

For x86-64 x32, a separate version is needed because unsigned long int
is 32 bits on this target.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-01-02 13:45:27 +01:00
Paul Eggert ad16577ae1 Update copyright in generated files by running "make" 2025-01-01 11:22:09 -08:00
Paul Eggert 2642002380 Update copyright dates with scripts/update-copyrights 2025-01-01 11:22:09 -08:00
Florian Weimer 5e249192ca elf: Remove the GET_ADDR_ARGS and related macros from the TLS code
This was used to manage an IA-64 ABI divergence is no longere needed
after the IA-64 removal.

(It should be possible to encode all the required information in
one machine word, so the pointer indirection is really unnecessary.
Technically, none of this is part of the ABI, so perhaps it's
possible to do this retroactively.  See bug 27404.)

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-12-27 07:29:56 +01:00
Adhemerval Zanella a2b0ff98a0 include/sys/cdefs.h: Add __attribute_optimization_barrier__
Add __attribute_optimization_barrier__ to disable inlining and cloning on a
function.  For Clang, expand it to

__attribute__ ((optnone))

Otherwise, expand it to

__attribute__ ((noinline, clone))

Co-Authored-By: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
2024-12-23 06:28:55 +08:00
H.J. Lu 03feea74dc elf: Compile test modules with -fsemantic-interposition
Compiler may default to -fno-semantic-interposition. But some elf test
modules must be compiled with -fsemantic-interposition to function properly.
Add a TEST_CC check for -fsemantic-interposition and use it on elf test
modules.  This fixed

FAIL: elf/tst-dlclose-lazy
FAIL: elf/tst-pie1
FAIL: elf/tst-plt-rewrite1
FAIL: elf/unload4

when Clang 19 is used to test glibc.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
2024-12-22 13:15:43 +08:00
H.J. Lu 9151ecbb5e x86-64: Disable libmvec ABI test for Clang
Unlike GCC, libmvec support in Clang is hard-coded.  Clang doesn't use
macros defined in <bits/libm-simd-decl-stubs.h> to support new libmvec
functions added to glibc and can't vectorize all test loops to test
libmvec ABI:

https://github.com/llvm/llvm-project/issues/120868

disable libmvec ABI test for Clang.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
2024-12-22 12:51:56 +08:00
H.J. Lu 88499d61bd Check if -mamx-tile works for testing
Since -mamx-tile is used only for testing, use LIBC_TRY_TEST_CC_COMMAND,
instead of LIBC_TRY_CC_AND_TEST_CC_COMMAND to check it and don't check
__builtin_ia32_ldtilecfg for Clang.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
2024-12-22 06:07:17 +08:00
Adhemerval Zanella b3a7a15d99 cet: Drop '#pragma GCC target' in tst-cet-legacy-10a[-static].c
After

commit 215447f5cb
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Tue Dec 17 06:18:55 2024 +0800

    cet: Pass -mshstk to compiler for tst-cet-legacy-10a[-static].c

we can remove '#pragma GCC target' in tst-cet-legacy-10a[-static].c.

Co-Authored-By: H.J. Lu <hjl.tools@gmail.com>
2024-12-21 06:16:58 +08:00
H.J. Lu 40bf25b754 Fix elf: Introduce is_rtld_link_map [BZ #32488]
Also use is_rtld_link_map in dl-cet.c.  This fixes BZ #32488.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2024-12-21 04:36:18 +08:00
Florian Weimer 6fba7d6578 x86_64: Regenerate ulps
As seen with an AMD 7950X CPU, on a glibc built with GCC 11.5.
2024-12-20 07:22:02 +01:00
Florian Weimer 30d3fd7f4f x86_64: Remove unused padding from tcbhead_t
This padding is difficult to use for preserving the internal
GLIBC_PRIVATE ABI.  The comment is misleading.  Current Address
Sanitizer uses heuristics to determine struct pthread size.
It does not depend on its precise layout.  It merely scans for
pointers allocated using malloc.

Due to the removal of the padding, the assert for its start
is no longer required.

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-12-19 21:21:30 +01:00
Adhemerval Zanella 0e0be3ed80 math: Use tanhf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic tanhf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      51.5273        41.0951        20.25%
x86_64v2                    47.7021        39.1526        17.92%
x86_64v3                    45.0373        34.2737        23.90%
i686                       133.9970        83.8596        37.42%
aarch64 (Neoverse)          21.5439        14.7961        31.32%
power10                     13.3301         8.4406        36.68%

reciprocal-throughput        master        patched   improvement
x86_64                      24.9493        12.8547        48.48%
x86_64v2                    20.7051        12.7761        38.29%
x86_64v3                    19.2492        11.0851        42.41%
i686                        78.6498        29.8211        62.08%
aarch64 (Neoverse)          11.6026        7.11487        38.68%
power10                      6.3328         2.8746        54.61%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella 1751c0519a math: Use sinhf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic sinhf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      52.6819        49.1489         6.71%
x86_64v2                    49.1162        42.9447        12.57%
x86_64v3                    46.9732        39.9157        15.02%
i686                       141.1470       129.6410         8.15%
aarch64 (Neoverse)          20.8539        17.1288        17.86%
power10                     14.5258        9.1906         36.73%

reciprocal-throughput        master        patched   improvement
x86_64                      27.5553        23.9395        13.12%
x86_64v2                    21.6423        20.3219         6.10%
x86_64v3                    21.4842        16.0224        25.42%
i686                        87.9709        86.1626         2.06%
aarch64 (Neoverse)          15.1919        12.2744        19.20%
power10                      7.2188         5.2611        27.12%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella 9583836785 math: Use coshf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode),
although it should worse performance than current one.  The current
implementation performance comes mainly from the internal usage of
the optimize expf implementation, and shows a maximum ULPs of 2 for
FE_TONEAREST and 3 for other rounding modes.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      40.6995        49.0737       -20.58%
x86_64v2                    40.5841        44.3604        -9.30%
x86_64v3                    39.3879        39.7502        -0.92%
i686                       112.3380       129.8570       -15.59%
aarch64 (Neoverse)          18.6914        17.0946         8.54%
power10                     11.1343        9.3245         16.25%

reciprocal-throughput        master        patched   improvement
x86_64                      18.6471        24.1077       -29.28%
x86_64v2                    17.7501        20.2946       -14.34%
x86_64v3                    17.8262        17.1877         3.58%
i686                        64.1454        86.5645       -34.95%
aarch64 (Neoverse)          9.77226        12.2314       -25.16%
power10                      4.0200        5.3316        -32.63%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella 7cfd8b5698 math: Use atanhf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic atanhf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      59.4930        45.8568        22.92%
x86_64v2                    59.5705        45.5804        23.48%
x86_64v3                    53.1838        37.7155        29.08%
i686                        169.354       133.5940        21.12%
aarch64 (Neoverse)          26.0781        16.9829        34.88%
power10                     15.6591        10.7623        31.27%

reciprocal-throughput        master        patched   improvement
x86_64                      23.5903        18.5766        21.25%
x86_64v2                    22.6489        18.2683        19.34%
x86_64v3                    19.0401        13.9474        26.75%
i686                        97.6034       107.3260        -9.96%
aarch64 (Neoverse)          15.3664        9.57846        37.67%
power10                      6.8877        4.6242         32.86%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella 6f9bacf36b math: Use atan2f from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic atan2f.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      68.1175        69.2014        -1.59%
x86_64v2                    66.9884        66.0081         1.46%
x86_64v3                    57.7034        61.6407        -6.82%
i686                       189.8690        152.7560       19.55%
aarch64 (Neoverse)          32.6151        24.5382        24.76%
power10                     21.7282        17.1896        20.89%

reciprocal-throughput        master        patched   improvement
x86_64                      34.5202        31.6155         8.41%
x86_64v2                    32.6379        30.3372         7.05%
x86_64v3                    34.3677        23.6455        31.20%
i686                       157.7290        75.8308        51.92%
aarch64 (Neoverse)          27.7788        16.2671        41.44%
power10                     15.5715         8.1588        47.60%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella a357d6273f math: Use atanf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic atanf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      56.8265        53.6842         5.53%
x86_64v2                    54.8177        53.6842         2.07%
x86_64v3                    46.2915        48.7034        -5.21%
i686                       158.3760        108.9560       31.20%
aarch64 (Neoverse)           21.687        20.5893         5.06%
power10                     13.1903        13.5012        -2.36%

reciprocal-throughput        master        patched   improvement
x86_64                      16.6787        16.7601        -0.49%
x86_64v2                    16.6983        16.7601        -0.37%
x86_64v3                    16.2268        12.1391        25.19%
i686                       138.6840        36.0640        74.00%
aarch64 (Neoverse)          11.8012        10.3565        12.24%
power10                      5.3212         4.2894        19.39%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella ed608a40e2 math: Use asinhf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic asinhf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      64.5128        56.9717        11.69%
x86_64v2                    63.3065        57.2666         9.54%
x86_64v3                    62.8719        51.4170        18.22%
i686                       189.1630        137.635        27.24%
aarch64 (Neoverse)          25.3551        20.5757        18.85%
power10                     17.9712        13.3302        25.82%

reciprocal-throughput        master        patched   improvement
x86_64                      20.0844        15.4731        22.96%
x86_64v2                    19.2919        15.4000        20.17%
x86_64v3                    18.7226        11.9009        36.44%
i686                       103.7670        80.2681        22.65%
aarch64 (Neoverse)          12.5005        8.68969        30.49%
power10                      7.2220        5.03617        30.27%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>:
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella 5fb4b566ef math: Use asinf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic asinf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      42.8237        35.2460        17.70%
x86_64v2                    43.3711        35.9406        17.13%
x86_64v3                    35.0335        30.5744        12.73%
i686                       213.8780        104.4710       51.15%
aarch64 (Neoverse)          17.2937        13.6025        21.34%
power10                     12.0227        7.4241         38.25%

reciprocal-throughput        master        patched   improvement
x86_64                      13.6770        15.5231       -13.50%
x86_64v2                    13.8722        16.0446       -15.66%
x86_64v3                    13.6211        13.2753         2.54%
i686                       186.7670        45.4388        75.67%
aarch64 (Neoverse)          9.96089        9.39285         5.70%
power10                      4.9862        3.7819         24.15%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella 673e6fe110 math: Use acoshf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic acoshf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      61.2471        58.7742         4.04%
x86_64-v2                   62.6519        59.0523         5.75%
x86_64-v3                   58.7408        50.1393        14.64%
aarch64                     24.8580        21.3317        14.19%
power10                     17.0469        13.1345        22.95%

reciprocal-throughput        master        patched   improvement
x86_64                      16.1618        15.1864         6.04%
x86_64-v2                   15.7729        14.7563         6.45%
x86_64-v3                   14.1669        11.9568        15.60%
aarch64                      10.911        9.5486         12.49%
power10                     6.38196        5.06734        20.60%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella 66fa7ad437 math: Use acosf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic acosf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      52.5098        36.6312        30.24%
x86_64v2                    53.0217        37.3091        29.63%
x86_64v3                    42.8501        32.3977        24.39%
i686                       207.3960       109.4000        47.25%
aarch64                     21.3694        13.7871        35.48%
power10                     14.5542         7.2891        49.92%

reciprocal-throughput        master        patched   improvement
x86_64                      14.1487        15.9508       -12.74%
x86_64v2                    14.3293        16.1899       -12.98%
x86_64v3                    13.6563        12.6161         7.62%
i686                       158.4060        45.7354        71.13%
aarch64                     12.5515        9.19233        26.76%
power10                      5.7868         3.3487        42.13%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
H.J. Lu 0cc88d2327 Silence Clang #include_next error
Use "#include <...>" to silence Clang #include_next error:

In file included from ../sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c:19:
../sysdeps/x86_64/fpu/test-double-vlen4.h:19:2: error: #include_next in file found relative to primary source file or found by absolute path; will search from start of include path [-Werror,-Winclude-next-absolute-path]
   19 | #include_next <test-double-vlen4.h>
      |  ^
1 error generated.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
2024-12-18 01:22:48 +08:00
H.J. Lu 215447f5cb cet: Pass -mshstk to compiler for tst-cet-legacy-10a[-static].c
Pass -mshstk to compiler to silence Clang:

In file included from ../sysdeps/x86_64/tst-cet-legacy-10a.c:2:
../sysdeps/x86_64/tst-cet-legacy-10.c:29:7: error: always_inline function '_get_ssp' requires target feature 'shstk', but would be inlined into function 'do_test' that is compiled without support for 'shstk'
   29 |   if (_get_ssp () != 0)
      |       ^

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
2024-12-18 01:20:16 +08:00
Joseph Myers 3374de9038 Implement C23 atan2pi
C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the atan2pi functions (atan2(y,x)/pi).

Tested for x86_64 and x86, and with build-many-glibcs.py.
2024-12-12 20:57:44 +00:00
Joseph Myers ffe79c446c Implement C23 atanpi
C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the atanpi functions (atan(x)/pi).

Tested for x86_64 and x86, and with build-many-glibcs.py.
2024-12-11 21:51:49 +00:00
H.J. Lu b79f257533 Add TEST_CC and TEST_CXX support
Support testing glibc build with a different C compiler or a different
C++ compiler with

$ ../glibc-VERSION/configure TEST_CC="gcc-6.4.1" TEST_CXX="g++-6.4.1"

1. Add LIBC_TRY_CC_AND_TEST_CC_OPTION, LIBC_TRY_CC_AND_TEST_CC_COMMAND
and LIBC_TRY_CC_AND_TEST_LINK to test both CC and TEST_CC.
2. Add check and xcheck targets to Makefile.in and override build compiler
options with ones from TEST_CC and TEST_CXX.

Tested on Fedora 41/x86-64:

1. Building with GCC 14.2.1 and testing with GCC 6.4.1 and GCC 11.2.1.
2. Building with GCC 15 and testing with GCC 6.4.1.

Support for GCC versions older than GCC 6.2 may need to change the test
sources.  Other targets may need to update configure.ac under sysdeps and
modify Makefile.in to override target build compiler options.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
2024-12-11 18:31:00 +08:00
Joseph Myers f962932206 Implement C23 asinpi
C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the asinpi functions (asin(x)/pi).

Tested for x86_64 and x86, and with build-many-glibcs.py.
2024-12-10 20:42:20 +00:00
Joseph Myers 28d102d15c Implement C23 acospi
C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the acospi functions (acos(x)/pi).

Tested for x86_64 and x86, and with build-many-glibcs.py.
2024-12-09 23:01:29 +00:00
Joseph Myers f9e90e4b4c Implement C23 tanpi
C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the tanpi functions (tan(pi*x)).

Tested for x86_64 and x86, and with build-many-glibcs.py.
2024-12-05 21:42:10 +00:00
H.J. Lu 0003605a54 x86-64: Update libm-test-ulps
Update x86-64 libm-test-ulps to fix

FAIL: math/test-float64x-cospi
FAIL: math/test-float64x-exp2m1
FAIL: math/test-float64x-sinpi
FAIL: math/test-ldouble-cospi
FAIL: math/test-ldouble-exp2m1
FAIL: math/test-ldouble-sinpi

when building glibc with GCC 7.4.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2024-12-05 20:08:36 +08:00