Commit Graph

1621 Commits

Author SHA1 Message Date
Joseph Myers e535fb910c Define C23 header version macros
C23 defines library macros __STDC_VERSION_<header>_H__ to indicate
that a header has support for new / changed features from C23.  Now
that all the required library features are implemented in glibc,
define these macros.  I'm not sure this is sufficiently much of a
user-visible feature to be worth a mention in NEWS.

Tested for x86_64.

There are various optional C23 features we don't yet have, of which I
might look at the Annex H ones (floating-point encoding conversion
functions and _Float16 functions) next.

* Optional time bases TIME_MONOTONIC, TIME_ACTIVE, TIME_THREAD_ACTIVE.
  See
  <https://sourceware.org/pipermail/libc-alpha/2023-June/149264.html>
  - we need to review / update that patch.  (I think patch 2/2,
  inventing new names for all the nonstandard CLOCK_* supported by the
  Linux kernel, is rather more dubious.)

* Updating conform/ tests for C23.

* Defining the rounding mode macro FE_TONEARESTFROMZERO for RISC-V (as
  far as I know, the only architecture supported by glibc that has
  hardware support for this rounding mode for binary floating point)
  and supporting it throughout glibc and its tests (especially the
  string/numeric conversions in both directions that explicitly handle
  each possible rounding mode, and various tests that do likewise).

* Annex H floating-point encoding conversion functions.  (It's not
  entirely clear which are optional even given support for Annex H;
  there's some wording applied inconsistently about only being
  required when non-arithmetic interchange formats are supported; see
  the comments I raised on the WG14 reflector on 23 Oct 2025.)

* _Float16 functions (and other header and testcase support for this
  type).

* Decimal floating-point support.

* Fully supporting __int128 and unsigned __int128 as integer types
  wider than intmax_t, as permitted by C23.  Would need doing in
  coordination with GCC, see GCC bug 113887 for more discussion of
  what's involved.
2025-11-27 19:32:49 +00:00
Adhemerval Zanella a61f7fd59d math: Sync atanh from CORE-MATH
The CORE-MATH commit dc9465e7 fixes some issues:

Failure: Test: atanh_towardzero (0x8.3f79103b3c64p-4)
Result:
 is:          5.7018661316561103e-01   0x1.23ef7ff0539c6p-1
 should be:   5.7018661316561092e-01   0x1.23ef7ff0539c5p-1
 difference:  1.1102230246251565e-16   0x1.0000000000000p-53
 ulp       :  1.0000
 max.ulp   :  0.0000
Failure: Test: atanh_towardzero (0x8.3f7d95aabaf7p-4)
Result:
 is:          5.7019248543911060e-01   0x1.23f044fac5997p-1
 should be:   5.7019248543911049e-01   0x1.23f044fac5996p-1
 difference:  1.1102230246251565e-16   0x1.0000000000000p-53
 ulp       :  1.0000
 max.ulp   :  0.0000
Failure: Test: atanh_towardzero (0x8.3f805380d6728p-4)
Result:
 is:          5.7019604623795527e-01   0x1.23f0bc75cd113p-1
 should be:   5.7019604623795516e-01   0x1.23f0bc75cd112p-1
 difference:  1.1102230246251565e-16   0x1.0000000000000p-53
 ulp       :  1.0000
 max.ulp   :  0.0000
Maximal error of `atanh_towardzero'
 is      : 1 ulp
 accepted: 0 ulp

Checked on x86_64-linux-gnu, x86_64-linux-gnu-v3, aarch64-linux-gnu,
and i686-linux-gnu.
2025-11-26 14:10:07 -03:00
Adhemerval Zanella 25de0771ec configure: Only use -fno-fp-int-builtin-inexact if compiler supports it
Checked on x86_64-linux-gnu.

Reviewed-by: Sam James <sam@gentoo.org>
2025-11-21 13:13:10 -03:00
Adhemerval Zanella 92186652d8 math: Sync atanh from CORE-MATH
The CORE-MATH commit 703d7487 fixes some issues for RNDZ:

Failure: Test: atanh_towardzero (0x5.96200b978b69cp-4)
Result:
 is:          3.6447730550366463e-01   0x1.753989ed16faap-2
 should be:   3.6447730550366458e-01   0x1.753989ed16fa9p-2
 difference:  5.5511151231257827e-17   0x1.0000000000000p-54
 ulp       :  1.0000
 max.ulp   :  0.0000
Maximal error of `atanh_towardzero'
 is      : 1 ulp
 accepted: 0 ulp

Checked on x86_64-linux-gnu, x86_64-linux-gnu-v3, aarch64-linux-gnu,
and i686-linux-gnu.
2025-11-19 15:21:44 -03:00
Adhemerval Zanella 4567204feb math: Sync acosh from CORE-MATH
The CORE-MATH commit 6736002f fixes some issues for RNDZ:

Failure: Test: acosh_towardzero (0x1.08000c1e79fp+0)
Result:
 is:          2.4935636091994373e-01   0x1.feae8c399b18cp-3
 should be:   2.4935636091994370e-01   0x1.feae8c399b18bp-3
 difference:  2.7755575615628913e-17   0x1.0000000000000p-55
 ulp       :  1.0000
 max.ulp   :  0.0000
Failure: Test: acosh_towardzero (0x1.080016353964ep+0)
Result:
 is:          2.4935874767710369e-01   0x1.feafcc91f518ep-3
 should be:   2.4935874767710367e-01   0x1.feafcc91f518dp-3
 difference:  2.7755575615628913e-17   0x1.0000000000000p-55
 ulp       :  1.0000
 max.ulp   :  0.0000
Maximal error of `acosh_towardzero'
 is      : 1 ulp
 accepted: 0 ulp

This only happens when the ISA supports fma, such as x86_64-v3, aarch64,
or powerpc.

Checked on x86_64-linux-gnu, x86_64-linux-gnu-v3, aarch64-linux-gnu,
and i686-linux-gnu.
2025-11-19 12:58:56 -03:00
Adhemerval Zanella 13cfd77bf5 math: Don't redirect inlined builtin math functions
When we want to inline builtin math functions, like truncf, for

  extern float truncf (float __x) __attribute__ ((__nothrow__ )) __attribute__ ((__const__));
  extern float __truncf (float __x) __attribute__ ((__nothrow__ )) __attribute__ ((__const__));

  float (truncf) (float) asm ("__truncf");

compiler may redirect truncf calls to __truncf, instead of inlining it
(for instance, clang).  The USE_TRUNCF_BUILTIN is 1 to indicate that
truncf should be inlined.  In this case, we don't want the truncf
redirection:

  1. For each math function which may be inlined, we define

  #if USE_TRUNCF_BUILTIN
   # define NO_truncf_BUILTIN inline_truncf
   #else
   # define NO_truncf_BUILTIN truncf
   #endif

in <math-use-builtins.h>.

  2. Include <math-use-builtins.h> in include/math.h.

  3. Change MATH_REDIRECT to

   #define MATH_REDIRECT(FUNC, PREFIX, ARGS)		\
    float (NO_ ## FUNC ## f ## _BUILTIN) (ARGS (float))	\
      asm (PREFIX #FUNC "f");

With this change If USE_TRUNCF_BUILTIN is 0, we get

  float (truncf) (float) asm ("__truncf");
  truncf will be redirected to __truncf.

And for USE_TRUNCF_BUILTIN 1, we get:

  float (inline_truncf) (float) asm ("__truncf");

In both cases either truncf will be inlined or the internal alias
(__truncf) will be called.

It is not required for all math-use-builtin symbol, only the one
defined in math.h.  It also allows to remove all the math-use-builtin
inclusion, since it is now implicitly included by math.h.

For MIPS, some math-use-builtin headers include sysdep.h and this
in turn includes a lot of extra headers that do not allow ldbl-128
code to override alias definition (math.h will include
some stdlib.h definition).  The math-use-builtin only requires
the __mips_isa_rev, so move the defintion to sgidefs.h.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Co-authored-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-11-17 11:17:07 -03:00
Joseph Myers 1f79bc4838 Change fromfp functions to return floating types following C23 (bug 28327)
As discussed in bug 28327, C23 changed the fromfp functions to return
floating types instead of intmax_t / uintmax_t.  (Although the
motivation in N2548 was reducing the use of intmax_t in library
interfaces, the new version does have the advantage of being able to
specify arbitrary integer widths for e.g. assigning the result to a
_BitInt, as well as being able to indicate an error case in-band with
a NaN return.)

As with other such changes from interfaces introduced in TS 18661,
implement the new types as a replacement for the old ones, with the
old functions remaining as compat symbols but not supported as an API.
The test generator used for many of the tests is updated to handle
both versions of the functions.

Tested for x86_64 and x86, and with build-many-glibcs.py.

Also tested tgmath tests for x86_64 with GCC 7 to make sure that the
modified case for older compilers in <tgmath.h> does work.

Also tested for powerpc64le to cover the ldbl-128ibm implementation
and the other things that are handled differently for that
configuration.  The new tests fail for ibm128, but all the failures
relate to incorrect signs of zero results and turn out to arise from
bugs in the underlying roundl, ceill, truncl and floorl
implementations that I've reported in bug 33623, rather than
indicating any bug in the actual new implementation of the functions
for that format.  So given fixes for those functions (which shouldn't
be hard, and of course should add to the tests for those functions
rather than relying only on indirect testing via fromfp), the fromfp
tests should start passing for ibm128 as well.
2025-11-13 00:04:21 +00:00
Adhemerval Zanella b983c854e6 math: Sync acosh from CORE-MATH
The c9abdf80 fix handle some cases for RNDZ.

Checked on x86_64-linux-gnu.
2025-11-10 08:58:14 -03:00
Adhemerval Zanella 3078358ac6 math: Remove the SVID error handling from tgammaf
It improves latency for about 1.5% and throughput for about 2-4%.

Tested on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-05 10:19:37 -03:00
Adhemerval Zanella de0e623434 math: Remove the SVID error handling from lgammaf/lgammaf_r
It improves latency throughput for about 2%.

Tested on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-05 09:27:07 -03:00
Adhemerval Zanella 7ec8eb5676 math: Remove the SVID error handling from atan2f
It improves latency for about 3-6% and throughput for about 5-12%.

Tested on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-05 07:15:52 -03:00
Joseph Myers 26e4810210 Rename fromfp files in preparation for changing types for C23
As discussed in bug 28327, the fromfp functions changed type in C23
(compared to the version in TS 18661-1); they now return the same type
as the floating-point argument, instead of intmax_t / uintmax_t.

As with other such incompatible changes compared to the initial TS
18661 versions of interfaces (the types of totalorder functions, in
particular), it seems appropriate to support only the new version as
an API, not the old one (although many programs written for the old
API might in fact work wtih the new one as well).  Thus, the existing
implementations should become compat symbols.  They are sufficiently
different from how I'd expect to implement the new version that using
separate implementations in separate files is more convenient than
trying to share code, and directly sharing testcases would be
problematic as well.

Rename the existing fromfp implementation and test files to names
reflecting how they're intended to become compat symbols, so freeing
up the existing filenames for a subsequent implementation of the C23
versions of these functions (which is the point at which the existing
implementations would actually become compat symbols).

gen-fromfp-tests.py and gen-fromfp-tests-inputs are not renamed; I
think it will make sense to adapt the test generator to be able to
generate most tests for both versions of the functions (with extra
test inputs added that are only of interest with the C23 version).
The ldbl-opt/nldbl-* files are also not renamed; since those are for a
static only library, no compat versions are needed, and they'll just
have their contents changed when the C23 version is implemented.

Tested for x86_64, and with build-many-glibcs.py.
2025-11-04 23:41:35 +00:00
Joseph Myers 26d11a0944 Add C23 long_double_t, _FloatN_t
C23 Annex H adds <math.h> typedefs long_double_t and _FloatN_t
(originally introduced in TS 18661-3), analogous to float_t and
double_t.  Add these typedefs to glibc.  (There are no _FloatNx_t
typedefs.)

C23 also slightly changes the rules for how such typedef names should
be defined, compared to the definition in TS 18661-3.  In both cases,
<TYPE>_t corresponds to the evaluation format for <TYPE>, as specified
by FLT_EVAL_METHOD (for which <math.h> uses glibc's internal
__GLIBC_FLT_EVAL_METHOD).  Specifically, each FLT_EVAL_METHOD value
corresponds to some type U (for example, 64 corresponds to U =
_Float64), and for types with exactly the same set of values as U, TS
18661-3 says expressions with those types are to be evaluated to the
range and precision of type U (so <TYPE>_t is defined to U), whereas
C23 only does that for types whose values are a strict subset of those
of type U (so <TYPE>_t is defined to <TYPE>).

As with other cases where semantics changed between TS 18661 and C23,
this patch only implements the newer version of the semantics
(including adjusting existing definitions of float_t and double_t as
needed).  The new semantics are contradictory between the main
standard and Annex H for the case of FLT_EVAL_METHOD == 2 and the
choice of double_t when double and long double have the same values
(the main standard says it's defined as long double in that case,
whereas Annex H would define it as double), which I've raised on the
WG14 reflector (but I think setting FLT_EVAL_METHOD == 2 when double
and long double have the same values is a fairly theoretical
combination of features); for now glibc follows the value in the main
standard in that case.

Note that I think all existing GCC targets supported by glibc only use
values -1, 0, 1, 2 or 16 for FLT_EVAL_METHOD (so most of the header
code is somewhat theoretical, though potentially relevant with other
compilers since the choice of FLT_EVAL_METHOD is only an API choice,
not an ABI one; it can vary with compiler options, and these typedefs
should not be used in ABIs).  The testcase (expanded to cover the new
typedefs) is really just repeating the same logic in a second place
(so all it really tests is that __GLIBC_FLT_EVAL_METHOD is consistent
with FLT_EVAL_METHOD).

Tested for x86_64 and x86, and with build-many-glibcs.py.
2025-11-04 17:12:00 +00:00
Adhemerval Zanella 0dfc849eff math: Remove the SVID error handling wrapper from sqrt
i386 and m68k architectures should use math-use-builtins-sqrt.h rather
than relying on architecture-specific or inline assembly implementations.

The PowerPC optimization for PPC 601/603 (30 years old) is removed.

Tested on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-04 04:14:01 -03:00
Adhemerval Zanella f27a146409 math: Remove the SVID error handling from sinhf
It improves latency for about 3-10% and throughput for about 5-15%.

Tested on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-04 04:14:01 -03:00
Adhemerval Zanella 0e1a1178ee math: Remove the SVID error handling from remainder
The optimized i386 version is faster than the generic one, and
gcc implements it through the builtin. This optimization enables
us to migrate the implementation to a C version.  The performance
on a Zen3 chip is similar to the SVID one.

The m68k provided an optimized version through __m81_u(remainderf)
(mathimpl.h), and gcc does not implement it through a builtin
(different than i386).

Performance improves a bit on x86_64 (Zen3, gcc 15.2.1):

reciprocal-throughput           input    master   NO-SVID  improvement
x86_64                     subnormals   18.8522   16.2506       13.80%
x86_64                         normal  421.8260  403.9270        4.24%
x86_64                 close-exponent   21.0579   18.7642       10.89%
i686                       subnormals   21.3443   21.4229       -0.37%
i686                           normal  525.8380   538.807       -2.47%
i686                   close-exponent   21.6589   21.7983       -0.64%

Tested on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-04 04:14:01 -03:00
Adhemerval Zanella c4c6c79d70 math: Remove the SVID error handling from remainderf
The optimized i386 version is faster than the generic one, and gcc
implements it through the builtin.  This optimization enables us to
migrate the implementation to a C version.  The performance on a Zen3
chip is similar to the SVID one.

The m68k provided an optimized version through __m81_u(remainderf)
(mathimpl.h), and gcc does not implement it through a builtin (different
than i386).

Performance improves a bit on x86_64 (Zen3, gcc 15.2.1):

reciprocal-throughput          input   master  NO-SVID  improvement
x86_64                    subnormals  17.5349  15.6125       10.96%
x86_64                        normal  53.8134  52.5754        2.30%
x86_64                close-exponent  20.0211  18.6656        6.77%
i686                      subnormals  21.8105  20.1856        7.45%
i686                          normal  73.1945  71.2199        2.70%
i686                  close-exponent  22.2141   20.331        8.48%

Tested on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-04 04:14:01 -03:00
Wilco Dijkstra 1136c036a3 math: Remove xfail from pow test [BZ #33563]
Remove xfail from pow testcase since pow and powf have been fixed.
Also check float128 maximum value.  See BZ #33563.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-10-31 19:13:53 +00:00
Adhemerval Zanella ee946212fe math: Remove the SVID error handling wrapper from yn/jn
Tested on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-30 15:41:35 -03:00
Adhemerval Zanella 8d4815e6d7 math: Remove the SVID error handling wrapper from y1/j1
Tested on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-30 15:41:33 -03:00
Adhemerval Zanella b050cb53b0 math: Remove the SVID error handling wrapper from y0/j0
Tested on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-30 15:41:31 -03:00
Adhemerval Zanella 03eeeba705 math: Remove the SVID error handling from coshf
It improves latency for about 3-10% and throughput for about 5-15%.

Tested on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-30 15:41:28 -03:00
Adhemerval Zanella 555c39c0fc math: Remove the SVID error handling from atanhf
It improves latency for about 1-10% and throughput for about 5-10%.

Tested on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-30 15:41:26 -03:00
Adhemerval Zanella 8facb464b4 math: Remove the SVID error handling from acoshf
It improves latency for about 3-7% and throughput for about 5-10%.

Tested on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-30 15:41:24 -03:00
Adhemerval Zanella f92aba68bc math: Remove the SVID error handling from asinf
It improves latency for about 2% and throughput for about 5%.

Tested on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-30 15:41:22 -03:00
Adhemerval Zanella 9f8dea5b5d math: Remove the SVID error handling from acosf
It improves latency for about 2-10% and throughput for about 5-10%.

Tested on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-30 15:41:20 -03:00
Adhemerval Zanella 0b484d7b77 math: Remove the SVID error handling from log10f
It improves latency for about 3-10% and throughput for about 5-10%.

Tested on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-30 15:41:17 -03:00
Adhemerval Zanella e4d812c980 math: Consolidate erf/erfc definitions
The common code definitions are consolidated in s_erf_common.h
and s_erf_common.c.

Checked on x86_64-linux-gnu, aarch64-linux-gnu, and
powerpc64le-linux-gnu.

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-10-27 09:46:01 -03:00
Adhemerval Zanella fc419290f9 math: Consolidate internal erf/erfc tables
The shared internal data definitions are consolidated in
s_erf_data.c and the erfc only one are moved to s_erfc_data.c.

Checked on x86_64-linux-gnu, aarch64-linux-gnu, and
powerpc64le-linux-gnu.

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-10-27 09:34:04 -03:00
Adhemerval Zanella acaad9ab06 math: Use erfc from CORE-MATH
The current implementation precision shows the following accuracy, on
three ranges ([-DBL_MAX,5], [-5,5], [5,DBL_MAX]) with 10e9 uniform
randomly generated numbers for each range (first column is the
accuracy in ULP, with '0' being correctly rounded, second is the
number of samples with the corresponding precision):

* Range [-DBL_MAX, -5]
 * FE_TONEAREST
     0:      10000000000 100.00%
 * FE_UPWARD
     0:      10000000000 100.00%
 * FE_DOWNWARD
     0:      10000000000 100.00%
 * FE_TOWARDZERO
     0:      10000000000 100.00%

* Range [-5, 5]
 * FE_TONEAREST
     0:       8069309665  80.69%
     1:       1882910247  18.83%
     2:         47485296   0.47%
     3:           293749   0.00%
     4:             1043   0.00%
 * FE_UPWARD
     0:       5540301026  55.40%
     1:       2026739127  20.27%
     2:       1774882486  17.75%
     3:        567324466   5.67%
     4:         86913847   0.87%
     5:          3820789   0.04%
     6:            18259   0.00%
 * FE_DOWNWARD
     0:       5520969586  55.21%
     1:       2057293099  20.57%
     2:       1778334818  17.78%
     3:        557521494   5.58%
     4:         82473927   0.82%
     5:          3393276   0.03%
     6:            13800   0.00%
 * FE_TOWARDZERO
     0:       6220287175  62.20%
     1:       2323846149  23.24%
     2:       1251999920  12.52%
     3:        190748245   1.91%
     4:         12996232   0.13%
     5:           122279   0.00%

* Range [5, DBL_MAX]
 * FE_TONEAREST
     0:      10000000000 100.00%
 * FE_UPWARD
     0:      10000000000 100.00%
 * FE_DOWNWARD
     0:      10000000000 100.00%
 * FE_TOWARDZERO
     0:      10000000000 100.00%

The CORE-MATH implementation is correctly rounded for any rounding mode.
The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows:

reciprocal-throughput        master        patched   improvement
x86_64                      49.0980       267.0660      -443.94%
x86_64v2                    49.3220       257.6310      -422.34%
x86_64v3                    42.9539        84.9571       -97.79%
aarch64                     28.7266        52.9096       -84.18%
power10                     14.1673        25.1273       -77.36%

Latency                      master        patched   improvement
x86_64                      95.6640       269.7060      -181.93%
x86_64v2                    95.8296       260.4860      -171.82%
x86_64v3                    91.1658       112.7150       -23.64%
aarch64                     37.0745        58.6791       -58.27%
power10                     23.3197        31.5737       -35.39%

Checked on x86_64-linux-gnu, aarch64-linux-gnu, and
powerpc64le-linux-gnu.

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-10-27 09:34:04 -03:00
Adhemerval Zanella 72a48e45bd math: Use erf from CORE-MATH
The current implementation precision shows the following accuracy, on
three rangeis ([-DBL_MIN, -4.2], [-4.2, 4.2], [4.2, DBL_MAX]) with
10e9 uniform randomly generated numbers for each range (first column
is the accuracy in ULP, with '0' being correctly rounded, second is the
number of samples with the corresponding precision):

* Range [-DBL_MIN, -4.2]
 * FE_TONEAREST
     0:      10000000000 100.00%
 * FE_UPWARD
     0:      10000000000 100.00%
 * FE_DOWNWARD
     0:      10000000000 100.00%
 * FE_TOWARDZERO
     0:      10000000000 100.00%

* Range [-4.2, 4.2]
 * FE_TONEAREST
     0:       9764404513  97.64%
     1:        235595487   2.36%
 * FE_UPWARD
     0:       9468013928  94.68%
     1:        531986072   5.32%
 * FE_DOWNWARD
     0:       9493787693  94.94%
     1:        506212307   5.06%
 * FE_TOWARDZERO
     0:       9585271351  95.85%
     1:        414728649   4.15%

* Range [4.2, DBL_MAX]
 * FE_TONEAREST
     0:      10000000000 100.00%
 * FE_UPWARD
     0:      10000000000 100.00%
 * FE_DOWNWARD
     0:      10000000000 100.00%
 * FE_TOWARDZERO
     0:      10000000000 100.00%

The CORE-MATH implementation is correctly rounded for any rounding mode.
The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows:

reciprocal-throughput        master       patched   improvement
x86_64                      38.2754       78.0311      -103.87%
x86_64v2                    38.3325       75.7555       -97.63%
x86_64v3                    34.6604       28.3182        18.30%
aarch64                     23.1499       21.4307         7.43%
power10                     12.3051       9.3766         23.80%

Latency                      master       patched   improvement
x86_64                      84.3062      121.3580       -43.95%
x86_64v2                    84.1817      117.4250       -39.49%
x86_64v3                    81.0933       70.6458        12.88%
aarch64                      35.012       29.5012        15.74%
power10                     21.7205       18.4589        15.02%

For x86_64/x86_64-v2, most performance hit came from the fma call
through the ifunc mechanism.

Checked on x86_64-linux-gnu, aarch64-linux-gnu, and
powerpc64le-linux-gnu.

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-10-27 09:34:04 -03:00
Adhemerval Zanella 1cae0550e8 math: Use tgamma from CORE-MATH
The current implementation precision shows the following accuracy, on
one range ([-20,20]) with 10e9 uniform randomly generated numbers for
each range (first column is the accuracy in ULP, with '0' being
correctly rounded, second is the number of samples with the
corresponding precision):

* Range [-20,20]
 * FE_TONEAREST
     0:       4504877808  45.05%
     1:       4402224940  44.02%
     2:        947652295   9.48%
     3:        131076831   1.31%
     4:         13222216   0.13%
     5:           910045   0.01%
     6:            35253   0.00%
     7:              606   0.00%
     8:                6   0.00%
 * FE_UPWARD
     0:       3477307921  34.77%
     1:       4838637866  48.39%
     2:       1413942684  14.14%
     3:        240762564   2.41%
     4:         27113094   0.27%
     5:          2130934   0.02%
     6:           102599   0.00%
     7:             2324   0.00%
     8:               14   0.00%
 * FE_DOWNWARD
     0:       3923545410  39.24%
     1:       4745067290  47.45%
     2:       1137899814  11.38%
     3:        171596912   1.72%
     4:         20013805   0.20%
     5:          1773899   0.02%
     6:            99911   0.00%
     7:             2928   0.00%
     8:               31   0.00%
 * FE_TOWARDZERO
     0:       3697160741  36.97%
     1:       4731951491  47.32%
     2:       1303092738  13.03%
     3:        231969191   2.32%
     4:         32344517   0.32%
     5:          3283092   0.03%
     6:           193010   0.00%
     7:             5175   0.00%
     8:               45   0.00%

The CORE-MATH implementation is correctly rounded for any rounding mode.
The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows:

reciprocal-throughput        master        patched   improvement
x86_64                     237.7960       175.4090        26.24%
x86_64v2                   232.9320       163.4460        29.83%
x86_64v3                   193.0680        89.7721        53.50%
aarch64                    113.6340        56.7350        50.07%
power10                     92.0617        26.6137        71.09%

Latency                      master        patched   improvement
x86_64                     266.7190       208.0130        22.01%
x86_64v2                   263.6070       200.0280        24.12%
x86_64v3                   214.0260       146.5180        31.54%
aarch64                    114.4760        58.5235        48.88%
power10                     84.3718        35.7473        57.63%

Checked on x86_64-linux-gnu, aarch64-linux-gnu, and
powerpc64le-linux-gnu.

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-10-27 09:34:04 -03:00
Adhemerval Zanella d67d2f4688 math: Use lgamma from CORE-MATH
The current implementation precision shows the following accuracy, on
one range ([-1,1]) with 10e9 uniform randomly generated numbers for
each range (first column is the accuracy in ULP, with '0' being
correctly rounded, second is the number of samples with the
corresponding precision):

* Range [-20, 20]
 * FE_TONEAREST
     0:       6701254075  67.01%
     1:       3230897408  32.31%
     2:         63986940   0.64%
     3:          3605417   0.04%
     4:           233189   0.00%
     5:            20973   0.00%
     6:             1869   0.00%
     7:              125   0.00%
     8:                4   0.00%
 * FE_UPWARDA
     0:       4207428861  42.07%
     1:       5001137116  50.01%
     2:        740542213   7.41%
     3:         49116304   0.49%
     4:          1715617   0.02%
     5:            54464   0.00%
     6:             4956   0.00%
     7:              451   0.00%
     8:               16   0.00%
     9:                2   0.00%
 * FE_DOWNWARD
     0:       4155925193  41.56%
     1:       4989821364  49.90%
     2:        770312796   7.70%
     3:         72014726   0.72%
     4:         11040522   0.11%
     5:           872811   0.01%
     6:            12480   0.00%
     7:              106   0.00%
     8:                2   0.00%
 * FE_TOWARDZERO
     0:       4225861532  42.26%
     1:       5027051105  50.27%
     2:        706443411   7.06%
     3:         39877908   0.40%
     4:           713109   0.01%
     5:            47513   0.00%
     6:             4961   0.00%
     7:              438   0.00%
     8:               23   0.00%

* Range [20, 0x5.d53649e2d4674p+1012]
 * FE_TONEAREST
     0:       7262241995  72.62%
     1:       2737758005  27.38%
 * FE_UPWARD
     0:       4690392401  46.90%
     1:       5143728216  51.44%
     2:        165879383   1.66%
 * FE_DOWNWARD
     0:       4690333331  46.90%
     1:       5143794937  51.44%
     2:        165871732   1.66%
 * FE_TOWARDZERO
     0:       4690343071  46.90%
     1:       5143786761  51.44%
     2:        165870168   1.66%

The CORE-MATH implementation is correctly rounded for any rounding mode.
The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows:

reciprocal-throughput        master        patched   improvement
x86_64                     112.9740       135.8640       -20.26%
x86_64v2                   111.8910       131.7590       -17.76%
x86_64v3                   108.2800        68.0935        37.11%
aarch64                     61.3759        49.2403        19.77%
power10                     42.4483        24.1943        43.00%

Latency                      master        patched   improvement
x86_64                     144.0090       167.9750       -16.64%
x86_64v2                   139.2690       167.1900       -20.05%
x86_64v3                   130.1320        96.9347        25.51%
aarch64                     66.8538        53.2747        20.31%
power10                     49.5076        29.6917        40.03%

For x86_64/x86_64-v2, most performance hit came from the fma call
through the ifunc mechanism.

Checked on x86_64-linux-gnu, aarch64-linux-gnu, and
powerpc64le-linux-gnu.

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-10-27 09:34:04 -03:00
Adhemerval Zanella 140e802cb3 math: Move atanh internal data to separate file
The internal data definitions are moved to s_atanh_data.c.
It helps on ABIs that build the implementation multiple times for
ifunc optimizations, like x86_64.

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-10-27 09:34:04 -03:00
Adhemerval Zanella cb8d1575b6 math: Consolidate acosh and asinh internal table
The shared internal data definitions are consolidated in
s_asincosh_data.c.

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-10-27 09:34:04 -03:00
Paul Zimmermann 48fde7b026 various fixes detected with -Wdouble-promotion
Changes with respect to v1:
- added comment in e_j1f.c to explain the use of float is enough
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-10-22 12:35:40 +02:00
Siddhesh Poyarekar 1b657c53c2 Simplify powl computation for small integral y [BZ #33411]
The powl implementation for x86_64 ends up multiplying X once more than
necessary and then throwing away that result.  This results in an
overflow flag being set in cases where there is no overflow.

Simplify the relevant portion by special casing the -3 to 3 range and
simply multiplying repetitively.

Resolves: BZ #33411
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
2025-10-21 14:00:10 -04:00
Adhemerval Zanella 0e4ca88bd2 math: Fix compare sort function on compoundn
To use the fabs function to the used type, instead of the double
variant.  it fixes a build issue with clang:

./s_compoundn_template.c:64:14: error: absolute value function 'fabs' given an argument of type 'const long double' but has parameter of type 'double' which may cause truncation of value [-Werror,-Wabsolute-value]
   64 |   FLOAT pd = fabs (*(const FLOAT *) p);
      |              ^
./s_compoundn_template.c:64:14: note: use function 'fabsl' instead
   64 |   FLOAT pd = fabs (*(const FLOAT *) p);
      |              ^~~~
      |              fabsl

Reviewed-by: Collin Funk <collin.funk1@gmail.com>
2025-10-21 09:27:05 -03:00
Adhemerval Zanella b9b28ce35f math: Suppress more aliases builtin type conflicts
Reviewed-by: Sam James <sam@gentoo.org>
2025-10-21 09:26:02 -03:00
Adhemerval Zanella 39bf95c1ba math: Suppress clang -Wabsolute-value warning on math_check_force_underflow
clang warns:

  ../sysdeps/x86/fpu/powl_helper.c:233:3: error: absolute value function
  '__builtin_fabsf' given an argument of type 'typeof (res)' (aka 'long
  double') but has parameter of type 'float' which may cause truncation of
  value [-Werror,-Wabsolute-value]
    math_check_force_underflow (res);
    ^
  ./math-underflow.h:45:11: note: expanded from macro
  'math_check_force_underflow'
        if (fabs_tg (force_underflow_tmp)                         \
            ^
  ./math-underflow.h:27:20: note: expanded from macro 'fabs_tg'
  #define fabs_tg(x) __MATH_TG ((x), (__typeof (x)) __builtin_fabs, (x))
                     ^
  ../math/math.h:899:16: note: expanded from macro '__MATH_TG'
                 float: FUNC ## f ARGS,           \
                        ^
  <scratch space>:73:1: note: expanded from here
  __builtin_fabsf
  ^

Due the use of _Generic from TG_MATH.

Reviewed-by: Sam James <sam@gentoo.org>
2025-10-21 09:24:21 -03:00
Adhemerval Zanella 850d93f514 math: Use binary search on lgammaf slow path
And remove some unused entries of the fallback table.

Checked on x86_64-linux-gnu and aarch64-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-14 11:12:08 -03:00
Adhemerval Zanella ae49afe74d math: Optimize fma call on log2pf1
The fma is required only for x == -0x1.da285cp-5 in FE_TONEAREST
to provide correctly rounded results.

Checked on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-14 11:12:00 -03:00
Adhemerval Zanella 82a4f50b4e math: Optimize fma call on asinpif
The fma is required only for x == +/-0x1.6371e8p-4f in FE_TOWARDZERO
to provide correctly rounded results.

Checked on x86_64-linux-gnu and aarch64-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-14 11:11:56 -03:00
Adhemerval Zanella 1c459af1ee math: Update auto-libm-test-out-log2p1
The 0797283910 did not update log2p1 output with the newer values.
2025-10-14 08:46:06 -03:00
Luna Lamb 653e6c4fff AArch64: Implement AdvSIMD and SVE log10p1(f) routines
Vector variants of the new C23 log10p1 routines.

Note: Benchmark inputs for log10p1(f) are identical to log1p(f)

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-09-27 12:45:59 +00:00
Luna Lamb db42732474 AArch64: Implement AdvSIMD and SVE log2p1(f) routines
Vector variants of the new C23 log2p1 routines.

Note: Benchmark inputs for log2p1(f) are identical to log1p(f).

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-09-27 12:44:09 +00:00
Adhemerval Zanella 63ba1a1509 math: Add fetestexcept internal alias
To avoid linknamespace issues on old standards.  It is required
if the fallback fma implementation is used if/when it is also
used internally for other implementation.
Reviewed-by: DJ Delorie <dj@redhat.com>
2025-09-11 14:46:07 -03:00
Adhemerval Zanella 2eb8836de7 math: Add feclearexcept internal alias
To avoid linknamespace issues on old standards.  It is required
if the fallback fma implementation is used if/when it is also
used internally for other implementation.
Reviewed-by: DJ Delorie <dj@redhat.com>
2025-09-11 14:46:07 -03:00
Hasaan Khan 8ced7815fb AArch64: Implement exp2m1 and exp10m1 routines
Vector variants of the new C23 exp2m1 & exp10m1 routines.

Note: Benchmark inputs for exp2m1 & exp10m1 are identical to exp2 & exp10
respectively, this also includes the floating point variations.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-09-02 16:50:24 +00:00
Adhemerval Zanella 6ab36c4e6d math: Update auto-libm-tests-in with ldbl-128ibm compoundn/pown failures
It fixes ce488f7c16 which updated
the out files without using gen-auto-libm-tests.c instructions.

Checked on powerpc64le-linux-gnu.

Tested-by: Andreas K. Huettel <dilfridge@gentoo.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2025-07-28 13:58:54 -03:00