C23 defines library macros __STDC_VERSION_<header>_H__ to indicate
that a header has support for new / changed features from C23. Now
that all the required library features are implemented in glibc,
define these macros. I'm not sure this is sufficiently much of a
user-visible feature to be worth a mention in NEWS.
Tested for x86_64.
There are various optional C23 features we don't yet have, of which I
might look at the Annex H ones (floating-point encoding conversion
functions and _Float16 functions) next.
* Optional time bases TIME_MONOTONIC, TIME_ACTIVE, TIME_THREAD_ACTIVE.
See
<https://sourceware.org/pipermail/libc-alpha/2023-June/149264.html>
- we need to review / update that patch. (I think patch 2/2,
inventing new names for all the nonstandard CLOCK_* supported by the
Linux kernel, is rather more dubious.)
* Updating conform/ tests for C23.
* Defining the rounding mode macro FE_TONEARESTFROMZERO for RISC-V (as
far as I know, the only architecture supported by glibc that has
hardware support for this rounding mode for binary floating point)
and supporting it throughout glibc and its tests (especially the
string/numeric conversions in both directions that explicitly handle
each possible rounding mode, and various tests that do likewise).
* Annex H floating-point encoding conversion functions. (It's not
entirely clear which are optional even given support for Annex H;
there's some wording applied inconsistently about only being
required when non-arithmetic interchange formats are supported; see
the comments I raised on the WG14 reflector on 23 Oct 2025.)
* _Float16 functions (and other header and testcase support for this
type).
* Decimal floating-point support.
* Fully supporting __int128 and unsigned __int128 as integer types
wider than intmax_t, as permitted by C23. Would need doing in
coordination with GCC, see GCC bug 113887 for more discussion of
what's involved.
The CORE-MATH commit 6736002f fixes some issues for RNDZ:
Failure: Test: acosh_towardzero (0x1.08000c1e79fp+0)
Result:
is: 2.4935636091994373e-01 0x1.feae8c399b18cp-3
should be: 2.4935636091994370e-01 0x1.feae8c399b18bp-3
difference: 2.7755575615628913e-17 0x1.0000000000000p-55
ulp : 1.0000
max.ulp : 0.0000
Failure: Test: acosh_towardzero (0x1.080016353964ep+0)
Result:
is: 2.4935874767710369e-01 0x1.feafcc91f518ep-3
should be: 2.4935874767710367e-01 0x1.feafcc91f518dp-3
difference: 2.7755575615628913e-17 0x1.0000000000000p-55
ulp : 1.0000
max.ulp : 0.0000
Maximal error of `acosh_towardzero'
is : 1 ulp
accepted: 0 ulp
This only happens when the ISA supports fma, such as x86_64-v3, aarch64,
or powerpc.
Checked on x86_64-linux-gnu, x86_64-linux-gnu-v3, aarch64-linux-gnu,
and i686-linux-gnu.
When we want to inline builtin math functions, like truncf, for
extern float truncf (float __x) __attribute__ ((__nothrow__ )) __attribute__ ((__const__));
extern float __truncf (float __x) __attribute__ ((__nothrow__ )) __attribute__ ((__const__));
float (truncf) (float) asm ("__truncf");
compiler may redirect truncf calls to __truncf, instead of inlining it
(for instance, clang). The USE_TRUNCF_BUILTIN is 1 to indicate that
truncf should be inlined. In this case, we don't want the truncf
redirection:
1. For each math function which may be inlined, we define
#if USE_TRUNCF_BUILTIN
# define NO_truncf_BUILTIN inline_truncf
#else
# define NO_truncf_BUILTIN truncf
#endif
in <math-use-builtins.h>.
2. Include <math-use-builtins.h> in include/math.h.
3. Change MATH_REDIRECT to
#define MATH_REDIRECT(FUNC, PREFIX, ARGS) \
float (NO_ ## FUNC ## f ## _BUILTIN) (ARGS (float)) \
asm (PREFIX #FUNC "f");
With this change If USE_TRUNCF_BUILTIN is 0, we get
float (truncf) (float) asm ("__truncf");
truncf will be redirected to __truncf.
And for USE_TRUNCF_BUILTIN 1, we get:
float (inline_truncf) (float) asm ("__truncf");
In both cases either truncf will be inlined or the internal alias
(__truncf) will be called.
It is not required for all math-use-builtin symbol, only the one
defined in math.h. It also allows to remove all the math-use-builtin
inclusion, since it is now implicitly included by math.h.
For MIPS, some math-use-builtin headers include sysdep.h and this
in turn includes a lot of extra headers that do not allow ldbl-128
code to override alias definition (math.h will include
some stdlib.h definition). The math-use-builtin only requires
the __mips_isa_rev, so move the defintion to sgidefs.h.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
As discussed in bug 28327, C23 changed the fromfp functions to return
floating types instead of intmax_t / uintmax_t. (Although the
motivation in N2548 was reducing the use of intmax_t in library
interfaces, the new version does have the advantage of being able to
specify arbitrary integer widths for e.g. assigning the result to a
_BitInt, as well as being able to indicate an error case in-band with
a NaN return.)
As with other such changes from interfaces introduced in TS 18661,
implement the new types as a replacement for the old ones, with the
old functions remaining as compat symbols but not supported as an API.
The test generator used for many of the tests is updated to handle
both versions of the functions.
Tested for x86_64 and x86, and with build-many-glibcs.py.
Also tested tgmath tests for x86_64 with GCC 7 to make sure that the
modified case for older compilers in <tgmath.h> does work.
Also tested for powerpc64le to cover the ldbl-128ibm implementation
and the other things that are handled differently for that
configuration. The new tests fail for ibm128, but all the failures
relate to incorrect signs of zero results and turn out to arise from
bugs in the underlying roundl, ceill, truncl and floorl
implementations that I've reported in bug 33623, rather than
indicating any bug in the actual new implementation of the functions
for that format. So given fixes for those functions (which shouldn't
be hard, and of course should add to the tests for those functions
rather than relying only on indirect testing via fromfp), the fromfp
tests should start passing for ibm128 as well.
It improves latency for about 1.5% and throughput for about 2-4%.
Tested on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
It improves latency for about 3-6% and throughput for about 5-12%.
Tested on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
As discussed in bug 28327, the fromfp functions changed type in C23
(compared to the version in TS 18661-1); they now return the same type
as the floating-point argument, instead of intmax_t / uintmax_t.
As with other such incompatible changes compared to the initial TS
18661 versions of interfaces (the types of totalorder functions, in
particular), it seems appropriate to support only the new version as
an API, not the old one (although many programs written for the old
API might in fact work wtih the new one as well). Thus, the existing
implementations should become compat symbols. They are sufficiently
different from how I'd expect to implement the new version that using
separate implementations in separate files is more convenient than
trying to share code, and directly sharing testcases would be
problematic as well.
Rename the existing fromfp implementation and test files to names
reflecting how they're intended to become compat symbols, so freeing
up the existing filenames for a subsequent implementation of the C23
versions of these functions (which is the point at which the existing
implementations would actually become compat symbols).
gen-fromfp-tests.py and gen-fromfp-tests-inputs are not renamed; I
think it will make sense to adapt the test generator to be able to
generate most tests for both versions of the functions (with extra
test inputs added that are only of interest with the C23 version).
The ldbl-opt/nldbl-* files are also not renamed; since those are for a
static only library, no compat versions are needed, and they'll just
have their contents changed when the C23 version is implemented.
Tested for x86_64, and with build-many-glibcs.py.
C23 Annex H adds <math.h> typedefs long_double_t and _FloatN_t
(originally introduced in TS 18661-3), analogous to float_t and
double_t. Add these typedefs to glibc. (There are no _FloatNx_t
typedefs.)
C23 also slightly changes the rules for how such typedef names should
be defined, compared to the definition in TS 18661-3. In both cases,
<TYPE>_t corresponds to the evaluation format for <TYPE>, as specified
by FLT_EVAL_METHOD (for which <math.h> uses glibc's internal
__GLIBC_FLT_EVAL_METHOD). Specifically, each FLT_EVAL_METHOD value
corresponds to some type U (for example, 64 corresponds to U =
_Float64), and for types with exactly the same set of values as U, TS
18661-3 says expressions with those types are to be evaluated to the
range and precision of type U (so <TYPE>_t is defined to U), whereas
C23 only does that for types whose values are a strict subset of those
of type U (so <TYPE>_t is defined to <TYPE>).
As with other cases where semantics changed between TS 18661 and C23,
this patch only implements the newer version of the semantics
(including adjusting existing definitions of float_t and double_t as
needed). The new semantics are contradictory between the main
standard and Annex H for the case of FLT_EVAL_METHOD == 2 and the
choice of double_t when double and long double have the same values
(the main standard says it's defined as long double in that case,
whereas Annex H would define it as double), which I've raised on the
WG14 reflector (but I think setting FLT_EVAL_METHOD == 2 when double
and long double have the same values is a fairly theoretical
combination of features); for now glibc follows the value in the main
standard in that case.
Note that I think all existing GCC targets supported by glibc only use
values -1, 0, 1, 2 or 16 for FLT_EVAL_METHOD (so most of the header
code is somewhat theoretical, though potentially relevant with other
compilers since the choice of FLT_EVAL_METHOD is only an API choice,
not an ABI one; it can vary with compiler options, and these typedefs
should not be used in ABIs). The testcase (expanded to cover the new
typedefs) is really just repeating the same logic in a second place
(so all it really tests is that __GLIBC_FLT_EVAL_METHOD is consistent
with FLT_EVAL_METHOD).
Tested for x86_64 and x86, and with build-many-glibcs.py.
i386 and m68k architectures should use math-use-builtins-sqrt.h rather
than relying on architecture-specific or inline assembly implementations.
The PowerPC optimization for PPC 601/603 (30 years old) is removed.
Tested on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
It improves latency for about 3-10% and throughput for about 5-15%.
Tested on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
The optimized i386 version is faster than the generic one, and
gcc implements it through the builtin. This optimization enables
us to migrate the implementation to a C version. The performance
on a Zen3 chip is similar to the SVID one.
The m68k provided an optimized version through __m81_u(remainderf)
(mathimpl.h), and gcc does not implement it through a builtin
(different than i386).
Performance improves a bit on x86_64 (Zen3, gcc 15.2.1):
reciprocal-throughput input master NO-SVID improvement
x86_64 subnormals 18.8522 16.2506 13.80%
x86_64 normal 421.8260 403.9270 4.24%
x86_64 close-exponent 21.0579 18.7642 10.89%
i686 subnormals 21.3443 21.4229 -0.37%
i686 normal 525.8380 538.807 -2.47%
i686 close-exponent 21.6589 21.7983 -0.64%
Tested on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
The optimized i386 version is faster than the generic one, and gcc
implements it through the builtin. This optimization enables us to
migrate the implementation to a C version. The performance on a Zen3
chip is similar to the SVID one.
The m68k provided an optimized version through __m81_u(remainderf)
(mathimpl.h), and gcc does not implement it through a builtin (different
than i386).
Performance improves a bit on x86_64 (Zen3, gcc 15.2.1):
reciprocal-throughput input master NO-SVID improvement
x86_64 subnormals 17.5349 15.6125 10.96%
x86_64 normal 53.8134 52.5754 2.30%
x86_64 close-exponent 20.0211 18.6656 6.77%
i686 subnormals 21.8105 20.1856 7.45%
i686 normal 73.1945 71.2199 2.70%
i686 close-exponent 22.2141 20.331 8.48%
Tested on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Remove xfail from pow testcase since pow and powf have been fixed.
Also check float128 maximum value. See BZ #33563.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
It improves latency for about 3-10% and throughput for about 5-15%.
Tested on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
It improves latency for about 1-10% and throughput for about 5-10%.
Tested on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
It improves latency for about 3-7% and throughput for about 5-10%.
Tested on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
It improves latency for about 2% and throughput for about 5%.
Tested on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
It improves latency for about 2-10% and throughput for about 5-10%.
Tested on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
It improves latency for about 3-10% and throughput for about 5-10%.
Tested on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
The common code definitions are consolidated in s_erf_common.h
and s_erf_common.c.
Checked on x86_64-linux-gnu, aarch64-linux-gnu, and
powerpc64le-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
The shared internal data definitions are consolidated in
s_erf_data.c and the erfc only one are moved to s_erfc_data.c.
Checked on x86_64-linux-gnu, aarch64-linux-gnu, and
powerpc64le-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
The current implementation precision shows the following accuracy, on
three rangeis ([-DBL_MIN, -4.2], [-4.2, 4.2], [4.2, DBL_MAX]) with
10e9 uniform randomly generated numbers for each range (first column
is the accuracy in ULP, with '0' being correctly rounded, second is the
number of samples with the corresponding precision):
* Range [-DBL_MIN, -4.2]
* FE_TONEAREST
0: 10000000000 100.00%
* FE_UPWARD
0: 10000000000 100.00%
* FE_DOWNWARD
0: 10000000000 100.00%
* FE_TOWARDZERO
0: 10000000000 100.00%
* Range [-4.2, 4.2]
* FE_TONEAREST
0: 9764404513 97.64%
1: 235595487 2.36%
* FE_UPWARD
0: 9468013928 94.68%
1: 531986072 5.32%
* FE_DOWNWARD
0: 9493787693 94.94%
1: 506212307 5.06%
* FE_TOWARDZERO
0: 9585271351 95.85%
1: 414728649 4.15%
* Range [4.2, DBL_MAX]
* FE_TONEAREST
0: 10000000000 100.00%
* FE_UPWARD
0: 10000000000 100.00%
* FE_DOWNWARD
0: 10000000000 100.00%
* FE_TOWARDZERO
0: 10000000000 100.00%
The CORE-MATH implementation is correctly rounded for any rounding mode.
The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).
Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows:
reciprocal-throughput master patched improvement
x86_64 38.2754 78.0311 -103.87%
x86_64v2 38.3325 75.7555 -97.63%
x86_64v3 34.6604 28.3182 18.30%
aarch64 23.1499 21.4307 7.43%
power10 12.3051 9.3766 23.80%
Latency master patched improvement
x86_64 84.3062 121.3580 -43.95%
x86_64v2 84.1817 117.4250 -39.49%
x86_64v3 81.0933 70.6458 12.88%
aarch64 35.012 29.5012 15.74%
power10 21.7205 18.4589 15.02%
For x86_64/x86_64-v2, most performance hit came from the fma call
through the ifunc mechanism.
Checked on x86_64-linux-gnu, aarch64-linux-gnu, and
powerpc64le-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
The internal data definitions are moved to s_atanh_data.c.
It helps on ABIs that build the implementation multiple times for
ifunc optimizations, like x86_64.
Reviewed-by: DJ Delorie <dj@redhat.com>
Changes with respect to v1:
- added comment in e_j1f.c to explain the use of float is enough
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The powl implementation for x86_64 ends up multiplying X once more than
necessary and then throwing away that result. This results in an
overflow flag being set in cases where there is no overflow.
Simplify the relevant portion by special casing the -3 to 3 range and
simply multiplying repetitively.
Resolves: BZ #33411
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
To use the fabs function to the used type, instead of the double
variant. it fixes a build issue with clang:
./s_compoundn_template.c:64:14: error: absolute value function 'fabs' given an argument of type 'const long double' but has parameter of type 'double' which may cause truncation of value [-Werror,-Wabsolute-value]
64 | FLOAT pd = fabs (*(const FLOAT *) p);
| ^
./s_compoundn_template.c:64:14: note: use function 'fabsl' instead
64 | FLOAT pd = fabs (*(const FLOAT *) p);
| ^~~~
| fabsl
Reviewed-by: Collin Funk <collin.funk1@gmail.com>
clang warns:
../sysdeps/x86/fpu/powl_helper.c:233:3: error: absolute value function
'__builtin_fabsf' given an argument of type 'typeof (res)' (aka 'long
double') but has parameter of type 'float' which may cause truncation of
value [-Werror,-Wabsolute-value]
math_check_force_underflow (res);
^
./math-underflow.h:45:11: note: expanded from macro
'math_check_force_underflow'
if (fabs_tg (force_underflow_tmp) \
^
./math-underflow.h:27:20: note: expanded from macro 'fabs_tg'
#define fabs_tg(x) __MATH_TG ((x), (__typeof (x)) __builtin_fabs, (x))
^
../math/math.h:899:16: note: expanded from macro '__MATH_TG'
float: FUNC ## f ARGS, \
^
<scratch space>:73:1: note: expanded from here
__builtin_fabsf
^
Due the use of _Generic from TG_MATH.
Reviewed-by: Sam James <sam@gentoo.org>
And remove some unused entries of the fallback table.
Checked on x86_64-linux-gnu and aarch64-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
The fma is required only for x == -0x1.da285cp-5 in FE_TONEAREST
to provide correctly rounded results.
Checked on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
The fma is required only for x == +/-0x1.6371e8p-4f in FE_TOWARDZERO
to provide correctly rounded results.
Checked on x86_64-linux-gnu and aarch64-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Vector variants of the new C23 log10p1 routines.
Note: Benchmark inputs for log10p1(f) are identical to log1p(f)
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Vector variants of the new C23 log2p1 routines.
Note: Benchmark inputs for log2p1(f) are identical to log1p(f).
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
To avoid linknamespace issues on old standards. It is required
if the fallback fma implementation is used if/when it is also
used internally for other implementation.
Reviewed-by: DJ Delorie <dj@redhat.com>
To avoid linknamespace issues on old standards. It is required
if the fallback fma implementation is used if/when it is also
used internally for other implementation.
Reviewed-by: DJ Delorie <dj@redhat.com>
Vector variants of the new C23 exp2m1 & exp10m1 routines.
Note: Benchmark inputs for exp2m1 & exp10m1 are identical to exp2 & exp10
respectively, this also includes the floating point variations.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
It fixes ce488f7c16 which updated
the out files without using gen-auto-libm-tests.c instructions.
Checked on powerpc64le-linux-gnu.
Tested-by: Andreas K. Huettel <dilfridge@gentoo.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>