glibc

Commit Graph

Author	SHA1	Message	Date
Joseph Myers	e535fb910c	Define C23 header version macros C23 defines library macros __STDC_VERSION_<header>_H__ to indicate that a header has support for new / changed features from C23. Now that all the required library features are implemented in glibc, define these macros. I'm not sure this is sufficiently much of a user-visible feature to be worth a mention in NEWS. Tested for x86_64. There are various optional C23 features we don't yet have, of which I might look at the Annex H ones (floating-point encoding conversion functions and _Float16 functions) next. * Optional time bases TIME_MONOTONIC, TIME_ACTIVE, TIME_THREAD_ACTIVE. See <https://sourceware.org/pipermail/libc-alpha/2023-June/149264.html> - we need to review / update that patch. (I think patch 2/2, inventing new names for all the nonstandard CLOCK_* supported by the Linux kernel, is rather more dubious.) * Updating conform/ tests for C23. * Defining the rounding mode macro FE_TONEARESTFROMZERO for RISC-V (as far as I know, the only architecture supported by glibc that has hardware support for this rounding mode for binary floating point) and supporting it throughout glibc and its tests (especially the string/numeric conversions in both directions that explicitly handle each possible rounding mode, and various tests that do likewise). * Annex H floating-point encoding conversion functions. (It's not entirely clear which are optional even given support for Annex H; there's some wording applied inconsistently about only being required when non-arithmetic interchange formats are supported; see the comments I raised on the WG14 reflector on 23 Oct 2025.) * _Float16 functions (and other header and testcase support for this type). * Decimal floating-point support. * Fully supporting __int128 and unsigned __int128 as integer types wider than intmax_t, as permitted by C23. Would need doing in coordination with GCC, see GCC bug 113887 for more discussion of what's involved.	2025-11-27 19:32:49 +00:00
Adhemerval Zanella	a61f7fd59d	math: Sync atanh from CORE-MATH The CORE-MATH commit dc9465e7 fixes some issues: Failure: Test: atanh_towardzero (0x8.3f79103b3c64p-4) Result: is: 5.7018661316561103e-01 0x1.23ef7ff0539c6p-1 should be: 5.7018661316561092e-01 0x1.23ef7ff0539c5p-1 difference: 1.1102230246251565e-16 0x1.0000000000000p-53 ulp : 1.0000 max.ulp : 0.0000 Failure: Test: atanh_towardzero (0x8.3f7d95aabaf7p-4) Result: is: 5.7019248543911060e-01 0x1.23f044fac5997p-1 should be: 5.7019248543911049e-01 0x1.23f044fac5996p-1 difference: 1.1102230246251565e-16 0x1.0000000000000p-53 ulp : 1.0000 max.ulp : 0.0000 Failure: Test: atanh_towardzero (0x8.3f805380d6728p-4) Result: is: 5.7019604623795527e-01 0x1.23f0bc75cd113p-1 should be: 5.7019604623795516e-01 0x1.23f0bc75cd112p-1 difference: 1.1102230246251565e-16 0x1.0000000000000p-53 ulp : 1.0000 max.ulp : 0.0000 Maximal error of `atanh_towardzero' is : 1 ulp accepted: 0 ulp Checked on x86_64-linux-gnu, x86_64-linux-gnu-v3, aarch64-linux-gnu, and i686-linux-gnu.	2025-11-26 14:10:07 -03:00
Adhemerval Zanella	25de0771ec	configure: Only use -fno-fp-int-builtin-inexact if compiler supports it Checked on x86_64-linux-gnu. Reviewed-by: Sam James <sam@gentoo.org>	2025-11-21 13:13:10 -03:00
Adhemerval Zanella	92186652d8	math: Sync atanh from CORE-MATH The CORE-MATH commit 703d7487 fixes some issues for RNDZ: Failure: Test: atanh_towardzero (0x5.96200b978b69cp-4) Result: is: 3.6447730550366463e-01 0x1.753989ed16faap-2 should be: 3.6447730550366458e-01 0x1.753989ed16fa9p-2 difference: 5.5511151231257827e-17 0x1.0000000000000p-54 ulp : 1.0000 max.ulp : 0.0000 Maximal error of `atanh_towardzero' is : 1 ulp accepted: 0 ulp Checked on x86_64-linux-gnu, x86_64-linux-gnu-v3, aarch64-linux-gnu, and i686-linux-gnu.	2025-11-19 15:21:44 -03:00
Adhemerval Zanella	4567204feb	math: Sync acosh from CORE-MATH The CORE-MATH commit 6736002f fixes some issues for RNDZ: Failure: Test: acosh_towardzero (0x1.08000c1e79fp+0) Result: is: 2.4935636091994373e-01 0x1.feae8c399b18cp-3 should be: 2.4935636091994370e-01 0x1.feae8c399b18bp-3 difference: 2.7755575615628913e-17 0x1.0000000000000p-55 ulp : 1.0000 max.ulp : 0.0000 Failure: Test: acosh_towardzero (0x1.080016353964ep+0) Result: is: 2.4935874767710369e-01 0x1.feafcc91f518ep-3 should be: 2.4935874767710367e-01 0x1.feafcc91f518dp-3 difference: 2.7755575615628913e-17 0x1.0000000000000p-55 ulp : 1.0000 max.ulp : 0.0000 Maximal error of `acosh_towardzero' is : 1 ulp accepted: 0 ulp This only happens when the ISA supports fma, such as x86_64-v3, aarch64, or powerpc. Checked on x86_64-linux-gnu, x86_64-linux-gnu-v3, aarch64-linux-gnu, and i686-linux-gnu.	2025-11-19 12:58:56 -03:00
Adhemerval Zanella	13cfd77bf5	math: Don't redirect inlined builtin math functions When we want to inline builtin math functions, like truncf, for extern float truncf (float __x) __attribute__ ((__nothrow__ )) __attribute__ ((__const__)); extern float __truncf (float __x) __attribute__ ((__nothrow__ )) __attribute__ ((__const__)); float (truncf) (float) asm ("__truncf"); compiler may redirect truncf calls to __truncf, instead of inlining it (for instance, clang). The USE_TRUNCF_BUILTIN is 1 to indicate that truncf should be inlined. In this case, we don't want the truncf redirection: 1. For each math function which may be inlined, we define #if USE_TRUNCF_BUILTIN # define NO_truncf_BUILTIN inline_truncf #else # define NO_truncf_BUILTIN truncf #endif in <math-use-builtins.h>. 2. Include <math-use-builtins.h> in include/math.h. 3. Change MATH_REDIRECT to #define MATH_REDIRECT(FUNC, PREFIX, ARGS) \ float (NO_ ## FUNC ## f ## _BUILTIN) (ARGS (float)) \ asm (PREFIX #FUNC "f"); With this change If USE_TRUNCF_BUILTIN is 0, we get float (truncf) (float) asm ("__truncf"); truncf will be redirected to __truncf. And for USE_TRUNCF_BUILTIN 1, we get: float (inline_truncf) (float) asm ("__truncf"); In both cases either truncf will be inlined or the internal alias (__truncf) will be called. It is not required for all math-use-builtin symbol, only the one defined in math.h. It also allows to remove all the math-use-builtin inclusion, since it is now implicitly included by math.h. For MIPS, some math-use-builtin headers include sysdep.h and this in turn includes a lot of extra headers that do not allow ldbl-128 code to override alias definition (math.h will include some stdlib.h definition). The math-use-builtin only requires the __mips_isa_rev, so move the defintion to sgidefs.h. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-11-17 11:17:07 -03:00
Joseph Myers	1f79bc4838	Change fromfp functions to return floating types following C23 (bug 28327) As discussed in bug 28327, C23 changed the fromfp functions to return floating types instead of intmax_t / uintmax_t. (Although the motivation in N2548 was reducing the use of intmax_t in library interfaces, the new version does have the advantage of being able to specify arbitrary integer widths for e.g. assigning the result to a _BitInt, as well as being able to indicate an error case in-band with a NaN return.) As with other such changes from interfaces introduced in TS 18661, implement the new types as a replacement for the old ones, with the old functions remaining as compat symbols but not supported as an API. The test generator used for many of the tests is updated to handle both versions of the functions. Tested for x86_64 and x86, and with build-many-glibcs.py. Also tested tgmath tests for x86_64 with GCC 7 to make sure that the modified case for older compilers in <tgmath.h> does work. Also tested for powerpc64le to cover the ldbl-128ibm implementation and the other things that are handled differently for that configuration. The new tests fail for ibm128, but all the failures relate to incorrect signs of zero results and turn out to arise from bugs in the underlying roundl, ceill, truncl and floorl implementations that I've reported in bug 33623, rather than indicating any bug in the actual new implementation of the functions for that format. So given fixes for those functions (which shouldn't be hard, and of course should add to the tests for those functions rather than relying only on indirect testing via fromfp), the fromfp tests should start passing for ibm128 as well.	2025-11-13 00:04:21 +00:00
Adhemerval Zanella	b983c854e6	math: Sync acosh from CORE-MATH The c9abdf80 fix handle some cases for RNDZ. Checked on x86_64-linux-gnu.	2025-11-10 08:58:14 -03:00
Adhemerval Zanella	3078358ac6	math: Remove the SVID error handling from tgammaf It improves latency for about 1.5% and throughput for about 2-4%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-05 10:19:37 -03:00
Adhemerval Zanella	de0e623434	math: Remove the SVID error handling from lgammaf/lgammaf_r It improves latency throughput for about 2%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-05 09:27:07 -03:00
Adhemerval Zanella	7ec8eb5676	math: Remove the SVID error handling from atan2f It improves latency for about 3-6% and throughput for about 5-12%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-05 07:15:52 -03:00
Joseph Myers	26e4810210	Rename fromfp files in preparation for changing types for C23 As discussed in bug 28327, the fromfp functions changed type in C23 (compared to the version in TS 18661-1); they now return the same type as the floating-point argument, instead of intmax_t / uintmax_t. As with other such incompatible changes compared to the initial TS 18661 versions of interfaces (the types of totalorder functions, in particular), it seems appropriate to support only the new version as an API, not the old one (although many programs written for the old API might in fact work wtih the new one as well). Thus, the existing implementations should become compat symbols. They are sufficiently different from how I'd expect to implement the new version that using separate implementations in separate files is more convenient than trying to share code, and directly sharing testcases would be problematic as well. Rename the existing fromfp implementation and test files to names reflecting how they're intended to become compat symbols, so freeing up the existing filenames for a subsequent implementation of the C23 versions of these functions (which is the point at which the existing implementations would actually become compat symbols). gen-fromfp-tests.py and gen-fromfp-tests-inputs are not renamed; I think it will make sense to adapt the test generator to be able to generate most tests for both versions of the functions (with extra test inputs added that are only of interest with the C23 version). The ldbl-opt/nldbl-* files are also not renamed; since those are for a static only library, no compat versions are needed, and they'll just have their contents changed when the C23 version is implemented. Tested for x86_64, and with build-many-glibcs.py.	2025-11-04 23:41:35 +00:00
Joseph Myers	26d11a0944	Add C23 long_double_t, _FloatN_t C23 Annex H adds <math.h> typedefs long_double_t and _FloatN_t (originally introduced in TS 18661-3), analogous to float_t and double_t. Add these typedefs to glibc. (There are no _FloatNx_t typedefs.) C23 also slightly changes the rules for how such typedef names should be defined, compared to the definition in TS 18661-3. In both cases, <TYPE>_t corresponds to the evaluation format for <TYPE>, as specified by FLT_EVAL_METHOD (for which <math.h> uses glibc's internal __GLIBC_FLT_EVAL_METHOD). Specifically, each FLT_EVAL_METHOD value corresponds to some type U (for example, 64 corresponds to U = _Float64), and for types with exactly the same set of values as U, TS 18661-3 says expressions with those types are to be evaluated to the range and precision of type U (so <TYPE>_t is defined to U), whereas C23 only does that for types whose values are a strict subset of those of type U (so <TYPE>_t is defined to <TYPE>). As with other cases where semantics changed between TS 18661 and C23, this patch only implements the newer version of the semantics (including adjusting existing definitions of float_t and double_t as needed). The new semantics are contradictory between the main standard and Annex H for the case of FLT_EVAL_METHOD == 2 and the choice of double_t when double and long double have the same values (the main standard says it's defined as long double in that case, whereas Annex H would define it as double), which I've raised on the WG14 reflector (but I think setting FLT_EVAL_METHOD == 2 when double and long double have the same values is a fairly theoretical combination of features); for now glibc follows the value in the main standard in that case. Note that I think all existing GCC targets supported by glibc only use values -1, 0, 1, 2 or 16 for FLT_EVAL_METHOD (so most of the header code is somewhat theoretical, though potentially relevant with other compilers since the choice of FLT_EVAL_METHOD is only an API choice, not an ABI one; it can vary with compiler options, and these typedefs should not be used in ABIs). The testcase (expanded to cover the new typedefs) is really just repeating the same logic in a second place (so all it really tests is that __GLIBC_FLT_EVAL_METHOD is consistent with FLT_EVAL_METHOD). Tested for x86_64 and x86, and with build-many-glibcs.py.	2025-11-04 17:12:00 +00:00
Adhemerval Zanella	0dfc849eff	math: Remove the SVID error handling wrapper from sqrt i386 and m68k architectures should use math-use-builtins-sqrt.h rather than relying on architecture-specific or inline assembly implementations. The PowerPC optimization for PPC 601/603 (30 years old) is removed. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Adhemerval Zanella	f27a146409	math: Remove the SVID error handling from sinhf It improves latency for about 3-10% and throughput for about 5-15%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Adhemerval Zanella	0e1a1178ee	math: Remove the SVID error handling from remainder The optimized i386 version is faster than the generic one, and gcc implements it through the builtin. This optimization enables us to migrate the implementation to a C version. The performance on a Zen3 chip is similar to the SVID one. The m68k provided an optimized version through __m81_u(remainderf) (mathimpl.h), and gcc does not implement it through a builtin (different than i386). Performance improves a bit on x86_64 (Zen3, gcc 15.2.1): reciprocal-throughput input master NO-SVID improvement x86_64 subnormals 18.8522 16.2506 13.80% x86_64 normal 421.8260 403.9270 4.24% x86_64 close-exponent 21.0579 18.7642 10.89% i686 subnormals 21.3443 21.4229 -0.37% i686 normal 525.8380 538.807 -2.47% i686 close-exponent 21.6589 21.7983 -0.64% Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Adhemerval Zanella	c4c6c79d70	math: Remove the SVID error handling from remainderf The optimized i386 version is faster than the generic one, and gcc implements it through the builtin. This optimization enables us to migrate the implementation to a C version. The performance on a Zen3 chip is similar to the SVID one. The m68k provided an optimized version through __m81_u(remainderf) (mathimpl.h), and gcc does not implement it through a builtin (different than i386). Performance improves a bit on x86_64 (Zen3, gcc 15.2.1): reciprocal-throughput input master NO-SVID improvement x86_64 subnormals 17.5349 15.6125 10.96% x86_64 normal 53.8134 52.5754 2.30% x86_64 close-exponent 20.0211 18.6656 6.77% i686 subnormals 21.8105 20.1856 7.45% i686 normal 73.1945 71.2199 2.70% i686 close-exponent 22.2141 20.331 8.48% Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Wilco Dijkstra	1136c036a3	math: Remove xfail from pow test [BZ #33563 ] Remove xfail from pow testcase since pow and powf have been fixed. Also check float128 maximum value. See BZ #33563. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-10-31 19:13:53 +00:00
Adhemerval Zanella	ee946212fe	math: Remove the SVID error handling wrapper from yn/jn Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:35 -03:00
Adhemerval Zanella	8d4815e6d7	math: Remove the SVID error handling wrapper from y1/j1 Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:33 -03:00
Adhemerval Zanella	b050cb53b0	math: Remove the SVID error handling wrapper from y0/j0 Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:31 -03:00
Adhemerval Zanella	03eeeba705	math: Remove the SVID error handling from coshf It improves latency for about 3-10% and throughput for about 5-15%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:28 -03:00
Adhemerval Zanella	555c39c0fc	math: Remove the SVID error handling from atanhf It improves latency for about 1-10% and throughput for about 5-10%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:26 -03:00
Adhemerval Zanella	8facb464b4	math: Remove the SVID error handling from acoshf It improves latency for about 3-7% and throughput for about 5-10%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:24 -03:00
Adhemerval Zanella	f92aba68bc	math: Remove the SVID error handling from asinf It improves latency for about 2% and throughput for about 5%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:22 -03:00
Adhemerval Zanella	9f8dea5b5d	math: Remove the SVID error handling from acosf It improves latency for about 2-10% and throughput for about 5-10%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:20 -03:00
Adhemerval Zanella	0b484d7b77	math: Remove the SVID error handling from log10f It improves latency for about 3-10% and throughput for about 5-10%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:17 -03:00
Adhemerval Zanella	e4d812c980	math: Consolidate erf/erfc definitions The common code definitions are consolidated in s_erf_common.h and s_erf_common.c. Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:46:01 -03:00
Adhemerval Zanella	fc419290f9	math: Consolidate internal erf/erfc tables The shared internal data definitions are consolidated in s_erf_data.c and the erfc only one are moved to s_erfc_data.c. Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:34:04 -03:00
Adhemerval Zanella	acaad9ab06	math: Use erfc from CORE-MATH The current implementation precision shows the following accuracy, on three ranges ([-DBL_MAX,5], [-5,5], [5,DBL_MAX]) with 10e9 uniform randomly generated numbers for each range (first column is the accuracy in ULP, with '0' being correctly rounded, second is the number of samples with the corresponding precision): * Range [-DBL_MAX, -5] * FE_TONEAREST 0: 10000000000 100.00% * FE_UPWARD 0: 10000000000 100.00% * FE_DOWNWARD 0: 10000000000 100.00% * FE_TOWARDZERO 0: 10000000000 100.00% * Range [-5, 5] * FE_TONEAREST 0: 8069309665 80.69% 1: 1882910247 18.83% 2: 47485296 0.47% 3: 293749 0.00% 4: 1043 0.00% * FE_UPWARD 0: 5540301026 55.40% 1: 2026739127 20.27% 2: 1774882486 17.75% 3: 567324466 5.67% 4: 86913847 0.87% 5: 3820789 0.04% 6: 18259 0.00% * FE_DOWNWARD 0: 5520969586 55.21% 1: 2057293099 20.57% 2: 1778334818 17.78% 3: 557521494 5.58% 4: 82473927 0.82% 5: 3393276 0.03% 6: 13800 0.00% * FE_TOWARDZERO 0: 6220287175 62.20% 1: 2323846149 23.24% 2: 1251999920 12.52% 3: 190748245 1.91% 4: 12996232 0.13% 5: 122279 0.00% * Range [5, DBL_MAX] * FE_TONEAREST 0: 10000000000 100.00% * FE_UPWARD 0: 10000000000 100.00% * FE_DOWNWARD 0: 10000000000 100.00% * FE_TOWARDZERO 0: 10000000000 100.00% The CORE-MATH implementation is correctly rounded for any rounding mode. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows: reciprocal-throughput master patched improvement x86_64 49.0980 267.0660 -443.94% x86_64v2 49.3220 257.6310 -422.34% x86_64v3 42.9539 84.9571 -97.79% aarch64 28.7266 52.9096 -84.18% power10 14.1673 25.1273 -77.36% Latency master patched improvement x86_64 95.6640 269.7060 -181.93% x86_64v2 95.8296 260.4860 -171.82% x86_64v3 91.1658 112.7150 -23.64% aarch64 37.0745 58.6791 -58.27% power10 23.3197 31.5737 -35.39% Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:34:04 -03:00
Adhemerval Zanella	72a48e45bd	math: Use erf from CORE-MATH The current implementation precision shows the following accuracy, on three rangeis ([-DBL_MIN, -4.2], [-4.2, 4.2], [4.2, DBL_MAX]) with 10e9 uniform randomly generated numbers for each range (first column is the accuracy in ULP, with '0' being correctly rounded, second is the number of samples with the corresponding precision): * Range [-DBL_MIN, -4.2] * FE_TONEAREST 0: 10000000000 100.00% * FE_UPWARD 0: 10000000000 100.00% * FE_DOWNWARD 0: 10000000000 100.00% * FE_TOWARDZERO 0: 10000000000 100.00% * Range [-4.2, 4.2] * FE_TONEAREST 0: 9764404513 97.64% 1: 235595487 2.36% * FE_UPWARD 0: 9468013928 94.68% 1: 531986072 5.32% * FE_DOWNWARD 0: 9493787693 94.94% 1: 506212307 5.06% * FE_TOWARDZERO 0: 9585271351 95.85% 1: 414728649 4.15% * Range [4.2, DBL_MAX] * FE_TONEAREST 0: 10000000000 100.00% * FE_UPWARD 0: 10000000000 100.00% * FE_DOWNWARD 0: 10000000000 100.00% * FE_TOWARDZERO 0: 10000000000 100.00% The CORE-MATH implementation is correctly rounded for any rounding mode. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows: reciprocal-throughput master patched improvement x86_64 38.2754 78.0311 -103.87% x86_64v2 38.3325 75.7555 -97.63% x86_64v3 34.6604 28.3182 18.30% aarch64 23.1499 21.4307 7.43% power10 12.3051 9.3766 23.80% Latency master patched improvement x86_64 84.3062 121.3580 -43.95% x86_64v2 84.1817 117.4250 -39.49% x86_64v3 81.0933 70.6458 12.88% aarch64 35.012 29.5012 15.74% power10 21.7205 18.4589 15.02% For x86_64/x86_64-v2, most performance hit came from the fma call through the ifunc mechanism. Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:34:04 -03:00
Adhemerval Zanella	1cae0550e8	math: Use tgamma from CORE-MATH The current implementation precision shows the following accuracy, on one range ([-20,20]) with 10e9 uniform randomly generated numbers for each range (first column is the accuracy in ULP, with '0' being correctly rounded, second is the number of samples with the corresponding precision): * Range [-20,20] * FE_TONEAREST 0: 4504877808 45.05% 1: 4402224940 44.02% 2: 947652295 9.48% 3: 131076831 1.31% 4: 13222216 0.13% 5: 910045 0.01% 6: 35253 0.00% 7: 606 0.00% 8: 6 0.00% * FE_UPWARD 0: 3477307921 34.77% 1: 4838637866 48.39% 2: 1413942684 14.14% 3: 240762564 2.41% 4: 27113094 0.27% 5: 2130934 0.02% 6: 102599 0.00% 7: 2324 0.00% 8: 14 0.00% * FE_DOWNWARD 0: 3923545410 39.24% 1: 4745067290 47.45% 2: 1137899814 11.38% 3: 171596912 1.72% 4: 20013805 0.20% 5: 1773899 0.02% 6: 99911 0.00% 7: 2928 0.00% 8: 31 0.00% * FE_TOWARDZERO 0: 3697160741 36.97% 1: 4731951491 47.32% 2: 1303092738 13.03% 3: 231969191 2.32% 4: 32344517 0.32% 5: 3283092 0.03% 6: 193010 0.00% 7: 5175 0.00% 8: 45 0.00% The CORE-MATH implementation is correctly rounded for any rounding mode. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows: reciprocal-throughput master patched improvement x86_64 237.7960 175.4090 26.24% x86_64v2 232.9320 163.4460 29.83% x86_64v3 193.0680 89.7721 53.50% aarch64 113.6340 56.7350 50.07% power10 92.0617 26.6137 71.09% Latency master patched improvement x86_64 266.7190 208.0130 22.01% x86_64v2 263.6070 200.0280 24.12% x86_64v3 214.0260 146.5180 31.54% aarch64 114.4760 58.5235 48.88% power10 84.3718 35.7473 57.63% Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:34:04 -03:00
Adhemerval Zanella	d67d2f4688	math: Use lgamma from CORE-MATH The current implementation precision shows the following accuracy, on one range ([-1,1]) with 10e9 uniform randomly generated numbers for each range (first column is the accuracy in ULP, with '0' being correctly rounded, second is the number of samples with the corresponding precision): * Range [-20, 20] * FE_TONEAREST 0: 6701254075 67.01% 1: 3230897408 32.31% 2: 63986940 0.64% 3: 3605417 0.04% 4: 233189 0.00% 5: 20973 0.00% 6: 1869 0.00% 7: 125 0.00% 8: 4 0.00% * FE_UPWARDA 0: 4207428861 42.07% 1: 5001137116 50.01% 2: 740542213 7.41% 3: 49116304 0.49% 4: 1715617 0.02% 5: 54464 0.00% 6: 4956 0.00% 7: 451 0.00% 8: 16 0.00% 9: 2 0.00% * FE_DOWNWARD 0: 4155925193 41.56% 1: 4989821364 49.90% 2: 770312796 7.70% 3: 72014726 0.72% 4: 11040522 0.11% 5: 872811 0.01% 6: 12480 0.00% 7: 106 0.00% 8: 2 0.00% * FE_TOWARDZERO 0: 4225861532 42.26% 1: 5027051105 50.27% 2: 706443411 7.06% 3: 39877908 0.40% 4: 713109 0.01% 5: 47513 0.00% 6: 4961 0.00% 7: 438 0.00% 8: 23 0.00% * Range [20, 0x5.d53649e2d4674p+1012] * FE_TONEAREST 0: 7262241995 72.62% 1: 2737758005 27.38% * FE_UPWARD 0: 4690392401 46.90% 1: 5143728216 51.44% 2: 165879383 1.66% * FE_DOWNWARD 0: 4690333331 46.90% 1: 5143794937 51.44% 2: 165871732 1.66% * FE_TOWARDZERO 0: 4690343071 46.90% 1: 5143786761 51.44% 2: 165870168 1.66% The CORE-MATH implementation is correctly rounded for any rounding mode. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows: reciprocal-throughput master patched improvement x86_64 112.9740 135.8640 -20.26% x86_64v2 111.8910 131.7590 -17.76% x86_64v3 108.2800 68.0935 37.11% aarch64 61.3759 49.2403 19.77% power10 42.4483 24.1943 43.00% Latency master patched improvement x86_64 144.0090 167.9750 -16.64% x86_64v2 139.2690 167.1900 -20.05% x86_64v3 130.1320 96.9347 25.51% aarch64 66.8538 53.2747 20.31% power10 49.5076 29.6917 40.03% For x86_64/x86_64-v2, most performance hit came from the fma call through the ifunc mechanism. Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:34:04 -03:00
Adhemerval Zanella	140e802cb3	math: Move atanh internal data to separate file The internal data definitions are moved to s_atanh_data.c. It helps on ABIs that build the implementation multiple times for ifunc optimizations, like x86_64. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:34:04 -03:00
Adhemerval Zanella	cb8d1575b6	math: Consolidate acosh and asinh internal table The shared internal data definitions are consolidated in s_asincosh_data.c. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:34:04 -03:00
Paul Zimmermann	48fde7b026	various fixes detected with -Wdouble-promotion Changes with respect to v1: - added comment in e_j1f.c to explain the use of float is enough Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-10-22 12:35:40 +02:00
Siddhesh Poyarekar	1b657c53c2	Simplify powl computation for small integral y [BZ #33411 ] The powl implementation for x86_64 ends up multiplying X once more than necessary and then throwing away that result. This results in an overflow flag being set in cases where there is no overflow. Simplify the relevant portion by special casing the -3 to 3 range and simply multiplying repetitively. Resolves: BZ #33411 Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed by: Paul Zimmermann <Paul.Zimmermann@inria.fr>	2025-10-21 14:00:10 -04:00
Adhemerval Zanella	0e4ca88bd2	math: Fix compare sort function on compoundn To use the fabs function to the used type, instead of the double variant. it fixes a build issue with clang: ./s_compoundn_template.c:64:14: error: absolute value function 'fabs' given an argument of type 'const long double' but has parameter of type 'double' which may cause truncation of value [-Werror,-Wabsolute-value] 64 \| FLOAT pd = fabs ((const FLOAT ) p); \| ^ ./s_compoundn_template.c:64:14: note: use function 'fabsl' instead 64 \| FLOAT pd = fabs ((const FLOAT ) p); \| ^~~~ \| fabsl Reviewed-by: Collin Funk <collin.funk1@gmail.com>	2025-10-21 09:27:05 -03:00
Adhemerval Zanella	b9b28ce35f	math: Suppress more aliases builtin type conflicts Reviewed-by: Sam James <sam@gentoo.org>	2025-10-21 09:26:02 -03:00
Adhemerval Zanella	39bf95c1ba	math: Suppress clang -Wabsolute-value warning on math_check_force_underflow clang warns: ../sysdeps/x86/fpu/powl_helper.c:233:3: error: absolute value function '__builtin_fabsf' given an argument of type 'typeof (res)' (aka 'long double') but has parameter of type 'float' which may cause truncation of value [-Werror,-Wabsolute-value] math_check_force_underflow (res); ^ ./math-underflow.h:45:11: note: expanded from macro 'math_check_force_underflow' if (fabs_tg (force_underflow_tmp) \ ^ ./math-underflow.h:27:20: note: expanded from macro 'fabs_tg' #define fabs_tg(x) __MATH_TG ((x), (__typeof (x)) __builtin_fabs, (x)) ^ ../math/math.h:899:16: note: expanded from macro '__MATH_TG' float: FUNC ## f ARGS, \ ^ <scratch space>:73:1: note: expanded from here __builtin_fabsf ^ Due the use of _Generic from TG_MATH. Reviewed-by: Sam James <sam@gentoo.org>	2025-10-21 09:24:21 -03:00
Adhemerval Zanella	850d93f514	math: Use binary search on lgammaf slow path And remove some unused entries of the fallback table. Checked on x86_64-linux-gnu and aarch64-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-14 11:12:08 -03:00
Adhemerval Zanella	ae49afe74d	math: Optimize fma call on log2pf1 The fma is required only for x == -0x1.da285cp-5 in FE_TONEAREST to provide correctly rounded results. Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-14 11:12:00 -03:00
Adhemerval Zanella	82a4f50b4e	math: Optimize fma call on asinpif The fma is required only for x == +/-0x1.6371e8p-4f in FE_TOWARDZERO to provide correctly rounded results. Checked on x86_64-linux-gnu and aarch64-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-14 11:11:56 -03:00
Adhemerval Zanella	1c459af1ee	math: Update auto-libm-test-out-log2p1 The `0797283910` did not update log2p1 output with the newer values.	2025-10-14 08:46:06 -03:00
Luna Lamb	653e6c4fff	AArch64: Implement AdvSIMD and SVE log10p1(f) routines Vector variants of the new C23 log10p1 routines. Note: Benchmark inputs for log10p1(f) are identical to log1p(f) Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-09-27 12:45:59 +00:00
Luna Lamb	db42732474	AArch64: Implement AdvSIMD and SVE log2p1(f) routines Vector variants of the new C23 log2p1 routines. Note: Benchmark inputs for log2p1(f) are identical to log1p(f). Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-09-27 12:44:09 +00:00
Adhemerval Zanella	63ba1a1509	math: Add fetestexcept internal alias To avoid linknamespace issues on old standards. It is required if the fallback fma implementation is used if/when it is also used internally for other implementation. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-09-11 14:46:07 -03:00
Adhemerval Zanella	2eb8836de7	math: Add feclearexcept internal alias To avoid linknamespace issues on old standards. It is required if the fallback fma implementation is used if/when it is also used internally for other implementation. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-09-11 14:46:07 -03:00
Hasaan Khan	8ced7815fb	AArch64: Implement exp2m1 and exp10m1 routines Vector variants of the new C23 exp2m1 & exp10m1 routines. Note: Benchmark inputs for exp2m1 & exp10m1 are identical to exp2 & exp10 respectively, this also includes the floating point variations. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-09-02 16:50:24 +00:00
Adhemerval Zanella	6ab36c4e6d	math: Update auto-libm-tests-in with ldbl-128ibm compoundn/pown failures It fixes `ce488f7c16` which updated the out files without using gen-auto-libm-tests.c instructions. Checked on powerpc64le-linux-gnu. Tested-by: Andreas K. Huettel <dilfridge@gentoo.org> Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2025-07-28 13:58:54 -03:00

1 2 3 4 5 ...

1621 Commits