glibc

Commit Graph

Author	SHA1	Message	Date
Adhemerval Zanella	1abdb38135	math: Handle fabsf128 !__USE_EXTERN_INLINES Work around the clang limitation wrt inline function and attribute definition, where it does not allow to 'add' new attribute if a function is already defined: clang on x86_64 fails to build s_fabsf128.c with: ../sysdeps/ieee754/float128/../ldbl-128/s_fabsl.c:32:1: error: attribute declaration must precede definition [-Werror,-Wignored-attributes] 32 \| libm_alias_ldouble (__fabs, fabs) \| ^ ../sysdeps/generic/libm-alias-ldouble.h:63:38: note: expanded from macro 'libm_alias_ldouble' 63 \| #define libm_alias_ldouble(from, to) libm_alias_ldouble_r (from, to, ) \| ^ ../sysdeps/ieee754/float128/float128_private.h:133:43: note: expanded from macro 'libm_alias_ldouble_r' 133 \| #define libm_alias_ldouble_r(from, to, r) libm_alias_float128_r (from, to, r) \| ^ ../sysdeps/ieee754/float128/s_fabsf128.c:5:3: note: expanded from macro 'libm_alias_float128_r' 5 \| static_weak_alias (from ## f128 ## r, to ## f128 ## r); \ \| ^ ./../include/libc-symbols.h:166:46: note: expanded from macro 'static_weak_alias' 166 \| # define static_weak_alias(name, aliasname) weak_alias (name, aliasname) \| ^ ./../include/libc-symbols.h:154:38: note: expanded from macro 'weak_alias' 154 \| # define weak_alias(name, aliasname) _weak_alias (name, aliasname) \| ^ ./../include/libc-symbols.h:156:52: note: expanded from macro '_weak_alias' 156 \| extern __typeof (name) aliasname __attribute__ ((weak, alias (#name))) \ \| ^ ../include/math.h:134:1: note: previous definition is here 134 \| fabsf128 (_Float128 x) If compiler does not support __USE_EXTERN_INLINES we need to route fabsf128 call to an internal symbol.	2025-11-17 11:17:07 -03:00
Adhemerval Zanella	13cfd77bf5	math: Don't redirect inlined builtin math functions When we want to inline builtin math functions, like truncf, for extern float truncf (float __x) __attribute__ ((__nothrow__ )) __attribute__ ((__const__)); extern float __truncf (float __x) __attribute__ ((__nothrow__ )) __attribute__ ((__const__)); float (truncf) (float) asm ("__truncf"); compiler may redirect truncf calls to __truncf, instead of inlining it (for instance, clang). The USE_TRUNCF_BUILTIN is 1 to indicate that truncf should be inlined. In this case, we don't want the truncf redirection: 1. For each math function which may be inlined, we define #if USE_TRUNCF_BUILTIN # define NO_truncf_BUILTIN inline_truncf #else # define NO_truncf_BUILTIN truncf #endif in <math-use-builtins.h>. 2. Include <math-use-builtins.h> in include/math.h. 3. Change MATH_REDIRECT to #define MATH_REDIRECT(FUNC, PREFIX, ARGS) \ float (NO_ ## FUNC ## f ## _BUILTIN) (ARGS (float)) \ asm (PREFIX #FUNC "f"); With this change If USE_TRUNCF_BUILTIN is 0, we get float (truncf) (float) asm ("__truncf"); truncf will be redirected to __truncf. And for USE_TRUNCF_BUILTIN 1, we get: float (inline_truncf) (float) asm ("__truncf"); In both cases either truncf will be inlined or the internal alias (__truncf) will be called. It is not required for all math-use-builtin symbol, only the one defined in math.h. It also allows to remove all the math-use-builtin inclusion, since it is now implicitly included by math.h. For MIPS, some math-use-builtin headers include sysdep.h and this in turn includes a lot of extra headers that do not allow ldbl-128 code to override alias definition (math.h will include some stdlib.h definition). The math-use-builtin only requires the __mips_isa_rev, so move the defintion to sgidefs.h. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-11-17 11:17:07 -03:00
Osama Abdelkader	4f18501498	math: Optimize frexpl (intel96) with fast path for normal numbers Add fast path optimization for frexpl (80-bit x87 extended precision) using a single unsigned comparison to identify normal floating-point numbers and return immediately via arithmetic on the exponent field. The implementation uses arithmetic operations (se - ex ) to adjust the exponent directly, which is simpler than bit masking. For subnormals, the traditional multiply-based normalization is retained as it handles the split word format more reliably. The zero/infinity/NaN check groups these special cases together for better branch prediction. Benchmark results on Intel Core i9-13900H (13th Gen): Baseline: 25.543 ns/op Optimized: 25.531 ns/op Speedup: 1.00x (neutral) Zero: 17.774 ns/op Denormal: 23.900 ns/op Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com> Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-14 19:52:38 +00:00
Joseph Myers	1f79bc4838	Change fromfp functions to return floating types following C23 (bug 28327) As discussed in bug 28327, C23 changed the fromfp functions to return floating types instead of intmax_t / uintmax_t. (Although the motivation in N2548 was reducing the use of intmax_t in library interfaces, the new version does have the advantage of being able to specify arbitrary integer widths for e.g. assigning the result to a _BitInt, as well as being able to indicate an error case in-band with a NaN return.) As with other such changes from interfaces introduced in TS 18661, implement the new types as a replacement for the old ones, with the old functions remaining as compat symbols but not supported as an API. The test generator used for many of the tests is updated to handle both versions of the functions. Tested for x86_64 and x86, and with build-many-glibcs.py. Also tested tgmath tests for x86_64 with GCC 7 to make sure that the modified case for older compilers in <tgmath.h> does work. Also tested for powerpc64le to cover the ldbl-128ibm implementation and the other things that are handled differently for that configuration. The new tests fail for ibm128, but all the failures relate to incorrect signs of zero results and turn out to arise from bugs in the underlying roundl, ceill, truncl and floorl implementations that I've reported in bug 33623, rather than indicating any bug in the actual new implementation of the functions for that format. So given fixes for those functions (which shouldn't be hard, and of course should add to the tests for those functions rather than relying only on indirect testing via fromfp), the fromfp tests should start passing for ibm128 as well.	2025-11-13 00:04:21 +00:00
Wilco Dijkstra	989e538224	math: Remove float_t and double_t [BZ #33563 ] Remove uses of float_t and double_t. This is not useful on modern machines, and does not help given GCC defaults to -fexcess-precision=fast. One use of double_t remains to allow forcing the precision to double on targets where FLT_EVAL_METHOD=2. This fixes BZ #33563 on i486-pc-linux-gnu. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-11-12 19:33:23 +00:00
Wilco Dijkstra	3b7bb7b2f2	math: Remove ldbl-128/s_fma.c Remove ldbl-128/s_fma.c - it makes no sense to use emulated float128 operations to emulate FMA. Benchmarking shows dbl-64/s_fma.c is about twice as fast. Remove redundant dbl-64/s_fma.c includes in targets that were trying to work around this issue. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-11-12 18:57:29 +00:00
Osama Abdelkader	e52d9542cd	math: Optimize frexpl (binary128) with fast path for normal numbers Add fast path optimization for frexpl (128-bit IEEE quad precision) using a single unsigned comparison to identify normal floating-point numbers and return immediately via arithmetic on the exponent field. The implementation uses arithmetic operations hx = hx - (ex << 48) to adjust the exponent in place, which is simpler and more efficient than bit masking. For subnormals, the traditional multiply-based normalization is retained for reliability with the split 64-bit word format. The zero/infinity/NaN check groups these special cases together for better branch prediction. This optimization provides the same algorithmic improvements as the other frexp variants while maintaining correctness for all edge cases. Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com> Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-10 08:58:19 -03:00
Osama Abdelkader	e05476b5c8	math: Optimize frexp (binary64) with fast path for normal numbers Add fast path optimization for frexp using a single unsigned comparison to identify normal floating-point numbers and return immediately via arithmetic on the bit representation. The implementation uses asuint64()/asdouble() from math_config.h and arithmetic operations to adjust the exponent, which generates better code than bit masking on ARM and RISC-V architectures. For subnormals, stdc_leading_zeros provides faster normalization than the traditional multiply approach. The zero/infinity/NaN check is simplified to (int64_t)(ix << 1) <= 0, which is more efficient than separate comparisons. Benchmark results on Intel Core i9-13900H (13th Gen): Baseline: 6.778 ns/op Optimized: 4.007 ns/op Speedup: 1.69x (40.9% faster) Zero: 3.580 ns/op (fast path) Denormal: 6.096 ns/op (slower, rare case) Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-11-10 08:58:18 -03:00
Osama Abdelkader	4d2582150e	math: Optimize frexpf (binary32) with fast path for normal numbers Add fast path optimization for frexpf using a single unsigned comparison to identify normal floating-point numbers and return immediately via arithmetic on the bit representation. The implementation uses asuint()/asfloat() from math_config.h and arithmetic operations to adjust the exponent, which generates better code than bit masking on ARM and RISC-V architectures. For subnormals, stdc_leading_zeros provides faster normalization than the traditional multiply approach. The zero/infinity/NaN check is simplified to (int32_t)(hx << 1) <= 0, which is more efficient than separate comparisons. Benchmark results on Intel Core i9-13900H (13th Gen): Baseline: 5.858 ns/op Optimized: 4.003 ns/op Speedup: 1.46x (31.7% faster) Zero: 3.580 ns/op (fast path) Denormal: 5.597 ns/op (slower, rare case) Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-11-10 08:58:18 -03:00
Adhemerval Zanella	b983c854e6	math: Sync acosh from CORE-MATH The c9abdf80 fix handle some cases for RNDZ. Checked on x86_64-linux-gnu.	2025-11-10 08:58:14 -03:00
Adhemerval Zanella	3078358ac6	math: Remove the SVID error handling from tgammaf It improves latency for about 1.5% and throughput for about 2-4%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-05 10:19:37 -03:00
Adhemerval Zanella	de0e623434	math: Remove the SVID error handling from lgammaf/lgammaf_r It improves latency throughput for about 2%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-05 09:27:07 -03:00
Adhemerval Zanella	7ec8eb5676	math: Remove the SVID error handling from atan2f It improves latency for about 3-6% and throughput for about 5-12%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-05 07:15:52 -03:00
Joseph Myers	26e4810210	Rename fromfp files in preparation for changing types for C23 As discussed in bug 28327, the fromfp functions changed type in C23 (compared to the version in TS 18661-1); they now return the same type as the floating-point argument, instead of intmax_t / uintmax_t. As with other such incompatible changes compared to the initial TS 18661 versions of interfaces (the types of totalorder functions, in particular), it seems appropriate to support only the new version as an API, not the old one (although many programs written for the old API might in fact work wtih the new one as well). Thus, the existing implementations should become compat symbols. They are sufficiently different from how I'd expect to implement the new version that using separate implementations in separate files is more convenient than trying to share code, and directly sharing testcases would be problematic as well. Rename the existing fromfp implementation and test files to names reflecting how they're intended to become compat symbols, so freeing up the existing filenames for a subsequent implementation of the C23 versions of these functions (which is the point at which the existing implementations would actually become compat symbols). gen-fromfp-tests.py and gen-fromfp-tests-inputs are not renamed; I think it will make sense to adapt the test generator to be able to generate most tests for both versions of the functions (with extra test inputs added that are only of interest with the C23 version). The ldbl-opt/nldbl-* files are also not renamed; since those are for a static only library, no compat versions are needed, and they'll just have their contents changed when the C23 version is implemented. Tested for x86_64, and with build-many-glibcs.py.	2025-11-04 23:41:35 +00:00
Adhemerval Zanella	0dfc849eff	math: Remove the SVID error handling wrapper from sqrt i386 and m68k architectures should use math-use-builtins-sqrt.h rather than relying on architecture-specific or inline assembly implementations. The PowerPC optimization for PPC 601/603 (30 years old) is removed. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Adhemerval Zanella	f27a146409	math: Remove the SVID error handling from sinhf It improves latency for about 3-10% and throughput for about 5-15%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Adhemerval Zanella	0e1a1178ee	math: Remove the SVID error handling from remainder The optimized i386 version is faster than the generic one, and gcc implements it through the builtin. This optimization enables us to migrate the implementation to a C version. The performance on a Zen3 chip is similar to the SVID one. The m68k provided an optimized version through __m81_u(remainderf) (mathimpl.h), and gcc does not implement it through a builtin (different than i386). Performance improves a bit on x86_64 (Zen3, gcc 15.2.1): reciprocal-throughput input master NO-SVID improvement x86_64 subnormals 18.8522 16.2506 13.80% x86_64 normal 421.8260 403.9270 4.24% x86_64 close-exponent 21.0579 18.7642 10.89% i686 subnormals 21.3443 21.4229 -0.37% i686 normal 525.8380 538.807 -2.47% i686 close-exponent 21.6589 21.7983 -0.64% Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Adhemerval Zanella	c4c6c79d70	math: Remove the SVID error handling from remainderf The optimized i386 version is faster than the generic one, and gcc implements it through the builtin. This optimization enables us to migrate the implementation to a C version. The performance on a Zen3 chip is similar to the SVID one. The m68k provided an optimized version through __m81_u(remainderf) (mathimpl.h), and gcc does not implement it through a builtin (different than i386). Performance improves a bit on x86_64 (Zen3, gcc 15.2.1): reciprocal-throughput input master NO-SVID improvement x86_64 subnormals 17.5349 15.6125 10.96% x86_64 normal 53.8134 52.5754 2.30% x86_64 close-exponent 20.0211 18.6656 6.77% i686 subnormals 21.8105 20.1856 7.45% i686 normal 73.1945 71.2199 2.70% i686 close-exponent 22.2141 20.331 8.48% Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Wilco Dijkstra	0212fc23b0	math: Fix pow special case [BZ #33563 ] Fix pow (DBL_MAX, 1.0) to return DBL_MAX when rouding upwards without FMA. This fixes BZ #33563. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-10-31 19:13:41 +00:00
Wilco Dijkstra	8917bd3eb3	math: Fix powf special case [BZ #33563 ] Fix powf (0x1.fffffep+127, 1.0f) to return 0x1.fffffep+127 when rouding upwards. Cleanup the special case code - performance improves by ~1.2%. This fixes BZ #33563. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-10-31 19:12:47 +00:00
Collin Funk	3fe3f62833	Cleanup some recently added whitespace. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-10-30 18:56:58 -07:00
Adhemerval Zanella	ee946212fe	math: Remove the SVID error handling wrapper from yn/jn Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:35 -03:00
Adhemerval Zanella	8d4815e6d7	math: Remove the SVID error handling wrapper from y1/j1 Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:33 -03:00
Adhemerval Zanella	b050cb53b0	math: Remove the SVID error handling wrapper from y0/j0 Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:31 -03:00
Adhemerval Zanella	03eeeba705	math: Remove the SVID error handling from coshf It improves latency for about 3-10% and throughput for about 5-15%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:28 -03:00
Adhemerval Zanella	555c39c0fc	math: Remove the SVID error handling from atanhf It improves latency for about 1-10% and throughput for about 5-10%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:26 -03:00
Adhemerval Zanella	8facb464b4	math: Remove the SVID error handling from acoshf It improves latency for about 3-7% and throughput for about 5-10%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:24 -03:00
Adhemerval Zanella	f92aba68bc	math: Remove the SVID error handling from asinf It improves latency for about 2% and throughput for about 5%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:22 -03:00
Adhemerval Zanella	9f8dea5b5d	math: Remove the SVID error handling from acosf It improves latency for about 2-10% and throughput for about 5-10%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:20 -03:00
Adhemerval Zanella	0b484d7b77	math: Remove the SVID error handling from log10f It improves latency for about 3-10% and throughput for about 5-10%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:17 -03:00
Adhemerval Zanella	970364dac0	Annotate swtich fall-through The clang default to warning for missing fall-through and it does not support all comment-like annotation that gcc does. Use C23 [[fallthrough]] annotation instead. proper attribute instead. Reviewed-by: Collin Funk <collin.funk1@gmail.com>	2025-10-29 12:54:01 -03:00
Adhemerval Zanella	36b4c553e6	Replace count_leading_zeros with stdc_leading_zeros Checked on x86_64-linux-gnu and aarch64-linux-gnu. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Collin Funk <collin.funk1@gmail.com>	2025-10-29 12:53:55 -03:00
Adhemerval Zanella	013f5167b9	math: Consolidate CORE-MATH double-double routines For lgamma and tgamma the muldd, mulddd, and polydd are renamed to muldd2, mulddd2, and polydd2 respectively. Checked on aarch64-linux-gnu and x86_64-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:46:04 -03:00
Adhemerval Zanella	e4d812c980	math: Consolidate erf/erfc definitions The common code definitions are consolidated in s_erf_common.h and s_erf_common.c. Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:46:01 -03:00
Adhemerval Zanella	fc419290f9	math: Consolidate internal erf/erfc tables The shared internal data definitions are consolidated in s_erf_data.c and the erfc only one are moved to s_erfc_data.c. Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:34:04 -03:00
Adhemerval Zanella	acaad9ab06	math: Use erfc from CORE-MATH The current implementation precision shows the following accuracy, on three ranges ([-DBL_MAX,5], [-5,5], [5,DBL_MAX]) with 10e9 uniform randomly generated numbers for each range (first column is the accuracy in ULP, with '0' being correctly rounded, second is the number of samples with the corresponding precision): * Range [-DBL_MAX, -5] * FE_TONEAREST 0: 10000000000 100.00% * FE_UPWARD 0: 10000000000 100.00% * FE_DOWNWARD 0: 10000000000 100.00% * FE_TOWARDZERO 0: 10000000000 100.00% * Range [-5, 5] * FE_TONEAREST 0: 8069309665 80.69% 1: 1882910247 18.83% 2: 47485296 0.47% 3: 293749 0.00% 4: 1043 0.00% * FE_UPWARD 0: 5540301026 55.40% 1: 2026739127 20.27% 2: 1774882486 17.75% 3: 567324466 5.67% 4: 86913847 0.87% 5: 3820789 0.04% 6: 18259 0.00% * FE_DOWNWARD 0: 5520969586 55.21% 1: 2057293099 20.57% 2: 1778334818 17.78% 3: 557521494 5.58% 4: 82473927 0.82% 5: 3393276 0.03% 6: 13800 0.00% * FE_TOWARDZERO 0: 6220287175 62.20% 1: 2323846149 23.24% 2: 1251999920 12.52% 3: 190748245 1.91% 4: 12996232 0.13% 5: 122279 0.00% * Range [5, DBL_MAX] * FE_TONEAREST 0: 10000000000 100.00% * FE_UPWARD 0: 10000000000 100.00% * FE_DOWNWARD 0: 10000000000 100.00% * FE_TOWARDZERO 0: 10000000000 100.00% The CORE-MATH implementation is correctly rounded for any rounding mode. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows: reciprocal-throughput master patched improvement x86_64 49.0980 267.0660 -443.94% x86_64v2 49.3220 257.6310 -422.34% x86_64v3 42.9539 84.9571 -97.79% aarch64 28.7266 52.9096 -84.18% power10 14.1673 25.1273 -77.36% Latency master patched improvement x86_64 95.6640 269.7060 -181.93% x86_64v2 95.8296 260.4860 -171.82% x86_64v3 91.1658 112.7150 -23.64% aarch64 37.0745 58.6791 -58.27% power10 23.3197 31.5737 -35.39% Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:34:04 -03:00
Adhemerval Zanella	72a48e45bd	math: Use erf from CORE-MATH The current implementation precision shows the following accuracy, on three rangeis ([-DBL_MIN, -4.2], [-4.2, 4.2], [4.2, DBL_MAX]) with 10e9 uniform randomly generated numbers for each range (first column is the accuracy in ULP, with '0' being correctly rounded, second is the number of samples with the corresponding precision): * Range [-DBL_MIN, -4.2] * FE_TONEAREST 0: 10000000000 100.00% * FE_UPWARD 0: 10000000000 100.00% * FE_DOWNWARD 0: 10000000000 100.00% * FE_TOWARDZERO 0: 10000000000 100.00% * Range [-4.2, 4.2] * FE_TONEAREST 0: 9764404513 97.64% 1: 235595487 2.36% * FE_UPWARD 0: 9468013928 94.68% 1: 531986072 5.32% * FE_DOWNWARD 0: 9493787693 94.94% 1: 506212307 5.06% * FE_TOWARDZERO 0: 9585271351 95.85% 1: 414728649 4.15% * Range [4.2, DBL_MAX] * FE_TONEAREST 0: 10000000000 100.00% * FE_UPWARD 0: 10000000000 100.00% * FE_DOWNWARD 0: 10000000000 100.00% * FE_TOWARDZERO 0: 10000000000 100.00% The CORE-MATH implementation is correctly rounded for any rounding mode. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows: reciprocal-throughput master patched improvement x86_64 38.2754 78.0311 -103.87% x86_64v2 38.3325 75.7555 -97.63% x86_64v3 34.6604 28.3182 18.30% aarch64 23.1499 21.4307 7.43% power10 12.3051 9.3766 23.80% Latency master patched improvement x86_64 84.3062 121.3580 -43.95% x86_64v2 84.1817 117.4250 -39.49% x86_64v3 81.0933 70.6458 12.88% aarch64 35.012 29.5012 15.74% power10 21.7205 18.4589 15.02% For x86_64/x86_64-v2, most performance hit came from the fma call through the ifunc mechanism. Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:34:04 -03:00
Adhemerval Zanella	1cae0550e8	math: Use tgamma from CORE-MATH The current implementation precision shows the following accuracy, on one range ([-20,20]) with 10e9 uniform randomly generated numbers for each range (first column is the accuracy in ULP, with '0' being correctly rounded, second is the number of samples with the corresponding precision): * Range [-20,20] * FE_TONEAREST 0: 4504877808 45.05% 1: 4402224940 44.02% 2: 947652295 9.48% 3: 131076831 1.31% 4: 13222216 0.13% 5: 910045 0.01% 6: 35253 0.00% 7: 606 0.00% 8: 6 0.00% * FE_UPWARD 0: 3477307921 34.77% 1: 4838637866 48.39% 2: 1413942684 14.14% 3: 240762564 2.41% 4: 27113094 0.27% 5: 2130934 0.02% 6: 102599 0.00% 7: 2324 0.00% 8: 14 0.00% * FE_DOWNWARD 0: 3923545410 39.24% 1: 4745067290 47.45% 2: 1137899814 11.38% 3: 171596912 1.72% 4: 20013805 0.20% 5: 1773899 0.02% 6: 99911 0.00% 7: 2928 0.00% 8: 31 0.00% * FE_TOWARDZERO 0: 3697160741 36.97% 1: 4731951491 47.32% 2: 1303092738 13.03% 3: 231969191 2.32% 4: 32344517 0.32% 5: 3283092 0.03% 6: 193010 0.00% 7: 5175 0.00% 8: 45 0.00% The CORE-MATH implementation is correctly rounded for any rounding mode. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows: reciprocal-throughput master patched improvement x86_64 237.7960 175.4090 26.24% x86_64v2 232.9320 163.4460 29.83% x86_64v3 193.0680 89.7721 53.50% aarch64 113.6340 56.7350 50.07% power10 92.0617 26.6137 71.09% Latency master patched improvement x86_64 266.7190 208.0130 22.01% x86_64v2 263.6070 200.0280 24.12% x86_64v3 214.0260 146.5180 31.54% aarch64 114.4760 58.5235 48.88% power10 84.3718 35.7473 57.63% Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:34:04 -03:00
Adhemerval Zanella	d67d2f4688	math: Use lgamma from CORE-MATH The current implementation precision shows the following accuracy, on one range ([-1,1]) with 10e9 uniform randomly generated numbers for each range (first column is the accuracy in ULP, with '0' being correctly rounded, second is the number of samples with the corresponding precision): * Range [-20, 20] * FE_TONEAREST 0: 6701254075 67.01% 1: 3230897408 32.31% 2: 63986940 0.64% 3: 3605417 0.04% 4: 233189 0.00% 5: 20973 0.00% 6: 1869 0.00% 7: 125 0.00% 8: 4 0.00% * FE_UPWARDA 0: 4207428861 42.07% 1: 5001137116 50.01% 2: 740542213 7.41% 3: 49116304 0.49% 4: 1715617 0.02% 5: 54464 0.00% 6: 4956 0.00% 7: 451 0.00% 8: 16 0.00% 9: 2 0.00% * FE_DOWNWARD 0: 4155925193 41.56% 1: 4989821364 49.90% 2: 770312796 7.70% 3: 72014726 0.72% 4: 11040522 0.11% 5: 872811 0.01% 6: 12480 0.00% 7: 106 0.00% 8: 2 0.00% * FE_TOWARDZERO 0: 4225861532 42.26% 1: 5027051105 50.27% 2: 706443411 7.06% 3: 39877908 0.40% 4: 713109 0.01% 5: 47513 0.00% 6: 4961 0.00% 7: 438 0.00% 8: 23 0.00% * Range [20, 0x5.d53649e2d4674p+1012] * FE_TONEAREST 0: 7262241995 72.62% 1: 2737758005 27.38% * FE_UPWARD 0: 4690392401 46.90% 1: 5143728216 51.44% 2: 165879383 1.66% * FE_DOWNWARD 0: 4690333331 46.90% 1: 5143794937 51.44% 2: 165871732 1.66% * FE_TOWARDZERO 0: 4690343071 46.90% 1: 5143786761 51.44% 2: 165870168 1.66% The CORE-MATH implementation is correctly rounded for any rounding mode. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows: reciprocal-throughput master patched improvement x86_64 112.9740 135.8640 -20.26% x86_64v2 111.8910 131.7590 -17.76% x86_64v3 108.2800 68.0935 37.11% aarch64 61.3759 49.2403 19.77% power10 42.4483 24.1943 43.00% Latency master patched improvement x86_64 144.0090 167.9750 -16.64% x86_64v2 139.2690 167.1900 -20.05% x86_64v3 130.1320 96.9347 25.51% aarch64 66.8538 53.2747 20.31% power10 49.5076 29.6917 40.03% For x86_64/x86_64-v2, most performance hit came from the fma call through the ifunc mechanism. Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:34:04 -03:00
Adhemerval Zanella	140e802cb3	math: Move atanh internal data to separate file The internal data definitions are moved to s_atanh_data.c. It helps on ABIs that build the implementation multiple times for ifunc optimizations, like x86_64. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:34:04 -03:00
Adhemerval Zanella	cb8d1575b6	math: Consolidate acosh and asinh internal table The shared internal data definitions are consolidated in s_asincosh_data.c. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:34:04 -03:00
Adhemerval Zanella	79b70fc09f	math: Use atanh from CORE-MATH The current implementation precision shows the following accuracy, on one range ([-1,1]) with 10e9 uniform randomly generated numbers for each range (first column is the accuracy in ULP, with '0' being correctly rounded, second is the number of samples with the corresponding precision): * Range [-1, 1] * FE_TONEAREST 0: 8180011860 81.80% 1: 1819865257 18.20% 2: 122883 0.00% * FE_UPWARDA 0: 3903695744 39.04% 1: 4992324465 49.92% 2: 1096319340 10.96% 3: 7660451 0.08% * FE_DOWNWARDA 0: 3904555484 39.05% 1: 4991970864 49.92% 2: 1095447471 10.95% 3: 8026181 0.08% * FE_TOWARDZERO 0: 7070209165 70.70% 1: 2908447434 29.08% 2: 21343401 0.21% The CORE-MATH implementation is correctly rounded for any rounding mode. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows: reciprocal-throughput master patched improvement x86_64 26.4969 22.4625 15.23% x86_64v2 26.0792 22.9822 11.88% x86_64v3 25.6357 22.2147 13.34% aarch64 20.2295 19.7001 2.62% power10 10.0986 9.3846 7.07% Latency master patched improvement x86_64 80.2311 59.9745 25.25% x86_64v2 79.7010 61.4066 22.95% x86_64v3 78.2679 58.5804 25.15% aarch64 34.3959 28.1523 18.15% power10 23.2417 18.2694 21.39% Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:34:04 -03:00
Adhemerval Zanella	30e66b085c	math: Use asinh from CORE-MATH The current implementation precision shows the following accuracy, on tthree different ranges ([-DBL_MAX, -10], [-10,10], and [10, DBL_MAX)) with 10e9 uniform randomly generated numbers for each range (first column is the accuracy in ULP, with '0' being correctly rounded, second is the number of samples with the corresponding precision): * range [-DBL_MAX, -10] * FE_TONEAREST 0: 5164019099 51.64% 1: 4835980901 48.36% * FE_UPWARD 1: 4836053540 48.36% 2: 5163946460 51.64% * FE_DOWNWARD 1: 5163926134 51.64% 2: 4836073866 48.36% * FE_TOWARDZERO 0: 5163937001 51.64% 1: 4836062999 48.36% * Range [-10, 10) * FE_TONEAREST 0: 8679029381 86.79% 1: 1320934581 13.21% 2: 36038 0.00% * FE_UPWARD 0: 3965704277 39.66% 1: 4993616710 49.94% 2: 1039680225 10.40% 3: 998788 0.01% * FE_DOWNWARD 0: 3965806523 39.66% 1: 4993534438 49.94% 2: 1039601726 10.40% 3: 1057313 0.01% * FE_TOWARDZEROA 0: 7734210130 77.34% 1: 2261868439 22.62% 2: 3921431 0.04% * Range [10, DBL_MAX) * FE_TONEAREST 0: 5163973212 51.64% 1: 4836026788 48.36% * FE_UPWARD 0: 4835991071 48.36% 1: 5164008929 51.64% * FE_DOWNWARD 0: 5163983594 51.64% 1: 4836016406 48.36% * FE_TOWARDZERO 0: 5163993394 51.64% 1: 4836006606 48.36% The CORE-MATH implementation is correctly rounded for any rounding mode. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows: reciprocal-throughput master patched improvement x86_64 26.5178 45.3754 -71.11% x86_64v2 26.3167 44.7870 -70.18% x86_64v3 25.9109 25.4887 1.63% aarch64 18.0555 17.3374 3.98% power10 19.8535 20.5586 -3.55% Latency master patched improvement x86_64 82.6755 91.2026 -10.31% x86_64v2 82.4581 90.7152 -10.01% x86_64v3 80.7000 71.9454 10.85% aarch64 32.8320 28.8565 12.11% power10 44.5309 37.0096 16.89% For x86_64/x86_64-v2, most performance hit came from the fma call through the ifunc mechanism. Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:34:04 -03:00
Adhemerval Zanella	d1509f2ce3	math: Use acosh from CORE-MATH The current implementation precision shows the following accuracy, on two different ranges ([1,21) and [21, DBL_MAX)) with 10e9 uniform randomly generated numbers (first column is the accuracy in ULP, with '0' being correctly rounded, second is the number of samples with the corresponding precision): * range [1,21] * FE_TONEAREST 0: 8931139411 89.31% 1: 1068697545 10.69% 2: 163044 0.00% * FE_UPWARD 0: 7936620731 79.37% 1: 2062594522 20.63% 2: 783977 0.01% 3: 770 0.00% * FE_DOWNWARD 0: 7936459794 79.36% 1: 2062734117 20.63% 2: 805312 0.01% 3: 777 0.00% * FE_TOWARDZERO 0: 7910345595 79.10% 1: 2088584522 20.89% 2: 1069106 0.01% 3: 777 0.00% * Range [21, DBL_MAX) * FE_TONEAREST 0: 5163888431 51.64% 1: 4836111569 48.36% * FE_UPWARD 0: 4835951885 48.36% 1: 5164048115 51.64% * FE_DOWNWARD 0: 5164048432 51.64% 1: 4835951568 48.36% * FE_TOWARDZERO 0: 5164058042 51.64% 1: 4835941958 48.36% The CORE-MATH implementation is correctly rounded for any rounding mode. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows: reciprocal-throughput master patched improvement x86_64 20.9131 47.2187 -125.79% x86_64v2 20.8823 41.1042 -96.84% x86_64v3 19.0282 25.8045 -35.61% aarch64 14.7419 18.1535 -23.14% power10 8.98341 11.0423 -22.92% Latency master patched improvement x86_64 75.5494 89.5979 -18.60% x86_64v2 74.4443 87.6292 -17.71% x86_64v3 71.8558 70.7086 1.60% aarch64 30.3361 29.2709 3.51% power10 20.9263 19.2482 8.02% For x86_64/x86_64-v2, most performance hit came from the fma call through the ifunc mechanism. Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:34:04 -03:00
Paul Zimmermann	48fde7b026	various fixes detected with -Wdouble-promotion Changes with respect to v1: - added comment in e_j1f.c to explain the use of float is enough Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-10-22 12:35:40 +02:00
Adhemerval Zanella	9d0b7ec87c	math: Suppress clang -Wincompatible-library-redeclaration on s_llround Clang issues: ../sysdeps/ieee754/dbl-64/s_llround.c:83:30: error: incompatible redeclaration of library function 'lround' [-Werror,-Wincompatible-library-redeclaration] libm_alias_double (__lround, lround) ^ ../sysdeps/ieee754/dbl-64/s_llround.c:83:30: note: 'lround' is a builtin with type 'long (double)' Reviewed-by: Sam James <sam@gentoo.org>	2025-10-21 09:24:27 -03:00
Adhemerval Zanella	407b2eea75	math: use fabs on __ieee754_lgamma_r clang issues: ../sysdeps/ieee754/dbl-64/e_lgamma_r.c:234:29: error: absolute value function 'fabsf' given an argument of type 'double' but has parameter of type 'float' which may cause \ truncation of value [-Werror,-Wabsolute-value] It should not matter because the value is 0.0, but using fabs is simpler than adding a warning suppresion. Reviewed-by: Sam James <sam@gentoo.org>	2025-10-21 09:24:24 -03:00
Adhemerval Zanella	76dfd91275	Suppress -Wmaybe-uninitialized only for gcc The warning is not supported by clang. Reviewed-by: Sam James <sam@gentoo.org>	2025-10-21 09:24:05 -03:00
Wilco Dijkstra	35807cc5cd	math: Add builtin support for (l)lround(f) Add builtin support for (l)lround(f) via the math-use-builtins header mechanism. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-10-17 17:03:54 +00:00
Adhemerval Zanella	850d93f514	math: Use binary search on lgammaf slow path And remove some unused entries of the fallback table. Checked on x86_64-linux-gnu and aarch64-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-14 11:12:08 -03:00

1 2 3 4 5 ...

1383 Commits