glibc

Commit Graph

Author	SHA1	Message	Date
Yao Zihong	09a94c86ca	riscv: memcpy_noalignment: Reorder to store via a3, then bump a3 Rewrite the copy micro-step from: REG_L a4, 0(a5) addi a3, a3, SZREG addi a5, a5, SZREG REG_S a4, -SZREG(a3) to: REG_L a4, 0(a5) addi a5, a5, SZREG REG_S a4, 0(a3) addi a3, a3, SZREG Semantics are unchanged: both read (a5_old), write (a3_old), and then increment a3/a5 by SZREG. memcpy assumes non-overlapping regions, so the reordering preserves correctness. No functional change. Signed-off-by: Yao Zihong <zihong.plct@isrc.iscas.ac.cn> Reviewed-by: Peter Bergner <bergner@tenstorrent.com>	2025-10-30 17:49:21 -05:00
Yao Zihong	0698fd462a	riscv: memcpy_noalignment: Fold SZREG/BLOCK_SIZE alignment to single andi Simplify the alignment steps for SZREG and BLOCK_SIZE multiples. The previous three-instruction sequences addi a7, a2, -SZREG andi a7, a7, -SZREG addi a7, a7, SZREG and addi a7, a2, -BLOCK_SIZE andi a7, a7, -BLOCK_SIZE addi a7, a7, BLOCK_SIZE are equivalent to a single andi a7, a2, -SZREG andi a7, a2, -BLOCK_SIZE because SZREG and BLOCK_SIZE are powers of two in this context, making the surrounding addi steps cancel out. Folding to one instruction reduces code size with identical semantics. No functional change. sysdeps/riscv/multiarch/memcpy_noalignment.S: Remove redundant addi around alignment; keep a single andi for SZREG/BLOCK_SIZE rounding. Signed-off-by: Yao Zihong <zihong.plct@isrc.iscas.ac.cn> Reviewed-by: Peter Bergner <bergner@tenstorrent.com>	2025-10-30 17:47:24 -05:00
Yao Zihong	444d81284e	riscv: memcpy_noalignment: Make register allocation Zca-friendly Tidy the temporary register allocation to favor registers eligible for compressed encodings when Zca/Zcb are enabled. This keeps the ABI and clobber set unchanged and does not alter control flow or memory access behavior. No functional change. sysdeps/riscv/multiarch/memcpy_noalignment.S: Reassign temps to improve compressed encoding opportunities. Signed-off-by: Yao Zihong <zihong.plct@isrc.iscas.ac.cn> Reviewed-by: Peter Bergner <bergner@tenstorrent.com>	2025-10-30 17:44:58 -05:00
Adhemerval Zanella	ee946212fe	math: Remove the SVID error handling wrapper from yn/jn Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:35 -03:00
Adhemerval Zanella	8d4815e6d7	math: Remove the SVID error handling wrapper from y1/j1 Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:33 -03:00
Adhemerval Zanella	b050cb53b0	math: Remove the SVID error handling wrapper from y0/j0 Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:31 -03:00
Adhemerval Zanella	03eeeba705	math: Remove the SVID error handling from coshf It improves latency for about 3-10% and throughput for about 5-15%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:28 -03:00
Adhemerval Zanella	555c39c0fc	math: Remove the SVID error handling from atanhf It improves latency for about 1-10% and throughput for about 5-10%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:26 -03:00
Adhemerval Zanella	8facb464b4	math: Remove the SVID error handling from acoshf It improves latency for about 3-7% and throughput for about 5-10%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:24 -03:00
Adhemerval Zanella	f92aba68bc	math: Remove the SVID error handling from asinf It improves latency for about 2% and throughput for about 5%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:22 -03:00
Adhemerval Zanella	9f8dea5b5d	math: Remove the SVID error handling from acosf It improves latency for about 2-10% and throughput for about 5-10%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:20 -03:00
Adhemerval Zanella	0b484d7b77	math: Remove the SVID error handling from log10f It improves latency for about 3-10% and throughput for about 5-10%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:17 -03:00
Adhemerval Zanella	6deadd4eb6	m68k: Remove SVID error handling on fmod The m68k provided an optimized version through __m81_u(fmod) (mathimpl.h), and gcc does not implement it through a builtin (different than i386). Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:15 -03:00
Adhemerval Zanella	b19904cfb2	m68k: Avoid include e_fmod.c on fmod/remainder implementation And open-code each implementation. It simplifies SVID error handling removal. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:12 -03:00
Adhemerval Zanella	ade9f30ce2	m68k: Remove the SVID error handling from fmodf The m68k provided an optimized version through __m81_u(fmodf) (mathimpl.h), and gcc does not implement it through a builtin (different than i386). Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:10 -03:00
Adhemerval Zanella	1dd2163e51	i386: Remove the SVID error handling from fmodf The optimized i386 version is faster than the generic one, and gcc implements it through the builtin. It allows us to move the implementation to a C one. The performance on a Zen3 chip is slight better: reciprocal-throughput input master no-SVID improvement i686 subnormals 22.4741 20.1571 10.31% i686 normal 74.1631 70.3606 5.13% i686 close-exponent 22.5625 20.2435 10.28% Tested on i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:07 -03:00
Adhemerval Zanella	bfee89dc8a	i386: Remove the SVID error handling from fmod The optimized i386 version is faster than the generic one, and gcc implements it through the builtin. It allows us to move the implementation to a C one. The performance on a Zen3 chip is similar to the SVID one. Tested on i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:40:41 -03:00
Jiamei Xie	4d86b6cdd8	x86: fix wmemset ifunc stray '!' (bug 33542) The ifunc selector for wmemset had a stray '!' in the X86_ISA_CPU_FEATURES_ARCH_P(...) check: if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load, !)) This effectively negated the predicate and caused the AVX2/AVX512 paths to be skipped, making the dispatcher fall back to the SSE2 implementation even on CPUs where AVX2/AVX512 are available. The regression leads to noticeable throughput loss for wmemset. Remove the stray '!' so the AVX_Fast_Unaligned_Load capability is tested as intended and the correct AVX2/EVEX variants are selected. Impact: - On AVX2/AVX512-capable x86_64, wmemset no longer incorrectly falls back to SSE2; perf now shows __wmemset_evex/avx2 variants. Testing: - benchtests/bench-wmemset shows improved bandwidth across sizes. - perf confirm the selected symbol is no longer SSE2. Signed-off-by: xiejiamei <xiejiamei@hygon.com> Signed-off-by: Li jing <lijing@hygon.cn> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-10-29 12:54:14 -03:00
Jiayuan Chen	1177d2f26c	Updates struct tcp_zerocopy_receive from 5.11 to netinet/tcp.h. This patch updates struct tcp_zerocopy_receive to contain filed including copybuf_address, copybuf_len, and others. Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-10-29 12:54:12 -03:00
Adhemerval Zanella	8711c29bb7	aarch64: Fix tst-ifunc-arg-4 on clang-18 It issues: ../sysdeps/aarch64/tst-ifunc-arg-4.c:39:1: error: unused function 'resolver' [-Werror,-Wunused-function] 39 \| resolver (uint64_t arg0, const uint64_t arg1[]) \| ^~~~~~~~ 1 error generated. clang-19 and onwards do not trigger the warning. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-10-29 12:54:10 -03:00
Adhemerval Zanella	d49d917b90	Enable --no-undefined-version by default Recent lld version default to --no-undefined-version, which triggers errors when building multiple libraries. For ld.so on x86_64 it fails with: ld.lld: error: version script assignment of 'GLIBC_2.4' to symbol '__stack_chk_guard' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_PRIVATE' to symbol '__nptl_set_robust_list_avail' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_PRIVATE' to symbol '__pointer_chk_guard' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_PRIVATE' to symbol '_dl_starting_up' failed: symbol not defined While for libc.so: ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '_IO_clearerr' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '_IO_fgetc' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '_IO_fileno' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '_IO_freopen' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '_IO_fscanf' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '_IO_fseek' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '_IO_peekc_unlocked' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '_IO_stderr_' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '_IO_stdin_' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '_IO_stdout_' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '_IO_pclose' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '_IO_perror' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '_IO_rewind' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '_IO_scanf' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '_IO_setbuf' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '_IO_setlinebuf' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '_IO_wdefault_setbuf' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '_IO_wfile_setbuf' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '__ctype32_tolower' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '__ctype32_toupper' failed: symbol not defined ld.lld: error: too many errors emitted, stopping now (use --error-limit=0 to see all errors) The version script is created with multiple missing symbols to simplify the build for multiple ABIs, each of which may have different symbols. For instance, __stack_chk_guard is defined by default. This avoids requiring each ABI to add this symbol to its version script, depending on the stack protector ABI it uses. The libc.so warnings do show unused symbols being defined (like _IO_clearerr), which might trigger potential errors depending on how symbols are exported. However, since we already have ABI checks for missing and extra symbols, the linker's extra checks are not really necessary. The --no-undefined-version is the default for ld.bfd. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-10-29 12:54:06 -03:00
Adhemerval Zanella	1ab6a62e68	Supress unused command arguments warning with clang clang 20 issues an warning for the unused '-c' argument used to create errlist-data-aux-shared.S, errlist-data-aux.S, siglist-aux-shared.S, and siglist-aux.S. Filter out the '-c' from the $(compile-command.c). Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-10-29 12:54:03 -03:00
Adhemerval Zanella	970364dac0	Annotate swtich fall-through The clang default to warning for missing fall-through and it does not support all comment-like annotation that gcc does. Use C23 [[fallthrough]] annotation instead. proper attribute instead. Reviewed-by: Collin Funk <collin.funk1@gmail.com>	2025-10-29 12:54:01 -03:00
Adhemerval Zanella	543ddd628f	argp: Move attribute_hidden to argp-fmtstream.h The internal header redefines the some internal argp functions with attribute_hidden, which triggers clang warning of mismatched attributes. Reviewed-by: Collin Funk <collin.funk1@gmail.com>	2025-10-29 12:54:00 -03:00
Adhemerval Zanella	110ec4954e	argp: Expand argp_usage, _option_is_short, and _option_is_end The argp code uses macro redefinitions to avoid duplicating static inline implementations for argp_usage, _option_is_short, and _option_is_end. However, this causes build issues with clang, as some function prototypes are redefined to add the hidden attribute with libc_hidden_proto. To avoid extensive changes to internal headers, just expand the function implementations and avoid the macro redefine tricks. Reviewed-by: Collin Funk <collin.funk1@gmail.com>	2025-10-29 12:53:57 -03:00
Adhemerval Zanella	36b4c553e6	Replace count_leading_zeros with stdc_leading_zeros Checked on x86_64-linux-gnu and aarch64-linux-gnu. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Collin Funk <collin.funk1@gmail.com>	2025-10-29 12:53:55 -03:00
Adhemerval Zanella	f91abbde02	malloc: Remove unused tcache_set_inactive clang warns that this function is not used. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-10-29 12:53:53 -03:00
Adhemerval Zanella	602fdf5d69	include: Sync gnulib intprops It syncs with gnulib commit 1790ef25d81983d1d25a77d452c0080345df459b. The main change is to proper support clang by using builtins. It fixes a sprof build issue, where previous version uses the generic code path when building with clang: sprof.c:682:8: error: result of comparison of constant 288230376151711743 with expression of type 'Elf64_Half' (aka 'unsigned short') is always false [-Werror,-Wtautological-constant-out-of-range-compare] 682 \| if (INT_MULTIPLY_WRAPV (ehdr2.e_shnum, sizeof (ElfW(Shdr)), &size)) \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../include/intprops.h:415:34: note: expanded from macro 'INT_MULTIPLY_WRAPV' 415 \| _GL_INT_OP_WRAPV (a, b, r, *, _GL_INT_MULTIPLY_RANGE_OVERFLOW) \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../include/intprops.h:504:45: note: expanded from macro '_GL_INT_OP_WRAPV' 504 \| : _GL_INT_OP_WRAPV_LONGISH(a, b, r, op, overflow)) \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~ ../include/intprops.h:511:41: note: expanded from macro '_GL_INT_OP_WRAPV_LONGISH' 511 \| : _GL_INT_OP_CALC (a, b, r, op, overflow, unsigned long int, \ \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 512 \| unsigned long int, 0, ULONG_MAX)) \ \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../include/intprops.h:533:4: note: expanded from macro '_GL_INT_OP_CALC' 533 \| (overflow (a, b, tmin, tmax) \ \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~ ../include/intprops.h:608:22: note: expanded from macro '_GL_INT_MULTIPLY_RANGE_OVERFLOW' 608 \| : (tmax) / (b) < (a))) \| ~~~~~~~~~~~~ ^ ~~~ 1 error generated. Reviewed-by: Collin Funk <collin.funk1@gmail.com>	2025-10-29 12:53:50 -03:00
Adhemerval Zanella	5ee722d3ac	i386: Build s_erf_common.c with -fexcess-precision=standard It is requires to provide correctly rounded results. Checked on i686-linux-gnu.	2025-10-29 10:17:34 -03:00
H.J. Lu	14243c9db6	Build programs in $(others-noinstall) like tests Programs in $(others-noinstall) are internal to glibc build and they aren't installed. They should be treated like programs in $(others), but linked like tests so that --enable-hardcoded-path-in-tests also applies to them. Also replace run-via-rtld-prefix with test-via-rtld-prefix when running container tests. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-29 12:04:40 +08:00
Osama Abdelkader	96073e9f34	Fix incorrect setrlimit return value checks in tests The setrlimit(2) function returns 0 on success and -1 on error, but several test files were incorrectly checking for a return value of 1 to detect errors. This means the error checks would never trigger, causing tests to continue silently even when setrlimit() failed. This commit fixes the error checks in five files to correctly test for -1, matching both the documented behavior and the pattern used correctly in other parts of the codebase. Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com> Reviewed-by: Collin Funk <collin.funk1@gmail.com>	2025-10-28 18:51:51 -07:00
Joseph Myers	096fcdc0a5	Rename uimaxabs to umaxabs (bug 33325) The C2y function uimaxabs has been renamed to umaxabs. Implement this change in glibc, keeping a compat symbol under the old name, copying the test to test the new name and changing the old test to test the compat symbol. Jakub has done the corresponding change to the built-in function in GCC. Tested for x86_64 and x86.	2025-10-28 12:15:02 +00:00
Adhemerval Zanella	013f5167b9	math: Consolidate CORE-MATH double-double routines For lgamma and tgamma the muldd, mulddd, and polydd are renamed to muldd2, mulddd2, and polydd2 respectively. Checked on aarch64-linux-gnu and x86_64-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:46:04 -03:00
Adhemerval Zanella	e4d812c980	math: Consolidate erf/erfc definitions The common code definitions are consolidated in s_erf_common.h and s_erf_common.c. Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:46:01 -03:00
Adhemerval Zanella	fc419290f9	math: Consolidate internal erf/erfc tables The shared internal data definitions are consolidated in s_erf_data.c and the erfc only one are moved to s_erfc_data.c. Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:34:04 -03:00
Adhemerval Zanella	acaad9ab06	math: Use erfc from CORE-MATH The current implementation precision shows the following accuracy, on three ranges ([-DBL_MAX,5], [-5,5], [5,DBL_MAX]) with 10e9 uniform randomly generated numbers for each range (first column is the accuracy in ULP, with '0' being correctly rounded, second is the number of samples with the corresponding precision): * Range [-DBL_MAX, -5] * FE_TONEAREST 0: 10000000000 100.00% * FE_UPWARD 0: 10000000000 100.00% * FE_DOWNWARD 0: 10000000000 100.00% * FE_TOWARDZERO 0: 10000000000 100.00% * Range [-5, 5] * FE_TONEAREST 0: 8069309665 80.69% 1: 1882910247 18.83% 2: 47485296 0.47% 3: 293749 0.00% 4: 1043 0.00% * FE_UPWARD 0: 5540301026 55.40% 1: 2026739127 20.27% 2: 1774882486 17.75% 3: 567324466 5.67% 4: 86913847 0.87% 5: 3820789 0.04% 6: 18259 0.00% * FE_DOWNWARD 0: 5520969586 55.21% 1: 2057293099 20.57% 2: 1778334818 17.78% 3: 557521494 5.58% 4: 82473927 0.82% 5: 3393276 0.03% 6: 13800 0.00% * FE_TOWARDZERO 0: 6220287175 62.20% 1: 2323846149 23.24% 2: 1251999920 12.52% 3: 190748245 1.91% 4: 12996232 0.13% 5: 122279 0.00% * Range [5, DBL_MAX] * FE_TONEAREST 0: 10000000000 100.00% * FE_UPWARD 0: 10000000000 100.00% * FE_DOWNWARD 0: 10000000000 100.00% * FE_TOWARDZERO 0: 10000000000 100.00% The CORE-MATH implementation is correctly rounded for any rounding mode. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows: reciprocal-throughput master patched improvement x86_64 49.0980 267.0660 -443.94% x86_64v2 49.3220 257.6310 -422.34% x86_64v3 42.9539 84.9571 -97.79% aarch64 28.7266 52.9096 -84.18% power10 14.1673 25.1273 -77.36% Latency master patched improvement x86_64 95.6640 269.7060 -181.93% x86_64v2 95.8296 260.4860 -171.82% x86_64v3 91.1658 112.7150 -23.64% aarch64 37.0745 58.6791 -58.27% power10 23.3197 31.5737 -35.39% Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:34:04 -03:00
Adhemerval Zanella	72a48e45bd	math: Use erf from CORE-MATH The current implementation precision shows the following accuracy, on three rangeis ([-DBL_MIN, -4.2], [-4.2, 4.2], [4.2, DBL_MAX]) with 10e9 uniform randomly generated numbers for each range (first column is the accuracy in ULP, with '0' being correctly rounded, second is the number of samples with the corresponding precision): * Range [-DBL_MIN, -4.2] * FE_TONEAREST 0: 10000000000 100.00% * FE_UPWARD 0: 10000000000 100.00% * FE_DOWNWARD 0: 10000000000 100.00% * FE_TOWARDZERO 0: 10000000000 100.00% * Range [-4.2, 4.2] * FE_TONEAREST 0: 9764404513 97.64% 1: 235595487 2.36% * FE_UPWARD 0: 9468013928 94.68% 1: 531986072 5.32% * FE_DOWNWARD 0: 9493787693 94.94% 1: 506212307 5.06% * FE_TOWARDZERO 0: 9585271351 95.85% 1: 414728649 4.15% * Range [4.2, DBL_MAX] * FE_TONEAREST 0: 10000000000 100.00% * FE_UPWARD 0: 10000000000 100.00% * FE_DOWNWARD 0: 10000000000 100.00% * FE_TOWARDZERO 0: 10000000000 100.00% The CORE-MATH implementation is correctly rounded for any rounding mode. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows: reciprocal-throughput master patched improvement x86_64 38.2754 78.0311 -103.87% x86_64v2 38.3325 75.7555 -97.63% x86_64v3 34.6604 28.3182 18.30% aarch64 23.1499 21.4307 7.43% power10 12.3051 9.3766 23.80% Latency master patched improvement x86_64 84.3062 121.3580 -43.95% x86_64v2 84.1817 117.4250 -39.49% x86_64v3 81.0933 70.6458 12.88% aarch64 35.012 29.5012 15.74% power10 21.7205 18.4589 15.02% For x86_64/x86_64-v2, most performance hit came from the fma call through the ifunc mechanism. Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:34:04 -03:00
Adhemerval Zanella	1cae0550e8	math: Use tgamma from CORE-MATH The current implementation precision shows the following accuracy, on one range ([-20,20]) with 10e9 uniform randomly generated numbers for each range (first column is the accuracy in ULP, with '0' being correctly rounded, second is the number of samples with the corresponding precision): * Range [-20,20] * FE_TONEAREST 0: 4504877808 45.05% 1: 4402224940 44.02% 2: 947652295 9.48% 3: 131076831 1.31% 4: 13222216 0.13% 5: 910045 0.01% 6: 35253 0.00% 7: 606 0.00% 8: 6 0.00% * FE_UPWARD 0: 3477307921 34.77% 1: 4838637866 48.39% 2: 1413942684 14.14% 3: 240762564 2.41% 4: 27113094 0.27% 5: 2130934 0.02% 6: 102599 0.00% 7: 2324 0.00% 8: 14 0.00% * FE_DOWNWARD 0: 3923545410 39.24% 1: 4745067290 47.45% 2: 1137899814 11.38% 3: 171596912 1.72% 4: 20013805 0.20% 5: 1773899 0.02% 6: 99911 0.00% 7: 2928 0.00% 8: 31 0.00% * FE_TOWARDZERO 0: 3697160741 36.97% 1: 4731951491 47.32% 2: 1303092738 13.03% 3: 231969191 2.32% 4: 32344517 0.32% 5: 3283092 0.03% 6: 193010 0.00% 7: 5175 0.00% 8: 45 0.00% The CORE-MATH implementation is correctly rounded for any rounding mode. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows: reciprocal-throughput master patched improvement x86_64 237.7960 175.4090 26.24% x86_64v2 232.9320 163.4460 29.83% x86_64v3 193.0680 89.7721 53.50% aarch64 113.6340 56.7350 50.07% power10 92.0617 26.6137 71.09% Latency master patched improvement x86_64 266.7190 208.0130 22.01% x86_64v2 263.6070 200.0280 24.12% x86_64v3 214.0260 146.5180 31.54% aarch64 114.4760 58.5235 48.88% power10 84.3718 35.7473 57.63% Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:34:04 -03:00
Adhemerval Zanella	d67d2f4688	math: Use lgamma from CORE-MATH The current implementation precision shows the following accuracy, on one range ([-1,1]) with 10e9 uniform randomly generated numbers for each range (first column is the accuracy in ULP, with '0' being correctly rounded, second is the number of samples with the corresponding precision): * Range [-20, 20] * FE_TONEAREST 0: 6701254075 67.01% 1: 3230897408 32.31% 2: 63986940 0.64% 3: 3605417 0.04% 4: 233189 0.00% 5: 20973 0.00% 6: 1869 0.00% 7: 125 0.00% 8: 4 0.00% * FE_UPWARDA 0: 4207428861 42.07% 1: 5001137116 50.01% 2: 740542213 7.41% 3: 49116304 0.49% 4: 1715617 0.02% 5: 54464 0.00% 6: 4956 0.00% 7: 451 0.00% 8: 16 0.00% 9: 2 0.00% * FE_DOWNWARD 0: 4155925193 41.56% 1: 4989821364 49.90% 2: 770312796 7.70% 3: 72014726 0.72% 4: 11040522 0.11% 5: 872811 0.01% 6: 12480 0.00% 7: 106 0.00% 8: 2 0.00% * FE_TOWARDZERO 0: 4225861532 42.26% 1: 5027051105 50.27% 2: 706443411 7.06% 3: 39877908 0.40% 4: 713109 0.01% 5: 47513 0.00% 6: 4961 0.00% 7: 438 0.00% 8: 23 0.00% * Range [20, 0x5.d53649e2d4674p+1012] * FE_TONEAREST 0: 7262241995 72.62% 1: 2737758005 27.38% * FE_UPWARD 0: 4690392401 46.90% 1: 5143728216 51.44% 2: 165879383 1.66% * FE_DOWNWARD 0: 4690333331 46.90% 1: 5143794937 51.44% 2: 165871732 1.66% * FE_TOWARDZERO 0: 4690343071 46.90% 1: 5143786761 51.44% 2: 165870168 1.66% The CORE-MATH implementation is correctly rounded for any rounding mode. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows: reciprocal-throughput master patched improvement x86_64 112.9740 135.8640 -20.26% x86_64v2 111.8910 131.7590 -17.76% x86_64v3 108.2800 68.0935 37.11% aarch64 61.3759 49.2403 19.77% power10 42.4483 24.1943 43.00% Latency master patched improvement x86_64 144.0090 167.9750 -16.64% x86_64v2 139.2690 167.1900 -20.05% x86_64v3 130.1320 96.9347 25.51% aarch64 66.8538 53.2747 20.31% power10 49.5076 29.6917 40.03% For x86_64/x86_64-v2, most performance hit came from the fma call through the ifunc mechanism. Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:34:04 -03:00
Adhemerval Zanella	140e802cb3	math: Move atanh internal data to separate file The internal data definitions are moved to s_atanh_data.c. It helps on ABIs that build the implementation multiple times for ifunc optimizations, like x86_64. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:34:04 -03:00
Adhemerval Zanella	cb8d1575b6	math: Consolidate acosh and asinh internal table The shared internal data definitions are consolidated in s_asincosh_data.c. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:34:04 -03:00
Adhemerval Zanella	79b70fc09f	math: Use atanh from CORE-MATH The current implementation precision shows the following accuracy, on one range ([-1,1]) with 10e9 uniform randomly generated numbers for each range (first column is the accuracy in ULP, with '0' being correctly rounded, second is the number of samples with the corresponding precision): * Range [-1, 1] * FE_TONEAREST 0: 8180011860 81.80% 1: 1819865257 18.20% 2: 122883 0.00% * FE_UPWARDA 0: 3903695744 39.04% 1: 4992324465 49.92% 2: 1096319340 10.96% 3: 7660451 0.08% * FE_DOWNWARDA 0: 3904555484 39.05% 1: 4991970864 49.92% 2: 1095447471 10.95% 3: 8026181 0.08% * FE_TOWARDZERO 0: 7070209165 70.70% 1: 2908447434 29.08% 2: 21343401 0.21% The CORE-MATH implementation is correctly rounded for any rounding mode. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows: reciprocal-throughput master patched improvement x86_64 26.4969 22.4625 15.23% x86_64v2 26.0792 22.9822 11.88% x86_64v3 25.6357 22.2147 13.34% aarch64 20.2295 19.7001 2.62% power10 10.0986 9.3846 7.07% Latency master patched improvement x86_64 80.2311 59.9745 25.25% x86_64v2 79.7010 61.4066 22.95% x86_64v3 78.2679 58.5804 25.15% aarch64 34.3959 28.1523 18.15% power10 23.2417 18.2694 21.39% Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:34:04 -03:00
Adhemerval Zanella	30e66b085c	math: Use asinh from CORE-MATH The current implementation precision shows the following accuracy, on tthree different ranges ([-DBL_MAX, -10], [-10,10], and [10, DBL_MAX)) with 10e9 uniform randomly generated numbers for each range (first column is the accuracy in ULP, with '0' being correctly rounded, second is the number of samples with the corresponding precision): * range [-DBL_MAX, -10] * FE_TONEAREST 0: 5164019099 51.64% 1: 4835980901 48.36% * FE_UPWARD 1: 4836053540 48.36% 2: 5163946460 51.64% * FE_DOWNWARD 1: 5163926134 51.64% 2: 4836073866 48.36% * FE_TOWARDZERO 0: 5163937001 51.64% 1: 4836062999 48.36% * Range [-10, 10) * FE_TONEAREST 0: 8679029381 86.79% 1: 1320934581 13.21% 2: 36038 0.00% * FE_UPWARD 0: 3965704277 39.66% 1: 4993616710 49.94% 2: 1039680225 10.40% 3: 998788 0.01% * FE_DOWNWARD 0: 3965806523 39.66% 1: 4993534438 49.94% 2: 1039601726 10.40% 3: 1057313 0.01% * FE_TOWARDZEROA 0: 7734210130 77.34% 1: 2261868439 22.62% 2: 3921431 0.04% * Range [10, DBL_MAX) * FE_TONEAREST 0: 5163973212 51.64% 1: 4836026788 48.36% * FE_UPWARD 0: 4835991071 48.36% 1: 5164008929 51.64% * FE_DOWNWARD 0: 5163983594 51.64% 1: 4836016406 48.36% * FE_TOWARDZERO 0: 5163993394 51.64% 1: 4836006606 48.36% The CORE-MATH implementation is correctly rounded for any rounding mode. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows: reciprocal-throughput master patched improvement x86_64 26.5178 45.3754 -71.11% x86_64v2 26.3167 44.7870 -70.18% x86_64v3 25.9109 25.4887 1.63% aarch64 18.0555 17.3374 3.98% power10 19.8535 20.5586 -3.55% Latency master patched improvement x86_64 82.6755 91.2026 -10.31% x86_64v2 82.4581 90.7152 -10.01% x86_64v3 80.7000 71.9454 10.85% aarch64 32.8320 28.8565 12.11% power10 44.5309 37.0096 16.89% For x86_64/x86_64-v2, most performance hit came from the fma call through the ifunc mechanism. Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:34:04 -03:00
Adhemerval Zanella	d1509f2ce3	math: Use acosh from CORE-MATH The current implementation precision shows the following accuracy, on two different ranges ([1,21) and [21, DBL_MAX)) with 10e9 uniform randomly generated numbers (first column is the accuracy in ULP, with '0' being correctly rounded, second is the number of samples with the corresponding precision): * range [1,21] * FE_TONEAREST 0: 8931139411 89.31% 1: 1068697545 10.69% 2: 163044 0.00% * FE_UPWARD 0: 7936620731 79.37% 1: 2062594522 20.63% 2: 783977 0.01% 3: 770 0.00% * FE_DOWNWARD 0: 7936459794 79.36% 1: 2062734117 20.63% 2: 805312 0.01% 3: 777 0.00% * FE_TOWARDZERO 0: 7910345595 79.10% 1: 2088584522 20.89% 2: 1069106 0.01% 3: 777 0.00% * Range [21, DBL_MAX) * FE_TONEAREST 0: 5163888431 51.64% 1: 4836111569 48.36% * FE_UPWARD 0: 4835951885 48.36% 1: 5164048115 51.64% * FE_DOWNWARD 0: 5164048432 51.64% 1: 4835951568 48.36% * FE_TOWARDZERO 0: 5164058042 51.64% 1: 4835941958 48.36% The CORE-MATH implementation is correctly rounded for any rounding mode. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows: reciprocal-throughput master patched improvement x86_64 20.9131 47.2187 -125.79% x86_64v2 20.8823 41.1042 -96.84% x86_64v3 19.0282 25.8045 -35.61% aarch64 14.7419 18.1535 -23.14% power10 8.98341 11.0423 -22.92% Latency master patched improvement x86_64 75.5494 89.5979 -18.60% x86_64v2 74.4443 87.6292 -17.71% x86_64v3 71.8558 70.7086 1.60% aarch64 30.3361 29.2709 3.51% power10 20.9263 19.2482 8.02% For x86_64/x86_64-v2, most performance hit came from the fma call through the ifunc mechanism. Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:34:04 -03:00
Collin Funk	3d20d746c3	Linux: fix tst-copy_file_range-large test on 32-bit platforms. Since SSIZE_MAX is less than UINT_MAX on 32-bit platforms we must AND the expression with SSIZE_MAX. Tested on x86_64 and x86. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-10-26 19:39:49 -07:00
litenglong	00d406e77b	x86: Disable AVX Fast Unaligned Load on Hygon 1/2/3 - Performance testing revealed significant memcpy performance degradation when bit_arch_AVX_Fast_Unaligned_Load is enabled on Hygon 3. - Hygon confirmed AVX performance issues in certain memory functions. - Glibc benchmarks show SSE outperforms AVX for memcpy/memmove/memset/strcmp/strcpy/strlen and so on. - Hardware differences primarily in floating-point operations don't justify AVX usage for memory operations. Reviewed-by: gaoxiang <gaoxiang@kylinos.cn> Signed-off-by: litenglong <litenglong@kylinos.cn> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-10-27 05:16:30 +08:00
Sachin Monga	b59799f14f	ppc64le: Power 10 rawmemchr clobbers v20 (bug #33091 ) Replace non-volatile(v20) by volatile(v17) since v20 is not restored Reviewed-by: Peter Bergner <bergner@tenstorrent.com>	2025-10-26 12:19:53 -05:00
Dev Jain	b2b4b46a52	malloc: fix large tcache code to check for exact size match The tcache is used for allocation only if an exact match is found. In the large tcache code added in commit `cbfd798810`, we currently extract a chunk of size greater than or equal to the size we need, but don't check strict equality. This patch fixes that behaviour. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-24 16:55:02 +00:00
Adhemerval Zanella	48e040d568	Fix configure from `ab22e5ec37` The "-Wno-unused-command-line-argument" was incorrectly added.	2025-10-22 17:23:12 -03:00
Adhemerval Zanella	6e862a07f7	misc: Fix clang -Wstring-plus-int warnings on syslog clang issues: syslog.c:193:9: error: adding 'int' to a string does not append to the string [-Werror,-Wstring-plus-int] 193 \| SYSLOG_HEADER (pri, timestamp, &msgoff, pid)); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ syslog.c:180:7: note: expanded from macro 'SYSLOG_HEADER' 180 \| "[" + (pid == 0), pid, "]" + (pid == 0) Use array indexes instead of string addition (it is simpler than add a supress warning).	2025-10-22 16:35:39 -03:00

1 2 3 4 5 ...

43003 Commits All Branches Search

43003 Commits

All Branches