glibc

Commit Graph

Author	SHA1	Message	Date
Adhemerval Zanella	ed608a40e2	math: Use asinhf from CORE-MATH The CORE-MATH implementation is correctly rounded (for any rounding mode) and shows slight better performance to the generic asinhf. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1): Latency master patched improvement x86_64 64.5128 56.9717 11.69% x86_64v2 63.3065 57.2666 9.54% x86_64v3 62.8719 51.4170 18.22% i686 189.1630 137.635 27.24% aarch64 (Neoverse) 25.3551 20.5757 18.85% power10 17.9712 13.3302 25.82% reciprocal-throughput master patched improvement x86_64 20.0844 15.4731 22.96% x86_64v2 19.2919 15.4000 20.17% x86_64v3 18.7226 11.9009 36.44% i686 103.7670 80.2681 22.65% aarch64 (Neoverse) 12.5005 8.68969 30.49% power10 7.2220 5.03617 30.27% Signed-off-by: Alexei Sibidanov <sibid@uvic.ca> Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr> Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>: Reviewed-by: DJ Delorie <dj@redhat.com>	2024-12-18 17:24:43 -03:00
Adhemerval Zanella	5fb4b566ef	math: Use asinf from CORE-MATH The CORE-MATH implementation is correctly rounded (for any rounding mode) and shows slight better performance to the generic asinf. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1): Latency master patched improvement x86_64 42.8237 35.2460 17.70% x86_64v2 43.3711 35.9406 17.13% x86_64v3 35.0335 30.5744 12.73% i686 213.8780 104.4710 51.15% aarch64 (Neoverse) 17.2937 13.6025 21.34% power10 12.0227 7.4241 38.25% reciprocal-throughput master patched improvement x86_64 13.6770 15.5231 -13.50% x86_64v2 13.8722 16.0446 -15.66% x86_64v3 13.6211 13.2753 2.54% i686 186.7670 45.4388 75.67% aarch64 (Neoverse) 9.96089 9.39285 5.70% power10 4.9862 3.7819 24.15% Signed-off-by: Alexei Sibidanov <sibid@uvic.ca> Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr> Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: DJ Delorie <dj@redhat.com>	2024-12-18 17:24:43 -03:00
Adhemerval Zanella	673e6fe110	math: Use acoshf from CORE-MATH The CORE-MATH implementation is correctly rounded (for any rounding mode) and shows slight better performance to the generic acoshf. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1): Latency master patched improvement x86_64 61.2471 58.7742 4.04% x86_64-v2 62.6519 59.0523 5.75% x86_64-v3 58.7408 50.1393 14.64% aarch64 24.8580 21.3317 14.19% power10 17.0469 13.1345 22.95% reciprocal-throughput master patched improvement x86_64 16.1618 15.1864 6.04% x86_64-v2 15.7729 14.7563 6.45% x86_64-v3 14.1669 11.9568 15.60% aarch64 10.911 9.5486 12.49% power10 6.38196 5.06734 20.60% Signed-off-by: Alexei Sibidanov <sibid@uvic.ca> Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr> Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: DJ Delorie <dj@redhat.com>	2024-12-18 17:24:43 -03:00
Adhemerval Zanella	66fa7ad437	math: Use acosf from CORE-MATH The CORE-MATH implementation is correctly rounded (for any rounding mode) and shows slight better performance to the generic acosf. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1): Latency master patched improvement x86_64 52.5098 36.6312 30.24% x86_64v2 53.0217 37.3091 29.63% x86_64v3 42.8501 32.3977 24.39% i686 207.3960 109.4000 47.25% aarch64 21.3694 13.7871 35.48% power10 14.5542 7.2891 49.92% reciprocal-throughput master patched improvement x86_64 14.1487 15.9508 -12.74% x86_64v2 14.3293 16.1899 -12.98% x86_64v3 13.6563 12.6161 7.62% i686 158.4060 45.7354 71.13% aarch64 12.5515 9.19233 26.76% power10 5.7868 3.3487 42.13% Signed-off-by: Alexei Sibidanov <sibid@uvic.ca> Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr> Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: DJ Delorie <dj@redhat.com>	2024-12-18 17:24:43 -03:00
Michael Jeanson	eb8fa66d4e	nptl: Add <thread_pointer.h> for sparc This will be required by the rseq extensible ABI implementation on all Linux architectures exposing the '__rseq_size' and '__rseq_offset' symbols to set the initial value of the 'cpu_id' field which can be used by applications to test if rseq is available and registered. As long as the symbols are exposed it is valid for an application to perform this test even if rseq is not yet implemented in libc for this architecture. Compile tested with build-many-glibcs.py but I don't have access to any hardware to run the tests. Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-12-18 19:38:58 +00:00
Adhemerval Zanella	849c73fe2b	powerpc: Update libm-test-ulps Regen to add new functions acospi, asinpi, atan2pi, atanpi, and tanpi.	2024-12-18 15:43:09 -03:00
Adhemerval Zanella	2872876d43	arm: Update libm-test-ulps Regen to add new functions acospi, asinpi, atan2pi, atanpi, cospi, sinpi, and tanpi.	2024-12-18 14:20:41 -03:00
Adhemerval Zanella	5a4c99163c	i386: Update libm-test-ulps Regen to add new functions acospi, asinpi, atan2pi, atanpi, cospi, sinpi, and tanpi.	2024-12-18 14:20:41 -03:00
Joseph Myers	e0a0fd64b5	Update syscall lists for Linux 6.12 Linux 6.12 has no new syscalls. Update the version number in syscall-names.list to reflect that it is still current for 6.12. Tested with build-many-glibcs.py.	2024-12-18 15:12:36 +00:00
H.J. Lu	a194871b13	sys/platform/x86.h: Do not depend on _Bool definition in C++ mode Clang does not define _Bool for -std=c++98: /usr/include/bits/platform/features.h:31:19: error: unknown type name '_Bool' 31 \| static __inline__ _Bool \| ^ Change _Bool to bool to silence clang++ error. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Florian Weimer <fweimer@redhat.com>	2024-12-18 02:32:27 +08:00
H.J. Lu	54fe008ba6	ldbl-96: Set -1 to "int sign_exponent:16" ieee_long_double_shape_type has typedef union { long double value; struct { ... int sign_exponent:16; ... } parts; } ieee_long_double_shape_type; Clang issues an error: ../sysdeps/ieee754/ldbl-96/test-totalorderl-ldbl-96.c:49:2: error: implicit truncation from 'int' to bit-field changes value from 65535 to -1 [-Werror,-Wbitfield-constant-conversion] 49 \| SET_LDOUBLE_WORDS (ldnx, 0xffff, \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 50 \| tests[i] >> 32, tests[i] & 0xffffffffULL); \| Use -1, instead of 0xffff, to silence Clang. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org>	2024-12-18 01:54:26 +08:00
H.J. Lu	d4ee46b0cd	tst-clone3[-internal].c: Add _Atomic to silence Clang Add _Atomic to futex_wait argument and ctid in tst-clone3[-internal].c to silence Clang error: ../sysdeps/unix/sysv/linux/tst-clone3-internal.c:93:3: error: address argument to atomic operation must be a pointer to _Atomic type ('pid_t ' (aka 'int ') invalid) 93 \| wait_tid (&ctid, CTID_INIT_VAL); \| ^ ~~~~~ ../sysdeps/unix/sysv/linux/tst-clone3-internal.c:51:21: note: expanded from macro 'wait_tid' 51 \| while ((__tid = atomic_load_explicit (ctid_ptr, \ \| ^ ~~~~~~~~ /usr/bin/../lib/clang/19/include/stdatomic.h:145:30: note: expanded from macro 'atomic_load_explicit' 145 \| #define atomic_load_explicit __c11_atomic_load \| ^ Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org>	2024-12-18 01:54:26 +08:00
Florian Weimer	61c3450db9	x86: Avoid integer truncation with large cache sizes (bug 32470) Some hypervisors report 1 TiB L3 cache size. This results in some variables incorrectly getting zeroed, causing crashes in memcpy/memmove because invariants are violated.	2024-12-17 18:49:50 +01:00
H.J. Lu	0cc88d2327	Silence Clang #include_next error Use "#include <...>" to silence Clang #include_next error: In file included from ../sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c:19: ../sysdeps/x86_64/fpu/test-double-vlen4.h:19:2: error: #include_next in file found relative to primary source file or found by absolute path; will search from start of include path [-Werror,-Winclude-next-absolute-path] 19 \| #include_next <test-double-vlen4.h> \| ^ 1 error generated. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org>	2024-12-18 01:22:48 +08:00
H.J. Lu	215447f5cb	cet: Pass -mshstk to compiler for tst-cet-legacy-10a[-static].c Pass -mshstk to compiler to silence Clang: In file included from ../sysdeps/x86_64/tst-cet-legacy-10a.c:2: ../sysdeps/x86_64/tst-cet-legacy-10.c:29:7: error: always_inline function '_get_ssp' requires target feature 'shstk', but would be inlined into function 'do_test' that is compiled without support for 'shstk' 29 \| if (_get_ssp () != 0) \| ^ Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org>	2024-12-18 01:20:16 +08:00
Joana Cruz	cff9648d0b	AArch64: Improve codegen of AdvSIMD expf family Load the polynomial evaluation coefficients into 2 vectors and use lanewise MLAs. Also use intrinsics instead of native operations. expf: 3% improvement in throughput microbenchmark on Neoverse V1, exp2f: 5%, exp10f: 13%, coshf: 14%. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2024-12-17 15:28:22 +00:00
Joana Cruz	6914774b9d	AArch64: Improve codegen of AdvSIMD atan(2)(f) Load the polynomial evaluation coefficients into 2 vectors and use lanewise MLAs. 8% improvement in throughput microbenchmark on Neoverse V1. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2024-12-17 15:28:22 +00:00
Joana Cruz	d6e034f5b2	AArch64: Improve codegen of AdvSIMD logf function family Load the polynomial evaluation coefficients into 2 vectors and use lanewise MLAs. 8% improvement in throughput microbenchmark on Neoverse V1 for log2 and log, and 2% for log10. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2024-12-17 15:25:58 +00:00
H.J. Lu	dd413a4d2f	Fix sysdeps/x86/fpu/Makefile: Split and sort tests Signed-off-by: H.J. Lu <hjl.tools@gmail.com>	2024-12-16 05:57:28 +08:00
H.J. Lu	57a44f27c4	sysdeps/x86/fpu/Makefile: Split and sort tests Split and sort tests in sysdeps/x86/fpu/Makefile. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>	2024-12-16 05:51:02 +08:00
H.J. Lu	07e3eb1774	Use empty initializer to silence GCC 4.9 or older Use empty initializer to silence GCC 4.9 or older: getaddrinfo.c: In function ‘gaih_inet’: getaddrinfo.c:1135:24: error: missing braces around initializer [-Werror=missing-braces] / sizeof (struct gaih_typeproto)] = {0}; ^ Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org>	2024-12-16 04:06:30 +08:00
Florian Weimer	b933e5cef6	Linux: Check for 0 return value from vDSO getrandom probe As of Linux 6.13, there is no code in the vDSO that declines this initialization request with the special ~0UL state size. If the vDSO has the function, the call succeeds and returns 0. It's expected that the code would follow the “a negative value indicating an error” convention, as indicated in the __cvdso_getrandom_data function comment, so that INTERNAL_SYSCALL_ERROR_P on glibc's side would return true. This commit changes the commit to check for zero to indicate success instead, which covers potential future non-zero success return values and error returns. Fixes commit `4f5704ea34` ("powerpc: Use correct procedure call standard for getrandom vDSO call (bug 32440)").	2024-12-15 17:05:25 +01:00
John David Anglin	6f5e1e4e98	hppa: Update libm-test-ulps Signed-off-by: John David Anglin <dave.anglin@bell.net>	2024-12-15 09:24:53 -05:00
H.J. Lu	20f8c5df56	Revert "Add braces in initializers for GCC 4.9 or older" This reverts commit `8aa2a9e033`. as not all targets need braces.	2024-12-15 18:49:52 +08:00
Stafford Horne	afac8b1311	or1k: Update libm-test-ulps Regen to add new functions acospi, asinpi, atan2pi and atanpi.	2024-12-15 00:42:27 +00:00
gfleury	2716bd6b12	htl: move pthread_sigmask into libc. Message-ID: <20241212220612.782313-3-gfleury@disroot.org>	2024-12-14 23:13:14 +01:00
gfleury	79cb83c7f9	htl: move __pthread_sigstate into libc. Message-ID: <20241212220612.782313-2-gfleury@disroot.org>	2024-12-14 23:12:01 +01:00
gfleury	dca0807a4d	htl: move __pthread_sigstate_destroy into libc. Message-ID: <20241212220612.782313-1-gfleury@disroot.org>	2024-12-14 23:11:45 +01:00
H.J. Lu	335ba9b6c1	Return EXIT_UNSUPPORTED if __builtin_add_overflow unavailable Since GCC 4.9 doesn't have __builtin_add_overflow: In file included from tst-stringtable.c:180:0: stringtable.c: In function ‘stringtable_finalize’: stringtable.c:185:7: error: implicit declaration of function ‘__builtin_add_overflow’ [-Werror=implicit-function-declaration] else if (__builtin_add_overflow (previous->offset, ^ return EXIT_UNSUPPORTED for GCC 4.9 or older. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org>	2024-12-15 05:24:19 +08:00
H.J. Lu	8aa2a9e033	Add braces in initializers for GCC 4.9 or older Add braces to silence GCC 4.9 or older: getaddrinfo.c: In function ‘gaih_inet’: getaddrinfo.c:1135:24: error: missing braces around initializer [-Werror=missing-braces] / sizeof (struct gaih_typeproto)] = {0}; ^ Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org>	2024-12-14 19:26:45 +08:00
Wilco Dijkstra	ca7d48a80f	AArch64: Update libm-test-ulps Update ulps for acospi, asinpi, atanpi, atan2pi.	2024-12-13 17:14:58 +00:00
Stefan Liebler	97b74cbbb0	s390: Simplify elf_machine_{load_address, dynamic} [BZ #31799 ] If an executable is static PIE and has a non-zero load address (compare to elf/tst-pie-address-static), it segfaults as elf_machine_load_address() returns 0x0 and elf_machine_dynamic() returns the run-time instead of link-time address of _DYNAMIC. Now rely on __ehdr_start and _DYNAMIC as also done on other architectures. Checked back to old arch-levels that this approach works fine: - 31bit: -march=g5 - 64bit: -march=z900 Note, that there is no static-PIE support on 31bit, but this approach cleans it also up. Furthermore this cleanup in glibc does not change anything regarding the first GOT-element as the s390 ABI (https://github.com/IBM/s390x-abi) explicitely defines: The doubleword at _GLOBAL_OFFSET_TABLE_[0] is set by the linkage editor to hold the address of the dynamic structure, referenced with the symbol _DYNAMIC. This allows a program, such as the dynamic linker, to find its own dynamic structure without having yet processed its relocation entries. This is especially important for the dynamic linker, because it must initialize itself without relying on other programs to relocate its memory image.	2024-12-13 09:44:38 +01:00
Stafford Horne	e4e49583d9	or1k: Update libm-test-ulps Pick up new functions cospi, "Imaginary part of csin", exp10m1, exp2m1, log10p1, log2p1, sinpi and tanpi.	2024-12-13 07:20:32 +00:00
Michael Jeanson	f2acd75b0e	nptl: Add <thread_pointer.h> for or1k This will be required by the rseq extensible ABI implementation on all Linux architectures exposing the '__rseq_size' and '__rseq_offset' symbols to set the initial value of the 'cpu_id' field which can be used by applications to test if rseq is available and registered. As long as the symbols are exposed it is valid for an application to perform this test even if rseq is not yet implemented in libc for this architecture. Compile tested with build-many-glibcs.py but I don't have access to any hardware to run the tests. Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Stafford Horne <shorne@gmail.com>	2024-12-13 07:20:32 +00:00
Joseph Myers	3374de9038	Implement C23 atan2pi C23 adds various <math.h> function families originally defined in TS 18661-4. Add the atan2pi functions (atan2(y,x)/pi). Tested for x86_64 and x86, and with build-many-glibcs.py.	2024-12-12 20:57:44 +00:00
Joseph Myers	ffe79c446c	Implement C23 atanpi C23 adds various <math.h> function families originally defined in TS 18661-4. Add the atanpi functions (atan(x)/pi). Tested for x86_64 and x86, and with build-many-glibcs.py.	2024-12-11 21:51:49 +00:00
Peter Bergner	aec85b2557	powerpc64: Fix dl-trampoline.S big-endian / non-ROP build failure Fix a big-endian / non-ROP build failure caused by commit `4d9a4c02` when building dl-trampoline.S. Reported-by: Joseph Myers <josmyers@redhat.com>	2024-12-11 23:15:13 +03:00
Florian Weimer	4f5704ea34	powerpc: Use correct procedure call standard for getrandom vDSO call (bug 32440) A plain indirect function call does not work on POWER because success and failure are signaled through a flag register, and not via the usual Linux negative return value convention. This has potential security impact, in two ways: the return value could be out of bounds (EAGAIN is 11 on powerpc6le), and no random bytes have been written despite the non-error return value. Fixes commit `461cab1de7` ("linux: Add support for getrandom vDSO"). Reported-by: Ján Stanček <jstancek@redhat.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2024-12-11 17:49:04 +01:00
H.J. Lu	b79f257533	Add TEST_CC and TEST_CXX support Support testing glibc build with a different C compiler or a different C++ compiler with $ ../glibc-VERSION/configure TEST_CC="gcc-6.4.1" TEST_CXX="g++-6.4.1" 1. Add LIBC_TRY_CC_AND_TEST_CC_OPTION, LIBC_TRY_CC_AND_TEST_CC_COMMAND and LIBC_TRY_CC_AND_TEST_LINK to test both CC and TEST_CC. 2. Add check and xcheck targets to Makefile.in and override build compiler options with ones from TEST_CC and TEST_CXX. Tested on Fedora 41/x86-64: 1. Building with GCC 14.2.1 and testing with GCC 6.4.1 and GCC 11.2.1. 2. Building with GCC 15 and testing with GCC 6.4.1. Support for GCC versions older than GCC 6.2 may need to change the test sources. Other targets may need to update configure.ac under sysdeps and modify Makefile.in to override target build compiler options. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org>	2024-12-11 18:31:00 +08:00
Peter Bergner	4d9a4c02f9	powerpc64le: ROP changes for the dl-trampoline functions Add ROP protection for the _dl_runtime_resolve and _dl_profile_resolve functions.	2024-12-10 23:25:56 -05:00
Joseph Myers	f962932206	Implement C23 asinpi C23 adds various <math.h> function families originally defined in TS 18661-4. Add the asinpi functions (asin(x)/pi). Tested for x86_64 and x86, and with build-many-glibcs.py.	2024-12-10 20:42:20 +00:00
Joseph Myers	28d102d15c	Implement C23 acospi C23 adds various <math.h> function families originally defined in TS 18661-4. Add the acospi functions (acos(x)/pi). Tested for x86_64 and x86, and with build-many-glibcs.py.	2024-12-09 23:01:29 +00:00
Sachin Monga	be13e46764	powerpc64le: ROP changes for the *context and setjmp functions Add ROP protection for the getcontext, setcontext, makecontext, swapcontext and __sigsetjmp_symbol functions. Reviewed-by: Peter Bergner <bergner@linux.ibm.com>	2024-12-09 16:49:54 -05:00
Michael Jeanson	9e08698e4c	nptl: Add <thread_pointer.h> for m68k This will be required by the rseq extensible ABI implementation on all Linux architectures exposing the '__rseq_size' and '__rseq_offset' symbols to set the initial value of the 'cpu_id' field which can be used by applications to test if rseq is available and registered. As long as the symbols are exposed it is valid for an application to perform this test even if rseq is not yet implemented in libc for this architecture. Compile tested with build-many-glibcs.py but I don't have access to any hardware to run the tests. Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Reviewed-by: Arjun Shankar <arjun@redhat.com>	2024-12-09 20:24:26 +00:00
Michael Jeanson	8dd1588794	nptl: Add <thread_pointer.h> for RISC-V This will be required by the rseq extensible ABI implementation on all Linux architectures exposing the '__rseq_size' and '__rseq_offset' symbols to set the initial value of the 'cpu_id' field which can be used by applications to test if rseq is available and registered. As long as the symbols are exposed it is valid for an application to perform this test even if rseq is not yet implemented in libc for this architecture. Both code paths tested on a Visionfive 2 with Debian sid. Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com>	2024-12-09 13:26:55 -05:00
Michael Jeanson	d3b3a12258	nptl: add RSEQ_SIG for RISC-V Enable RSEQ for RISC-V, support was added in Linux 5.18. Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com>	2024-12-09 13:26:55 -05:00
Pierre Blanchard	13a7ef5999	AArch64: Improve codegen in users of ADVSIMD expm1 helper Add inline helper for expm1 and rearrange operations so MOV is not necessary in reduction or around the special-case handler. Reduce memory access by using more indexed MLAs in polynomial. Speedup on Neoverse V1 for expm1 (19%), sinh (8.5%), and tanh (7.5%).	2024-12-09 16:20:34 +00:00
Pierre Blanchard	ca0c0d0f26	AArch64: Improve codegen in users of ADVSIMD log1p helper Add inline helper for log1p and rearrange operations so MOV is not necessary in reduction or around the special-case handler. Reduce memory access by using more indexed MLAs in polynomial. Speedup on Neoverse V1 for log1p (3.5%), acosh (7.5%) and atanh (10%).	2024-12-09 16:20:34 +00:00
Pierre Blanchard	8eb5ad2ebc	AArch64: Improve codegen in AdvSIMD logs Remove spurious ADRP and a few MOVs. Reduce memory access by using more indexed MLAs in polynomial. Align notation so that algorithms are easier to compare. Speedup on Neoverse V1 for log10 (8%), log (8.5%), and log2 (10%). Update error threshold in AdvSIMD log (now matches SVE log).	2024-12-09 16:20:34 +00:00
Pierre Blanchard	569cfaaf49	AArch64: Improve codegen in AdvSIMD pow Remove spurious ADRP. Improve memory access by shuffling constants and using more indexed MLAs. A few more optimisation with no impact on accuracy - force fmas contraction - switch from shift-aided rint to rint instruction Between 1 and 5% throughput improvement on Neoverse V1 depending on benchmark.	2024-12-09 16:20:34 +00:00

1 2 3 4 5 ...

16539 Commits