glibc

Commit Graph

Author	SHA1	Message	Date
Adhemerval Zanella	6b7067460f	Handle clang -Wignored-attributes on weak aliases Clang issues a warning for double alias redirection, indicating that thei original symbol is used even if a weak definition attempts to override it. For instance, in the construction: int __internal_impl (...) {} weak_alias (__internal_impl, external_impl); #if SOMETHING weak_alias (external_impl, another_external_impl) #endif Clang warns that another_external_impl always resolves to __internal_impl, even if external_impl is a weak reference. Using the internal symbol for both aliases resolves this warning. This issue also occurs with certain libc_hidden_def usage: int __internal_impl (...) {} weak_alias (__internal_impl, __internal_alias) libc_hidden_weak (__internal_alias) In this case, using a strong_alias is sufficient to avoid the warning (since the alias is internal, there is no need to use a weak alias). However, for the constructions like: int __internal_impl (...) {} weak_alias (__internal_impl, __internal_alias) libc_hidden_def (__internal_alias) weak_alias (__internal_impl, external_alias) libc_hidden_def (external_alias) Clang warns that the internal external_alias will always resolve to __GI___internal_impl, even if a weak definition of __GI_internal_impl is overridden. For this case, a new macro named static_weak_alias is used to create a strong alias for SHARED, or a weak_alias otherwise. With these changes, there is no need to check and enable the -Wno-ignored-attributes suppression when using clang. Checked with a build on affected ABIs, and a full check on aarch64, armhf, i686, and x86_64. Reviewed-by: Sam James <sam@gentoo.org>	2025-12-09 08:58:10 -03:00
Florian Weimer	2677916d1c	build-many-glibcs.py: Include URL in download exception Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2025-12-09 01:23:05 +01:00
H.J. Lu	6afabde23e	x32: Implement prctl in assembly Since the variadic prctl function takes at most 5 integer arguments which are passed in the same integer registers on x32 as the function with 5 integer arguments, we can use assembly for prctl. Since upper 32-bits in the last 4 arguments of pcrtl must be cleared to match the x32 prctl syscall interface where the last 4 arguments are unsigned 64 bit longs, implement prctl in assembly to clear upper 32-bits in the last 4 arguments and add a test to verify it. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Florian Weimer <fweimer@redhat.com>	2025-12-09 06:41:55 +08:00
Florian Weimer	f56a71097f	build-many-glibcs.py: Switch Git URLs to https:// Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>	2025-12-08 11:52:27 +01:00
Collin Funk	866fa41ef8	libio: null terminate the buffer upon initial allocation in getdelim Commit `33eff78c8b` caused issues in nbdkit which had code similar to this to get the last line of the file: while (getline (&line, &len, fp) != -1) ; /* Process LINE. */ After that commit, line[0] would be equal to '\0' instead of containing the last line of the file like before that commit. A recent POSIX issue clarified that the behavior before and after that commit are allowed, since the contents of LINE are unspecified after -1 is returned [1]. However, some programs rely on the previous behavior. This patch null terminates the buffer upon getdelim/getline's initial allocation. This is compatible with previous glibc versions, while also protecting the caller from reading uninitialized memory if the file is empty, as long as getline/getdelim does the initial allocation. [1] https://www.austingroupbugs.net/bug_view_page.php?bug_id=1953 Suggested-by: Eric Blake <eblake@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2025-12-05 20:09:36 -08:00
James Chesterman	e2b00d59eb	aarch64: Implement AdvSIMD and SVE rsqrt(f) routines Vector variants of the new C23 rsqrt routines for both AdvSIMD and SVE, as well as in both single and double precision. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-12-05 15:05:54 -03:00
James Chesterman	09d85861f1	benchtests: Add benchtests for rsqrt Add benchtests for double precision vector rsqrt routine. They are identical to those found in log2. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-12-05 11:15:44 -03:00
James Chesterman	bd0a3526cc	benchtests: Add benchtests for rsqrtf Add benchtests for vector single precision rsqrtf. They are identical to those found in log2f. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-12-05 11:15:42 -03:00
Adhemerval Zanella	eb03df5404	i386: Fix fmod/fmodf/remainder/remainderf for gcc-12 The __builtin_fmod{f} and __builtin_remainder{f} were added on gcc 13, and the minimum supported gcc is 12. This patch adds a configure test to check whether the compiler enables inlining for fmod/remainder, and uses inline assembly if not. Checked on i686-linux-gnu wih gcc-12. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-12-04 13:12:50 -03:00
Wilco Dijkstra	83dd79dffb	nptl: Check alignment of pthread structs Report assertion failure if the alignment of external pthread structs is lower than the internal version. This triggers on type mismatches like in BZ #33632. Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>	2025-12-04 15:45:15 +00:00
James Chesterman	f9bb6bcff6	aarch64: Optimise AdvSIMD atanhf Optimise AdvSIMD atanhf by vectorising the special case. There are asymptotes at x = -1 and x = 1. So return inf for these. Values for which \|x\| > 1, return NaN. R.Throughput difference on V2 with GCC@15: 58-60% improvement in special cases. No regression in fast pass.	2025-12-04 10:54:49 -03:00
James Chesterman	0e734b2b0c	aarch64: Optimise AdvSIMD asinhf Optimise AdvSIMD asinhf by vectorising the special case. For values greater than 0x1p64, scale the input down first. This is because the output will overflow with inputs greater than or equal to this value as there is a squaring operation in the algorithm. To scale, do: 2asinh(sqrt[(x-1)/2]) Because: 2asinh(x) = +-acosh(2x^2 + 1) Apply opposite operations in opposite order for x, and you get: asinh(x) = 2acosh(sqrt[(x-1)/2]). Found that using asinh instead of acosh also very closely approximates asinh(x) for a high input x. R.Throughput difference on V2 with GCC@15: 25-58% improvement in special cases. 4% regression in fast pass.	2025-12-04 10:54:49 -03:00
James Chesterman	0e80864c07	aarch64: Optimise AdvSIMD acoshf Optimise AdvSIMD acoshf by vectorising the special case. For values greater than 0x1p64, scale the input down first. This is because the output will overflow with inputs greater than or equal to this value as there is a squaring operation in the algorithm. To scale, do: 2acosh(sqrt[(x+1)/2]) Because: acosh(x) = 1/2acosh(2x^2 - 1) for x>=1. Apply opposite operations in opposite order for x, and you get: acosh(x) = 2acosh(sqrt[(x+1)/2]). R.Throughput difference on V2 with GCC@15: 30-49% improvement in special cases. 2% regression in fast pass.	2025-12-04 10:54:49 -03:00
Yury Khrustalev	6f869f54fb	aarch64: Add tests for glibc.cpu.aarch64_bti behaviour Check that the new tunable changes behaviour correctly: * When BTI is enforced, any unmarked binary that is loaded results in an error: either an abort or dlopen error when this binary is loaded via dlopen. * When BTI is not enforced, it is OK to load an unmarked binary. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-12-04 12:44:45 +00:00
Yury Khrustalev	dba95d2887	aarch64: Support enforcing BTI on dependencies Add glibc.cpu.aarch64_bti tunable with 2 values: - permissive (default) - enforced and use this tunable to enforce BTI marking on dependencies when the enforced option is selected. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Tested-by: Jeremy Linton <jeremy.linton@arm.com>	2025-12-04 12:44:42 +00:00
Yury Khrustalev	59bac0d5d2	aarch64: Add configure checks for BTI support We add configure checks for 3 things: - Compiler (both CC and TEST_CC) supports -mbranch-protection=bti. - Linker supports -z force-bti. - The toolchain supplies object files and target libraries with the BTI marking. All three must be true in order for the tests to be valid, so we check all flags and set the makefile variable accordingly. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-12-04 12:44:39 +00:00
Yury Khrustalev	ccb5083553	aarch64: fix makefile formatting	2025-12-04 12:40:47 +00:00
James Chesterman	e3c40c8db0	aarch64: Optimise AdvSIMD log10 Optimise AdvSIMD log10 by vectorising the special case. For subnormal input values, use the same scaling technique as described in the single precision equivalent. Then check for inf, nan and x<=0.	2025-12-04 08:35:25 -03:00
James Chesterman	59c706b418	aarch64: Optimise AdvSIMD log2 Optimise AdvSIMD log2 by vectorising the special case. For subnormal input values, use the same scaling technique as described in the single precision equivalent. Then check for inf, nan and x<=0.	2025-12-04 08:35:25 -03:00
James Chesterman	82d3a8a738	aarch64: Optimise AdvSIMD log Optimise AdvSIMD log by vectorising the special case. For subnormal input values, use the same scaling technique as described in the single precision equivalent. Then check for inf, nan and x<=0.	2025-12-04 08:35:25 -03:00
James Chesterman	015a13e780	aarch64: Optimise AdvSIMD log1p Optimise AdvSIMD log1p by vectorising the special case. The special cases are for when the input is: Less than or equal to -1 +/- INFINITY +/- NaN	2025-12-04 08:35:25 -03:00
James Chesterman	57215df30e	aarch64: Optimise AdvSIMD log10f Optimise AdvSIMD log10f by vectorising the special case. Use scaling technique on subnormal values, then check for inf and nan values. The scaling technique will sqrt the input then multiply the output by 2 because: log(sqrt(x)) = 1/2(log(x)), so log(x) = 2log(sqrt(x)) Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-12-04 08:31:19 -03:00
James Chesterman	fe83660a7e	aarch64: Optimise AdvSIMD log2f Optimise AdvSIMD log2f by vectorising the special case. Use scaling technique on subnormal values, then check for inf and nan values. The scaling technique used will sqrt the input then multiply the output by 2 because: log(sqrt(x)) = 1/2 log(x), so log(x) = 2log(sqrt(x)) Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-12-04 08:31:15 -03:00
James Chesterman	ab8138303c	aarch64: Optimise AdvSIMD logf Optimise AdvSIMD logf by vectorising the special case. Use scaling technique on subnormal values, then check for inf and nan values. The scaling technique used will sqrt the input then multiply the output by 2 because: log(sqrt(x)) = 1/2 log(x), so log(x) = 2log(sqrt(x)) Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-12-04 08:31:08 -03:00
James Chesterman	f42c135157	aarch64: Optimise AdvSIMD log1pf Optimise AdvSIMD log1pf by vectorising the special case and by reducing the range of values passed to the special case. Previously, high values such as 0x1.1p127 where treated as special cases, but now the special cases are for when the input is: Less than or equal to -1 +/- INFINITY +/- NaN Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-12-04 08:31:02 -03:00
H.J. Lu	762bb01d4e	int128: Check BITS_PER_MP_LIMB == 32 instead of __WORDSIZE == 32 commit `8cd6efca5b` Author: Adhemerval Zanella <adhemerval.zanella@linaro.org> Date: Thu Nov 20 15:30:06 2025 -0300 Add add_ssaaaa and sub_ssaaaa to gmp-arch.h checks __WORDSIZE == 32 to decide if int128 should be used, which breaks x32 which has int128 and __WORDSIZE == 32. Check BITS_PER_MP_LIMB == 32, instead of __WORDSIZE == 32. This fixes BZ #33677. Tested on x32, x86-64 and i686. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-12-04 07:46:20 +08:00
Adhemerval Zanella	f28a11e43f	time: Add TIME_MONOTONIC, TIME_ACTIVE, and TIME_THREAD_ACTIVE The TIME_MONOTONIC maps to POSIX's CLOCK_MONOTONIC, TIME_ACTIVE to CLOCK_PROCESS_CPUTIME_ID, and TIME_THREAD_ACTIVE to CLOCK_THREAD_CPUTIME_ID. No Linux specific timer are added as extension. Co-authored-by: Yonggang Luo <luoyonggang@gmail.com> Reviewed-by: Paul Eggert <eggert@cs.ucla.edu>	2025-12-03 11:03:58 -03:00
Joseph Myers	56d0e2cca1	Use Linux 6.18 in build-many-glibcs.py Tested with build-many-glibcs.py (host-libraries, compilers and glibcs builds).	2025-12-02 16:34:07 +00:00
Yury Khrustalev	11d3cfb570	misc: fix some typos Reviewed-by: Florian Weimer <fweimer@redhat.com>	2025-12-02 12:25:36 +00:00
H.J. Lu	3dd2cbfa35	Use 64-bit atomic on sem_t with 8-byte alignment [BZ #33632 ] commit `7fec8a5de6` Author: Adhemerval Zanella <adhemerval.zanella@linaro.org> Date: Thu Nov 13 14:26:08 2025 -0300 Revert __HAVE_64B_ATOMICS configure check uses 64-bit atomic operations on sem_t if 64-bit atomics are supported. But sem_t may be aligned to 32-bit on 32-bit architectures. 1. Add a macro, SEM_T_ALIGN, for sem_t alignment. 2. Add a macro, HAVE_UNALIGNED_64B_ATOMICS. Define it if unaligned 64-bit atomic operations are supported. 3. Add a macro, USE_64B_ATOMICS_ON_SEM_T. Define to 1 if 64-bit atomic operations are supported and SEM_T_ALIGN is at least 8-byte aligned or HAVE_UNALIGNED_64B_ATOMICS is defined. 4. Assert that size and alignment of sem_t are not lower than those of the internal struct new_sem. 5. Check USE_64B_ATOMICS_ON_SEM_T, instead of USE_64B_ATOMICS, when using 64-bit atomic operations on sem_t. This fixes BZ #33632. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-12-02 06:50:49 +08:00
Yury Khrustalev	d605dea0a4	scripts: Support custom Git URLs in build-many-glibcs.py Use environment variables to provide mirror URLs to checkout sources from Git. Each component has a corresponding env var that will be used if it's present: <component>_GIT_MIRROR. Note that '<component>' should be upper case, e.g. GLIBC. Co-authored-by: Carlos O'Donell <carlos@redhat.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2025-12-01 16:22:02 +00:00
Yury Khrustalev	af5ce3ec8f	scripts: Support custom FTP mirror URL in build-many-glibcs.py Allow to use custom mirror URLs to download tarballs from a mirror of ftp.gnu.org using the FTP_GNU_ORG_MIRROR env variable (default value is 'https://ftp.gnu.org'). Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2025-12-01 16:21:54 +00:00
Kacper Piwiński	82f4758410	strops: use strlen instead of strchr for string length For wide string the equivalent funtion __wcslen is used. This change makes it more symetrical. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2025-12-01 16:42:54 +01:00
Yury Khrustalev	20092f2ef6	nptl: tests: Fix test-wrapper use in tst-dl-debug-tid.sh Test wrapper script was used twice: once to run the test command and second time within the text command which seems unnecessary and results in false errors when running this test. Fixes `332f8e62af` Reviewed-by: Frédéric Bérat <fberat@redhat.com>	2025-12-01 14:39:39 +00:00
Osama Abdelkader	57ce2d8243	Fix allocation_index increment in malloc_internal The allocation_index was being incremented before checking if mmap() succeeds. If mmap() fails, allocation_index would still be incremented, creating a gap in the allocations tracking array and making allocation_index inconsistent with the actual number of successful allocations. This fix moves the allocation_index increment to after the mmap() success check, ensuring it only increments when an allocation actually succeeds. This maintains proper tracking for leak detection and prevents gaps in the allocations array. Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com> Reviewed-by: Florian Weimer <fweimer@redhat.com>	2025-12-01 13:35:36 +01:00
Adhemerval Zanella	f9e61cd446	NEWS: Add new generic fma/fmaf note Reviewed-by: Collin Funk <collin.funk1@gmail.com>	2025-11-28 09:29:35 -03:00
Florian Weimer	e98bd0c54d	iconvdata: Fix invalid pointer arithmetic in ANSI_X3.110 module The expression inptr + 1 can technically be invalid: if inptr == inend, inptr may point one element past the end of an array. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-11-28 13:20:39 +01:00
Joseph Myers	e535fb910c	Define C23 header version macros C23 defines library macros __STDC_VERSION_<header>_H__ to indicate that a header has support for new / changed features from C23. Now that all the required library features are implemented in glibc, define these macros. I'm not sure this is sufficiently much of a user-visible feature to be worth a mention in NEWS. Tested for x86_64. There are various optional C23 features we don't yet have, of which I might look at the Annex H ones (floating-point encoding conversion functions and _Float16 functions) next. * Optional time bases TIME_MONOTONIC, TIME_ACTIVE, TIME_THREAD_ACTIVE. See <https://sourceware.org/pipermail/libc-alpha/2023-June/149264.html> - we need to review / update that patch. (I think patch 2/2, inventing new names for all the nonstandard CLOCK_* supported by the Linux kernel, is rather more dubious.) * Updating conform/ tests for C23. * Defining the rounding mode macro FE_TONEARESTFROMZERO for RISC-V (as far as I know, the only architecture supported by glibc that has hardware support for this rounding mode for binary floating point) and supporting it throughout glibc and its tests (especially the string/numeric conversions in both directions that explicitly handle each possible rounding mode, and various tests that do likewise). * Annex H floating-point encoding conversion functions. (It's not entirely clear which are optional even given support for Annex H; there's some wording applied inconsistently about only being required when non-arithmetic interchange formats are supported; see the comments I raised on the WG14 reflector on 23 Oct 2025.) * _Float16 functions (and other header and testcase support for this type). * Decimal floating-point support. * Fully supporting __int128 and unsigned __int128 as integer types wider than intmax_t, as permitted by C23. Would need doing in coordination with GCC, see GCC bug 113887 for more discussion of what's involved.	2025-11-27 19:32:49 +00:00
Adhemerval Zanella	8a0152b61b	math: New generic fmaf implementation The current implementation relies on setting the rounding mode for different calculations (FE_TOWARDZERO) to obtain correctly rounded results. For most CPUs, this adds significant performance overhead because it requires executing a typically slow instruction (to get/set the floating-point status), necessitates flushing the pipeline, and breaks some compiler assumptions/optimizations. The original implementation adds tests to handle underflow in corner cases, whereas this implementation uses a different strategy that checks both the mantissa and the result to determine whether the result is not subject to double rounding. I tested this implementation on various targets (x86_64, i686, arm, aarch64, powerpc), including some by manually disabling the compiler instructions. Performance-wise, it shows large improvements: reciprocal-throughput master patched improvement x86_64 [1] 58.09 7.96 7.33x i686 [1] 279.41 16.97 16.46x aarch64 [2] 26.09 4.10 6.35x armhf [2] 30.25 4.20 7.18x powerpc [3] 9.46 1.46 6.45x latency master patched improvement x86_64 64.50 14.25 4.53x i686 304.39 61.04 4.99x aarch64 27.71 5.74 4.82x armhf 33.46 7.34 4.55x powerpc 10.96 2.65 4.13x Checked on x86_64-linux-gnu and i686-linux-gnu with —disable-multi-arch, and on arm-linux-gnueabihf. [1] gcc 15.2.1, Zen3 [2] gcc 15.2.1, Neoverse N1 [3] gcc 15.2.1, POWER10 Signed-off-by: Szabolcs Nagy <nsz@gcc.gnu.org> Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Co-authored-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-27 14:52:25 -03:00
Florian Weimer	15de570246	Linux: Ignore PIDFD_GET_INFO in tst-pidfd-consts The constant is expected to change between kernel releases. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2025-11-27 14:34:58 +01:00
Adhemerval Zanella	a61f7fd59d	math: Sync atanh from CORE-MATH The CORE-MATH commit dc9465e7 fixes some issues: Failure: Test: atanh_towardzero (0x8.3f79103b3c64p-4) Result: is: 5.7018661316561103e-01 0x1.23ef7ff0539c6p-1 should be: 5.7018661316561092e-01 0x1.23ef7ff0539c5p-1 difference: 1.1102230246251565e-16 0x1.0000000000000p-53 ulp : 1.0000 max.ulp : 0.0000 Failure: Test: atanh_towardzero (0x8.3f7d95aabaf7p-4) Result: is: 5.7019248543911060e-01 0x1.23f044fac5997p-1 should be: 5.7019248543911049e-01 0x1.23f044fac5996p-1 difference: 1.1102230246251565e-16 0x1.0000000000000p-53 ulp : 1.0000 max.ulp : 0.0000 Failure: Test: atanh_towardzero (0x8.3f805380d6728p-4) Result: is: 5.7019604623795527e-01 0x1.23f0bc75cd113p-1 should be: 5.7019604623795516e-01 0x1.23f0bc75cd112p-1 difference: 1.1102230246251565e-16 0x1.0000000000000p-53 ulp : 1.0000 max.ulp : 0.0000 Maximal error of `atanh_towardzero' is : 1 ulp accepted: 0 ulp Checked on x86_64-linux-gnu, x86_64-linux-gnu-v3, aarch64-linux-gnu, and i686-linux-gnu.	2025-11-26 14:10:07 -03:00
Yury Khrustalev	bc4bc1650b	aarch64: make GCS configure checks aarch64-only We only need to enable GCS tests on AArch64 targets, however previously the configure checks for GCS support in compiler and linker were added for all targets which was not efficient. To enable tests for GCS we need 4 things to be true: - Compiler supports GCS branch protection. - Test compiler supports GCS branch protection. - Linker supports GCS marking of binaries. - The CRT objects provided by the toolchain have GCS marking. To check for the latter, we add new macro to aclocal.m4 that allows to grep output from readelf. We check all four and then put the result in one make variable to simplify checks in makefiles. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-26 13:50:15 +00:00
Adhemerval Zanella	bf211c3499	math: New generic fma implementation The current implementation relies on setting the rounding mode for different calculations (first to FE_TONEAREST and then to FE_TOWARDZERO) to obtain correctly rounded results. For most CPUs, this adds a significant performance overhead since it requires executing a typically slow instruction (to get/set the floating-point status), it necessitates flushing the pipeline, and breaks some compiler assumptions/optimizations. This patch introduces a new implementation originally written by Szabolcs for musl, which utilizes mostly integer arithmetic. Floating-point arithmetic is used to raise the expected exceptions, without the need for fenv.h operations. I added some changes compared to the original code: * Fixed some signaling NaN issues when the 3-argument is NaN. * Use math_uint128.h for the 64-bit multiplication operation. It allows the compiler to use 128-bit types where available, which enables some optimizations on certain targets (for instance, MIPS64). * Fixed an arm32 issue where the libgcc routine might not respect the rounding mode [1]. This can also be used on other targets to optimize the conversion from int64_t to double. * Use -fexcess-precision=standard on i686. I tested this implementation on various targets (x86_64, i686, arm, aarch64, powerpc), including some by manually disabling the compiler instructions. Performance-wise, it shows large improvements: reciprocal-throughput master patched improvement x86_64 [2] 289.4640 22.4396 12.90x i686 [2] 636.8660 169.3640 3.76x aarch64 [3] 46.0020 11.3281 4.06x armhf [3] 63.989 26.5056 2.41x powerpc [4] 23.9332 6.40205 3.74x latency master patched improvement x86_64 293.7360 38.1478 7.70x i686 658.4160 187.9940 3.50x aarch64 44.5166 14.7157 3.03x armhf 63.7678 28.4116 2.24x power10 23.8561 11.4250 2.09x Checked on x86_64-linux-gnu and i686-linux-gnu with —disable-multi-arch, and on arm-linux-gnueabihf. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91970 [2] gcc 15.2.1, Zen3 [3] gcc 15.2.1, Neoverse N1 [4] gcc 15.2.1, POWER10 Signed-off-by: Szabolcs Nagy <nsz@gcc.gnu.org> Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-26 10:10:06 -03:00
Adhemerval Zanella	5dab2a3195	stdlib: Remove longlong.h The gmp-arch.h now provides all the required definitions. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-26 10:10:06 -03:00
Adhemerval Zanella	7a0471f149	Add umul_ppmm to gmp-arch.hdoc To enable “longlong.h” removal, the umul_ppmm is moved to a gmp-arch.h. The generic implementation now uses a static inline, which provides better type checking than the GNU extension to cast the asm constraint (and it works better with clang). Most of the architecture uses the generic implementation, which is expanded from a macro, except for alpha, arm, hppa, x86, m68k, mips, powerpc, and sparc. The 32 bit architectures the compiler generates good enough code using uint64_t types, where for 64 bit architecture the patch leverages the math_u128.h definitions that uses 128-bit integers when available (all 64 bit architectures on gcc 15). Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-26 10:10:06 -03:00
Adhemerval Zanella	8cd6efca5b	Add add_ssaaaa and sub_ssaaaa to gmp-arch.h To enable “longlong.h” removal, add_ssaaaa and sub_ssaaaa are moved to gmp-arch.h. The generic implementation now uses a static inline. This provides better type checking than the GNU extension, which casts the asm constraint; and it also works better with clang. Most architectures use the generic implementation, with except of arc, arm, hppa, x86, m68k, powerpc, and sparc. The 32 bit architectures the compiler generates good enough code using uint64_t types, where for 64 bit architecture the patch leverages the math_u128.h definitions that uses 128-bit integers when available (all 64 bit architectures on gcc 15). The strongly typed implementation required some changes. I adjusted _FP_W_TYPE, _FP_WS_TYPE, and _FP_I_TYPE to use the same type as mp_limb_t on aarch64, powerpc64le, x86_64, and riscv64. This basically means using “long” instead of “long long.” Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-26 10:10:02 -03:00
Adhemerval Zanella	476e962af7	Add gmp-arch and udiv_qrnnd To enable “longlong.h” removal, the udiv_qrnnd is moved to a gmp-arch.h file. It allows each architecture to implement its own arch-specific optimizations. The generic implementation now uses a static inline, which provides better type checking than the GNU extension to cast the asm constraint (and it works better with clang). Most of the architecture uses the generic implementation, which is expanded from a macro, except for alpha, x86, m68k, sh, and sparc. I kept that alpha, which uses out-of-the-line implementations and x86, where there is no easy way to use the div{q} instruction from C code. For the rest, the compiler generates good enough code. The hppa also provides arch-specific implementations, but they are not routed in “longlong.h” and thus never used. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-25 14:52:15 -03:00
Adhemerval Zanella	e45174fe8c	Add new math improvemenst to NEWS Reviewed-by: Carlos O'Donell <carlos@redhat.com> Reviewed-by: DJ Delorie <dj@redhat.com>	2025-11-25 14:51:56 -03:00
Yury Khrustalev	6a29bbcf5a	scripts: Fix minor lint warnings in build-many-glibcs.py Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2025-11-25 13:56:56 +00:00
Arjun Shankar	244c404ae8	malloc: Add threaded variants of single-threaded malloc tests Single-threaded malloc tests exercise only the SINGLE_THREAD_P paths in the malloc implementation. This commit runs variants of these tests in a multi-threaded environment in order to exercise the alternate code paths in the same test scenarios, thus potentially improving coverage. $(test)-threaded-main and $(test)-threaded-worker variants are introduced for most single-threaded malloc tests (with a small number of exceptions). The -main variants run the base test in a main thread while the test environment has an alternate thread running, whereas the -worker variants run the test in an alternate thread while the main thread waits on it. The tests themselves are unmodified, and the change is accomplished by using -DTEST_IN_THREAD at compile time, which instructs support/ infrastructure to run the test while an alternate thread waits on it. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2025-11-24 16:47:52 +01:00

1 2 3 4 5 ...

43215 Commits All Branches Search

43215 Commits

All Branches