The CORE-MATH implementation is correctly rounded (for any rounding mode),
although it should worse performance than current one. The current
implementation performance comes mainly from the internal usage of
the optimize expf implementation, and shows a maximum ULPs of 2 for
FE_TONEAREST and 3 for other rounding modes.
The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).
Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):
Latency master patched improvement
x86_64 40.6995 49.0737 -20.58%
x86_64v2 40.5841 44.3604 -9.30%
x86_64v3 39.3879 39.7502 -0.92%
i686 112.3380 129.8570 -15.59%
aarch64 (Neoverse) 18.6914 17.0946 8.54%
power10 11.1343 9.3245 16.25%
reciprocal-throughput master patched improvement
x86_64 18.6471 24.1077 -29.28%
x86_64v2 17.7501 20.2946 -14.34%
x86_64v3 17.8262 17.1877 3.58%
i686 64.1454 86.5645 -34.95%
aarch64 (Neoverse) 9.77226 12.2314 -25.16%
power10 4.0200 5.3316 -32.63%
Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
The pi defined constants are not the expected value for atan2
on non-default rounding modes. Instead use the autogenerated value.
Reviewed-by: DJ Delorie <dj@redhat.com>
For some correctly rounded inputs where infinity might generate
a number (like atanf), comparing to a pre-defined constant does not
yield the expected result in all rounding modes.
The most straightforward way to handle it would be to get the expected
result from mpfr, where it handles all the rounding modes.
This will be required by the rseq extensible ABI implementation on all
Linux architectures exposing the '__rseq_size' and '__rseq_offset'
symbols to set the initial value of the 'cpu_id' field which can be used
by applications to test if rseq is available and registered. As long as
the symbols are exposed it is valid for an application to perform this
test even if rseq is not yet implemented in libc for this architecture.
Compile tested with build-many-glibcs.py but I don't have access to any
hardware to run the tests.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Linux 6.12 has no new syscalls. Update the version number in
syscall-names.list to reflect that it is still current for 6.12.
Tested with build-many-glibcs.py.
Hide memset/bzero from compiler to silence Clang error:
./tester.c:1345:29: error: 'size' argument to memset is '0'; did you mean to transpose the last two arguments? [-Werror,-Wmemset-transposed-args]
1345 | (void) memset(one+2, 'y', 0);
| ^
./tester.c:1345:29: note: parenthesize the third argument to silence
./tester.c:1432:16: error: 'size' argument to bzero is '0' [-Werror,-Wsuspicious-bzero]
1432 | bzero(one+2, 0);
| ^
./tester.c:1432:16: note: parenthesize the second argument to silence
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
Set have-test-clang to yes if clang is used to test glibc. Set
have-test-clangxx to yes if clang++ is used to test glibc.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
Although _chk functions are exported in libc.so.6, their prototypes aren't
provided. Their built versions are supported by compiler. Replace
__strcpy_chk with __builtin___strcpy_chk to silence Clang error:
./tst-gnuglob-skeleton.c:225:3: error: call to undeclared function '__strcpy_chk'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
225 | __strcpy_chk (dir->d.d_name, filesystem[dir->idx].name, NAME_MAX);
| ^
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
The C standard requires that ungetc guarantees at least one pushback,
but the malloc call to allocate the pushback buffer could fail, thus
violating that requirement. Fix this by adding a single byte pushback
buffer in the FILE struct that the pushback can fall back to if malloc
fails.
The side-effect is that if the initial malloc fails and the 1-byte
fallback buffer is used, future resizing (if it succeeds) will be
2-bytes, 4-bytes and so on, which is suboptimal but it's after a malloc
failure, so maybe even desirable.
A future optimization here could be to have the pushback code use the
single byte buffer first and only fall back to malloc for subsequent
calls.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Maciej W. Rozycki <macro@redhat.com>
Clang does not define _Bool for -std=c++98:
/usr/include/bits/platform/features.h:31:19: error: unknown type name '_Bool'
31 | static __inline__ _Bool
| ^
Change _Bool to bool to silence clang++ error.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
GCC 4.9 issues an error for copysign in initializer:
In file included from tst-printf-format-p-double.c:20:0:
tst-printf-format-skeleton-double.c:29:3: error: initializer element is not a constant expression [-Werror]
{ -HUGE_VAL, -DBL_MAX, -DBL_MIN, copysign (0, -1), -NAN, NAN, 0, DBL_MIN,
^
since it can't fold "copysign (0, -1)". Replace copysign (0,-1) with -0.0.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Add explicit instantiation declaration of S<char>::i to silence Clang
error:
tst-unique3.cc:6:18: error: instantiation of variable 'S<char>::i' required here, but no definition is available [-Werror,-Wundefined-var-template]
6 | int t = S<char>::i;
| ^
./tst-unique3.h:5:14: note: forward declaration of template entity is here
5 | static int i;
| ^
tst-unique3.cc:6:18: note: add an explicit instantiation declaration to suppress this warning if 'S<char>::i' is explicitly instantiated in another translation unit
6 | int t = S<char>::i;
| ^
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
ieee_long_double_shape_type has
typedef union
{
long double value;
struct
{
...
int sign_exponent:16;
...
} parts;
} ieee_long_double_shape_type;
Clang issues an error:
../sysdeps/ieee754/ldbl-96/test-totalorderl-ldbl-96.c:49:2: error: implicit truncation from 'int' to bit-field changes value from 65535 to -1 [-Werror,-Wbitfield-constant-conversion]
49 | SET_LDOUBLE_WORDS (ldnx, 0xffff,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
50 | tests[i] >> 32, tests[i] & 0xffffffffULL);
|
Use -1, instead of 0xffff, to silence Clang.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
Add _Atomic to futex_wait argument and ctid in tst-clone3[-internal].c to
silence Clang error:
../sysdeps/unix/sysv/linux/tst-clone3-internal.c:93:3: error: address argument to atomic operation must be a pointer to _Atomic type ('pid_t *' (aka 'int *') invalid)
93 | wait_tid (&ctid, CTID_INIT_VAL);
| ^ ~~~~~
../sysdeps/unix/sysv/linux/tst-clone3-internal.c:51:21: note: expanded from macro 'wait_tid'
51 | while ((__tid = atomic_load_explicit (ctid_ptr, \
| ^ ~~~~~~~~
/usr/bin/../lib/clang/19/include/stdatomic.h:145:30: note: expanded from macro 'atomic_load_explicit'
145 | #define atomic_load_explicit __c11_atomic_load
| ^
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
Mark _exit_with_flush as noreturn to silence the Clang error on
tst-atexit-common.c:
In file included from tst-atexit.c:22:
../stdlib/tst-atexit-common.c:113:5: error: unannotated fall-through between switch labels [-Werror,-Wimplicit-fallthrough]
113 | case 0: /* Child. */
| ^
../stdlib/tst-atexit-common.c:113:5: note: insert 'break;' to avoid fall-through
113 | case 0: /* Child. */
| ^
| break;
1 error generated.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
Some hypervisors report 1 TiB L3 cache size. This results
in some variables incorrectly getting zeroed, causing crashes
in memcpy/memmove because invariants are violated.
Explicitly cast 192 and 168 to char to silence Clang error:
tst-resolv-invalid-cname.c:313:17: error: implicit conversion from 'int' to 'char' changes value from 192 to -64 [-Werror,-Wconstant-conversion]
313 | addr[0] = 192;
| ~ ^~~
tst-resolv-invalid-cname.c:314:17: error: implicit conversion from 'int' to 'char' changes value from 168 to -88 [-Werror,-Wconstant-conversion]
314 | addr[1] = 168;
| ~ ^~~
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
Use "#include <...>" to silence Clang #include_next error:
In file included from ../sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c:19:
../sysdeps/x86_64/fpu/test-double-vlen4.h:19:2: error: #include_next in file found relative to primary source file or found by absolute path; will search from start of include path [-Werror,-Winclude-next-absolute-path]
19 | #include_next <test-double-vlen4.h>
| ^
1 error generated.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
Pass -mshstk to compiler to silence Clang:
In file included from ../sysdeps/x86_64/tst-cet-legacy-10a.c:2:
../sysdeps/x86_64/tst-cet-legacy-10.c:29:7: error: always_inline function '_get_ssp' requires target feature 'shstk', but would be inlined into function 'do_test' that is compiled without support for 'shstk'
29 | if (_get_ssp () != 0)
| ^
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
Load the polynomial evaluation coefficients into 2 vectors and use lanewise MLAs.
Also use intrinsics instead of native operations.
expf: 3% improvement in throughput microbenchmark on Neoverse V1, exp2f: 5%,
exp10f: 13%, coshf: 14%.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Load the polynomial evaluation coefficients into 2 vectors and use lanewise MLAs.
8% improvement in throughput microbenchmark on Neoverse V1.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Load the polynomial evaluation coefficients into 2 vectors and use lanewise MLAs.
8% improvement in throughput microbenchmark on Neoverse V1 for log2 and log,
and 2% for log10.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Since -1 isn't a power of two, compiler may reject it, hide memalign from
Clang 19 which issues an error:
tst-memalign.c:86:31: error: requested alignment is not a power of 2 [-Werror,-Wnon-power-of-two-alignment]
86 | p = memalign (-1, pagesize);
| ^~
tst-memalign.c:86:31: error: requested alignment must be 4294967296 bytes or smaller; maximum alignment assumed [-Werror,-Wbuiltin-assume-aligned-alignment]
86 | p = memalign (-1, pagesize);
| ^~
Update tst-malloc-aux.h to hide all malloc functions and include it in
all malloc tests to prevent compiler from optimizing out any malloc
functions.
Tested with Clang 19.1.5 and GCC 15 20241206 for BZ #32366.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
This was missed in a recent global change.
Fixes: 53fcdf5f74 (2024-11-25, "Silence most -Wzero-as-null-pointer-constant diagnostics")
Reported-by: "Maciej W. Rozycki" <macro@redhat.com>
Cc: Siddhesh Poyarekar <siddhesh@sourceware.org>
Cc: Bruno Haible <bruno@clisp.org>
Cc: Martin Uecker <uecker@tugraz.at>
Cc: Xi Ruoyao <xry111@xry111.site>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Joseph Myers <josmyers@redhat.com>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
Reviewed-by: Maciej W. Rozycki <macro@redhat.com>