clang generates internal calls for some _chk symbol, so add internal
aliases for them, and stub some with rtld-stubbed-symbols to avoid
ld.so linker issues.
The clang default to warning for missing fall-through and it does
not support all comment-like annotation that gcc does. Use a
proper attribute instead.
Clang issues:
../sysdeps/ieee754/dbl-64/s_llround.c:83:30: error: incompatible
redeclaration of library function 'lround'
[-Werror,-Wincompatible-library-redeclaration]
libm_alias_double (__lround, lround)
^
../sysdeps/ieee754/dbl-64/s_llround.c:83:30: note: 'lround' is a builtin
with type 'long (double)'
Reject invalid formatted scanf real input data the exponent part of
which is comprised of an exponent introducing character, optionally
followed by a sign, and with no actual digits following. Such data is a
prefix of, but not a matching input sequence and it is required by ISO C
to cause a matching failure.
Currently a matching success is instead incorrectly produced along with
the conversion result according to the input significand read and the
exponent of zero, with the significand and the exponent part wholly
consumed from input.
Correct an invalid `tstscanf.c' test accordingly that expects a matching
success for input data provided in the ISO C standard as an example for
a matching failure.
Enable input data that causes test failures without this fix in place.
Reviewed-by: Joseph Myers <josmyers@redhat.com>
Reject invalid formatted scanf real input data that is comprised of a
hexadecimal prefix, optionally preceded by a sign, and with no actual
digits following owing to the field width restriction in effect. Such
data is a prefix of, but not a matching input sequence and it is
required by ISO C to cause a matching failure.
Currently a matching success is instead incorrectly produced along with
the conversion result of zero, with the prefix wholly consumed from
input. Where the end of input is marked by the end-of-file condition
rather than the field width restriction in effect a matching failure is
already correctly produced.
Enable input data that causes test failures without this fix in place.
Reviewed-by: Joseph Myers <josmyers@redhat.com>
Add Makefile infrastructure, a format-specific test skeleton providing a
data comparison implementation that ignores bits of data representation
in memory that do not participate in holding floating-point data, and
`long double' real input data for targets using the Intel/Motorola
80-bit format.
Keep input data disabled and referring to BZ #12701 for entries that are
are currently incorrectly accepted as valid data, such as '0e', '0e+',
'0x', '0x8p', '0x0p-', etc.
Reviewed-by: Joseph Myers <josmyers@redhat.com>
C23 adds various <math.h> function families originally defined in TS
18661-4. Add the pown functions, which are like pow but with an
integer exponent. That exponent has type long long int in C23; it was
intmax_t in TS 18661-4, and as with other interfaces changed after
their initial appearance in the TS, I don't think we need to support
the original version of the interface. The test inputs are based on
the subset of test inputs for pow that use integer exponents that fit
in long long.
As the first such template implementation that saves and restores the
rounding mode internally (to avoid possible issues with directed
rounding and intermediate overflows or underflows in the wrong
rounding mode), support also needed to be added for using
SET_RESTORE_ROUND* in such template function implementations. This
required math-type-macros-float128.h to include <fenv_private.h>, so
it can tell whether SET_RESTORE_ROUNDF128 is defined. In turn, the
include order with <fenv_private.h> included before <math_private.h>
broke loongarch builds, showing up that
sysdeps/loongarch/math_private.h is really a fenv_private.h file
(maybe implemented internally before the consistent split of those
headers in 2018?) and needed to be renamed to fenv_private.h to avoid
errors with duplicate macro definitions if <math_private.h> is
included after <fenv_private.h>.
The underlying implementation uses __ieee754_pow functions (called
more than once in some cases, where the exponent does not fit in the
floating type). I expect a custom implementation for a given format,
that only handles integer exponents but handles larger exponents
directly, could be faster and more accurate in some cases.
I encourage searching for worst cases for ulps error for these
implementations (necessarily non-exhaustively, given the size of the
input space).
Tested for x86_64 and x86, and with build-many-glibcs.py.
Add Makefile infrastructure and IBM 128-bit 'long double' real input for
targets switching between the IEEE 754 binary128 and IBM 128-bit formats
with '-mabi=ieeelongdouble' and '-mabi=ibmlongdouble'. Reuse IEEE 754
binary128 input data but with modified output file names so as not to
clash with the names used for IBM 128-bit format tests made with common
rules for the 'long double' data type.
Keep input data disabled and referring to BZ #12701 for entries that are
are currently incorrectly accepted as valid data, such as '0e', '0e+',
'0x', '0x8p', '0x0p-', etc.
Reviewed-by: Joseph Myers <josmyers@redhat.com>
Add Makefile infrastructure and 64-bit `long double' real input data for
targets switching between the IEEE 754 binary64 and IEEE 754 binary128
formats with `-mlong-double-64' and `-mlong-double-128'. Use modified
output file names for the IEEE 754 binary64 format so as not to clash
with the names used for IEEE 754 binary128 format tests made with common
rules for the 'long double' data type.
Keep input data disabled and referring to BZ #12701 for entries that are
are currently incorrectly accepted as valid data, such as '0e', '0e+',
'0x', '0x8p', '0x0p-', etc.
Reviewed-by: Joseph Myers <josmyers@redhat.com>
Add Makefile infrastructure and `long double' real input data for
targets using the IEEE 754 binary128 format.
Keep input data disabled and referring to BZ #12701 for entries that are
are currently incorrectly accepted as valid data, such as '0e', '0e+',
'0x', '0x8p', '0x0p-', etc.
Reviewed-by: Joseph Myers <josmyers@redhat.com>
Add Makefile infrastructure and `double' real input data for targets
using the IEEE 754 binary64 format.
Keep input data disabled and referring to BZ #12701 for entries that are
are currently incorrectly accepted as valid data, such as '0e', '0e+',
'0x', '0x8p', '0x0p-', etc.
Reviewed-by: Joseph Myers <josmyers@redhat.com>
Add Makefile infrastructure and `float' real input data for targets
using the IEEE 754 binary32 format.
Keep input data disabled and referring to BZ #12701 for entries that are
are currently incorrectly accepted as valid data, such as '0e', '0e+',
'0x', '0x8p', '0x0p-', etc.
Reviewed-by: Joseph Myers <josmyers@redhat.com>
C23 adds various <math.h> function families originally defined in TS
18661-4. Add the powr functions, which are like pow, but with simpler
handling of special cases (based on exp(y*log(x)), so negative x and
0^0 are domain errors, powers of -0 are always +0 or +Inf never -0 or
-Inf, and 1^+-Inf and Inf^0 are also domain errors, while NaN^0 and
1^NaN are NaN). The test inputs are taken from those for pow, with
appropriate adjustments (including removing all tests that would be
domain errors from those in auto-libm-test-in and adding some more
such tests in libm-test-powr.inc).
The underlying implementation uses __ieee754_pow functions after
dealing with all special cases that need to be handled differently.
It might be a little faster (avoiding a wrapper and redundant checks
for special cases) to have an underlying implementation built
separately for both pow and powr with compile-time conditionals for
special-case handling, but I expect the benefit of that would be
limited given that both functions will end up needing to use the same
logic for computing pow outside of special cases.
My understanding is that powr(negative, qNaN) should raise "invalid":
that the rule on "invalid" for an argument outside the domain of the
function takes precedence over a quiet NaN argument producing a quiet
NaN result with no exceptions raised (for rootn it's explicit that the
0th root of qNaN raises "invalid"). I've raised this on the WG14
reflector to confirm the intent.
Tested for x86_64 and x86, and with build-many-glibcs.py.
On SPR, it improves atanh bench performance by:
Before After Improvement
reciprocal-throughput 15.1715 14.8628 2%
latency 57.1941 56.1883 2%
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
On SPR, it improves sinh bench performance by:
Before After Improvement
reciprocal-throughput 14.2017 11.815 17%
latency 36.4917 35.2114 4%
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
On Skylake, it improves tanh bench performance by:
Before After Improvement
max 110.89 95.826 14%
min 20.966 20.157 4%
mean 30.9601 29.8431 4%
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
The current approach tracks math maximum supported errors by explicitly
setting them per function and architecture. On newer implementations or
new compiler versions, the file is updated with newer values if it
shows higher results. The idea is to track the maximum known error, to
update the manual with the obtained values.
The constant libm-test-ulps shows little value, where it is usually a
mechanical change done by the maintainer, for past releases it is
usually ignored whether the ulp change resulted from a compiler
regression, and the math tests already have a maximum ulp error that
triggers a regression.
It was shown by a recent update after the new acosf [1] implementation
that is correctly rounded, where the libm-test-ulps was indeed from a
compiler issue.
This patch removes all arch-specific libm-test-ulps, adds system generic
libm-test-ulps where applicable, and changes its semantics. The generic
files now track specific implementation constraints, like if it is
expected to be correctly rounded, or if the system-specific has
different error expectations.
Now multiple libm-test-ulps can be defined, and system-specific
overrides generic implementation. This is for the case where
arch-specific implementation might show worse precision than generic
implementation, for instance, the cbrtf on i686.
Regressions are only reported if the implementation shows larger errors
than 9 ulps (13 for IBM long double) unless it is overridden by
libm-test-ulps and the maximum error is not printed at the end of tests.
The regen-ulps rule is also removed since it does not make sense to
update the libm-test-ulps automatically.
The manual error table is also removed, Paul Zimmermann and others have
been tracking libm precision with a more comprehensive analysis for some
releases; so link to his work instead.
[1] https://sourceware.org/git/?p=glibc.git;a=commit;h=9cc9f8e11e8fb8f54f1e84d9f024917634a78201
C23 adds various <math.h> function families originally defined in TS
18661-4. Add the rsqrt functions (1/sqrt(x)). The test inputs are
taken from those for sqrt.
Tested for x86_64 and x86, and with build-many-glibcs.py.
Single-precision remainderf() and quad-precision remainderl()
implementation derived from Sun is affected by an issue when the result
is +-0. IEEE754 requires that if remainder(x, y) = 0, its sign shall be
that of x regardless of the rounding direction.
The implementation seems to have assumed that x - x = +0 in all
rounding modes, which is not the case. When rounding direction is
roundTowardNegative the sign of an exact zero sum (or difference) is −0.
Regression tests that triggered this erroneous behavior are added to
math/libm-test-remainder.inc.
Tested for cross riscv64 and powerpc.
Original fix by: Bruce Evans <bde@FreeBSD.org> in FreeBSD's
a2ddfa5ea726c56dbf825763ad371c261b89b7c7.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
A number of fma tests started to fail on hppa when gcc was changed to
use Ranger rather than EVRP. Eventually I found that the value of
a1 + u.d in this is block of code was being computed in FE_TOWARDZERO
mode and not the original rounding mode:
if (TININESS_AFTER_ROUNDING)
{
w.d = a1 + u.d;
if (w.ieee.exponent == 109)
return w.d * 0x1p-108;
}
This caused the exponent value to be wrong and the wrong return path
to be used.
Here we add an optimization barrier after the rounding mode is reset
to ensure that the previous value of a1 + u.d is not reused.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
GCC aligns global data to 16 bytes if their size is >= 16 bytes. This patch
changes the exp_data struct slightly so that the fields are better aligned
and without gaps. As a result on targets that support them, more load-pair
instructions are used in exp. Exp10 is improved by moving invlog10_2N later
so that neglog10_2hiN and neglog10_2loN can be loaded using load-pair.
The exp benchmark improves 2.5%, "144bits" by 7.2%, "768bits" by 12.7% on
Neoverse V2. Exp10 improves by 1.5%.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The libm size improvement built with "--enable-stack-protector=strong
--enable-bind-now=yes --enable-fortify-source=2":
Before:
text data bss dec hex filename
585192 860 12 586064 8f150 aarch64-linux-gnu/math/libm.so
960775 1068 12 961855 ead3f x86_64-linux-gnu/math/libm.so
1189174 5544 368 1195086 123c4e powerpc64le-linux-gnu/math/libm.so
After:
text data bss dec hex filename
584952 860 12 585824 8f060 aarch64-linux-gnu/math/libm.so
960615 1068 12 961695 eac9f x86_64-linux-gnu/math/libm.so
1189078 5544 368 1194990 123bee powerpc64le-linux-gnu/math/libm.so
The are small code changes for x86_64 and powerpc64le, which do not
affect performance; but on aarch64 with gcc-14 I see a slight better
code generation due the usage of ldq for floating point constant loading.
Reviewed-by: Andreas K. Huettel <dilfridge@gentoo.org>
The libm size improvement built with "--enable-stack-protector=strong
--enable-bind-now=yes --enable-fortify-source=2":
Before:
text data bss dec hex filename
587304 860 12 588176 8f990 aarch64-linux-gnu-master/math/libm.so
962855 1068 12 963935 eb55f x86_64-linux-gnu-master/math/libm.so
1191222 5544 368 1197134 12444e powerpc64le-linux-gnu-master/math/libm.so
After:
text data bss dec hex filename
585192 860 12 586064 8f150 aarch64-linux-gnu/math/libm.so
960775 1068 12 961855 ead3f x86_64-linux-gnu/math/libm.so
1189174 5544 368 1195086 123c4e powerpc64le-linux-gnu/math/libm.so
The are small code changes for x86_64 and powerpc64le, which do not
affect performance; but on aarch64 with gcc-14 I see a slight better
code generation due the usage of ldq for floating point constant loading.
Reviewed-by: Andreas K. Huettel <dilfridge@gentoo.org>
The CORE-MATH implementation is correctly rounded (for any rounding mode),
although it should worse performance than current one. The current
implementation performance comes mainly from the internal usage of
the optimize expf implementation, and shows a maximum ULPs of 2 for
FE_TONEAREST and 3 for other rounding modes.
The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).
Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):
Latency master patched improvement
x86_64 40.6995 49.0737 -20.58%
x86_64v2 40.5841 44.3604 -9.30%
x86_64v3 39.3879 39.7502 -0.92%
i686 112.3380 129.8570 -15.59%
aarch64 (Neoverse) 18.6914 17.0946 8.54%
power10 11.1343 9.3245 16.25%
reciprocal-throughput master patched improvement
x86_64 18.6471 24.1077 -29.28%
x86_64v2 17.7501 20.2946 -14.34%
x86_64v3 17.8262 17.1877 3.58%
i686 64.1454 86.5645 -34.95%
aarch64 (Neoverse) 9.77226 12.2314 -25.16%
power10 4.0200 5.3316 -32.63%
Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>