C23 adds various <math.h> function families originally defined in TS
18661-4. Add the pown functions, which are like pow but with an
integer exponent. That exponent has type long long int in C23; it was
intmax_t in TS 18661-4, and as with other interfaces changed after
their initial appearance in the TS, I don't think we need to support
the original version of the interface. The test inputs are based on
the subset of test inputs for pow that use integer exponents that fit
in long long.
As the first such template implementation that saves and restores the
rounding mode internally (to avoid possible issues with directed
rounding and intermediate overflows or underflows in the wrong
rounding mode), support also needed to be added for using
SET_RESTORE_ROUND* in such template function implementations. This
required math-type-macros-float128.h to include <fenv_private.h>, so
it can tell whether SET_RESTORE_ROUNDF128 is defined. In turn, the
include order with <fenv_private.h> included before <math_private.h>
broke loongarch builds, showing up that
sysdeps/loongarch/math_private.h is really a fenv_private.h file
(maybe implemented internally before the consistent split of those
headers in 2018?) and needed to be renamed to fenv_private.h to avoid
errors with duplicate macro definitions if <math_private.h> is
included after <fenv_private.h>.
The underlying implementation uses __ieee754_pow functions (called
more than once in some cases, where the exponent does not fit in the
floating type). I expect a custom implementation for a given format,
that only handles integer exponents but handles larger exponents
directly, could be faster and more accurate in some cases.
I encourage searching for worst cases for ulps error for these
implementations (necessarily non-exhaustively, given the size of the
input space).
Tested for x86_64 and x86, and with build-many-glibcs.py.
C23 adds various <math.h> function families originally defined in TS
18661-4. Add the powr functions, which are like pow, but with simpler
handling of special cases (based on exp(y*log(x)), so negative x and
0^0 are domain errors, powers of -0 are always +0 or +Inf never -0 or
-Inf, and 1^+-Inf and Inf^0 are also domain errors, while NaN^0 and
1^NaN are NaN). The test inputs are taken from those for pow, with
appropriate adjustments (including removing all tests that would be
domain errors from those in auto-libm-test-in and adding some more
such tests in libm-test-powr.inc).
The underlying implementation uses __ieee754_pow functions after
dealing with all special cases that need to be handled differently.
It might be a little faster (avoiding a wrapper and redundant checks
for special cases) to have an underlying implementation built
separately for both pow and powr with compile-time conditionals for
special-case handling, but I expect the benefit of that would be
limited given that both functions will end up needing to use the same
logic for computing pow outside of special cases.
My understanding is that powr(negative, qNaN) should raise "invalid":
that the rule on "invalid" for an argument outside the domain of the
function takes precedence over a quiet NaN argument producing a quiet
NaN result with no exceptions raised (for rootn it's explicit that the
0th root of qNaN raises "invalid"). I've raised this on the WG14
reflector to confirm the intent.
Tested for x86_64 and x86, and with build-many-glibcs.py.
C23 adds various <math.h> function families originally defined in TS
18661-4. Add the rsqrt functions (1/sqrt(x)). The test inputs are
taken from those for sqrt.
Tested for x86_64 and x86, and with build-many-glibcs.py.
More routines are to follow, some of which hit many failures in the
current testsuite due to wrong sign of zero (mathvec routines are not
required to get this right). Instead of disabling a large number of
tests, change the failure condition such that, for vector routines,
tests pass as long as computed == expected == 0.0, regardless of sign.
Affected tests (vector tests for expm1, log1p, sin, tan and tanh) all
still pass.
These inputs were generated with the programs from
https://gitlab.inria.fr/zimmerma/math_accuracy,
with rounding to nearest:
* for univariate binary32 functions by exhaustive search
* for other functions with the "threshold" parameter up to 10^6
The pi defined constants are not the expected value for atan2
on non-default rounding modes. Instead use the autogenerated value.
Reviewed-by: DJ Delorie <dj@redhat.com>
C23 adds various <math.h> function families originally defined in TS
18661-4. Add the atan2pi functions (atan2(y,x)/pi).
Tested for x86_64 and x86, and with build-many-glibcs.py.
C23 adds various <math.h> function families originally defined in TS
18661-4. Add the atanpi functions (atan(x)/pi).
Tested for x86_64 and x86, and with build-many-glibcs.py.
C23 adds various <math.h> function families originally defined in TS
18661-4. Add the asinpi functions (asin(x)/pi).
Tested for x86_64 and x86, and with build-many-glibcs.py.
C23 adds various <math.h> function families originally defined in TS
18661-4. Add the acospi functions (acos(x)/pi).
Tested for x86_64 and x86, and with build-many-glibcs.py.
C23 adds various <math.h> function families originally defined in TS
18661-4. Add the tanpi functions (tan(pi*x)).
Tested for x86_64 and x86, and with build-many-glibcs.py.
C23 adds various <math.h> function families originally defined in TS
18661-4. Add the sinpi functions (sin(pi*x)).
Tested for x86_64 and x86, and with build-many-glibcs.py.
C23 adds various <math.h> function families originally defined in TS
18661-4. Add the cospi functions (cos(pi*x)).
Tested for x86_64 and x86, and with build-many-glibcs.py.
These functions are exp10m1, exp2m1, log10p1, log2p1.
Also regenerated ulps on x86_64.
For each format, there are 4 values, one for each rounding mode.
(For the intel96 format, there are 8 values, 4 for Intel hardware,
and 4 for AMD hardware. However, regen-ulps was only run on Intel.
It should be run in a separate patch on a AMD x86_64.)
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
C23 adds various <math.h> function families originally defined in TS
18661-4. Add the exp2m1 and exp10m1 functions (exp2(x)-1 and
exp10(x)-1, like expm1).
As with other such functions, these use type-generic templates that
could be replaced with faster and more accurate type-specific
implementations in future. Test inputs are copied from those for
expm1, plus some additions close to the overflow threshold (copied
from exp2 and exp10) and also some near the underflow threshold.
exp2m1 has the unusual property of having an input (M_MAX_EXP) where
whether the function overflows (under IEEE semantics) depends on the
rounding mode. Although these could reasonably be XFAILed in the
testsuite (as we do in some cases for arguments very close to a
function's overflow threshold when an error of a few ulps in the
implementation can result in the implementation not agreeing with an
ideal one on whether overflow takes place - the testsuite isn't smart
enough to handle this automatically), since these functions aren't
required to be correctly rounding, I made the implementation check for
and handle this case specially.
The Makefile ordering expected by lint-makefiles for the new functions
is a bit peculiar, but I implemented it in this patch so that the test
passes; I don't know why log2 also needed moving in one Makefile
variable setting when it didn't in my previous patches, but the
failure showed a different place was expected for that function as
well.
The powerpc64le IFUNC setup seems not to be as self-contained as one
might hope; it shouldn't be necessary to add IFUNCs for new functions
such as these simply to get them building, but without setting up
IFUNCs for the new functions, there were undefined references to
__GI___expm1f128 (that IFUNC machinery results in no such function
being defined, but doesn't stop include/math.h from doing the
redirection resulting in the exp2m1f128 and exp10m1f128
implementations expecting to call it).
Tested for x86_64 and x86, and with build-many-glibcs.py.
C23 adds various <math.h> function families originally defined in TS
18661-4. Add the log10p1 functions (log10(1+x): like log1p, but for
base-10 logarithms).
This is directly analogous to the log2p1 implementation (except that
whereas log2p1 has a smaller underflow range than log1p, log10p1 has a
larger underflow range). The test inputs are copied from those for
log1p and log2p1, plus a few more inputs in that wider underflow
range.
Tested for x86_64 and x86, and with build-many-glibcs.py.
C23 adds various <math.h> function families originally defined in TS
18661-4. Add the log2p1 functions (log2(1+x): like log1p, but for
base-2 logarithms).
This illustrates the intended structure of implementations of all
these function families: define them initially with a type-generic
template implementation. If someone wishes to add type-specific
implementations, it is likely such implementations can be both faster
and more accurate than the type-generic one and can then override it
for types for which they are implemented (adding benchmarks would be
desirable in such cases to demonstrate that a new implementation is
indeed faster).
The test inputs are copied from those for log1p. Note that these
changes make gen-auto-libm-tests depend on MPFR 4.2 (or later).
The bulk of the changes are fairly generic for any such new function.
(sysdeps/powerpc/nofpu/Makefile only needs changing for those
type-generic templates that use fabs.)
Tested for x86_64 and x86, and with build-many-glibcs.py.
Add support for a no-mathvec flag to gen-auto-libm-tests.c.
Update input test sin (-0.0) to be skipped in vector math libraries and
regenerate testcases.
Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>
This patch adds following input to atanh accuracy test.
0x1.f80094p-8
Tested on x86-64 and i686 platforms.
Other platforms may have to regenerate ulps file.
Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
This patch adds following inputs:
0x1.bcab29da0e947p-54 0x1.bc41f4d2294b8p-54
0x1.a11891ec004d4p-348 0x1.814830510be26p-348
0x1.b836ed678be29p-588 0x1.b7be6f5a03a8cp-588
0x1.a83f842ef3f73p-633 0x1.a799d8a6677ep-633
to atan2 tests and updates x86_64 double atan2 ulps.
This fixes BZ #28765.
Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>
I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 7061 files FOO.
I then removed trailing white space from math/tgmath.h,
support/tst-support-open-dev-null-range.c, and
sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following
obscure pre-commit check failure diagnostics from Savannah. I don't
know why I run into these diagnostics whereas others evidently do not.
remote: *** 912-#endif
remote: *** 913:
remote: *** 914-
remote: *** error: lines with trailing whitespace found
...
remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines
The largest errors over the full binary32 range are after this
patch (on x86_64):
RNDN: libm wrong by up to 9.00e+00 ulp(s) [9] for x=0x1.04c39cp+6
RNDZ: libm wrong by up to 9.00e+00 ulp(s) [9] for x=0x1.04c39cp+6
RNDU: libm wrong by up to 9.00e+00 ulp(s) [9] for x=0x1.04c39cp+6
RNDD: libm wrong by up to 8.98e+00 ulp(s) [9] for x=0x1.4b7066p+7
Inputs that were yielding huge errors have been added to "make check".
Reviewed-by: Adhemeral Zanella <adhemerval.zanella@linaro.org>
This patch adds the narrowing fused multiply-add functions from TS
18661-1 / TS 18661-3 / C2X to glibc's libm: ffma, ffmal, dfmal,
f32fmaf64, f32fmaf32x, f32xfmaf64 for all configurations; f32fmaf64x,
f32fmaf128, f64fmaf64x, f64fmaf128, f32xfmaf64x, f32xfmaf128,
f64xfmaf128 for configurations with _Float64x and _Float128;
__f32fmaieee128 and __f64fmaieee128 aliases in the powerpc64le case
(for calls to ffmal and dfmal when long double is IEEE binary128).
Corresponding tgmath.h macro support is also added.
The changes are mostly similar to those for the other narrowing
functions previously added, especially that for sqrt, so the
description of those generally applies to this patch as well. As with
sqrt, I reused the same test inputs in auto-libm-test-in as for
non-narrowing fma rather than adding extra or separate inputs for
narrowing fma. The tests in libm-test-narrow-fma.inc also follow
those for non-narrowing fma.
The non-narrowing fma has a known bug (bug 6801) that it does not set
errno on errors (overflow, underflow, Inf * 0, Inf - Inf). Rather
than fixing this or having narrowing fma check for errors when
non-narrowing does not (complicating the cases when narrowing fma can
otherwise be an alias for a non-narrowing function), this patch does
not attempt to check for errors from narrowing fma and set errno; the
CHECK_NARROW_FMA macro is still present, but as a placeholder that
does nothing, and this missing errno setting is considered to be
covered by the existing bug rather than needing a separate open bug.
missing-errno annotations are duly added to many of the
auto-libm-test-in test inputs for fma.
This completes adding all the new functions from TS 18661-1 to glibc,
so will be followed by corresponding stdc-predef.h changes to define
__STDC_IEC_60559_BFP__ and __STDC_IEC_60559_COMPLEX__, as the support
for TS 18661-1 will be at a similar level to that for C standard
floating-point facilities up to C11 (pragmas not implemented, but
library functions done). (There are still further changes to be done
to implement changes to the types of fromfp functions from N2548.)
Tested as followed: natively with the full glibc testsuite for x86_64
(GCC 11, 7, 6) and x86 (GCC 11); with build-many-glibcs.py with GCC
11, 7 and 6; cross testing of math/ tests for powerpc64le, powerpc32
hard float, mips64 (all three ABIs, both hard and soft float). The
different GCC versions are to cover the different cases in tgmath.h
and tgmath.h tests properly (GCC 6 has _Float* only as typedefs in
glibc headers, GCC 7 has proper _Float* support, GCC 8 adds
__builtin_tgmath).
As described in bug 28358, the round-to-odd computations used in the
libm functions that round their results to a narrower format can yield
spurious underflow exceptions in the following circumstances: the
narrowing only narrows the precision of the type and not the exponent
range (i.e., it's narrowing _Float128 to _Float64x on x86_64, x86 or
ia64), the architecture does after-rounding tininess detection (which
applies to all those architectures), the result is inexact, tiny
before rounding but not tiny after rounding (with the chosen rounding
mode) for _Float64x (which is possible for narrowing mul, div and fma,
not for narrowing add, sub or sqrt), so the underflow exception
resulting from the toward-zero computation in _Float128 is spurious
for _Float64x.
Fixed by making ROUND_TO_ODD call feclearexcept (FE_UNDERFLOW) in the
problem cases (as indicated by an extra argument to the macro); there
is never any need to preserve underflow exceptions from this part of
the computation, because the conversion of the round-to-odd value to
the narrower type will underflow in exactly the cases in which the
function should raise that exception, but it may be more efficient to
avoid the extra manipulation of the floating-point environment when
not needed.
Tested for x86_64 and x86, and with build-many-glibcs.py.
With this patch, the maximal known error for tgamma is now reduced to 9 ulps
for dbl-64, for all rounding modes. Since exhaustive testing is not possible
for dbl-64, it might be that there are still cases with an error larger than
9 ulps, but all known cases are fixed (intensive tests were done to find cases
with large errors).
Tested on x86_64 and powerpc (and by Adhemerval Zanella on aarch64, arm,
s390x, sparc, and i686).
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
For j0f/j1f/y0f/y1f, the largest error for all binary32
inputs is reduced to at most 9 ulps for all rounding modes.
The new code is enabled only when there is a cancellation at the very end of
the j0f/j1f/y0f/y1f computation, or for very large inputs, thus should not
give any visible slowdown on average. Two different algorithms are used:
* around the first 64 zeros of j0/j1/y0/y1, approximation polynomials of
degree 3 are used, computed using the Sollya tool (https://www.sollya.org/)
* for large inputs, an asymptotic formula from [1] is used
[1] Fast and Accurate Bessel Function Computation,
John Harrison, Proceedings of Arith 19, 2009.
Inputs yielding the new largest errors are added to auto-libm-test-in,
and ulps are regenerated for various targets (thanks Adhemerval Zanella).
Tested on x86_64 with --disable-multi-arch and on powerpc64le-linux-gnu.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 6694 files FOO.
I then removed trailing white space from benchtests/bench-pthread-locks.c
and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this
diagnostic from Savannah:
remote: *** pre-commit check failed ...
remote: *** error: lines with trailing whitespace found
remote: error: hook declined to update refs/heads/master