Commit Graph

16539 Commits

Author SHA1 Message Date
Adhemerval Zanella ed608a40e2 math: Use asinhf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic asinhf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      64.5128        56.9717        11.69%
x86_64v2                    63.3065        57.2666         9.54%
x86_64v3                    62.8719        51.4170        18.22%
i686                       189.1630        137.635        27.24%
aarch64 (Neoverse)          25.3551        20.5757        18.85%
power10                     17.9712        13.3302        25.82%

reciprocal-throughput        master        patched   improvement
x86_64                      20.0844        15.4731        22.96%
x86_64v2                    19.2919        15.4000        20.17%
x86_64v3                    18.7226        11.9009        36.44%
i686                       103.7670        80.2681        22.65%
aarch64 (Neoverse)          12.5005        8.68969        30.49%
power10                      7.2220        5.03617        30.27%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>:
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella 5fb4b566ef math: Use asinf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic asinf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      42.8237        35.2460        17.70%
x86_64v2                    43.3711        35.9406        17.13%
x86_64v3                    35.0335        30.5744        12.73%
i686                       213.8780        104.4710       51.15%
aarch64 (Neoverse)          17.2937        13.6025        21.34%
power10                     12.0227        7.4241         38.25%

reciprocal-throughput        master        patched   improvement
x86_64                      13.6770        15.5231       -13.50%
x86_64v2                    13.8722        16.0446       -15.66%
x86_64v3                    13.6211        13.2753         2.54%
i686                       186.7670        45.4388        75.67%
aarch64 (Neoverse)          9.96089        9.39285         5.70%
power10                      4.9862        3.7819         24.15%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella 673e6fe110 math: Use acoshf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic acoshf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      61.2471        58.7742         4.04%
x86_64-v2                   62.6519        59.0523         5.75%
x86_64-v3                   58.7408        50.1393        14.64%
aarch64                     24.8580        21.3317        14.19%
power10                     17.0469        13.1345        22.95%

reciprocal-throughput        master        patched   improvement
x86_64                      16.1618        15.1864         6.04%
x86_64-v2                   15.7729        14.7563         6.45%
x86_64-v3                   14.1669        11.9568        15.60%
aarch64                      10.911        9.5486         12.49%
power10                     6.38196        5.06734        20.60%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella 66fa7ad437 math: Use acosf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic acosf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      52.5098        36.6312        30.24%
x86_64v2                    53.0217        37.3091        29.63%
x86_64v3                    42.8501        32.3977        24.39%
i686                       207.3960       109.4000        47.25%
aarch64                     21.3694        13.7871        35.48%
power10                     14.5542         7.2891        49.92%

reciprocal-throughput        master        patched   improvement
x86_64                      14.1487        15.9508       -12.74%
x86_64v2                    14.3293        16.1899       -12.98%
x86_64v3                    13.6563        12.6161         7.62%
i686                       158.4060        45.7354        71.13%
aarch64                     12.5515        9.19233        26.76%
power10                      5.7868         3.3487        42.13%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Michael Jeanson eb8fa66d4e nptl: Add <thread_pointer.h> for sparc
This will be required by the rseq extensible ABI implementation on all
Linux architectures exposing the '__rseq_size' and '__rseq_offset'
symbols to set the initial value of the 'cpu_id' field which can be used
by applications to test if rseq is available and registered. As long as
the symbols are exposed it is valid for an application to perform this
test even if rseq is not yet implemented in libc for this architecture.

Compile tested with build-many-glibcs.py but I don't have access to any
hardware to run the tests.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-12-18 19:38:58 +00:00
Adhemerval Zanella 849c73fe2b powerpc: Update libm-test-ulps
Regen to add new functions acospi, asinpi, atan2pi, atanpi, and
tanpi.
2024-12-18 15:43:09 -03:00
Adhemerval Zanella 2872876d43 arm: Update libm-test-ulps
Regen to add new functions acospi, asinpi, atan2pi, atanpi, cospi,
sinpi, and tanpi.
2024-12-18 14:20:41 -03:00
Adhemerval Zanella 5a4c99163c i386: Update libm-test-ulps
Regen to add new functions acospi, asinpi, atan2pi, atanpi, cospi,
sinpi, and tanpi.
2024-12-18 14:20:41 -03:00
Joseph Myers e0a0fd64b5 Update syscall lists for Linux 6.12
Linux 6.12 has no new syscalls.  Update the version number in
syscall-names.list to reflect that it is still current for 6.12.

Tested with build-many-glibcs.py.
2024-12-18 15:12:36 +00:00
H.J. Lu a194871b13 sys/platform/x86.h: Do not depend on _Bool definition in C++ mode
Clang does not define _Bool for -std=c++98:

/usr/include/bits/platform/features.h:31:19: error: unknown type name '_Bool'
   31 | static __inline__ _Bool
      |                   ^

Change _Bool to bool to silence clang++ error.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2024-12-18 02:32:27 +08:00
H.J. Lu 54fe008ba6 ldbl-96: Set -1 to "int sign_exponent:16"
ieee_long_double_shape_type has

typedef union
{
  long double value;
  struct
  {
    ...
    int sign_exponent:16;
    ...
  } parts;
} ieee_long_double_shape_type;

Clang issues an error:

../sysdeps/ieee754/ldbl-96/test-totalorderl-ldbl-96.c:49:2: error: implicit truncation from 'int' to bit-field changes value from 65535 to -1 [-Werror,-Wbitfield-constant-conversion]
   49 |         SET_LDOUBLE_WORDS (ldnx, 0xffff,
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   50 |                            tests[i] >> 32, tests[i] & 0xffffffffULL);
      |

Use -1, instead of 0xffff, to silence Clang.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
2024-12-18 01:54:26 +08:00
H.J. Lu d4ee46b0cd tst-clone3[-internal].c: Add _Atomic to silence Clang
Add _Atomic to futex_wait argument and ctid in tst-clone3[-internal].c to
silence Clang error:

../sysdeps/unix/sysv/linux/tst-clone3-internal.c:93:3: error: address argument to atomic operation must be a pointer to _Atomic type ('pid_t *' (aka 'int *') invalid)
   93 |   wait_tid (&ctid, CTID_INIT_VAL);
      |   ^         ~~~~~
../sysdeps/unix/sysv/linux/tst-clone3-internal.c:51:21: note: expanded from macro 'wait_tid'
   51 |     while ((__tid = atomic_load_explicit (ctid_ptr,                     \
      |                     ^                     ~~~~~~~~
/usr/bin/../lib/clang/19/include/stdatomic.h:145:30: note: expanded from macro 'atomic_load_explicit'
  145 | #define atomic_load_explicit __c11_atomic_load
      |                              ^

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
2024-12-18 01:54:26 +08:00
Florian Weimer 61c3450db9 x86: Avoid integer truncation with large cache sizes (bug 32470)
Some hypervisors report 1 TiB L3 cache size.  This results
in some variables incorrectly getting zeroed, causing crashes
in memcpy/memmove because invariants are violated.
2024-12-17 18:49:50 +01:00
H.J. Lu 0cc88d2327 Silence Clang #include_next error
Use "#include <...>" to silence Clang #include_next error:

In file included from ../sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c:19:
../sysdeps/x86_64/fpu/test-double-vlen4.h:19:2: error: #include_next in file found relative to primary source file or found by absolute path; will search from start of include path [-Werror,-Winclude-next-absolute-path]
   19 | #include_next <test-double-vlen4.h>
      |  ^
1 error generated.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
2024-12-18 01:22:48 +08:00
H.J. Lu 215447f5cb cet: Pass -mshstk to compiler for tst-cet-legacy-10a[-static].c
Pass -mshstk to compiler to silence Clang:

In file included from ../sysdeps/x86_64/tst-cet-legacy-10a.c:2:
../sysdeps/x86_64/tst-cet-legacy-10.c:29:7: error: always_inline function '_get_ssp' requires target feature 'shstk', but would be inlined into function 'do_test' that is compiled without support for 'shstk'
   29 |   if (_get_ssp () != 0)
      |       ^

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
2024-12-18 01:20:16 +08:00
Joana Cruz cff9648d0b AArch64: Improve codegen of AdvSIMD expf family
Load the polynomial evaluation coefficients into 2 vectors and use lanewise MLAs.
Also use intrinsics instead of native operations.
expf: 3% improvement in throughput microbenchmark on Neoverse V1, exp2f: 5%,
exp10f: 13%, coshf: 14%.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2024-12-17 15:28:22 +00:00
Joana Cruz 6914774b9d AArch64: Improve codegen of AdvSIMD atan(2)(f)
Load the polynomial evaluation coefficients into 2 vectors and use lanewise MLAs.
8% improvement in throughput microbenchmark on Neoverse V1.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2024-12-17 15:28:22 +00:00
Joana Cruz d6e034f5b2 AArch64: Improve codegen of AdvSIMD logf function family
Load the polynomial evaluation coefficients into 2 vectors and use lanewise MLAs.
8% improvement in throughput microbenchmark on Neoverse V1 for log2 and log,
and 2% for log10.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2024-12-17 15:25:58 +00:00
H.J. Lu dd413a4d2f Fix sysdeps/x86/fpu/Makefile: Split and sort tests
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2024-12-16 05:57:28 +08:00
H.J. Lu 57a44f27c4 sysdeps/x86/fpu/Makefile: Split and sort tests
Split and sort tests in sysdeps/x86/fpu/Makefile.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2024-12-16 05:51:02 +08:00
H.J. Lu 07e3eb1774 Use empty initializer to silence GCC 4.9 or older
Use empty initializer to silence GCC 4.9 or older:

getaddrinfo.c: In function ‘gaih_inet’:
getaddrinfo.c:1135:24: error: missing braces around initializer [-Werror=missing-braces]
       / sizeof (struct gaih_typeproto)] = {0};
                        ^

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
2024-12-16 04:06:30 +08:00
Florian Weimer b933e5cef6 Linux: Check for 0 return value from vDSO getrandom probe
As of Linux 6.13, there is no code in the vDSO that declines this
initialization request with the special ~0UL state size.  If the vDSO
has the function, the call succeeds and returns 0.  It's expected
that the code would follow the “a negative value indicating an error”
convention, as indicated in the __cvdso_getrandom_data function
comment, so that INTERNAL_SYSCALL_ERROR_P on glibc's side would return
true.  This commit changes the commit to check for zero to indicate
success instead, which covers potential future non-zero success
return values and error returns.

Fixes commit 4f5704ea34 ("powerpc: Use
correct procedure call standard for getrandom vDSO call (bug 32440)").
2024-12-15 17:05:25 +01:00
John David Anglin 6f5e1e4e98 hppa: Update libm-test-ulps
Signed-off-by: John David Anglin <dave.anglin@bell.net>
2024-12-15 09:24:53 -05:00
H.J. Lu 20f8c5df56 Revert "Add braces in initializers for GCC 4.9 or older"
This reverts commit 8aa2a9e033.

as not all targets need braces.
2024-12-15 18:49:52 +08:00
Stafford Horne afac8b1311 or1k: Update libm-test-ulps
Regen to add new functions acospi, asinpi, atan2pi and atanpi.
2024-12-15 00:42:27 +00:00
gfleury 2716bd6b12 htl: move pthread_sigmask into libc.
Message-ID: <20241212220612.782313-3-gfleury@disroot.org>
2024-12-14 23:13:14 +01:00
gfleury 79cb83c7f9 htl: move __pthread_sigstate into libc.
Message-ID: <20241212220612.782313-2-gfleury@disroot.org>
2024-12-14 23:12:01 +01:00
gfleury dca0807a4d htl: move __pthread_sigstate_destroy into libc.
Message-ID: <20241212220612.782313-1-gfleury@disroot.org>
2024-12-14 23:11:45 +01:00
H.J. Lu 335ba9b6c1 Return EXIT_UNSUPPORTED if __builtin_add_overflow unavailable
Since GCC 4.9 doesn't have __builtin_add_overflow:

In file included from tst-stringtable.c:180:0:
stringtable.c: In function ‘stringtable_finalize’:
stringtable.c:185:7: error: implicit declaration of function ‘__builtin_add_overflow’ [-Werror=implicit-function-declaration]
       else if (__builtin_add_overflow (previous->offset,
       ^

return EXIT_UNSUPPORTED for GCC 4.9 or older.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
2024-12-15 05:24:19 +08:00
H.J. Lu 8aa2a9e033 Add braces in initializers for GCC 4.9 or older
Add braces to silence GCC 4.9 or older:

getaddrinfo.c: In function ‘gaih_inet’:
getaddrinfo.c:1135:24: error: missing braces around initializer [-Werror=missing-braces]
       / sizeof (struct gaih_typeproto)] = {0};
                        ^

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
2024-12-14 19:26:45 +08:00
Wilco Dijkstra ca7d48a80f AArch64: Update libm-test-ulps
Update ulps for acospi, asinpi, atanpi, atan2pi.
2024-12-13 17:14:58 +00:00
Stefan Liebler 97b74cbbb0 s390: Simplify elf_machine_{load_address, dynamic} [BZ #31799]
If an executable is static PIE and has a non-zero load address
(compare to elf/tst-pie-address-static), it segfaults as
elf_machine_load_address() returns 0x0 and elf_machine_dynamic()
returns the run-time instead of link-time address of _DYNAMIC.

Now rely on __ehdr_start and _DYNAMIC as also done on other
architectures.

Checked back to old arch-levels that this approach works fine:
- 31bit: -march=g5
- 64bit: -march=z900

Note, that there is no static-PIE support on 31bit, but this
approach cleans it also up.

Furthermore this cleanup in glibc does not change anything
regarding the first GOT-element as the s390 ABI
(https://github.com/IBM/s390x-abi) explicitely defines:
The doubleword at _GLOBAL_OFFSET_TABLE_[0] is set by the linkage
editor to hold the address of the dynamic structure, referenced
with the symbol _DYNAMIC. This allows a program, such as the dynamic
linker, to find its own dynamic structure without having yet processed
its relocation entries. This is especially important for the dynamic
linker, because it must initialize itself without relying on other
programs to relocate its memory image.
2024-12-13 09:44:38 +01:00
Stafford Horne e4e49583d9 or1k: Update libm-test-ulps
Pick up new functions cospi, "Imaginary part of csin", exp10m1, exp2m1,
log10p1, log2p1, sinpi and tanpi.
2024-12-13 07:20:32 +00:00
Michael Jeanson f2acd75b0e nptl: Add <thread_pointer.h> for or1k
This will be required by the rseq extensible ABI implementation on all
Linux architectures exposing the '__rseq_size' and '__rseq_offset'
symbols to set the initial value of the 'cpu_id' field which can be used
by applications to test if rseq is available and registered. As long as
the symbols are exposed it is valid for an application to perform this
test even if rseq is not yet implemented in libc for this architecture.

Compile tested with build-many-glibcs.py but I don't have access to any
hardware to run the tests.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Stafford Horne <shorne@gmail.com>
2024-12-13 07:20:32 +00:00
Joseph Myers 3374de9038 Implement C23 atan2pi
C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the atan2pi functions (atan2(y,x)/pi).

Tested for x86_64 and x86, and with build-many-glibcs.py.
2024-12-12 20:57:44 +00:00
Joseph Myers ffe79c446c Implement C23 atanpi
C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the atanpi functions (atan(x)/pi).

Tested for x86_64 and x86, and with build-many-glibcs.py.
2024-12-11 21:51:49 +00:00
Peter Bergner aec85b2557 powerpc64: Fix dl-trampoline.S big-endian / non-ROP build failure
Fix a big-endian / non-ROP build failure caused by commit 4d9a4c02 when
building dl-trampoline.S.

Reported-by: Joseph Myers <josmyers@redhat.com>
2024-12-11 23:15:13 +03:00
Florian Weimer 4f5704ea34 powerpc: Use correct procedure call standard for getrandom vDSO call (bug 32440)
A plain indirect function call does not work on POWER because
success and failure are signaled through a flag register, and
not via the usual Linux negative return value convention.

This has potential security impact, in two ways: the return value
could be out of bounds (EAGAIN is 11 on powerpc6le), and no
random bytes have been written despite the non-error return value.

Fixes commit 461cab1de7 ("linux: Add
support for getrandom vDSO").

Reported-by: Ján Stanček <jstancek@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2024-12-11 17:49:04 +01:00
H.J. Lu b79f257533 Add TEST_CC and TEST_CXX support
Support testing glibc build with a different C compiler or a different
C++ compiler with

$ ../glibc-VERSION/configure TEST_CC="gcc-6.4.1" TEST_CXX="g++-6.4.1"

1. Add LIBC_TRY_CC_AND_TEST_CC_OPTION, LIBC_TRY_CC_AND_TEST_CC_COMMAND
and LIBC_TRY_CC_AND_TEST_LINK to test both CC and TEST_CC.
2. Add check and xcheck targets to Makefile.in and override build compiler
options with ones from TEST_CC and TEST_CXX.

Tested on Fedora 41/x86-64:

1. Building with GCC 14.2.1 and testing with GCC 6.4.1 and GCC 11.2.1.
2. Building with GCC 15 and testing with GCC 6.4.1.

Support for GCC versions older than GCC 6.2 may need to change the test
sources.  Other targets may need to update configure.ac under sysdeps and
modify Makefile.in to override target build compiler options.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
2024-12-11 18:31:00 +08:00
Peter Bergner 4d9a4c02f9 powerpc64le: ROP changes for the dl-trampoline functions
Add ROP protection for the _dl_runtime_resolve and _dl_profile_resolve
functions.
2024-12-10 23:25:56 -05:00
Joseph Myers f962932206 Implement C23 asinpi
C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the asinpi functions (asin(x)/pi).

Tested for x86_64 and x86, and with build-many-glibcs.py.
2024-12-10 20:42:20 +00:00
Joseph Myers 28d102d15c Implement C23 acospi
C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the acospi functions (acos(x)/pi).

Tested for x86_64 and x86, and with build-many-glibcs.py.
2024-12-09 23:01:29 +00:00
Sachin Monga be13e46764 powerpc64le: ROP changes for the *context and setjmp functions
Add ROP protection for the getcontext, setcontext, makecontext, swapcontext
and __sigsetjmp_symbol functions.

Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2024-12-09 16:49:54 -05:00
Michael Jeanson 9e08698e4c nptl: Add <thread_pointer.h> for m68k
This will be required by the rseq extensible ABI implementation on all
Linux architectures exposing the '__rseq_size' and '__rseq_offset'
symbols to set the initial value of the 'cpu_id' field which can be used
by applications to test if rseq is available and registered. As long as
the symbols are exposed it is valid for an application to perform this
test even if rseq is not yet implemented in libc for this architecture.

Compile tested with build-many-glibcs.py but I don't have access to any
hardware to run the tests.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Arjun Shankar <arjun@redhat.com>
2024-12-09 20:24:26 +00:00
Michael Jeanson 8dd1588794 nptl: Add <thread_pointer.h> for RISC-V
This will be required by the rseq extensible ABI implementation on all
Linux architectures exposing the '__rseq_size' and '__rseq_offset'
symbols to set the initial value of the 'cpu_id' field which can be used
by applications to test if rseq is available and registered. As long as
the symbols are exposed it is valid for an application to perform this
test even if rseq is not yet implemented in libc for this architecture.

Both code paths tested on a Visionfive 2 with Debian sid.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-12-09 13:26:55 -05:00
Michael Jeanson d3b3a12258 nptl: add RSEQ_SIG for RISC-V
Enable RSEQ for RISC-V, support was added in Linux 5.18.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-12-09 13:26:55 -05:00
Pierre Blanchard 13a7ef5999 AArch64: Improve codegen in users of ADVSIMD expm1 helper
Add inline helper for expm1 and rearrange operations so MOV
is not necessary in reduction or around the special-case handler.
Reduce memory access by using more indexed MLAs in polynomial.
Speedup on Neoverse V1 for expm1 (19%), sinh (8.5%), and tanh (7.5%).
2024-12-09 16:20:34 +00:00
Pierre Blanchard ca0c0d0f26 AArch64: Improve codegen in users of ADVSIMD log1p helper
Add inline helper for log1p and rearrange operations so MOV
is not necessary in reduction or around the special-case handler.
Reduce memory access by using more indexed MLAs in polynomial.
Speedup on Neoverse V1 for log1p (3.5%), acosh (7.5%) and atanh (10%).
2024-12-09 16:20:34 +00:00
Pierre Blanchard 8eb5ad2ebc AArch64: Improve codegen in AdvSIMD logs
Remove spurious ADRP and a few MOVs.
Reduce memory access by using more indexed MLAs in polynomial.
Align notation so that algorithms are easier to compare.
Speedup on Neoverse V1 for log10 (8%), log (8.5%), and log2 (10%).
Update error threshold in AdvSIMD log (now matches SVE log).
2024-12-09 16:20:34 +00:00
Pierre Blanchard 569cfaaf49 AArch64: Improve codegen in AdvSIMD pow
Remove spurious ADRP. Improve memory access by shuffling constants and
using more indexed MLAs.

A few more optimisation with no impact on accuracy
- force fmas contraction
- switch from shift-aided rint to rint instruction

Between 1 and 5% throughput improvement on Neoverse
V1 depending on benchmark.
2024-12-09 16:20:34 +00:00