Commit Graph

1686 Commits

Author SHA1 Message Date
Adhemerval Zanella 95a01ea955 math: Use atanpif from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic atanpif.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

latency                     master        patched   improvement
x86_64                     66.3296        52.7558        20.46%
x86_64v2                   66.0429        51.4007        22.17%
x86_64v3                   60.6294        48.7876        19.53%
aarch64 (Neoverse)         24.3163        20.9110        14.00%
power8                     16.5766        13.3620        19.39%
power10                    16.5115        13.4072        18.80%

reciprocal-throughput       master        patched   improvement
x86_64                     30.8599        16.0866        47.87%
x86_64v2                   29.2286        15.4688        47.08%
x86_64v3                   23.0960        12.8510        44.36%
aarch64 (Neoverse)         15.4619        10.6752        30.96%
power8                      7.9200         5.2483        33.73%
power10                     6.8539         4.6262        32.50%

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-02-12 16:31:57 -03:00
Adhemerval Zanella 1cd9ccd8c0 math: Use atan2pif from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic atan2pif.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

latency                 master        patched   improvement
x86_64                 79.4006        70.8726        10.74%
x86_64v2               77.5136        69.1424        10.80%
x86_64v3               71.8050        68.1637         5.07%
aarch64 (Neoverse)     27.8363        24.7700        11.02%
power8                 39.3893        17.2929        56.10%
power10                19.7200        16.8187        14.71%

reciprocal-throughput   master        patched   improvement
x86_64                 38.3457        30.9471        19.29%
x86_64v2               37.4023        30.3112        18.96%
x86_64v3               33.0713        24.4891        25.95%
aarch64 (Neoverse)     19.3683        15.3259        20.87%
power8                 19.5507        8.27165        57.69%
power10                9.05331        7.63775        15.64%

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-02-12 16:31:57 -03:00
Adhemerval Zanella ae679a0aca math: Use asinpif from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic asinpif.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

latency                 master        patched   improvement
x86_64                 46.4996        41.6126        10.51%
x86_64v2               46.7551        38.8235        16.96%
x86_64v3               42.6235        33.7603        20.79%
aarch64 (Neoverse)     17.4161        14.3604        17.55%
power8                 10.7347         9.0193        15.98%
power10                10.6420         9.0362        15.09%

reciprocal-throughput   master        patched   improvement
x86_64                 24.7208        16.5544        33.03%
x86_64v2               24.2177        14.8938        38.50%
x86_64v3               20.5617        10.5452        48.71%
aarch64 (Neoverse)     13.4827        7.17613        46.78%
power8                 6.46134        3.56089        44.89%
power10                5.79007        3.49544        39.63%

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-02-12 16:31:57 -03:00
Adhemerval Zanella edb2a8f0ae math: Use acospif from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic acospif.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

latency                  master        patched   improvement
x86_64                  54.8281        42.9070        21.74%
x86_64v2                54.1717        42.7497        21.08%
x86_64v3                49.3552        34.1512        30.81%
aarch64 (Neoverse)      17.9395        14.3733        19.88%
power8                  20.3110         8.8609        56.37%
power10                 11.3113        8.84067        21.84%

reciprocal-throughput    master        patched   improvement
x86_64                  21.2301        14.4803        31.79%
x86_64v2                20.6858        13.9506        32.56%
x86_64v3                16.1944        11.3377        29.99%
aarch64 (Neoverse)      11.4474        7.13282        37.69%
power8                  10.6916        3.57547        66.56%
power10                 4.64269        3.54145        23.72%

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-02-12 16:31:57 -03:00
Florian Weimer 89e61e96b7 i386: Update ulps for *pi functions
As seen with GCC 11.5 on an AMD Ryzen 9 7950X CPU, with an
-fpmath=sse, --disable-multi-arch build of glibc.
2025-01-20 11:34:38 +01:00
Michael Jeanson 494d65129e nptl: Introduce <rseq-access.h> for RSEQ_* accessors
In preparation to move the rseq area to the 'extra TLS' block, we need
accessors based on the thread pointer and the rseq offset. The ONCE
variant of the accessors ensures single-copy atomicity for loads and
stores which is required for all fields once the registration is active.

A separate header is required to allow including <atomic.h> which
results in an include loop when added to <tcb-access.h>.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2025-01-10 20:20:17 +00:00
Florian Weimer 4a9a8a5098 Add missing include guards to <dl-tls.h>
Some architecture-specific variants lack header inclusion guards.
Add them for consistency with the generic version.
2025-01-10 19:02:47 +01:00
Florian Weimer d1da011118 elf: Always define TLS_TP_OFFSET
This will be needed to compute __rseq_offset outside of the TLS
relocation machinery.

Reviewed-by: Michael Jeanson <mjeanson@efficios.com>
2025-01-09 19:30:44 +01:00
Adhemerval Zanella 9cc9f8e11e math: Fix acosf when building with gcc <= 11
GCC <= 11 wrongly assumes the rounding is to nearest and performs a
constant folding where it should evaluate since the result is not
exact [1].

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57245
2025-01-09 12:53:58 -03:00
H.J. Lu 502a71c578 i686: Regenerate multiarch ulps
Regenerate i686 multiarch ulps on Intel Core i7-1195G7 compiled with
-O2 -march=i686 using GCC 14.2.1.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-01-09 07:06:35 +08:00
Adhemerval Zanella 15b7a675bd i386: Update libm-test-ulps
gcc version 14.2.1 targeting '-m32 -march=i586'.
2025-01-06 16:04:04 -03:00
Andreas K. Hüttel 2af56da855
math: update i686 multiarch ulps
Linux waikiki 6.6.53-gentoo #1 SMP Wed Oct  2 13:21:27 CEST 2024 x86_64 AMD EPYC 7532 32-Core Processor AuthenticAMD GNU/Linux

Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
2025-01-06 19:24:01 +01:00
Florian Weimer ceae7e2770 elf: Introduce generic <dl-tls.h>
On arc, the definition of TLS_DTV_UNALLOCATED now comes from
<dl-dtv.h>.

For x86-64 x32, a separate version is needed because unsigned long int
is 32 bits on this target.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-01-02 13:45:27 +01:00
Paul Eggert 2642002380 Update copyright dates with scripts/update-copyrights 2025-01-01 11:22:09 -08:00
Florian Weimer 9a6533429e i386: Regenerate ulps
As seen on an Intel i9-9900K CPU, with glibc built with GCC 11.5,
configured with and without --disable-multi-arch.
2024-12-20 12:40:17 +01:00
Adhemerval Zanella 0e0be3ed80 math: Use tanhf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic tanhf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      51.5273        41.0951        20.25%
x86_64v2                    47.7021        39.1526        17.92%
x86_64v3                    45.0373        34.2737        23.90%
i686                       133.9970        83.8596        37.42%
aarch64 (Neoverse)          21.5439        14.7961        31.32%
power10                     13.3301         8.4406        36.68%

reciprocal-throughput        master        patched   improvement
x86_64                      24.9493        12.8547        48.48%
x86_64v2                    20.7051        12.7761        38.29%
x86_64v3                    19.2492        11.0851        42.41%
i686                        78.6498        29.8211        62.08%
aarch64 (Neoverse)          11.6026        7.11487        38.68%
power10                      6.3328         2.8746        54.61%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella 1751c0519a math: Use sinhf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic sinhf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      52.6819        49.1489         6.71%
x86_64v2                    49.1162        42.9447        12.57%
x86_64v3                    46.9732        39.9157        15.02%
i686                       141.1470       129.6410         8.15%
aarch64 (Neoverse)          20.8539        17.1288        17.86%
power10                     14.5258        9.1906         36.73%

reciprocal-throughput        master        patched   improvement
x86_64                      27.5553        23.9395        13.12%
x86_64v2                    21.6423        20.3219         6.10%
x86_64v3                    21.4842        16.0224        25.42%
i686                        87.9709        86.1626         2.06%
aarch64 (Neoverse)          15.1919        12.2744        19.20%
power10                      7.2188         5.2611        27.12%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella 9583836785 math: Use coshf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode),
although it should worse performance than current one.  The current
implementation performance comes mainly from the internal usage of
the optimize expf implementation, and shows a maximum ULPs of 2 for
FE_TONEAREST and 3 for other rounding modes.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      40.6995        49.0737       -20.58%
x86_64v2                    40.5841        44.3604        -9.30%
x86_64v3                    39.3879        39.7502        -0.92%
i686                       112.3380       129.8570       -15.59%
aarch64 (Neoverse)          18.6914        17.0946         8.54%
power10                     11.1343        9.3245         16.25%

reciprocal-throughput        master        patched   improvement
x86_64                      18.6471        24.1077       -29.28%
x86_64v2                    17.7501        20.2946       -14.34%
x86_64v3                    17.8262        17.1877         3.58%
i686                        64.1454        86.5645       -34.95%
aarch64 (Neoverse)          9.77226        12.2314       -25.16%
power10                      4.0200        5.3316        -32.63%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella 7cfd8b5698 math: Use atanhf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic atanhf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      59.4930        45.8568        22.92%
x86_64v2                    59.5705        45.5804        23.48%
x86_64v3                    53.1838        37.7155        29.08%
i686                        169.354       133.5940        21.12%
aarch64 (Neoverse)          26.0781        16.9829        34.88%
power10                     15.6591        10.7623        31.27%

reciprocal-throughput        master        patched   improvement
x86_64                      23.5903        18.5766        21.25%
x86_64v2                    22.6489        18.2683        19.34%
x86_64v3                    19.0401        13.9474        26.75%
i686                        97.6034       107.3260        -9.96%
aarch64 (Neoverse)          15.3664        9.57846        37.67%
power10                      6.8877        4.6242         32.86%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella 6f9bacf36b math: Use atan2f from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic atan2f.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      68.1175        69.2014        -1.59%
x86_64v2                    66.9884        66.0081         1.46%
x86_64v3                    57.7034        61.6407        -6.82%
i686                       189.8690        152.7560       19.55%
aarch64 (Neoverse)          32.6151        24.5382        24.76%
power10                     21.7282        17.1896        20.89%

reciprocal-throughput        master        patched   improvement
x86_64                      34.5202        31.6155         8.41%
x86_64v2                    32.6379        30.3372         7.05%
x86_64v3                    34.3677        23.6455        31.20%
i686                       157.7290        75.8308        51.92%
aarch64 (Neoverse)          27.7788        16.2671        41.44%
power10                     15.5715         8.1588        47.60%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella a357d6273f math: Use atanf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic atanf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      56.8265        53.6842         5.53%
x86_64v2                    54.8177        53.6842         2.07%
x86_64v3                    46.2915        48.7034        -5.21%
i686                       158.3760        108.9560       31.20%
aarch64 (Neoverse)           21.687        20.5893         5.06%
power10                     13.1903        13.5012        -2.36%

reciprocal-throughput        master        patched   improvement
x86_64                      16.6787        16.7601        -0.49%
x86_64v2                    16.6983        16.7601        -0.37%
x86_64v3                    16.2268        12.1391        25.19%
i686                       138.6840        36.0640        74.00%
aarch64 (Neoverse)          11.8012        10.3565        12.24%
power10                      5.3212         4.2894        19.39%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella ed608a40e2 math: Use asinhf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic asinhf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      64.5128        56.9717        11.69%
x86_64v2                    63.3065        57.2666         9.54%
x86_64v3                    62.8719        51.4170        18.22%
i686                       189.1630        137.635        27.24%
aarch64 (Neoverse)          25.3551        20.5757        18.85%
power10                     17.9712        13.3302        25.82%

reciprocal-throughput        master        patched   improvement
x86_64                      20.0844        15.4731        22.96%
x86_64v2                    19.2919        15.4000        20.17%
x86_64v3                    18.7226        11.9009        36.44%
i686                       103.7670        80.2681        22.65%
aarch64 (Neoverse)          12.5005        8.68969        30.49%
power10                      7.2220        5.03617        30.27%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>:
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella 5fb4b566ef math: Use asinf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic asinf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      42.8237        35.2460        17.70%
x86_64v2                    43.3711        35.9406        17.13%
x86_64v3                    35.0335        30.5744        12.73%
i686                       213.8780        104.4710       51.15%
aarch64 (Neoverse)          17.2937        13.6025        21.34%
power10                     12.0227        7.4241         38.25%

reciprocal-throughput        master        patched   improvement
x86_64                      13.6770        15.5231       -13.50%
x86_64v2                    13.8722        16.0446       -15.66%
x86_64v3                    13.6211        13.2753         2.54%
i686                       186.7670        45.4388        75.67%
aarch64 (Neoverse)          9.96089        9.39285         5.70%
power10                      4.9862        3.7819         24.15%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella 673e6fe110 math: Use acoshf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic acoshf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      61.2471        58.7742         4.04%
x86_64-v2                   62.6519        59.0523         5.75%
x86_64-v3                   58.7408        50.1393        14.64%
aarch64                     24.8580        21.3317        14.19%
power10                     17.0469        13.1345        22.95%

reciprocal-throughput        master        patched   improvement
x86_64                      16.1618        15.1864         6.04%
x86_64-v2                   15.7729        14.7563         6.45%
x86_64-v3                   14.1669        11.9568        15.60%
aarch64                      10.911        9.5486         12.49%
power10                     6.38196        5.06734        20.60%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella 66fa7ad437 math: Use acosf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic acosf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      52.5098        36.6312        30.24%
x86_64v2                    53.0217        37.3091        29.63%
x86_64v3                    42.8501        32.3977        24.39%
i686                       207.3960       109.4000        47.25%
aarch64                     21.3694        13.7871        35.48%
power10                     14.5542         7.2891        49.92%

reciprocal-throughput        master        patched   improvement
x86_64                      14.1487        15.9508       -12.74%
x86_64v2                    14.3293        16.1899       -12.98%
x86_64v3                    13.6563        12.6161         7.62%
i686                       158.4060        45.7354        71.13%
aarch64                     12.5515        9.19233        26.76%
power10                      5.7868         3.3487        42.13%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella 5a4c99163c i386: Update libm-test-ulps
Regen to add new functions acospi, asinpi, atan2pi, atanpi, cospi,
sinpi, and tanpi.
2024-12-18 14:20:41 -03:00
Joseph Myers 3374de9038 Implement C23 atan2pi
C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the atan2pi functions (atan2(y,x)/pi).

Tested for x86_64 and x86, and with build-many-glibcs.py.
2024-12-12 20:57:44 +00:00
Joseph Myers ffe79c446c Implement C23 atanpi
C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the atanpi functions (atan(x)/pi).

Tested for x86_64 and x86, and with build-many-glibcs.py.
2024-12-11 21:51:49 +00:00
Joseph Myers f962932206 Implement C23 asinpi
C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the asinpi functions (asin(x)/pi).

Tested for x86_64 and x86, and with build-many-glibcs.py.
2024-12-10 20:42:20 +00:00
Joseph Myers 28d102d15c Implement C23 acospi
C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the acospi functions (acos(x)/pi).

Tested for x86_64 and x86, and with build-many-glibcs.py.
2024-12-09 23:01:29 +00:00
Joseph Myers f9e90e4b4c Implement C23 tanpi
C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the tanpi functions (tan(pi*x)).

Tested for x86_64 and x86, and with build-many-glibcs.py.
2024-12-05 21:42:10 +00:00
H.J. Lu 09d07f16a7 i686: Update libm-test-ulps
Update i686 libm-test-ulps to fix

FAIL: math/test-float64x-cospi
FAIL: math/test-float64x-sinpi
FAIL: math/test-ldouble-cospi
FAIL: math/test-ldouble-sinpi

when building glibc with GCC 7.4.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2024-12-05 20:10:58 +08:00
Joseph Myers 776938e8b8 Implement C23 sinpi
C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the sinpi functions (sin(pi*x)).

Tested for x86_64 and x86, and with build-many-glibcs.py.
2024-12-04 20:04:04 +00:00
Joseph Myers 0ae0af68d8 Implement C23 cospi
C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the cospi functions (cos(pi*x)).

Tested for x86_64 and x86, and with build-many-glibcs.py.
2024-12-04 10:20:44 +00:00
Adhemerval Zanella bccb0648ea math: Use tanf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic tanf.

The code was adapted to glibc style, to use the definition of
math_config.h, to remove errno handling, and to use a generic
128 bit routine for ABIs that do not support it natively.

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (neoverse1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):

latency                       master       patched  improvement
x86_64                       82.3961       54.8052       33.49%
x86_64v2                     82.3415       54.8052       33.44%
x86_64v3                     69.3661       50.4864       27.22%
i686                         219.271       45.5396       79.23%
aarch64                      29.2127       19.1951       34.29%
power10                      19.5060       16.2760       16.56%

reciprocal-throughput         master       patched  improvement
x86_64                       28.3976       19.7334       30.51%
x86_64v2                     28.4568       19.7334       30.65%
x86_64v3                     21.1815       16.1811       23.61%
i686                         105.016       15.1426       85.58%
aarch64                      18.1573       10.7681       40.70%
power10                       8.7207        8.7097        0.13%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-22 10:52:27 -03:00
Adhemerval Zanella d846f4c12d math: Use lgammaf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic lgammaf.

The code was adapted to glibc style, to use the definition of
math_config.h, to remove errno handling, to use math_narrow_eval
on overflow usage, and to adapt to make it reentrant.

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):

latency                       master       patched  improvement
x86_64                       86.5609       70.3278       18.75%
x86_64v2                     78.3030       69.9709       10.64%
x86_64v3                     74.7470       59.8457       19.94%
i686                         387.355       229.761       40.68%
aarch64                      40.8341       33.7563       17.33%
power10                      26.5520       16.1672       39.11%
powerpc                      28.3145       17.0625       39.74%

reciprocal-throughput         master       patched  improvement
x86_64                       68.0461       48.3098       29.00%
x86_64v2                     55.3256       47.2476       14.60%
x86_64v3                     52.3015       38.9028       25.62%
i686                         340.848       195.707       42.58%
aarch64                      36.8000       30.5234       17.06%
power10                      20.4043       12.6268       38.12%
powerpc                      22.6588       13.8866       38.71%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-22 10:52:27 -03:00
Adhemerval Zanella baa495f231 math: Use erfcf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic erfcf.

The code was adapted to glibc style and to use the definition of
math_config.h.

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):

latency                       master       patched  improvement
x86_64                       98.8796       66.2142       33.04%
x86_64v2                     98.9617       67.4221       31.87%
x86_64v3                     87.4161       53.1754       39.17%
aarch64                      33.8336       22.0781       34.75%
power10                      21.1750       13.5864       35.84%
powerpc                      21.4694       13.8149       35.65%

reciprocal-throughput         master       patched  improvement
x86_64                       48.5620       27.6731       43.01%
x86_64v2                     47.9497       28.3804       40.81%
x86_64v3                     42.0255       18.1355       56.85%
aarch64                      24.3938       13.4041       45.05%
power10                      10.4919        6.1881       41.02%
powerpc                       11.763       6.76468       42.49%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-22 10:52:27 -03:00
Adhemerval Zanella 994fec2397 math: Use erff from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic erff.

The code was adapted to glibc style and to use the definition of
math_config.h.

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):

latency                       master       patched  improvement
x86_64                       85.7363       45.1372       47.35%
x86_64v2                     86.6337       38.5816       55.47%
x86_64v3                     71.3810       34.0843       52.25%
i686                         190.143       97.5014       48.72%
aarch64                      34.9091       14.9320       57.23%
power10                      38.6160        8.5188       77.94%
powerpc                      39.7446       8.45781       78.72%

reciprocal-throughput         master       patched  improvement
x86_64                       35.1739       14.7603       58.04%
x86_64v2                     34.5976       11.2283       67.55%
x86_64v3                     27.3260        9.8550       63.94%
i686                         91.0282       30.8840       66.07%
aarch64                      22.5831        6.9615       69.17%
power10                      18.0386        3.0918       82.86%
powerpc                      20.7277       3.63396       82.47%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-22 10:52:27 -03:00
Adhemerval Zanella f338c7c5f5 math: Use log10p1f from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic log10p1f.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      68.5251        32.2627        52.92%
x86_64v2                    68.8912        32.7887        52.41%
x86_64v3                    59.3427        27.0521        54.41%
i686                        162.026        103.383        36.19%
aarch64                     26.8513        14.5695        45.74%
power10                     12.7426         8.4929        33.35%
powerpc                     16.6768        9.29135        44.29%

reciprocal-throughput        master        patched   improvement
x86_64                      26.0969        12.4023        52.48%
x86_64v2                    25.0045        11.0748        55.71%
x86_64v3                    20.5610        10.2995        49.91%
i686                        89.8842        78.5211        12.64%
aarch64                     17.1200         9.4832        44.61%
power10                      6.7814         6.4258         5.24%
powerpc                      15.769         7.6825        51.28%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-01 11:27:40 -03:00
Adhemerval Zanella 8ae9e51376 math: Use log1pf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic log1pf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      71.8142        38.9668        45.74%
x86_64v2                    71.9094        39.1321        45.58%
x86_64v3                    60.1000        32.4016        46.09%
i686                        147.105        104.258        29.13%
aarch64                     26.4439        14.0050        47.04%
power10                     19.4874         9.4146        51.69%
powerpc                     17.6145        8.00736        54.54%

reciprocal-throughput        master        patched   improvement
x86_64                      19.7604        12.7254        35.60%
x86_64v2                    19.0039        11.9455        37.14%
x86_64v3                    16.8559        11.9317        29.21%
i686                        82.3426        73.9718        10.17%
aarch64                     14.4665         7.9614        44.97%
power10                     11.9974         8.4117        29.89%
powerpc                     7.15222         6.0914        14.83%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-01 11:27:39 -03:00
Adhemerval Zanella c369580814 math: Use log2p1f from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance compared to the generic log2p1f.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      70.1462        47.0090        32.98%
x86_64v2                    70.2513        47.6160        32.22%
x86_64v3                    60.4840        39.9443        33.96%
i686                        164.068        122.909        25.09%
aarch64                     25.9169        16.9207        34.71%
power10                     18.1261        9.8592         45.61%
powerpc                     17.2683        9.38665        45.64%

reciprocal-throughput        master        patched   improvement
x86_64                      26.2240        16.4082        37.43%
x86_64v2                    25.0911        15.7480        37.24%
x86_64v3                    20.9371        11.7264        43.99%
i686                        90.4209        95.3073        -5.40%
aarch64                     16.8537        8.9561         46.86%
power10                     12.9401        6.5555         49.34%
powerpc                     9.01763        7.54745        16.30%

The performance decrease for i686 is mostly due the use of x87 fpu,
when building with '-msse2 -mfpmath=sse:

                             master        patched   improvement
latency                     164.068        102.982        37.23%
reciprocal-throughput       89.1968        82.5117         7.49%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-01 11:27:39 -03:00
Adhemerval Zanella 9247f53219 math: Use log10f from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance compared to the generic log10f.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      49.9017        33.5143        32.84%
x86_64v2                    50.4878        33.5623        33.52%
x86_64v3                    50.0991        27.6078        44.89%
i686                        140.874        106.086        24.69%
aarch64                     19.2846        11.3573        41.11%
power10                     14.0994        7.7739        44.86%
powerpc                     14.2898        7.92497        44.54%

reciprocal-throughput        master        patched   improvement
x86_64                      17.8336        12.9074        27.62%
x86_64v2                    16.4418        11.3220        31.14%
x86_64v3                    15.6002        10.5158        32.59%
i686                        66.0678        80.2287        -21.43%
aarch64                      9.4906        6.8393        27.94%
power10                      7.5255        5.5084        26.80%
powerpc                      9.5204        6.98055        26.68%

The performance decrease for i686 is mostly due the use of x87 fpu,
when building with '-msse2 -mfpmath=sse':

                             master        patched   improvement
latency                     140.874        77.1137        45.26%
reciprocal-throughput        64.481        56.4397        12.47%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-01 11:27:39 -03:00
Adhemerval Zanella bbd578b38d math: Use expm1f from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance compared to the generic expm1f.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      96.7402        36.4026        62.37%
x86_64v2                    97.5391        33.4625        65.69%
x86_64v3                    82.1778        30.8668        62.44%
i686                         120.58        94.8302        21.35%
aarch64                     32.3558        12.8881        60.17%
power10                     23.5087        9.8574         58.07%
powerpc                     23.4776        9.06325        61.40%

reciprocal-throughput        master        patched   improvement
x86_64                      27.8224        15.9255        42.76%
x86_64v2                    27.8364        9.6438         65.36%
x86_64v3                    20.3227        9.6146         52.69%
i686                        63.5629        59.4718         6.44%
aarch64                     17.4838        7.1082         59.34%
power10                     12.4644        8.7829         29.54%
powerpc                     14.2152        5.94765        58.16%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-01 11:27:35 -03:00
Adhemerval Zanella 5c22fd25c1 math: Use exp2m1f from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance compared to the generic exp2m1f.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).  The
only change is to handle FLT_MAX_EXP for FE_DOWNWARD or FE_TOWARDZERO.

The benchmark inputs are based on exp2f ones.

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      40.6042        48.7104       -19.96%
x86_64v2                    40.7506        35.9032        11.90%
x86_64v3                    35.2301        31.7956        9.75%
i686                        102.094        94.6657        7.28%
aarch64                     18.2704        15.1387        17.14%
power10                     11.9444         8.2402        31.01%

reciprocal-throughput        master        patched   improvement
x86_64                      20.8683        16.1428        22.64%
x86_64v2                    19.5076        10.4474        46.44%
x86_64v3                    19.2106        10.4014        45.86%
i686                        56.4054        59.3004        -5.13%
aarch64                     12.0781         7.3953        38.77%
power10                      6.5306         5.9388         9.06%

The generic implementation calls __ieee754_exp2f and x86_64 provides
an optimized ifunc version (built with -mfma -mavx2, not correctly
rounded).  This explains the performance difference for x86_64.

Same for i686, where the ABI provides an optimized __ieee754_exp2f
version built with '-msse2 -mfpmath=sse'.  When built wth same
flags, the new algorithm shows a better performance:

                            master        patched    improvement
latency                    102.094        91.2823         10.59%
reciprocal-throughput      56.4054        52.7984          6.39%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-01 11:27:35 -03:00
Adhemerval Zanella 5fa89852fa math: Use exp10m1f from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance compared to the generic exp10m1f.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).  I mostly
fixed some small issues in corner cases (sNaN handling, -INFINITY,
a specific overflow check).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      45.4690        49.5845        -9.05%
x86_64v2                    46.1604        36.2665        21.43%
x86_64v3                    37.8442        31.0359        17.99%
i686                        121.367        93.0079        23.37%
aarch64                     21.1126        15.0165        28.87%
power10                     12.7426        8.4929         33.35%

reciprocal-throughput        master        patched   improvement
x86_64                      19.6005        17.4005        11.22%
x86_64v2                    19.6008        11.1977        42.87%
x86_64v3                    17.5427        10.2898        41.34%
i686                        59.4215        60.9675        -2.60%
aarch64                     13.9814        7.9173         43.37%
power10                      6.7814        6.4258          5.24%

The generic implementation calls __ieee754_exp10f which has an
optimized version, although it is not correctly rounded, which is
the main culprit of the the latency difference for x86_64 and
throughp for i686.

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-01 11:27:26 -03:00
Paul Zimmermann 392b3f0971 replace tgammaf by the CORE-MATH implementation
The CORE-MATH implementation is correctly rounded (for any rounding mode).
This can be checked by exhaustive tests in a few minutes since there are
less than 2^32 values to check against for example GNU MPFR.
This patch also adds some bench values for tgammaf.

Tested on x86_64 and x86 (cfarm26).

With the initial GNU libc code it gave on an Intel(R) Core(TM) i7-8700:

      "tgammaf": {
       "": {
        "duration": 3.50188e+09,
        "iterations": 2e+07,
        "max": 602.891,
        "min": 65.1415,
        "mean": 175.094
       }
      }

With the new code:

      "tgammaf": {
       "": {
        "duration": 3.30825e+09,
        "iterations": 5e+07,
        "max": 211.592,
        "min": 32.0325,
        "mean": 66.1649
       }
      }

With the initial GNU libc code it gave on cfarm26 (i686):

  "tgammaf": {
   "": {
    "duration": 3.70505e+09,
    "iterations": 6e+06,
    "max": 2420.23,
    "min": 243.154,
    "mean": 617.509
   }
  }

With the new code:

  "tgammaf": {
   "": {
    "duration": 3.24497e+09,
    "iterations": 1.8e+07,
    "max": 1238.15,
    "min": 101.155,
    "mean": 180.276
   }
  }

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>

Changes in v2:
    - include <math.h> (fix the linknamespace failures)
    - restored original benchtests/strcoll-inputs/filelist#en_US.UTF-8 file
    - restored original wrapper code (math/w_tgammaf_compat.c),
      except for the dealing with the sign
    - removed the tgammaf/float entries in all libm-test-ulps files
    - address other comments from Joseph Myers
      (https://sourceware.org/pipermail/libc-alpha/2024-July/158736.html)

Changes in v3:
    - pass NULL argument for signgam from w_tgammaf_compat.c
    - use of math_narrow_eval
    - added more comments

Changes in v4:
    - initialize local_signgam to 0 in math/w_tgamma_template.c
    - replace sysdeps/ieee754/dbl-64/gamma_productf.c by dummy file

Changes in v5:
    - do not mention local_signgam any more in math/w_tgammaf_compat.c
    - initialize local_signgam to 1 instead of 0 in w_tgamma_template.c
      and added comment

Changes in v6:
    - pass NULL as 2nd argument of __ieee754_gammaf_r in
      w_tgammaf_compat.c, and check for NULL in e_gammaf_r.c

Changes in v7:
    - added Signed-off-by line for Alexei Sibidanov (author of the code)

Changes in v8:
    - added Signed-off-by line for Paul Zimmermann (submitted of the patch)

Changes in v9:
    - address comments from review by Adhemerval Zanella
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-10-11 11:12:32 +02:00
Carlos O'Donell cae9944a6c Fix whitespace related license issues.
Several copies of the licenses in files contained whitespace related
problems.  Two cases are addressed here, the first is two spaces
after a period which appears between "PURPOSE." and "See". The other
is a space after the last forward slash in the URL. Both issues are
corrected and the licenses now match the official textual description
of the license (and the other license in the sources).

Since these whitespaces changes do not alter the paragraph structure of
the license, nor create new sentences, they do not change the license.
2024-10-07 18:08:16 -04:00
Florian Weimer a8c433856f i386: Update ulps
As seen on an AMD Ryzen 9 7950X CPU when building with GCC 14
with SSE2 math.
2024-09-05 22:25:55 +02:00
Florian Weimer ed416ee402 i386: Update ulps
As seen on an unspecified Intel system with glibc compiled
with GCC 8.
2024-09-05 09:57:25 +02:00
Adhemerval Zanella f8aafb5a16 i386: Regenerate ULPs
From new tests added by 0797283910.
2024-08-07 11:02:03 -03:00