Commit Graph

92 Commits

Author SHA1 Message Date
Adhemerval Zanella 5fb4b566ef math: Use asinf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic asinf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      42.8237        35.2460        17.70%
x86_64v2                    43.3711        35.9406        17.13%
x86_64v3                    35.0335        30.5744        12.73%
i686                       213.8780        104.4710       51.15%
aarch64 (Neoverse)          17.2937        13.6025        21.34%
power10                     12.0227        7.4241         38.25%

reciprocal-throughput        master        patched   improvement
x86_64                      13.6770        15.5231       -13.50%
x86_64v2                    13.8722        16.0446       -15.66%
x86_64v3                    13.6211        13.2753         2.54%
i686                       186.7670        45.4388        75.67%
aarch64 (Neoverse)          9.96089        9.39285         5.70%
power10                      4.9862        3.7819         24.15%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella 673e6fe110 math: Use acoshf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic acoshf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      61.2471        58.7742         4.04%
x86_64-v2                   62.6519        59.0523         5.75%
x86_64-v3                   58.7408        50.1393        14.64%
aarch64                     24.8580        21.3317        14.19%
power10                     17.0469        13.1345        22.95%

reciprocal-throughput        master        patched   improvement
x86_64                      16.1618        15.1864         6.04%
x86_64-v2                   15.7729        14.7563         6.45%
x86_64-v3                   14.1669        11.9568        15.60%
aarch64                      10.911        9.5486         12.49%
power10                     6.38196        5.06734        20.60%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella 66fa7ad437 math: Use acosf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic acosf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      52.5098        36.6312        30.24%
x86_64v2                    53.0217        37.3091        29.63%
x86_64v3                    42.8501        32.3977        24.39%
i686                       207.3960       109.4000        47.25%
aarch64                     21.3694        13.7871        35.48%
power10                     14.5542         7.2891        49.92%

reciprocal-throughput        master        patched   improvement
x86_64                      14.1487        15.9508       -12.74%
x86_64v2                    14.3293        16.1899       -12.98%
x86_64v3                    13.6563        12.6161         7.62%
i686                       158.4060        45.7354        71.13%
aarch64                     12.5515        9.19233        26.76%
power10                      5.7868         3.3487        42.13%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Andreas K. Hüttel 3a9b4b4aeb
math: Add sinpi,cospi,tanpi sparc64 ulps
Linux catbus 6.1.112 #1 SMP Sun Oct 13 10:52:08 PDT 2024 sparc64 sun4v UltraSparc T5 (Niagara5) GNU/Linux

gcc (Gentoo 13.3.1_p20240614 p17) 13.3.1 20240614

Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
2024-12-08 22:01:51 +01:00
Adhemerval Zanella bccb0648ea math: Use tanf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic tanf.

The code was adapted to glibc style, to use the definition of
math_config.h, to remove errno handling, and to use a generic
128 bit routine for ABIs that do not support it natively.

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (neoverse1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):

latency                       master       patched  improvement
x86_64                       82.3961       54.8052       33.49%
x86_64v2                     82.3415       54.8052       33.44%
x86_64v3                     69.3661       50.4864       27.22%
i686                         219.271       45.5396       79.23%
aarch64                      29.2127       19.1951       34.29%
power10                      19.5060       16.2760       16.56%

reciprocal-throughput         master       patched  improvement
x86_64                       28.3976       19.7334       30.51%
x86_64v2                     28.4568       19.7334       30.65%
x86_64v3                     21.1815       16.1811       23.61%
i686                         105.016       15.1426       85.58%
aarch64                      18.1573       10.7681       40.70%
power10                       8.7207        8.7097        0.13%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-22 10:52:27 -03:00
Adhemerval Zanella d846f4c12d math: Use lgammaf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic lgammaf.

The code was adapted to glibc style, to use the definition of
math_config.h, to remove errno handling, to use math_narrow_eval
on overflow usage, and to adapt to make it reentrant.

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):

latency                       master       patched  improvement
x86_64                       86.5609       70.3278       18.75%
x86_64v2                     78.3030       69.9709       10.64%
x86_64v3                     74.7470       59.8457       19.94%
i686                         387.355       229.761       40.68%
aarch64                      40.8341       33.7563       17.33%
power10                      26.5520       16.1672       39.11%
powerpc                      28.3145       17.0625       39.74%

reciprocal-throughput         master       patched  improvement
x86_64                       68.0461       48.3098       29.00%
x86_64v2                     55.3256       47.2476       14.60%
x86_64v3                     52.3015       38.9028       25.62%
i686                         340.848       195.707       42.58%
aarch64                      36.8000       30.5234       17.06%
power10                      20.4043       12.6268       38.12%
powerpc                      22.6588       13.8866       38.71%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-22 10:52:27 -03:00
Adhemerval Zanella baa495f231 math: Use erfcf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic erfcf.

The code was adapted to glibc style and to use the definition of
math_config.h.

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):

latency                       master       patched  improvement
x86_64                       98.8796       66.2142       33.04%
x86_64v2                     98.9617       67.4221       31.87%
x86_64v3                     87.4161       53.1754       39.17%
aarch64                      33.8336       22.0781       34.75%
power10                      21.1750       13.5864       35.84%
powerpc                      21.4694       13.8149       35.65%

reciprocal-throughput         master       patched  improvement
x86_64                       48.5620       27.6731       43.01%
x86_64v2                     47.9497       28.3804       40.81%
x86_64v3                     42.0255       18.1355       56.85%
aarch64                      24.3938       13.4041       45.05%
power10                      10.4919        6.1881       41.02%
powerpc                       11.763       6.76468       42.49%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-22 10:52:27 -03:00
Adhemerval Zanella 994fec2397 math: Use erff from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic erff.

The code was adapted to glibc style and to use the definition of
math_config.h.

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):

latency                       master       patched  improvement
x86_64                       85.7363       45.1372       47.35%
x86_64v2                     86.6337       38.5816       55.47%
x86_64v3                     71.3810       34.0843       52.25%
i686                         190.143       97.5014       48.72%
aarch64                      34.9091       14.9320       57.23%
power10                      38.6160        8.5188       77.94%
powerpc                      39.7446       8.45781       78.72%

reciprocal-throughput         master       patched  improvement
x86_64                       35.1739       14.7603       58.04%
x86_64v2                     34.5976       11.2283       67.55%
x86_64v3                     27.3260        9.8550       63.94%
i686                         91.0282       30.8840       66.07%
aarch64                      22.5831        6.9615       69.17%
power10                      18.0386        3.0918       82.86%
powerpc                      20.7277       3.63396       82.47%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-22 10:52:27 -03:00
Adhemerval Zanella c5d241f06b math: Use cbrtf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic cbrtf.

The code was adapted to glibc style and to use the definition of
math_config.h.

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):

latency                       master        patched       improvement
x86_64                       68.6348        36.8908            46.25%
x86_64v2                     67.3418        36.6968            45.51%
x86_64v3                     63.4981        32.7859            48.37%
aarch64                      29.3172        12.1496            58.56%
power10                      18.0845         8.8893            50.85%
powerpc                      18.0859        8.79527            51.37%

reciprocal-throughput         master        patched       improvement
x86_64                       36.4369        13.3565            63.34%
x86_64v2                     37.3611        13.1149            64.90%
x86_64v3                     31.6024        11.2102            64.53%
aarch64                      18.6866        7.3474             60.68%
power10                       9.4758        3.6329             61.66%
powerpc                      9.58896        3.90439            59.28%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-11-22 10:01:03 -03:00
Adhemerval Zanella f338c7c5f5 math: Use log10p1f from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic log10p1f.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      68.5251        32.2627        52.92%
x86_64v2                    68.8912        32.7887        52.41%
x86_64v3                    59.3427        27.0521        54.41%
i686                        162.026        103.383        36.19%
aarch64                     26.8513        14.5695        45.74%
power10                     12.7426         8.4929        33.35%
powerpc                     16.6768        9.29135        44.29%

reciprocal-throughput        master        patched   improvement
x86_64                      26.0969        12.4023        52.48%
x86_64v2                    25.0045        11.0748        55.71%
x86_64v3                    20.5610        10.2995        49.91%
i686                        89.8842        78.5211        12.64%
aarch64                     17.1200         9.4832        44.61%
power10                      6.7814         6.4258         5.24%
powerpc                      15.769         7.6825        51.28%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-01 11:27:40 -03:00
Adhemerval Zanella 8ae9e51376 math: Use log1pf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic log1pf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      71.8142        38.9668        45.74%
x86_64v2                    71.9094        39.1321        45.58%
x86_64v3                    60.1000        32.4016        46.09%
i686                        147.105        104.258        29.13%
aarch64                     26.4439        14.0050        47.04%
power10                     19.4874         9.4146        51.69%
powerpc                     17.6145        8.00736        54.54%

reciprocal-throughput        master        patched   improvement
x86_64                      19.7604        12.7254        35.60%
x86_64v2                    19.0039        11.9455        37.14%
x86_64v3                    16.8559        11.9317        29.21%
i686                        82.3426        73.9718        10.17%
aarch64                     14.4665         7.9614        44.97%
power10                     11.9974         8.4117        29.89%
powerpc                     7.15222         6.0914        14.83%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-01 11:27:39 -03:00
Adhemerval Zanella c369580814 math: Use log2p1f from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance compared to the generic log2p1f.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      70.1462        47.0090        32.98%
x86_64v2                    70.2513        47.6160        32.22%
x86_64v3                    60.4840        39.9443        33.96%
i686                        164.068        122.909        25.09%
aarch64                     25.9169        16.9207        34.71%
power10                     18.1261        9.8592         45.61%
powerpc                     17.2683        9.38665        45.64%

reciprocal-throughput        master        patched   improvement
x86_64                      26.2240        16.4082        37.43%
x86_64v2                    25.0911        15.7480        37.24%
x86_64v3                    20.9371        11.7264        43.99%
i686                        90.4209        95.3073        -5.40%
aarch64                     16.8537        8.9561         46.86%
power10                     12.9401        6.5555         49.34%
powerpc                     9.01763        7.54745        16.30%

The performance decrease for i686 is mostly due the use of x87 fpu,
when building with '-msse2 -mfpmath=sse:

                             master        patched   improvement
latency                     164.068        102.982        37.23%
reciprocal-throughput       89.1968        82.5117         7.49%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-01 11:27:39 -03:00
Adhemerval Zanella bbd578b38d math: Use expm1f from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance compared to the generic expm1f.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      96.7402        36.4026        62.37%
x86_64v2                    97.5391        33.4625        65.69%
x86_64v3                    82.1778        30.8668        62.44%
i686                         120.58        94.8302        21.35%
aarch64                     32.3558        12.8881        60.17%
power10                     23.5087        9.8574         58.07%
powerpc                     23.4776        9.06325        61.40%

reciprocal-throughput        master        patched   improvement
x86_64                      27.8224        15.9255        42.76%
x86_64v2                    27.8364        9.6438         65.36%
x86_64v3                    20.3227        9.6146         52.69%
i686                        63.5629        59.4718         6.44%
aarch64                     17.4838        7.1082         59.34%
power10                     12.4644        8.7829         29.54%
powerpc                     14.2152        5.94765        58.16%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-01 11:27:35 -03:00
Adhemerval Zanella 5c22fd25c1 math: Use exp2m1f from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance compared to the generic exp2m1f.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).  The
only change is to handle FLT_MAX_EXP for FE_DOWNWARD or FE_TOWARDZERO.

The benchmark inputs are based on exp2f ones.

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      40.6042        48.7104       -19.96%
x86_64v2                    40.7506        35.9032        11.90%
x86_64v3                    35.2301        31.7956        9.75%
i686                        102.094        94.6657        7.28%
aarch64                     18.2704        15.1387        17.14%
power10                     11.9444         8.2402        31.01%

reciprocal-throughput        master        patched   improvement
x86_64                      20.8683        16.1428        22.64%
x86_64v2                    19.5076        10.4474        46.44%
x86_64v3                    19.2106        10.4014        45.86%
i686                        56.4054        59.3004        -5.13%
aarch64                     12.0781         7.3953        38.77%
power10                      6.5306         5.9388         9.06%

The generic implementation calls __ieee754_exp2f and x86_64 provides
an optimized ifunc version (built with -mfma -mavx2, not correctly
rounded).  This explains the performance difference for x86_64.

Same for i686, where the ABI provides an optimized __ieee754_exp2f
version built with '-msse2 -mfpmath=sse'.  When built wth same
flags, the new algorithm shows a better performance:

                            master        patched    improvement
latency                    102.094        91.2823         10.59%
reciprocal-throughput      56.4054        52.7984          6.39%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-01 11:27:35 -03:00
Adhemerval Zanella 5fa89852fa math: Use exp10m1f from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance compared to the generic exp10m1f.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).  I mostly
fixed some small issues in corner cases (sNaN handling, -INFINITY,
a specific overflow check).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      45.4690        49.5845        -9.05%
x86_64v2                    46.1604        36.2665        21.43%
x86_64v3                    37.8442        31.0359        17.99%
i686                        121.367        93.0079        23.37%
aarch64                     21.1126        15.0165        28.87%
power10                     12.7426        8.4929         33.35%

reciprocal-throughput        master        patched   improvement
x86_64                      19.6005        17.4005        11.22%
x86_64v2                    19.6008        11.1977        42.87%
x86_64v3                    17.5427        10.2898        41.34%
i686                        59.4215        60.9675        -2.60%
aarch64                     13.9814        7.9173         43.37%
power10                      6.7814        6.4258          5.24%

The generic implementation calls __ieee754_exp10f which has an
optimized version, although it is not correctly rounded, which is
the main culprit of the the latency difference for x86_64 and
throughp for i686.

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-01 11:27:26 -03:00
Paul Zimmermann 392b3f0971 replace tgammaf by the CORE-MATH implementation
The CORE-MATH implementation is correctly rounded (for any rounding mode).
This can be checked by exhaustive tests in a few minutes since there are
less than 2^32 values to check against for example GNU MPFR.
This patch also adds some bench values for tgammaf.

Tested on x86_64 and x86 (cfarm26).

With the initial GNU libc code it gave on an Intel(R) Core(TM) i7-8700:

      "tgammaf": {
       "": {
        "duration": 3.50188e+09,
        "iterations": 2e+07,
        "max": 602.891,
        "min": 65.1415,
        "mean": 175.094
       }
      }

With the new code:

      "tgammaf": {
       "": {
        "duration": 3.30825e+09,
        "iterations": 5e+07,
        "max": 211.592,
        "min": 32.0325,
        "mean": 66.1649
       }
      }

With the initial GNU libc code it gave on cfarm26 (i686):

  "tgammaf": {
   "": {
    "duration": 3.70505e+09,
    "iterations": 6e+06,
    "max": 2420.23,
    "min": 243.154,
    "mean": 617.509
   }
  }

With the new code:

  "tgammaf": {
   "": {
    "duration": 3.24497e+09,
    "iterations": 1.8e+07,
    "max": 1238.15,
    "min": 101.155,
    "mean": 180.276
   }
  }

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>

Changes in v2:
    - include <math.h> (fix the linknamespace failures)
    - restored original benchtests/strcoll-inputs/filelist#en_US.UTF-8 file
    - restored original wrapper code (math/w_tgammaf_compat.c),
      except for the dealing with the sign
    - removed the tgammaf/float entries in all libm-test-ulps files
    - address other comments from Joseph Myers
      (https://sourceware.org/pipermail/libc-alpha/2024-July/158736.html)

Changes in v3:
    - pass NULL argument for signgam from w_tgammaf_compat.c
    - use of math_narrow_eval
    - added more comments

Changes in v4:
    - initialize local_signgam to 0 in math/w_tgamma_template.c
    - replace sysdeps/ieee754/dbl-64/gamma_productf.c by dummy file

Changes in v5:
    - do not mention local_signgam any more in math/w_tgammaf_compat.c
    - initialize local_signgam to 1 instead of 0 in w_tgamma_template.c
      and added comment

Changes in v6:
    - pass NULL as 2nd argument of __ieee754_gammaf_r in
      w_tgammaf_compat.c, and check for NULL in e_gammaf_r.c

Changes in v7:
    - added Signed-off-by line for Alexei Sibidanov (author of the code)

Changes in v8:
    - added Signed-off-by line for Paul Zimmermann (submitted of the patch)

Changes in v9:
    - address comments from review by Adhemerval Zanella
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-10-11 11:12:32 +02:00
Adhemerval Zanella 1dcc107a1f sparc: Regenerate ULPs
From new tests added by 0797283910.
2024-08-07 11:02:03 -03:00
Adhemerval Zanella fe94080875 sparc: Regenerate ULPs
From new tests added by 4dc22baa84.
2024-07-25 11:06:53 -03:00
Andreas K. Hüttel 4f1cf0c0e1
sparc: Regenerate ULPs
Linux catbus 5.15.110-gentoo-r1 #1 SMP Fri Jun 9 17:53:23 PDT 2023 sparc64 sun4v UltraSparc T5 (Niagara5) GNU/Linux

Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
2024-06-19 14:58:32 +02:00
Joseph Myers bb014f50c4 Implement C23 logp1
C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the logp1 functions (aliases for log1p functions - the
name is intended to be more consistent with the new log2p1 and
log10p1, where clearly it would have been very confusing to name those
functions log21p and log101p).  As aliases rather than new functions,
the content of this patch is somewhat different from those actually
adding new functions.

Tests are shared with log1p, so this patch *does* mechanically update
all affected libm-test-ulps files to expect the same errors for both
functions.

The vector versions of log1p on aarch64 and x86_64 are *not* updated
to have logp1 aliases (and thus there are no corresponding header,
tests, abilist or ulps changes for vector functions either).  It would
be reasonable for such vector aliases and corresponding changes to
other files to be made separately.  For now, the log1p tests instead
avoid testing logp1 in the vector case (a Makefile change is needed to
avoid problems with grep, used in generating the .c files for vector
function tests, matching more than one ALL_RM_TEST line in a file
testing multiple functions with the same inputs, when it assumes that
the .inc file only has a single such line).

Tested for x86_64 and x86, and with build-many-glibcs.py.
2024-06-17 13:47:09 +00:00
Adhemerval Zanella 5a70ac9d39 Update sparc libm-test-ulps 2023-07-17 10:09:44 -03:00
Andreas K. Hüttel 33f0f58b59 sparc (64bit): Regenerate ulps
Linux catbus 5.15.69-gentoo #1 SMP Sat Sep 24 07:56:24 PDT 2022 sparc64 sun4v UltraSparc T5 (Niagara5) GNU/Linux
gcc (Gentoo 11.3.1_p20221209 p3) 11.3.1 20221209
GNU ld (Gentoo 2.38 p4) 2.38
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2023-01-24 11:21:50 -05:00
Adhemerval Zanella c52eb066bc Update sparc libm-test-ulps 2021-08-04 17:38:38 -03:00
Paul Zimmermann 9acda61d94 Fix the inaccuracy of j0f/j1f/y0f/y1f [BZ #14469, #14470, #14471, #14472]
For j0f/j1f/y0f/y1f, the largest error for all binary32
inputs is reduced to at most 9 ulps for all rounding modes.

The new code is enabled only when there is a cancellation at the very end of
the j0f/j1f/y0f/y1f computation, or for very large inputs, thus should not
give any visible slowdown on average.  Two different algorithms are used:

* around the first 64 zeros of j0/j1/y0/y1, approximation polynomials of
  degree 3 are used, computed using the Sollya tool (https://www.sollya.org/)

* for large inputs, an asymptotic formula from [1] is used

[1] Fast and Accurate Bessel Function Computation,
    John Harrison, Proceedings of Arith 19, 2009.

Inputs yielding the new largest errors are added to auto-libm-test-in,
and ulps are regenerated for various targets (thanks Adhemerval Zanella).

Tested on x86_64 with --disable-multi-arch and on powerpc64le-linux-gnu.
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-04-02 06:15:48 +02:00
Adhemerval Zanella 42624c7dc7 Update sparc libm-tests-ulps
Required after db3f7bb558 "math: Remove slow paths from asin and
acos [BZ #15267]".
2021-03-30 14:04:11 -03:00
Adhemerval Zanella 1a1c854338 Update sparc libm-test-ulps 2021-03-05 08:34:40 -03:00
Adhemerval Zanella 341063c3b1 Update sparc libm-test-ulps 2021-01-28 14:05:14 -03:00
Adhemerval Zanella 9efac04341 Update sparc libm-test-ulps 2020-09-11 14:39:03 -03:00
Adhemerval Zanella e379112b68 Update sparc libm-test-ulps 2020-04-08 13:52:44 -03:00
Adhemerval Zanella 1c15464ca0 math: Remove inline math tests
With mathinline removal there is no need to keep building and testing
inline math tests.

The gen-libm-tests.py support to generate ULP_I_* is removed and all
libm-test-ulps files are updated to longer have the
i{float,double,ldouble} entries.  The support for no-test-inline is
also removed from both gen-auto-libm-tests and the
auto-libm-test-out-* were regenerated.

Checked on x86_64-linux-gnu and i686-linux-gnu.
2020-03-19 11:45:44 -03:00
David S. Miller 0a271c8f25 Regenerate sparc ulps.
* sysdeps/sparc/fpu/libm-test-ulps: Regenerated.
2018-10-18 11:43:56 -07:00
Adhemerval Zanella bbc2062a75 Update SPARC libm-test-ulps.
* sysdeps/sparc/fpu/libm-test-ulps: Update.
2018-04-05 17:10:03 -03:00
Adhemerval Zanella 4cdc25a4a1 Update sparc ulps
* sysdeps/sparc/fpu/libm-test-ulps: Update.
2018-02-22 16:44:58 -03:00
Patrick McGehearty 1cbf818d25 Update sparc ULPs.
* sysdeps/sparc/fpu/libm-test-ulps: Update
        cpow, ctan, ctanh, j0, j1, y0, yn ulps.
2018-01-26 08:23:01 -08:00
David S. Miller 7ec094e801 Sparc ulp update.
* sysdeps/sparc/fpu/libm-test-ulps: Update
	exp_{downward,towardzero,upward} ulps.
2017-12-06 12:18:51 -08:00
Joseph Myers 5a80d39d0d Obsolete pow10 functions.
This patch obsoletes the pow10, pow10f and pow10l functions (makes
them into compat symbols, not available for new ports or static
linking).  The exp10 names for these functions are standardized (in TS
18661-4) and were added in the same glibc version (2.1) as pow10 so
source code can change to use them without any loss of portability.
Since pow10 is deliberately not provided for _Float128, only exp10,
this slightly simplifies moving to the new wrapper templates in the
!LIBM_SVID_COMPAT case, by avoiding needing to arrange for pow10,
pow10f and pow10l to be defined by those templates.

Tested for x86_64, and with build-many-glibcs.py.

	* manual/math.texi (pow10): Do not document.
	(pow10f): Likewise.
	(pow10l): Likewise.
	* math/bits/mathcalls.h [__USE_GNU] (pow10): Do not declare.
	* math/bits/math-finite.h [__USE_GNU] (pow10): Likewise.
	* math/libm-test-exp10.inc (pow10_test): Remove.
	(do_test): Do not call pow10.
	* math/w_exp10_compat.c (pow10): Make into compat symbol.
	[NO_LONG_DOUBLE] (pow10l): Likewise.
	* math/w_exp10f_compat.c (pow10f): Likewise.
	* math/w_exp10l_compat.c (pow10l): Likewise.
	* sysdeps/ia64/fpu/e_exp10.S: Include <shlib-compat.h>.
	(pow10): Make into compat symbol.
	* sysdeps/ia64/fpu/e_exp10f.S: Include <shlib-compat.h>.
	(pow10f): Make into compat symbol.
	* sysdeps/ia64/fpu/e_exp10l.S: Include <shlib-compat.h>.
	(pow10l): Make into compat symbol.
	* sysdeps/ieee754/ldbl-opt/Makefile (libnldbl-calls): Remove
	pow10.
	(CFLAGS-nldbl-pow10.c): Remove variable..
	* sysdeps/ieee754/ldbl-opt/nldbl-pow10.c: Remove file.
	* sysdeps/ieee754/ldbl-opt/w_exp10_compat.c (pow10l): Condition on
	[SHLIB_COMPAT (libm, GLIBC_2_1, GLIBC_2_27)].
	* sysdeps/ieee754/ldbl-opt/w_exp10l_compat.c (compat_symbol):
	Undefine and redefine.
	(pow10l): Make into compat symbol.
	* sysdeps/aarch64/libm-test-ulps: Remove pow10 ulps.
	* sysdeps/alpha/fpu/libm-test-ulps: Likewise.
	* sysdeps/arm/libm-test-ulps: Likewise.
	* sysdeps/hppa/fpu/libm-test-ulps: Likewise.
	* sysdeps/i386/fpu/libm-test-ulps: Likewise.
	* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
	* sysdeps/microblaze/libm-test-ulps: Likewise.
	* sysdeps/mips/mips32/libm-test-ulps: Likewise.
	* sysdeps/mips/mips64/libm-test-ulps: Likewise.
	* sysdeps/nios2/libm-test-ulps: Likewise.
	* sysdeps/powerpc/fpu/libm-test-ulps: Likewise.
	* sysdeps/powerpc/nofpu/libm-test-ulps: Likewise.
	* sysdeps/s390/fpu/libm-test-ulps: Likewise.
	* sysdeps/sh/libm-test-ulps: Likewise.
	* sysdeps/sparc/fpu/libm-test-ulps: Likewise.
	* sysdeps/tile/libm-test-ulps: Likewise.
	* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
2017-09-01 21:13:18 +00:00
Adhemerval Zanella ee8c6cc5f9 Update sparc ulps
* sysdeps/sparc/fpu/libm-test-ulps: Update.
2017-07-19 15:56:02 -03:00
David S. Miller a9d460a977 Update sparc ULPS.
* sysdeps/sparc/fpu/libm-test-ulps: Update.
2016-01-24 21:12:58 -08:00
David S. Miller d87630edc4 Regenerate SPARC ULPs.
* sysdeps/sparc/fpu/libm-test-ulps: Regenerated.
2015-07-16 12:03:33 -07:00
David S. Miller a8b6a3a6c1 Rebuilt fresh sparc ULPS to get rid of removed tests.
* sysdeps/sparc/fpu/libm-test-ulps: Regenerate from scratch.
2015-04-15 09:42:54 -07:00
David S. Miller 23ebf74307 Update SPARC ulps.
* sysdeps/sparc/fpu/libm-test-ulps: Update.
2015-04-08 20:34:49 -07:00
David S. Miller 5efe06a176 Update sparc ULPS.
* sysdeps/sparc/fpu/libm-test-ulps: Update.
2014-07-09 17:20:01 -07:00
David S. Miller 6957fb3150 Update Sparc ULPs.
* sysdeps/sparc/fpu/libm-test-ulps: Update.
2014-06-14 13:14:54 -07:00
David S. Miller 6b4d7a909f Update Sparc ULPS.
* sysdeps/sparc/fpu/libm-test-ulps: Update.
2014-06-01 19:19:18 -07:00
David S. Miller 809bd45fa9 Regenerate sparc ULPs.
* sysdeps/sparc/fpu/libm-test-ulps: Update.
2014-04-13 18:27:19 -07:00
Joseph Myers e6b6a85705 Don't include individual test ulps in libm-test-ulps.
As recently discussed
<https://sourceware.org/ml/libc-alpha/2014-02/msg00670.html>, it
doesn't seem particularly useful for libm-test-ulps files to contain
huge amounts of data on ulps for individual tests; just the global
maximum observed ulps for each function, together with the
verification of exceptions, errno and special results such as
infinities and NaNs for each test, suffices to verify that a
function's behavior on the given test inputs is within the expected
accuracy.  Removing this data reduces source tree churn caused by
updates to these files when libm tests are added, and reduces the
frequency with which testsuite additions actually need libm-test-ulps
changes at all.

Accordingly, this patch removes that data, so that individual tests
get checked against the global bounds for the given function and only
generate an error if those are exceeded.  Tested x86_64 (including
verifying that if an ulps value is artificially reduced, the tests do
indeed fail as they should and "make regen-ulps" generates the
expected changes).

	* math/libm-test.inc (struct ulp_data): Don't refer to ulps for
	individual tests in comment.
	(libm-test-ulps.h): Don't refer to test_ulps in #include comment.
	(prev_max_error): New variable.
	(prev_real_max_error): Likewise.
	(prev_imag_max_error): Likewise.
	(compare_ulp_data): Don't refer to test names in comment.
	(find_test_ulps): Remove function.
	(find_function_ulps): Likewise.
	(find_complex_function_ulps): Likewise.
	(init_max_error): Take function name as argument.  Look up ulps
	for that function.
	(print_ulps): Remove function.
	(print_max_error): Use prev_max_error instead of calling
	find_function_ulps.
	(print_complex_max_error): Use prev_real_max_error and
	prev_imag_max_error instead of calling find_complex_function_ulps.
	(check_float_internal): Take max_ulp parameter instead of calling
	find_test_ulps.  Don't call print_ulps.
	(check_float): Update call to check_float_internal.
	(check_complex): Update calls to check_float_internal.
	(START): Pass argument to init_max_error.
	* math/gen-libm-test.pl (%results): Don't include "kind"
	information.
	(parse_ulps): Don't handle ulps of individual tests.
	(print_ulps_file): Likewise.
	(output_ulps): Likewise.
	* math/README.libm-test: Update.
	* manual/libm-err-tab.pl (parse_ulps): Don't handle ulps of
	individual tests.
	* sysdeps/aarch64/libm-test-ulps: Remove individual test ulps.
	* sysdeps/alpha/fpu/libm-test-ulps: Likewise.
	* sysdeps/arm/libm-test-ulps: Likewise.
	* sysdeps/i386/fpu/libm-test-ulps: Likewise.
	* sysdeps/ia64/fpu/libm-test-ulps: Likewise.
	* sysdeps/m68k/coldfire/fpu/libm-test-ulps: Likewise.
	* sysdeps/m68k/m680x0/fpu/libm-test-ulps: Likewise.
	* sysdeps/microblaze/libm-test-ulps: Likewise.
	* sysdeps/mips/mips32/libm-test-ulps: Likewise.
	* sysdeps/mips/mips64/libm-test-ulps: Likewise.
	* sysdeps/powerpc/fpu/libm-test-ulps: Likewise.
	* sysdeps/powerpc/nofpu/libm-test-ulps: Likewise.
	* sysdeps/s390/fpu/libm-test-ulps: Likewise.
	* sysdeps/sh/libm-test-ulps: Likewise.
	* sysdeps/sparc/fpu/libm-test-ulps: Likewise.
	* sysdeps/tile/libm-test-ulps: Likewise.
	* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.

	* sysdeps/hppa/fpu/libm-test-ulps: Remove individual test ulps.
2014-03-05 15:02:38 +00:00
David S. Miller ae2f53e1eb Adjust sparc ULPs.
* sysdeps/sparc/fpu/libm-test-ulps: Update for some 64-bit differences from
	32-bit.
2014-02-04 20:54:58 -08:00
David S. Miller 6e697ff746 Rebuild sparc ULPs.
* sysdeps/sparc/fpu/libm-test-ulps: Regenerate.
2014-01-24 18:32:53 -08:00
David S. Miller 3bf78b7174 Update sparc ULPs.
* sysdeps/sparc/fpu/libm-test-ulps: Update.
2013-10-10 14:42:58 -07:00
David S. Miller da2d62df77 Update Sparc ULPs.
* sysdeps/sparc/fpu/libm-test-ulps: Update ULPs to handle minor
	difference between 32-bit and 64-bit.
2013-07-20 16:47:28 -07:00