glibc/sysdeps
Adhemerval Zanella c055c54e96 x86_64: Optimize modf/modff for x86_64-v2
The SSE4.1 provides a direct instruction for trunc, which improves
modf/modff performance with a less text size.  On Ryzen 9 (zen3) with
gcc 14.2.1:

x86_64-v2
reciprocal-throughput        master        patch       difference
workload-0_1                 7.9610       7.7914            2.13%
workload-1_maxint            9.4323       7.8021           17.28%
workload-maxint_maxfloat     8.7379       7.8049           10.68%
workload-integral            7.9492       7.7991            1.89%

latency                      master        patch       difference
workload-0_1                 7.9511      10.8910          -36.97%
workload-1_maxint           15.8278      10.9048           31.10%
workload-maxint_maxfloat    11.3495      10.9139            3.84%
workload-integral           11.5938      10.9071            5.92%

x86_64-v3
reciprocal-throughput        master        patch       difference
workload-0_1                 8.7522       7.9781            8.84%
workload-1_maxint            9.6690       7.9872           17.39%
workload-maxint_maxfloat     8.7634       7.9857            8.87%
workload-integral            8.7397       7.9893            8.59%

latency                      master        patch       difference
workload-0_1                 8.7447       9.5589           -9.31%
workload-1_maxint           13.7480       9.5690           30.40%
workload-maxint_maxfloat    10.0092       9.5680            4.41%
workload-integral            9.7518       9.5743            1.82%

For x86_64-v1 the optimization is done through a new ifunc selector.
The avx is to follow other SSE4_1 optimization (like trunc) to avoid
the ifunc for x86_64-v3.

Checked on x86_64-linux-gnu.
Tested-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2025-07-11 13:01:31 -03:00
..
aarch64 i386: Update ___tls_get_addr to preserve vector registers 2025-06-19 04:30:31 +08:00
alpha
arc
arm
csky
generic elf: Restore support for _r_debug interpositions and copy relocations 2025-07-05 20:15:12 +02:00
gnu Add TCPI_OPT_USEC_TS from Linux 6.14 and TCPI_OPT_TFO_CHILD from 6.15 to netinet/tcp.h. 2025-06-17 09:57:44 -03:00
hppa
htl htl: move __pthread_get_cleanup_stack to libc 2025-07-06 19:56:15 +00:00
hurd
i386 i386: Update ___tls_get_addr to preserve vector registers 2025-06-19 04:30:31 +08:00
ieee754 powerpc: Remove modff optimization 2025-06-25 15:05:30 -03:00
loongarch i386: Update ___tls_get_addr to preserve vector registers 2025-06-19 04:30:31 +08:00
m68k
mach htl: move __pthread_get_cleanup_stack to libc 2025-07-06 19:56:15 +00:00
microblaze
mips
nptl
or1k
posix stdlib: Fix __libc_message_impl iovec size (BZ 32947) 2025-06-30 13:51:41 -03:00
powerpc powerpc: Remove modf optimization 2025-06-25 15:05:30 -03:00
pthread
riscv
s390
sh
sparc sparc: Fix sparc32 Fix argument passing to __libc_start_main (BZ 32981) 2025-06-18 11:20:34 -03:00
unix Linux: Keep termios ioctl constants strictly internal 2025-07-11 16:04:07 +02:00
wordsize-32
wordsize-64
x86 x86: Avoid vector/r16-r31 registers and memcpy/memset in mcount_internal 2025-07-09 05:33:05 +08:00
x86_64 x86_64: Optimize modf/modff for x86_64-v2 2025-07-11 13:01:31 -03:00