mirror of git://sourceware.org/git/glibc.git
The optimized i386 version is faster than the generic one, and gcc implements it through the builtin. This optimization enables us to migrate the implementation to a C version. The performance on a Zen3 chip is similar to the SVID one. The m68k provided an optimized version through __m81_u(remainderf) (mathimpl.h), and gcc does not implement it through a builtin (different than i386). Performance improves a bit on x86_64 (Zen3, gcc 15.2.1): reciprocal-throughput input master NO-SVID improvement x86_64 subnormals 18.8522 16.2506 13.80% x86_64 normal 421.8260 403.9270 4.24% x86_64 close-exponent 21.0579 18.7642 10.89% i686 subnormals 21.3443 21.4229 -0.37% i686 normal 525.8380 538.807 -2.47% i686 close-exponent 21.6589 21.7983 -0.64% Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> |
||
|---|---|---|
| .. | ||
| bits | ||
| coldfire | ||
| fpu | ||
| m680x0 | ||
| nptl | ||
| sys | ||
| Implies | ||
| Makefile | ||
| Versions | ||
| __longjmp.c | ||
| abort-instr.h | ||
| asm-syntax.h | ||
| backtrace.c | ||
| bsd-_setjmp.c | ||
| bsd-setjmp.c | ||
| configure | ||
| configure.ac | ||
| crti.S | ||
| crtn.S | ||
| dl-machine.h | ||
| dl-tls.h | ||
| dl-trampoline.S | ||
| elf-initfini.h | ||
| fpu_control.h | ||
| gccframe.h | ||
| jmpbuf-unwind.h | ||
| ldsodefs.h | ||
| libc-tls.c | ||
| math-use-builtins-ffs.h | ||
| memchr.S | ||
| memcopy.h | ||
| preconfigure | ||
| preconfigure.ac | ||
| rawmemchr.S | ||
| setjmp.c | ||
| shlib-versions | ||
| sotruss-lib.c | ||
| stackinfo.h | ||
| start.S | ||
| strchr.S | ||
| strchrnul.S | ||
| symbol-hacks.h | ||
| sysdep.h | ||
| thread_pointer.h | ||
| tst-audit.h | ||
| unwind-arch.h | ||
| utmp-size.h | ||
| wcpcpy_chk.c | ||
| wordcopy.c | ||