glibc/sysdeps
H.J. Lu d2cf37c0a2 x86-64: Use _dl_runtime_resolve_opt only with AVX512F [BZ #21871]
On AVX machines with XGETBV (ECX == 1) like Skylake processors,

(gdb) disass _dl_runtime_resolve_avx_opt
Dump of assembler code for function _dl_runtime_resolve_avx_opt:
   0x0000000000015890 <+0>:	push   %rax
   0x0000000000015891 <+1>:	push   %rcx
   0x0000000000015892 <+2>:	push   %rdx
   0x0000000000015893 <+3>:	mov    $0x1,%ecx
   0x0000000000015898 <+8>:	xgetbv
   0x000000000001589b <+11>:	mov    %eax,%r11d
   0x000000000001589e <+14>:	pop    %rdx
   0x000000000001589f <+15>:	pop    %rcx
   0x00000000000158a0 <+16>:	pop    %rax
   0x00000000000158a1 <+17>:	and    $0x4,%r11d
   0x00000000000158a5 <+21>:	bnd je 0x16200 <_dl_runtime_resolve_sse_vex>
End of assembler dump.

is slower than:

(gdb) disass _dl_runtime_resolve_avx_slow
Dump of assembler code for function _dl_runtime_resolve_avx_slow:
   0x0000000000015850 <+0>:	vorpd  %ymm0,%ymm1,%ymm8
   0x0000000000015854 <+4>:	vorpd  %ymm2,%ymm3,%ymm9
   0x0000000000015858 <+8>:	vorpd  %ymm4,%ymm5,%ymm10
   0x000000000001585c <+12>:	vorpd  %ymm6,%ymm7,%ymm11
   0x0000000000015860 <+16>:	vorpd  %ymm8,%ymm9,%ymm9
   0x0000000000015865 <+21>:	vorpd  %ymm10,%ymm11,%ymm10
   0x000000000001586a <+26>:	vpcmpeqd %xmm8,%xmm8,%xmm8
   0x000000000001586f <+31>:	vorpd  %ymm9,%ymm10,%ymm10
   0x0000000000015874 <+36>:	vptest %ymm10,%ymm8
   0x0000000000015879 <+41>:	bnd jae 0x158b0 <_dl_runtime_resolve_avx>
   0x000000000001587c <+44>:	vzeroupper
   0x000000000001587f <+47>:	bnd jmpq 0x16200 <_dl_runtime_resolve_sse_vex>
End of assembler dump.
(gdb)

since xgetbv takes much more cycles than single cycle operations like
vpord/vvpcmpeq/ptest.  _dl_runtime_resolve_opt should be used only with
AVX512 where AVX512 instructions lead to lower CPU frequency on Skylake
server.

	[BZ #21871]
	* sysdeps/x86/cpu-features.c (init_cpu_features): Set
	bit_arch_Use_dl_runtime_resolve_opt only with AVX512F.
2017-08-04 11:14:33 -07:00
..
aarch64
alpha Update Alpha libm-test-ulps 2017-07-27 14:21:28 -03:00
arm [ARM] Fix ld.so crash when built using Binutils 2.29 2017-07-13 15:48:41 +01:00
generic Consistently use uintN_t not u_intN_t in libm. 2017-08-03 19:55:04 +00:00
gnu
hppa Remove extra semicolons in struct pthread_mutex (bug 21804) 2017-07-24 12:22:05 +02:00
i386 x86: Remove __memset_zero_constant_len_parameter [BZ #21790] 2017-08-04 10:56:51 -07:00
ia64
ieee754 Consistently use uintN_t not u_intN_t in libm. 2017-08-03 19:55:04 +00:00
init_array
m68k Consistently use uintN_t not u_intN_t in libm. 2017-08-03 19:55:04 +00:00
mach [hurd]: Add __libc_init_secure stub 2017-08-02 23:29:57 +02:00
microblaze Update Microblaze libm-test-ulps 2017-07-28 09:19:40 -03:00
mips
nios2 Update Nios II ULPs file. 2017-07-28 03:54:35 -07:00
nptl Remove extra semicolons in struct pthread_mutex (bug 21804) 2017-07-24 12:22:05 +02:00
posix getaddrinfo: Release resolver context on error in gethosts [BZ #21885] 2017-08-03 12:33:00 +02:00
powerpc tst-tlsopt-powerpc as a shared lib 2017-08-03 15:39:21 +09:30
pthread Single threaded stdio optimization 2017-07-04 16:05:12 +01:00
s390
sh
sparc Update sparc ulps 2017-07-19 15:56:02 -03:00
tile
unix microblaze: Resolve non-relocatable branch in pt-vfork.S (BZ#21779) 2017-07-28 09:21:14 -03:00
wordsize-32
wordsize-64
x86 x86-64: Use _dl_runtime_resolve_opt only with AVX512F [BZ #21871] 2017-08-04 11:14:33 -07:00
x86_64 x86: Remove __memset_zero_constant_len_parameter [BZ #21790] 2017-08-04 10:56:51 -07:00