Commit Graph

559 Commits

Author SHA1 Message Date
H.J. Lu 5a4573be6f x86 (__HAVE_FLOAT128): Defined to 0 for Intel SYCL compiler [BZ ]
Intel compiler always defines __INTEL_LLVM_COMPILER.  When SYCL is
enabled by -fsycl, it also defines SYCL_LANGUAGE_VERSION.  Since Intel
SYCL compiler doesn't support _Float128:

https://github.com/intel/llvm/issues/16903

define __HAVE_FLOAT128 to 0 for Intel SYCL compiler.

This fixes BZ .

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
2025-02-20 08:42:37 +08:00
Florian Weimer 9b71570c46 x86: Add missing #include <features.h> to <thread_pointer.h>
It is required for __GNUC_PREREQ.

Reviewed-by: Michael Jeanson <mjeanson@efficios.com>
2025-01-09 19:30:41 +01:00
Florian Weimer 7a3e2e877a Move <thread_pointer.h> to kernel-independent sysdeps directories
Hurd is expected to use the same thread ABI as Linux.

Reviewed-by: Michael Jeanson <mjeanson@efficios.com>
2025-01-09 19:30:16 +01:00
Paul Eggert 2642002380 Update copyright dates with scripts/update-copyrights 2025-01-01 11:22:09 -08:00
Adhemerval Zanella a2b0ff98a0 include/sys/cdefs.h: Add __attribute_optimization_barrier__
Add __attribute_optimization_barrier__ to disable inlining and cloning on a
function.  For Clang, expand it to

__attribute__ ((optnone))

Otherwise, expand it to

__attribute__ ((noinline, clone))

Co-Authored-By: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
2024-12-23 06:28:55 +08:00
Fangrui Song d773aff467 x86: Define __HAVE_FLOAT128 for Clang and use __builtin_*f128 code path
Clang supports __builtin_fabsf128 (despite not supporting _Float128) but
it does not support __builtin_fabsq.  Fallback to back to
`typedef __float128 _Float128;` it clang is used.

Originally developed by Fangrui Song <maskray@google.com>.
Co-Authored-By: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
2024-12-22 16:07:11 +08:00
Adhemerval Zanella 6412d8cc46 x86: Use inhibit_stack_protector on tst-ifunc-isa.h
Co-Authored-By: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
2024-12-22 13:19:12 +08:00
H.J. Lu f5fb9fa011 x86: Include test-flt-eval-method-387 if -mfpmath=387 works
Since Clang doesn't support -mfpmath=387 on x86-64, on x86, include
test-flt-eval-method-387 only if -mfpmath=387 works.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
2024-12-22 12:54:44 +08:00
Florian Weimer 2b1dba3eb3 elf: Introduce is_rtld_link_map
Unconditionally define it to false for static builds.

This avoids the awkward use of weak_extern for _dl_rtld_map
in checks that cannot be possibly true on static builds.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-12-20 15:52:57 +01:00
H.J. Lu a194871b13 sys/platform/x86.h: Do not depend on _Bool definition in C++ mode
Clang does not define _Bool for -std=c++98:

/usr/include/bits/platform/features.h:31:19: error: unknown type name '_Bool'
   31 | static __inline__ _Bool
      |                   ^

Change _Bool to bool to silence clang++ error.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2024-12-18 02:32:27 +08:00
Florian Weimer 61c3450db9 x86: Avoid integer truncation with large cache sizes (bug 32470)
Some hypervisors report 1 TiB L3 cache size.  This results
in some variables incorrectly getting zeroed, causing crashes
in memcpy/memmove because invariants are violated.
2024-12-17 18:49:50 +01:00
H.J. Lu dd413a4d2f Fix sysdeps/x86/fpu/Makefile: Split and sort tests
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2024-12-16 05:57:28 +08:00
H.J. Lu 57a44f27c4 sysdeps/x86/fpu/Makefile: Split and sort tests
Split and sort tests in sysdeps/x86/fpu/Makefile.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2024-12-16 05:51:02 +08:00
Alejandro Colomar 53fcdf5f74 Silence most -Wzero-as-null-pointer-constant diagnostics
Replace 0 by NULL and {0} by {}.

Omit a few cases that aren't so trivial to fix.

Link: <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117059>
Link: <https://software.codidact.com/posts/292718/292759#answer-292759>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
2024-11-25 16:45:59 -03:00
Feifei Wang ca90758b2a x86: Enable non-temporal memset for Hygon processors
This patch uses 'Avoid_Non_Temporal_Memset' flag to access
the non-temporal memset implementation for hygon processors.

Test Results:

hygon1 arch
x86_memset_non_temporal_threshold = 8MB
size                          new performance time / old performance time
1MB                           0.994
4MB                           0.996
8MB                           0.670
16MB                          0.343
32MB                          0.355

hygon2 arch
x86_memset_non_temporal_threshold = 8MB
size                          new performance time / old performance time
1MB                           1
4MB                           1
8MB                           1.312
16MB                          0.822
32MB                          0.830

hygon3 arch
x86_memset_non_temporal_threshold = 8MB
size                          new performance time / old performance time
1MB                           1
4MB                           0.990
8MB                           0.737
16MB                          0.390
32MB                          0.401

For hygon arch with this patch, non-temporal stores can improve
performance by 20% - 65%.

Signed-off-by: Feifei Wang <wangfeifei@hygon.cn>
Reviewed-by: Jing Li <lijing@hygon.cn>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-08-26 10:01:58 -07:00
Feifei Wang d14aecbffc x86: Add cache information support for Hygon processors
Add hygon branch in dl_init_cacheinfo function to initialize
cache size variables for hygon processors. In the meanwhile,
add handle_hygon() function to get cache information.

Signed-off-by: Feifei Wang <wangfeifei@hygon.cn>
Reviewed-by: Jing Li <lijing@hygon.cn>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-08-26 10:01:58 -07:00
Feifei Wang 6b08116b2d x86: Add new architecture type for Hygon processors
Add a new architecture type arch_kind_hygon to spilt Hygon branch
from AMD. This is to facilitate the Hygon processors to make settings
that are suitable for its own characteristics.

Signed-off-by: Feifei Wang <wangfeifei@hygon.cn>
Reviewed-by: Jing Li <lijing@hygon.cn>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-08-26 10:01:58 -07:00
Noah Goldstein f446d90fe6 x86: Add `Avoid_STOSB` tunable to allow NT memset without ERMS
The goal of this flag is to allow targets which don't prefer/have ERMS
to still access the non-temporal memset implementation.

There are 4 cases for tuning memset:
    1) `Avoid_STOSB && Avoid_Non_Temporal_Memset`
        - Memset with temporal stores
    2) `Avoid_STOSB && !Avoid_Non_Temporal_Memset`
        - Memset with temporal/non-temporal stores. Non-temporal path
          goes through `rep stosb` path. We accomplish this by setting
          `x86_rep_stosb_threshold` to
          `x86_memset_non_temporal_threshold`.
    3) `!Avoid_STOSB && Avoid_Non_Temporal_Memset`
        - Memset with temporal stores/`rep stosb`
    3) `!Avoid_STOSB && !Avoid_Non_Temporal_Memset`
        - Memset with temporal stores/`rep stosb`/non-temporal stores.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-08-15 08:19:15 -07:00
Noah Goldstein b93dddfaf4 x86: Use `Avoid_Non_Temporal_Memset` to control non-temporal path
This is just a refactor and there should be no behavioral change from
this commit.

The goal is to make `Avoid_Non_Temporal_Memset` a more universal knob
for controlling whether we use non-temporal memset rather than having
extra logic based on vendor.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-08-15 08:19:15 -07:00
Florian Weimer 7a630f7d33 x86: Tunables may incorrectly set Prefer_PMINUB_for_stringop (bug 32047)
Fixes commit 5bcf6265f2 ("x86:
Disable non-temporal memset on Skylake Server").

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-08-02 18:08:14 +02:00
Florian Weimer 0df48472ff x86: Add missing switch/case fall-through markers to init_cpu_features
The commits introducing these fall-throughs intended them to
happen.

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-08-02 18:08:14 +02:00
Noah Goldstein 5bcf6265f2 x86: Disable non-temporal memset on Skylake Server
The original commit enabling non-temporal memset on Skylake Server had
erroneous benchmarks (actually done on ICX).

Further benchmarks indicate non-temporal stores may in fact by a
regression on Skylake Server.

This commit may be over-cautious in some cases, but should avoid any
regressions for 2.40.

Tested using qemu on all x86_64 cpu arch supported by both qemu +
GLIBC.

Reviewed-by: DJ Delorie <dj@redhat.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-07-16 17:20:18 +08:00
MayShao-oc 9dc645cb56 x86: Set default non_temporal_threshold for Zhaoxin processors
Current 'non_temporal_threshold' set to 'non_temporal_threshold_lowbound'
on Zhaoxin processors without ERMS. The default
'non_temporal_threshold_lowbound' is too small for the KH-40000 and KX-7000
Zhaoxin processors, this patch updates the value to
'shared / cachesize_non_temporal_divisor'.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-06-30 06:26:43 -07:00
MayShao-oc 44d757eb9f x86: Set preferred CPU features on the KH-40000 and KX-7000 Zhaoxin processors
Fix code formatting under the Zhaoxin branch and add comments for
different Zhaoxin models.

Unaligned AVX load are slower on KH-40000 and KX-7000, so disable
the AVX_Fast_Unaligned_Load.

Enable Prefer_No_VZEROUPPER and Fast_Unaligned_Load features to
use sse2_unaligned version of memset,strcpy and strcat.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-06-30 06:26:43 -07:00
Stefan Liebler e260ceb4aa elf: Remove HWCAP_IMPORTANT
Remove the definitions of HWCAP_IMPORTANT after removal of
LD_HWCAP_MASK / tunable glibc.cpu.hwcap_mask.  There HWCAP_IMPORTANT
was used as default value.
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-06-18 10:45:36 +02:00
Stefan Liebler 343439a31e elf: Remove _DL_PLATFORMS_COUNT
Remove the definitions of _DL_PLATFORMS_COUNT as those are not used
anymore after removal in elf/dl-cache.c:search_cache().

Note: On x86, we can also get rid of the definitions
HWCAP_PLATFORMS_START and HWCAP_PLATFORMS_COUNT.
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-06-18 10:45:36 +02:00
Stefan Liebler 43c7c5e62d elf: Remove _DL_FIRST_PLATFORM
Remove the definitions of _DL_FIRST_PLATFORM as those were only used
in the _DL_HWCAP_PLATFORM definitions and in _dl_string_platform().
Both were removed.

Note: Removed on every architecture despite of powerpc, where
_dl_string_platform() is still used.
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-06-18 10:45:36 +02:00
Stefan Liebler ed23449dac elf: Remove _DL_HWCAP_PLATFORM
Remove the definitions of _DL_HWCAP_PLATFORM as those are not used
anymore after removal in elf/dl-cache.c:search_cache().
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-06-18 10:45:36 +02:00
Stefan Liebler 374c8b4483 elf: Remove platform strings in dl-procinfo.c
Remove the platform strings in dl-procinfo.c where also
the implementation of _dl_string_platform() was removed.
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-06-18 10:45:36 +02:00
Stefan Liebler 8faada8302 elf: Remove _dl_string_platform
Despite of powerpc where the returned integer is stored in tcb,
and the diagnostics output, there is no user anymore.

Thus this patch removes the diagnostics output and
_dl_string_platform for all other platforms.
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-06-18 10:45:36 +02:00
Stefan Liebler f14b6dfc87 x86: Remove HWCAP_START and HWCAP_COUNT
Both defines are not used anymore.  Those were only used for
_dl_string_hwcap(), which itself was removed with commit
ab40f20364
"elf: Remove _dl_string_hwcap"

Just clean up.
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-06-18 10:45:36 +02:00
Andreas K. Hüttel 98ffc1bfeb
Convert to autoconf 2.72 (vanilla release, no distribution patches)
As discussed at the patch review meeting

Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
Reviewed-by: Simon Chopin <simon.chopin@canonical.com>
2024-06-17 21:15:28 +02:00
Noah Goldstein 5b54a33435 x86: Fix value for `x86_memset_non_temporal_threshold` when it is undesirable
When we don't want to use non-temporal stores for memset, we set
`x86_memset_non_temporal_threshold` to SIZE_MAX.

The current code, however, we using `maximum_non_temporal_threshold`
as the upper bound which is `SIZE_MAX >> 4` so we ended up with a
value of `0`.

Fix is to just use `SIZE_MAX` as the upper bound for when setting the
tunable.
Tested-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-06-14 17:25:05 -05:00
H.J. Lu 29807a271e x86: Properly set x86 minimum ISA level [BZ ]
Properly set libc_cv_have_x86_isa_level in shell for MINIMUM_X86_ISA_LEVEL
defined as

(__X86_ISA_V1 + __X86_ISA_V2 + __X86_ISA_V3 + __X86_ISA_V4)

Also set __X86_ISA_V2 to 1 for i386 if __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8
is defined.  There are no changes in config.h nor in config.make on x86-64.
On i386, -march=x86-64-v2 with GCC generates

 #define MINIMUM_X86_ISA_LEVEL 2

in config.h and

have-x86-isa-level = 2

in config.make.  This fixes BZ .

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-06-12 14:27:54 -07:00
H.J. Lu 09bc68b0ac x86: Properly set MINIMUM_X86_ISA_LEVEL for i386 [BZ ]
On i386, set the default minimum ISA level to 0, not 1 (baseline which
includes SSE2).  There are no changes in config.h nor in config.make on
x86-64.  This fixes BZ .

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Tested-by: Ian Jordan <immoloism@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2024-06-11 00:10:08 -07:00
Joe Damato bef2a827a5 x86: Enable non-temporal memset tunable for AMD
In commit 46b5e98ef6 ("x86: Add seperate non-temporal tunable for
memset") a tunable threshold for enabling non-temporal memset was added,
but only for Intel hardware.

Since that commit, new benchmark results suggest that non-temporal
memset is beneficial on AMD, as well, so allow this tunable to be set
for AMD.

See:
https://docs.google.com/spreadsheets/d/1opzukzvum4n6-RUVHTGddV6RjAEil4P2uMjjQGLbLcU/edit?usp=sharing
which has been updated to include data using different stategies for
large memset on AMD Zen2, Zen3, and Zen4.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-06-10 16:18:18 -05:00
Noah Goldstein 46b5e98ef6 x86: Add seperate non-temporal tunable for memset
The tuning for non-temporal stores for memset vs memcpy is not always
the same. This includes both the exact value and whether non-temporal
stores are profitable at all for a given arch.

This patch add `x86_memset_non_temporal_threshold`. Currently we
disable non-temporal stores for non Intel vendors as the only
benchmarks showing its benefit have been on Intel hardware.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-05-30 12:36:09 -05:00
Sunil K Pandey 1b713c9a53 i386: Disable Intel Xeon Phi tests for GCC 15 and above (BZ 31782)
This patch disables Intel Xeon Phi tests for GCC 15 and above.

GCC 15 removed Intel Xeon Phi ISA support.
commit e1a7e2c54d52d0ba374735e285b617af44841ace
Author: Haochen Jiang <haochen.jiang@intel.com>
Date:   Mon May 20 10:43:44 2024 +0800

    i386: Remove Xeon Phi ISA support

Fixes BZ 31782.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-05-27 12:28:13 -07:00
Adhemerval Zanella 1e1ad714ee support: Add envp argument to support_capture_subprogram
So tests can specify a list of environment variables.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2024-05-07 12:16:36 -03:00
Florian Weimer b62928f907 x86: In ld.so, diagnose missing APX support in APX-only builds
At this point, this is mainly a tool for testing the early ld.so
CPU compatibility diagnostics: GCC uses the new instructions in most
functions, so it's easy to spot if some of the early code is not
built correctly.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-04-25 17:20:28 +02:00
H.J. Lu 46c9997413 x86: Define MINIMUM_X86_ISA_LEVEL in config.h [BZ ]
Define MINIMUM_X86_ISA_LEVEL at configure time to avoid

/usr/bin/ld: …/build/elf/librtld.os: in function `init_cpu_features':
…/git/elf/../sysdeps/x86/cpu-features.c:1202: undefined reference to `_dl_runtime_resolve_fxsave'
/usr/bin/ld: …/build/elf/librtld.os: relocation R_X86_64_PC32 against undefined hidden symbol `_dl_runtime_resolve_fxsave' can not be used when making a shared object
/usr/bin/ld: final link failed: bad value
collect2: error: ld returned 1 exit status

when glibc is built with -march=x86-64-v3 and configured with
--with-rtld-early-cflags=-march=x86-64, which is used to allow ld.so to
print an error message on unsupported CPUs:

Fatal glibc error: CPU does not support x86-64-v3

This fixes BZ .
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
2024-04-24 04:50:56 -07:00
Florian Weimer 9abdae94c7 login: structs utmp, utmpx, lastlog _TIME_BITS independence (bug 30701)
These structs describe file formats under /var/log, and should not
depend on the definition of _TIME_BITS.  This is achieved by
defining __WORDSIZE_TIME64_COMPAT32 to 1 on 32-bit ports that
support 32-bit time_t values (where __time_t is 32 bits).

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-04-19 14:38:17 +02:00
Florian Weimer 4d4da5aab9 login: Check default sizes of structs utmp, utmpx, lastlog
The default <utmp-size.h> is for ports with a 64-bit time_t.
Ports with a 32-bit time_t or with __WORDSIZE_TIME64_COMPAT32=1
need to override it.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-04-19 14:38:17 +02:00
Florian Weimer 7a430f40c4 x86: Add generic CPUID data dumper to ld.so --list-diagnostics
This is surprisingly difficult to implement if the goal is to produce
reasonably sized output.  With the current approaches to output
compression (suppressing zeros and repeated results between CPUs,
folding ranges of identical subleaves, dealing with the %ecx
reflection issue), the output is less than 600 KiB even for systems
with 256 logical CPUs.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-04-08 16:48:55 +02:00
Adhemerval Zanella 44ccc2465c math: x86 trunc traps when FE_INEXACT is enabled (BZ 31603)
The implementations of trunc functions using x87 floating point (i386 and
x86_64 long double only) traps when FE_INEXACT is enabled.  Although
this is a GNU extension outside the scope of the C standard, other
architectures that also support traps do not show this behavior.

The fix moves the implementation to a common one that holds any
exceptions with a 'fnclex' (libc_feholdexcept_setround_387).

Checked on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-04-04 14:29:28 -03:00
Adhemerval Zanella 932544efa4 math: x86 floor traps when FE_INEXACT is enabled (BZ 31601)
The implementations of floor functions using x87 floating point (i386 and
86_64 long double only) traps when FE_INEXACT is enabled.  Although
this is a GNU extension outside the scope of the C standard, other
architectures that also support traps do not show this behavior.

The fix moves the implementation to a common one that holds any
exceptions with a 'fnclex' (libc_feholdexcept_setround_387).

Checked on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-04-04 14:29:28 -03:00
Adhemerval Zanella 637bfc392f math: x86 ceill traps when FE_INEXACT is enabled (BZ 31600)
The implementations of ceil functions using x87 floating point (i386 and
x86_64 long double only) traps when FE_INEXACT is enabled.  Although
this is a GNU extension outside the scope of the C standard, other
architectures that also support traps do not show this behavior.

The fix moves the implementation to a common one that holds any
exceptions with a 'fnclex' (libc_feholdexcept_setround_387).

Checked on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-04-04 14:29:28 -03:00
H.J. Lu 717ebfa85c x86-64: Allocate state buffer space for RDI, RSI and RBX
_dl_tlsdesc_dynamic preserves RDI, RSI and RBX before realigning stack.
After realigning stack, it saves RCX, RDX, R8, R9, R10 and R11.  Define
TLSDESC_CALL_REGISTER_SAVE_AREA to allocate space for RDI, RSI and RBX
to avoid clobbering saved RDI, RSI and RBX values on stack by xsave to
STATE_SAVE_OFFSET(%rsp).

   +==================+<- stack frame start aligned at 8 or 16 bytes
   |                  |<- RDI saved in the red zone
   |                  |<- RSI saved in the red zone
   |                  |<- RBX saved in the red zone
   |                  |<- paddings for stack realignment of 64 bytes
   |------------------|<- xsave buffer end aligned at 64 bytes
   |                  |<-
   |                  |<-
   |                  |<-
   |------------------|<- xsave buffer start at STATE_SAVE_OFFSET(%rsp)
   |                  |<- 8-byte padding for 64-byte alignment
   |                  |<- 8-byte padding for 64-byte alignment
   |                  |<- R11
   |                  |<- R10
   |                  |<- R9
   |                  |<- R8
   |                  |<- RDX
   |                  |<- RCX
   +==================+<- RSP aligned at 64 bytes

Define TLSDESC_CALL_REGISTER_SAVE_AREA, the total register save area size
for all integer registers by adding 24 to STATE_SAVE_OFFSET since RDI, RSI
and RBX are saved onto stack without adjusting stack pointer first, using
the red-zone.  This fixes BZ .
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
2024-03-18 19:45:13 -07:00
Sunil K Pandey b6e3898194 x86-64: Simplify minimum ISA check ifdef conditional with if
Replace minimum ISA check ifdef conditional with if.  Since
MINIMUM_X86_ISA_LEVEL and AVX_X86_ISA_LEVEL are compile time constants,
compiler will perform constant folding optimization, getting same
results.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-03-03 15:47:53 -08:00
H.J. Lu 9b7091415a x86-64: Update _dl_tlsdesc_dynamic to preserve AMX registers
_dl_tlsdesc_dynamic should also preserve AMX registers which are
caller-saved.  Add X86_XSTATE_TILECFG_ID and X86_XSTATE_TILEDATA_ID
to x86-64 TLSDESC_CALL_STATE_SAVE_MASK.  Compute the AMX state size
and save it in xsave_state_full_size which is only used by
_dl_tlsdesc_dynamic_xsave and _dl_tlsdesc_dynamic_xsavec.  This fixes
the AMX part of BZ .  Tested on AMX processor.

AMX test is enabled only for compilers with the fix for

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114098

GCC 14 and GCC 11/12/13 branches have the bug fix.
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
2024-02-29 04:30:01 -08:00