This patch addresses the actual cause of CVE-2025-5745
The vector non-volatile registers are not used anymore for
32 byte load and comparison operation
Additionally, the assembler workaround used earlier for the
instruction lxvp is replaced with actual instruction.
Signed-off-by: Sachin Monga <smonga@linux.ibm.com>
Co-authored-by: Paul Murphy <paumurph@redhat.com>
(cherry picked from commit 2ea943f7d4)
This patch addresses the actual cause of CVE-2025-5702
The vector non-volatile registers are not used anymore for
32 byte load and comparison operation
Additionally, the assembler workaround used earlier for the
instruction lxvp is replaced with actual instruction.
Signed-off-by: Sachin Monga <smonga@linux.ibm.com>
Co-authored-by: Paul Murphy <paumurph@redhat.com>
(cherry picked from commit 9a40b1cda5)
powf:
Update scalar special case function to best use new interface.
pow:
Make specialcase NOINLINE to prevent str/ldr leaking in fast path.
Remove depency in sv_call2, as new callback impl is not a
performance gain.
Replace with vectorised specialcase since structure of scalar
routine is fairly simple.
Throughput gain of about 5-10% on V1 for large values and 25% for subnormal `x`.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit bb6519de1e)
Fixed svld1rq using incorrect predicates (BZ #33642).
Next to no performance variations (tested on V1).
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit e889160273)
Previously presence of special-cases in one lane could affect the
results in other lanes due to unconditional scalar fallback. The old
WANT_SIMD_EXCEPT option (which has never been enabled in libmvec) has
been removed from AOR, making it easier to spot and fix
this. No measured change in performance. This patch applies cleanly as
far back as 2.41, however there are conflicts with 2.40 where sinh was
first introduced.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit e45af510bc)
Previously presence of special-cases in one lane could affect the
results in other lanes due to unconditional scalar fallback. The old
WANT_SIMD_EXCEPT option (which has never been enabled in libmvec) has
been removed from AOR, making it easier to spot and fix this. 4%
improvement in throughput with GCC 14 on Neoverse V1. This bug is
present as far back as 2.39 (where tan was first introduced).
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit 6c22823da5)
Instead of using SVE instructions to marshall special results into the
correct lane, just write the entire vector (and the predicate) to
memory, then use cheaper scalar operations.
Geomean speedup of 16% in special intervals on Neoverse with GCC 14.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit 5b82fb1882)
Use the correct include for the SIGCHLD macro: signal.h
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit a9c426bcca)
(cherry picked from commit 17c3eab387)
If the toolchain does not default to generate BTI markers in GCC,
the main program for conform runtime tests will not have the
BTI marker that -Wl,-z,force-bti requires. Without -Wl,-z,force-bti,
the link editor will not tell the dynamic linker to enable BTI,
and the missing BTI marker is harmless.
Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>
This commit adds tests for the following use cases relevant to handing of
the SME state:
- fork() and vfork()
- clone() and clone3()
- signal handler
While most cases are trivial, the case of clone3() is more complicated since
the clone3() symbol is not public in Glibc.
To avoid having to check all possible ways clone3() may be called via other
public functions (e.g. vfork() or pthread_create()), we put together a test
that links directly with clone3.o. All the existing functions that have calls
to clone3() may not actually use it, in which case the outcome of such tests
would be unexpected. Having a direct call to the clone3() symbol in the test
allows to check precisely what we need to test: that the __arm_za_disable()
function is indeed called and has the desired effect.
Linking to clone3.o also requires linking to __arm_za_disable.o that in
turn requires the _dl_hwcap2 hidden symbol which to provide in the test
and initialise it before using.
Co-authored-by: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit ecb0fc2f0f)
(cherry picked from commit 71874f167a)
This change adds a call to the __arm_za_disable() function immediately
before the SVC instruction inside clone() and clone3() wrappers. It also
adds a macro for inline clone() used in fork() and adds the same call to
the vfork implementation. This sets the ZA state of SME to "off" on return
from these functions (for both the child and the parent).
The __arm_za_disable() function is described in [1] (8.1.3). Note that
the internal Glibc name for this function is __libc_arm_za_disable().
When this change was originally proposed [2,3], it generated a long
discussion where several questions and concerns were raised. Here we
will address these concerns and explain why this change is useful and,
in fact, necessary.
In a nutshell, a C library that conforms to the AAPCS64 spec [1] (pertinent
to this change, mainly, the chapters 6.2 and 6.6), should have a call to the
__arm_za_disable() function in clone() and clone3() wrappers. The following
explains in detail why this is the case.
When we consider using the __arm_za_disable() function inside the clone()
and clone3() libc wrappers, we talk about the C library subroutines clone()
and clone3() rather than the syscalls with similar names. In the current
version of Glibc, clone() is public and clone3() is private, but it being
private is not pertinent to this discussion.
We will begin with stating that this change is NOT a bug fix for something
in the kernel. The requirement to call __arm_za_disable() does NOT come from
the kernel. It also is NOT needed to satisfy a contract between the kernel
and userspace. This is why it is not for the kernel documentation to describe
this requirement. This requirement is instead needed to satisfy a pure userspace
scheme outlined in [1] and to make sure that software that uses Glibc (or any
other C library that has correct handling of SME states (see below)) conforms
to [1] without having to unnecessarily become SME-aware thus losing portability.
To recap (see [1] (6.2)), SME extension defines SME state which is part of
processor state. Part of this SME state is ZA state that is necessary to
manage ZA storage register in the context of the ZA lazy saving scheme [1]
(6.6). This scheme exists because it would be challenging to handle ZA
storage of SME in either callee-saved or caller-saved manner.
There are 3 kinds of ZA state that are defined in terms of the PSTATE.ZA
bit and the TPIDR2_EL0 register (see [1] (6.6.3)):
- "off": PSTATE.ZA == 0
- "active": PSTATE.ZA == 1 TPIDR2_EL0 == null
- "dormant": PSTATE.ZA == 1 TPIDR2_EL0 != null
As [1] (6.7.2) outlines, every subroutine has exactly one SME-interface
depending on the permitted ZA-states on entry and on normal return from
a call to this subroutine. Callers of a subroutine must know and respect
the ZA-interface of the subroutines they are using. Using a subroutine
in a way that is not permitted by its ZA-interface is undefined behaviour.
In particular, clone() and clone3() (the C library functions) have the
ZA-private interface. This means that the permitted ZA-states on entry
are "off" and "dormant" and that the permitted states on return are "off"
or "dormant" (but if and only if it was "dormant" on entry).
This means that both functions in question should correctly handle both
"off" and "dormant" ZA-states on entry. The conforming states on return
are "off" and "dormant" (if inbound state was already "dormant").
This change ensures that the ZA-state on return is always "off". Note,
that, in the context of clone() and clone3(), "on return" means a point
when execution resumes at certain address after transferring from clone()
or clone3(). For the caller (we may refer to it as "parent") this is the
return address in the link register where the RET instruction jumps. For
the "child", this is the target branch address.
So, the "off" state on return is permitted and conformant. Why can't we
retain the "dormant" state? In theory, we can, but we shouldn't, here is
why.
Every subroutine with a private-ZA interface, including clone() and clone3(),
must comply with the lazy saving scheme [1] (6.7.2). This puts additional
responsibility on a subroutine if ZA-state on return is "dormant" because
this state has special meaning. The "caller" (that is the place in code
where execution is transferred to, so this include both "parent" and "child")
may check the ZA-state and use it as per the spec of the "dormant" state that
is outlined in [1] (6.6.6 and 6.6.7).
Conforming to this would require more code inside of clone() and clone3()
which hardly is desirable.
For the return to "parent" this could be achieved in theory, but given that
neither clone() nor clone3() are supposed to be used in the middle of an
SME operation, if wouldn't be useful. For the "return" to "child" this
would be particularly difficult to achieve given the complexity of these
functions and their interfaces. Most importantly, it would be illegal
and somewhat meaningless to allow a "child" to start execution in the
"dormant" ZA-state because the very essence of the "dormant" state implies
that there is a place to return and that there is some outer context that
we are allowed to interact with.
To sum up, calling __arm_za_disable() to ensure the "off" ZA-state when the
execution resumes after a call to clone() or clone3() is correct and also
the most simple way to conform to [1].
Can there be situations when we can avoid calling __arm_za_disable()?
Calling __arm_za_disable() implies certain (sufficiently small) overhead,
so one might rightly ponder avoiding making a call to this function when
we can afford not to. The most trivial cases like this (e.g. when the
calling thread doesn't have access to SME or to the TPIDR2_EL0 register)
are already handled by this function (see [1] (8.1.3 and 8.1.2)). Reasoning
about other possible use cases would require making code inside clone() and
clone3() more complicated and it would defeat the point of trying to make
an optimisation of not calling __arm_za_disable().
Why can't the kernel do this instead?
The handling of SME state by the kernel is described in [4]. In short,
kernel must not impose a specific ZA-interface onto a userspace function.
Interaction with the kernel happens (among other thing) via system calls.
In Glibc many of the system calls (notably, including SYS_clone and
SYS_clone3) are used via wrappers, and the kernel has no control of them
and, moreover, it cannot dictate how these wrappers should behave because
it is simply outside of the kernel's remit.
However, in certain cases, the kernel may ensure that a "child" doesn't
start in an incorrect state. This is what is done by the recent change
included in 6.16 kernel [5]. This is not enough to ensure that code that
uses clone() and clone3() function conforms to [1] when it runs on a
system that provides SME, hence this change.
[1]: https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst
[2]: https://inbox.sourceware.org/libc-alpha/20250522114828.2291047-1-yury.khrustalev@arm.com
[3]: https://inbox.sourceware.org/libc-alpha/20250609121407.3316070-1-yury.khrustalev@arm.com
[4]: https://www.kernel.org/doc/html/v6.16/arch/arm64/sme.html
[5]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cde5c32db55740659fca6d56c09b88800d88fd29
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 27effb3d50)
(cherry picked from commit 256030b984)
A common sequence of instructions is used in several places
in assembly files, so define it in one place as an assembly
macro.
Note that PAC instructions are not included in the new macro
because they are redundant given how we call the arm_za_disable
function (return address is not saved on stack, so no need to
sign it).
(based on commits 6de12fc9ad
and c0f0db2d59)
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Add test that checks that ZA state is disabled after setjmp and sigsetjmp
Update existing SME test that uses setjmp
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit 251f932624)
Due to the nature of the ZA state, setjmp() should clear it in the
same manner as it is already done by longjmp.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit a7f6fd976c)
The ifunc selector for wmemset had a stray '!' in the
X86_ISA_CPU_FEATURES_ARCH_P(...) check:
if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2)
&& X86_ISA_CPU_FEATURES_ARCH_P (cpu_features,
AVX_Fast_Unaligned_Load, !))
This effectively negated the predicate and caused the AVX2/AVX512
paths to be skipped, making the dispatcher fall back to the SSE2
implementation even on CPUs where AVX2/AVX512 are available. The
regression leads to noticeable throughput loss for wmemset.
Remove the stray '!' so the AVX_Fast_Unaligned_Load capability is
tested as intended and the correct AVX2/EVEX variants are selected.
Impact:
- On AVX2/AVX512-capable x86_64, wmemset no longer incorrectly
falls back to SSE2; perf now shows __wmemset_evex/avx2 variants.
Testing:
- benchtests/bench-wmemset shows improved bandwidth across sizes.
- perf confirm the selected symbol is no longer SSE2.
Signed-off-by: xiejiamei <xiejiamei@hygon.com>
Signed-off-by: Li jing <lijing@hygon.cn>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 4d86b6cdd8)
The break statement in CHECK_MERGE is expected to exit the surrounding
while loop, not the do-while loop with in the macro. Remove the
do-while loop from the macro. It is not needed to turn the macro
expansion into a single statement due to the way CHECK_MERGE is used
(and the statement expression would cover this anyway).
Reviewed-by: Collin Funk <collin.funk1@gmail.com>
(cherry picked from commit 0fceed2545)
Fix a bug in predicate logic introduced in last change.
A slight performance improvement from relying on all true
predicates during conversion from single to double.
This fixes BZ #33299.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit aac077645a)
These variables are not exported, and libc.so TLS is initial-exec
anyway. Declare these variables as hidden and use the initial-exec
TLS model.
Reviewed-by: Frédéric Bérat <fberat@redhat.com>
(cherry picked from commit a894f04d87)
On i386, programs and shared libraries with __thread usage may fail
silently at run-time against glibc without the TLS run-time fix for:
https://sourceware.org/bugzilla/show_bug.cgi?id=32996
Add GLIBC_ABI_GNU_TLS version to indicate that glibc has the working
GNU TLS run-time. Linker can add the GLIBC_ABI_GNU_TLS version to
binaries which depend on the working TLS run-time so that such programs
and shared libraries will fail to load and run at run-time against
libc.so without the GLIBC_ABI_GNU_TLS version, instead of fail silently
at random.
This fixes BZ #33221.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
(cherry picked from commit ed1b7a5a48)
Compiler generates the following instruction sequence for dynamic TLS
access:
leal tls_var@tlsgd(,%ebx,1), %eax
call ___tls_get_addr@PLT
CALL instruction is transparent to compiler which assumes all registers,
except for EFLAGS, AX, CX, and DX, are unchanged after CALL. But
___tls_get_addr is a normal function which doesn't preserve any vector
registers.
1. Rename the generic __tls_get_addr function to ___tls_get_addr_internal.
2. Change ___tls_get_addr to a wrapper function with implementations for
FNSAVE, FXSAVE, XSAVE and XSAVEC to save and restore all vector registers.
3. dl-tlsdesc-dynamic.h has:
_dl_tlsdesc_dynamic:
/* Like all TLS resolvers, preserve call-clobbered registers.
We need two scratch regs anyway. */
subl $32, %esp
cfi_adjust_cfa_offset (32)
It is wrong to use
movl %ebx, -28(%esp)
movl %esp, %ebx
cfi_def_cfa_register(%ebx)
...
mov %ebx, %esp
cfi_def_cfa_register(%esp)
movl -28(%esp), %ebx
to preserve EBX on stack. Fix it with:
movl %ebx, 28(%esp)
movl %esp, %ebx
cfi_def_cfa_register(%ebx)
...
mov %ebx, %esp
cfi_def_cfa_register(%esp)
movl 28(%esp), %ebx
4. Update _dl_tlsdesc_dynamic to call ___tls_get_addr_internal directly.
5. Add have-test-mtls-traditional to compile tst-tls23-mod.c with
traditional TLS variant to verify the fix.
6. Define DL_RUNTIME_RESOLVE_REALIGN_STACK in sysdeps/x86/sysdep.h.
This fixes BZ #32996.
Co-Authored-By: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 848f0e46f0)
This call must not complete initialization of all shared objects
in the global scope because the ELF constructor which makes the call
likely has not finished initialization. Calling more constructors
at this point would expose those to a partially constructed
dependency.
This completes the revert of commit 9897ced8e7
("elf: Run constructors on cyclic recursive dlopen (bug 31986)").
(cherry picked from commit d604f9c500)
This was found through code inspection. No application impact is
known.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 46d3198094)
When the linker -z mark-plt option is used to add DT_X86_64_PLT,
DT_X86_64_PLTSZ and DT_X86_64_PLTENT, the r_addend field of the
R_X86_64_JUMP_SLOT relocation stores the offset of the indirect
branch instruction. However, glibc versions without the commit:
commit f8587a6189
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Fri May 20 19:21:48 2022 -0700
x86-64: Ignore r_addend for R_X86_64_GLOB_DAT/R_X86_64_JUMP_SLOT
According to x86-64 psABI, r_addend should be ignored for R_X86_64_GLOB_DAT
and R_X86_64_JUMP_SLOT. Since linkers always set their r_addends to 0, we
can ignore their r_addends.
Reviewed-by: Fangrui Song <maskray@google.com>
won't ignore the r_addend value in the R_X86_64_JUMP_SLOT relocation.
Such programs and shared libraries will fail at run-time randomly.
Add GLIBC_ABI_DT_X86_64_PLT version to indicate that glibc is compatible
with DT_X86_64_PLT.
The linker can add the glibc GLIBC_ABI_DT_X86_64_PLT version dependency
whenever -z mark-plt is passed to the linker. The resulting programs and
shared libraries will fail to load at run-time against libc.so without the
GLIBC_ABI_DT_X86_64_PLT version, instead of fail randomly.
This fixes BZ #33212.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
(cherry picked from commit 399384e0c8)
Programs and shared libraries compiled with -mtls-dialect=gnu2 may fail
silently at run-time against glibc without the GNU2 TLS run-time fix
for:
https://sourceware.org/bugzilla/show_bug.cgi?id=31372
Add GLIBC_ABI_GNU2_TLS version to indicate that glibc has the working
GNU2 TLS run-time. Linker can add the GLIBC_ABI_GNU2_TLS version to
binaries which depend on the working GNU2 TLS run-time:
https://sourceware.org/bugzilla/show_bug.cgi?id=33130
so that such programs and shared libraries will fail to load and run at
run-time against libc.so without the GLIBC_ABI_GNU2_TLS version, instead
of fail silently at random.
This fixes BZ #33129.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
(cherry picked from commit 9df8fa397d)
This ensures that the compiler will not inline it, so that
debuggers which do not use the Systemtap probes can reliably
set a breakpoint on it.
Reviewed-by: Andreas K. Huettel <dilfridge@gentoo.org>
Tested-by: Andreas K. Huettel <dilfridge@gentoo.org>
(cherry picked from commit 620f0730f3)
The changes in commit a93d9e03a3
("Extend struct r_debug to support multiple namespaces [BZ #15971]")
break the dyninst dynamic instrumentation tool. It brings its
own definition of _r_debug (rather than a declaration).
Furthermore, it turns out it is rather hard to use the proposed
handshake for accessing _r_debug via DT_DEBUG. If applications want
to access _r_debug, they can do so directly if the relevant code has
been built as PIC. To protect against harm from accidental copy
relocations due to linker relaxations, this commit restores copy
relocation support by adjusting both copies if interposition or
copy relocations are in play. Therefore, it is possible to
use a hidden reference in ld.so to access _r_debug.
Only perform the copy relocation initialization if libc has been
loaded. Otherwise, the ld.so search scope can be empty, and the
lookup of the _r_debug symbol mail fail.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit ea85e7d550)
It combines updating r_state with the debugger notification.
The second change to _dl_open introduces an additional debugger
notification for dlmopen, but debuggers are expected to ignore it.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 8329939a37)
It replaces the ns_debug member of the namespaces. Previously,
the base namespace had an unused ns_debug member.
This change also fixes a concurrency issue: Now _dl_debug_initialize
only updates r_next of the previous namespace's r_debug after the new
r_debug is initialized, so that only the initialized version is
observed. (Client code accessing _r_debug will benefit from load
dependency tracking in CPUs even without explicit barriers.)
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 7278d11f3a)
Commit 10a66a8e42 ("Remove <libc-tsd.h>") removed the TLS initial-exec
(IE) model attribute from the __libc_tsd_CTYPE_* thread variable declarations
and definitions. Commit a894f04d87 ("Optimize __libc_tsd_* thread
variable access") restored it on declarations.
Restore the TLS initial-exec model attribute on __libc_tsd_CTYPE_* thread
variable definitions.
This resolves test tst-locale1 failure on s390 32-bit, when using a
GNU linker without the fix from GNU binutils commit aefebe82dc89
("IBM zSystems: Fix offset relative to static TLS").
Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit e5363e6f46)
This ensures that the ctype data pointers in TLS are valid
in secondary namespaces even without initialization via
__ctype_init.
Reviewed-by: Frédéric Bérat <fberat@redhat.com>
(cherry picked from commit 2745db8dd3)
The existing initializers already contain explicit casts. Keep them
due to int/uint32_t mismatch.
Reviewed-by: Frédéric Bérat <fberat@redhat.com>
(cherry picked from commit e0c0f856f5)
Use __thread variables directly instead. The macros do not save any
typing. It seems unlikely that a future port will lack __thread
variable support.
Some of the __libc_tsd_* variables are referenced from assembler
files, so keep their names. Previously, <libc-tls.h> included
<tls.h>, which in turn included <errno.h>, so a few direct includes
of <errno.h> are now required.
Reviewed-by: Frédéric Bérat <fberat@redhat.com>
(cherry picked from commit 10a66a8e42)
Improve codegen by packing coefficients.
4% and 2% improvement in throughput microbenchmark on Neoverse V1, for acosh
and atanh respectively.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit 6849c5b791)
Reworke SVE FP64 hyperbolics to use the SVE FEXPA
instruction.
Also update the special case handelling for large
inputs to be entirely vectorised.
Performance improvements on Neoverse V1:
cosh_sve: 19% for |x| < 709, 5x otherwise
sinh_sve: 24% for |x| < 709, 5.9x otherwise
tanh_sve: 12% for |x| < 19, 9x otherwise
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit dee22d2a81)
Improve performance of SVE exps by making better use
of the SVE FEXPA instruction.
Performance improvement on Neoverse V1:
exp2_sve: 21%
exp2f_sve: 24%
exp10f_sve: 23%
expm1_sve: 25%
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit 1e3d1ddf97)
Improve performance of Inverse trig functions by altering how coefficients are
loaded.
Performance improvement on Neoverse V1:
SVE acos 14%
AdvSIMD acos 6%
AdvSIMD asin 6%
SVE asin 5%
AdvSIMD asinf 2%
AdvSIMD atanf 22%
SVE atanf 20%
SVE atan 11%
AdvSIMD atan 5%
SVE atan2 7%
SVE atan2f 4%
AdvSIMD atan2f 3%
AdvSIMD atan2 2%
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit 1e84509e00)
Polynomial order was unnecessarily high, unlocking multiple
optimizations.
Max error for new SVE expf is 0.88 +0.5ULP.
Max error for new SVE coshf is 2.56 +0.5ULP.
Performance improvement on Neoverse V1: expf (30%), coshf (26%).
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
(cherry picked from commit cf56eb28fa)
During early startup memcpy or memset must not be called since many targets
use ifuncs for them which won't be initialized yet. Security hardening may
use -ftrivial-auto-var-init=zero which inserts calls to memset. Redirect
memset to memset_generic by including dl-symbol-redir-ifunc.h in cpu-features.c.
This fixes BZ #33112.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 681a24ae4d)
The SYSCALL_CANCEL calls __syscall_cancel, which in turn
calls __internal_syscall_cancel with an 'int' return instead of the
expected 'long int'. This causes issues with syscalls that return
values larger than INT_MAX, such as copy_file_range [1].
Checked on x86_64-linux-gnu.
[1] https://debbugs.gnu.org/cgi/bugreport.cgi?bug=79139
Reviewed-by: Andreas K. Huettel <dilfridge@gentoo.org>
(cherry picked from commit 7107bebf19)
Detect if ld.so not contiguous and handle that case in _dl_find_object.
Set l_find_object_processed even for initially loaded link maps,
otherwise dlopen of an initially loaded object adds it to
_dlfo_loaded_mappings (where maps are expected to be contiguous),
in addition to _dlfo_nodelete_mappings.
Test elf/tst-link-map-contiguous-ldso iterates over the loader
image, reading every word to make sure memory is actually mapped.
It only does that if the l_contiguous flag is set for the link map.
Otherwise, it finds gaps with mmap and checks that _dl_find_object
does not return the ld.so mapping for them.
The test elf/tst-link-map-contiguous-main does the same thing for
the libc.so shared object. This only works if the kernel loaded
the main program because the glibc dynamic loader may fill
the gaps with PROT_NONE mappings in some cases, making it contiguous,
but accesses to individual words may still fault.
Test elf/tst-link-map-contiguous-libc is again slightly different
because the dynamic loader always fills the gaps with PROT_NONE
mappings, so a different form of probing has to be used.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 20681be149)
Remove historic binutils reference from comment and update
how this data is used by applications.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 2cac9559e0)
This reduces code size and dependencies on ld.so internals from
libc.so.
Fixes commit f4c142bb9f
("arm: Use _dl_find_object on __gnu_Unwind_Find_exidx (BZ 31405)").
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 96429bcc91)
The __abort_fork_reset_child (introduced in
d40ac01cbb) call resets the lock after the
fork. This causes a DRD regression in valgrind
(https://bugs.kde.org/show_bug.cgi?id=503668), as it's effectively a
double initialization, despite it being actually ok in this case. As
suggested in https://sourceware.org/bugzilla/show_bug.cgi?id=32994#c2
we replace it here with a memcpy of another initialized lock instead,
which makes valgrind happy.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit d9a348d092)
The mistake is that open must use 0666 to pick up the umask,
and not 0777 (which is required by mkdir).
Fixes commit 8ef3cff9d1
("iconv: Support in-place conversions (bug 10460, bug 32033)").
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit cdcf24ee14)