clang might generate an abort call when cleanup functions (set by
__attribute__ ((cleanup)) calls functions not marked as nothrow.
The hurd already provides abort for the loader at
sysdeps/mach/hurd/dl-sysdep.c, and adding it rtld-stubbed-symbols
triggers duplicate symbols.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
It improves latency for about 1.5% and throughput for about 2-4%.
Tested on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
It improves latency for about 3-6% and throughput for about 5-12%.
Tested on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
i386 and m68k architectures should use math-use-builtins-sqrt.h rather
than relying on architecture-specific or inline assembly implementations.
The PowerPC optimization for PPC 601/603 (30 years old) is removed.
Tested on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
It improves latency for about 3-10% and throughput for about 5-15%.
Tested on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
The optimized i386 version is faster than the generic one, and
gcc implements it through the builtin. This optimization enables
us to migrate the implementation to a C version. The performance
on a Zen3 chip is similar to the SVID one.
The m68k provided an optimized version through __m81_u(remainderf)
(mathimpl.h), and gcc does not implement it through a builtin
(different than i386).
Performance improves a bit on x86_64 (Zen3, gcc 15.2.1):
reciprocal-throughput input master NO-SVID improvement
x86_64 subnormals 18.8522 16.2506 13.80%
x86_64 normal 421.8260 403.9270 4.24%
x86_64 close-exponent 21.0579 18.7642 10.89%
i686 subnormals 21.3443 21.4229 -0.37%
i686 normal 525.8380 538.807 -2.47%
i686 close-exponent 21.6589 21.7983 -0.64%
Tested on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
The optimized i386 version is faster than the generic one, and gcc
implements it through the builtin. This optimization enables us to
migrate the implementation to a C version. The performance on a Zen3
chip is similar to the SVID one.
The m68k provided an optimized version through __m81_u(remainderf)
(mathimpl.h), and gcc does not implement it through a builtin (different
than i386).
Performance improves a bit on x86_64 (Zen3, gcc 15.2.1):
reciprocal-throughput input master NO-SVID improvement
x86_64 subnormals 17.5349 15.6125 10.96%
x86_64 normal 53.8134 52.5754 2.30%
x86_64 close-exponent 20.0211 18.6656 6.77%
i686 subnormals 21.8105 20.1856 7.45%
i686 normal 73.1945 71.2199 2.70%
i686 close-exponent 22.2141 20.331 8.48%
Tested on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
It improves latency for about 3-10% and throughput for about 5-15%.
Tested on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
It improves latency for about 1-10% and throughput for about 5-10%.
Tested on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
It improves latency for about 3-7% and throughput for about 5-10%.
Tested on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
It improves latency for about 2% and throughput for about 5%.
Tested on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
It improves latency for about 2-10% and throughput for about 5-10%.
Tested on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
It improves latency for about 3-10% and throughput for about 5-10%.
Tested on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
The optimized i386 version is faster than the generic one, and gcc
implements it through the builtin. It allows us to move the
implementation to a C one.
The performance on a Zen3 chip is slight better:
reciprocal-throughput input master no-SVID improvement
i686 subnormals 22.4741 20.1571 10.31%
i686 normal 74.1631 70.3606 5.13%
i686 close-exponent 22.5625 20.2435 10.28%
Tested on i686-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
The optimized i386 version is faster than the generic one, and gcc
implements it through the builtin. It allows us to move the
implementation to a C one. The performance on a Zen3 chip is
similar to the SVID one.
Tested on i686-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
The C2y function uimaxabs has been renamed to umaxabs. Implement this
change in glibc, keeping a compat symbol under the old name, copying
the test to test the new name and changing the old test to test the
compat symbol. Jakub has done the corresponding change to the
built-in function in GCC.
Tested for x86_64 and x86.
The constant should be used with c_cc, which for all supported ABIs
is defined as unsigned char. By using it as literar char constant,
clang triggers an error when compared with signal literal on ABIs that
define 'char' as unsigned.
On aarch64, clang shows:
../sysdeps/posix/fpathconf.c:118:21: error: right side of operator
converted from negative value to unsigned: -1 to 18446744073709551615
[-Werror]
#if _POSIX_VDISABLE == -1
~~~~~~~~~~~~~~~ ^ ~~
Reviewed-by: Collin Funk <collin.funk1@gmail.com>
Add the C23 memalignment function (query the alignment of a pointer)
to glibc.
Given how simple this operation is, it would make sense for compilers
to inline calls to this function, but I'm treating that as a compiler
matter (compilers should add it as a built-in function) rather than
adding an inline version to glibc headers (although such an inline
version would be reasonable as well). I've filed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122117 for this feature
in GCC.
Tested for x86_64 and x86.
Add the C23 memset_explicit function to glibc. Everything here is
closely based on the approach taken for explicit_bzero. This includes
the bits that relate to internal uses of explicit_bzero within glibc
(although we don't currently have any such internal uses of
memset_explicit), and also includes the nonnull attribute (when we
move to nonnull_if_nonzero for various functions following C2y, this
function should be included in that change).
The function is declared both for __USE_MISC and for __GLIBC_USE (ISOC23)
(so by default not just for compilers defaulting to C23 mode).
Tested for x86_64 and x86.
Check for VM limit RPCs
* config.h.in: add #undef for HAVE_MACH_VM_GET_SIZE_LIMIT and
HAVE_MACH_VM_SET_SIZE_LIMIT.
* sysdeps/mach/configure.ac: use mach_RPC_CHECK to check for
vm_set_size_limit and vm_get_size_limit RPCs in gnumach.defs.
* sysdeps/mach/configure: regenerate file.
Use vm_get_size_limit to initialize RLIMIT_AS
* hurd/hurdrlimit.c(init_rlimit): use vm_get_size_limit to initialize
RLIMIT_AS entry of the _hurd_rlimits array.
Notify the kernel of the new VM size limits
* sysdeps/mach/hurd/setrlimit.c: use the vm_set_size_limit RPC,
if available, to notify the kernel of the new limits. Retry RPC
calls if they were interrupted by a signal.
Message-ID: <03fb90a795b354a366ee73f56f73e6ad22a86cda.1755220108.git.dnietoc@gmail.com>
On stack overflow typically, we may not actually have room on the stack to
trampoline back from the signal handler. We have to detect this before
locking the ss, otherwise the signal thread will be stuck on taking the
ss lock while trying to post SIGSEGV.
This patch replaces _dl_stack_flags global variable by
_dl_stack_prot_flags.
The advantage is that any convertion from p_flags to final used mprotect
flags occurs at loading of p_flags. It avoids repeated spurious
convertions of _dl_stack_flags, for example in allocate_thread_stack.
This modification was suggested in:
https://sourceware.org/pipermail/libc-alpha/2025-March/165537.html
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>