It improves some interception tools such as valgrind, however on
multithread the __syscall_cancel_arch is called.
The result libc.so has a slight larger code size:
ABI master patched diff increase
aarch64 1658673 1669121 10448 0.63%
x86_64 1976656 1985744 9088 0.46%
i686 2233622 2251130 17508 0.78%
powerpc64le 2382448 2396768 14320 0.60%
It mimics internally how cancellable entrypoints were implemented
before 89b53077d2, where cancellation
handlign were done inline in the syscall wraper.
On x86-64 and compiling with -O2 using stdc_leading_zeros compiles to
the bsr instruction. The fls function removed by this patch is inlined
but still loops while checking each bit individually.
* nss/getaddrinfo.c: Include <stdbit.h>.
(fls): Remove function. This function contains a left shift of 31 on an
'int' which is undefined.
(rfc3484_sort): Use stdc_leading_zeros instead of fls.
Signed-off-by: Collin Funk <collin.funk1@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
These routines are not extensively used (gnulib documentation even
recommend use a replacement [1]), and there is already a POWER8
version that uses proper vectorized instructions.
[1] https://www.gnu.org/software/gnulib/manual/gnulib.html#C-strings
Checked with a build for some powerpc variations.
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
Add stubs and partial docs for many undocumented pthreads functions.
While neither exhaustive nor complete, gives minimal usage docs
for many functions and expands the pthreads chapters, making it
easier to continue improving this section in the future.
Reviewed-by: Collin Funk <collin.funk1@gmail.com>
The glibc-hwcaps subdirectories are extended by "z17". Libraries are loaded if
the z17 facility bits are active:
- Miscellaneous-instruction-extensions facility 4
- Vector-enhancements-facility 3
- Vector-Packed-Decimal-Enhancement Facility 3
- CPU: Concurrent-Functions Facility
tst-glibc-hwcaps.c is extended in order to test z17 via new marker6.
In case of running on a z17 with a kernel not recognizing z17 yet,
AT_PLATFORM will be z900 but vector-bit in AT_HWCAP is set. This situation
is now recognized and this testcase does not fail.
A fatal glibc error is dumped if glibc was build with architecture
level set for z17, but run on an older machine (See dl-hwcap-check.h).
Note, you might get an SIGILL before this check if you don't use:
configure --with-rtld-early-cflags=-march=<older-machine>
ld.so --list-diagnostics now also dumps information about s390.cpu_features.
Independent from z17, the s390x kernel won't introduce new HWCAP-Bits if there
is no special handling needed in kernel itself. For z17, we don't have new
HWCAP flags, but have to check the facility bits retrieved by
stfle-instruction.
Instead of storing all the stfle-bits (currently four 64bit values) in the
cpu_features struct, we now only store those bits, which are needed within
glibc itself. Note that we have this list twice, one with original values and
the other one which can be filtered with GLIBC_TUNABLES=glibc.cpu.hwcaps.
Those new fields are stored in so far reserved space in cpu_features struct.
Thus processes started in between the update of glibc package and we e.g. have
a new ld.so and an old libc.so, won't crash. The glibc internal ifunc-resolvers
would not select the best optimized variant.
The users of stfle-bits are also updated:
- parsing of GLIBC_TUNABLES=glibc.cpu.hwcaps
- glibc internal ifunc-resolvers
- __libc_ifunc_impl_list
- sysconf
While working on implementing compoundn, I noticed that
libm-test-pown.inc was wrongly using TEST_ff_f and AUTO_TESTS_ff_f
when the actual types involved meant fL_f should be used instead of
ff_f; fix to use the correct descriptor strings for pown. (These
strings affect how gen-libm-test.py generates a C file in some cases.
The structure type test_fL_f_data for expected results and the use of
RUN_TEST_LOOP_fL_f in the ALL_RM_TEST call were already correct.)
Tested for x86_64. The generated libm-test-pown.c was actually
unchanged, but the old descriptor strings were still logically
incorrect.
Inline tcache_try_malloc into calloc since it is the only caller. Also fix
usize2tidx and use it in __libc_malloc, __libc_calloc and _mid_memalign.
The result is simpler, cleaner code.
Reviewed-by: DJ Delorie <dj@redhat.com>
The left shift overflows for 'int', use uint32_t instead. It syncs
with CORE-MATH commit bbfabd99.
Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
The left shift overflows for 'int', use uint64_t instead. It syncs
with CORE-MATH commit d0a2be200cbc1344d800d9ef0ebee9ad67dd3ad8.
Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
The left shift overflows for 'int', use uint32_t instead. It syncs
with CORE-MATH commit bbfabd993a71b049c210b0febfd06d18369fadc1.
Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
The left shift overflows for 'int64_t', use unsigned instead. It syncs
with CORE-MATH commit f7c7408d1749ec2859ea249495af699359ae559b.
Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
The left shift overflows for 'int', use uint64_t instead. It syncs
with CORE-MATH commit bbfabd99.
Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
The left shift overflows for 'int', use a literal instead. It syncs
with OPTIMIZED-ROUTINES commit 0f87f607b976820ef41fe64d004fe67dc7af8236.
Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
The left shift overflows for 'int', use uint64_t instead. It syncs
with CORE-MATH commit 4d6192d2.
Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
The left shift overflows for 'int', use unsigned instead. It syncs
with CORE-MATH commit 4d6192d2.
Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
The BZ 32653 fix (12a497c716) kept the
stack pointer zeroing from make_main_stack_executable on
_dl_make_stack_executable. However, previously the 'stack_endp'
pointed to temporary variable created before the call of
_dl_map_object_from_fd; while now we use the __libc_stack_end
directly.
Since pthread_getattr_np relies on correct __libc_stack_end, if
_dl_make_stack_executable is called (for instance, when
glibc.rtld.execstack=2 is set) __libc_stack_end will be set to zero,
and the call will always fail.
The __libc_stack_end zero was used a mitigation hardening, but since
52a01100ad it is used solely on
pthread_getattr_np code. So there is no point in zeroing anymore.
Checked on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Sam James <sam@gentoo.org>
Hardware ctz instructions are available in the RISC-V Zbb and XTheadBb extension. With special `-march` flags defined, we can generate more simplified code compared to the generic implementation of `ffs`/`ffsll`.
Signed-off-by: Julian Zhu <julian.oerv@isrc.iscas.ac.cn>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The __printf_fp_buffer_1 issues count_leading_zeros with 0 argument,
which might leads to call __builtin_ctz depending of the ABI.
Replace with stdbit.h function instead.
Checked on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Paul Eggert <eggert@cs.ucla.edu>
This patch changes the shell script that selects which arguments are used
for the execution of bench-malloc-thread.
The problem seems to have been introduced in commit:
commit 2d6427a63c
Author: Wangyang Guo <wangyang.guo@intel.com>
Date: Fri Nov 29 16:05:35 2024 +0800
benchtests: Add calloc test
With current condition, the following error "/bin/sh: 3: [[: not found"
occurs when executing `make bench BENCHSET="malloc-thread"` and the else
path is taken, using incorrect arguments for bench test execution.
Error is reproducible in Debian based distros.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
The <termio.h> interface is absolutely ancient: it was obsoleted by
<termios.h> already in the first version of POSIX (1988) and thus
predates the very first version of Linux. Unfortunately, some constant
macros are used both by <termio.h> and <termios.h>; particularly
problematic is the baud rate constants since the termio interface
*requires* that the baud rate is set via an enumeration as part of
c_cflag.
In preparation of revamping the termios interface to support the
arbitrary baud rate capability that the Linux kernel has supported
since 2008, remove <termio.h> in the hope that no one still uses this
archaic interface.
Note that there is no actual code in glibc to support termio: it is
purely an unabstracted ioctl() interface.
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
"Recent" GCC versions (since commit fc62716fe8d1, backported to stable
branches) emit a vzeroupper instruction at the end of functions
containing AVX instructions. This causes the tst-audit10 test to fail
on CPUs lacking AVX instructions, despite the AVX512F check. The crash
occurs in the pltenter function of tst-auditmod10b.c.
Fix that by moving the code guarded by the check_avx512 function into
specific functions using the target ("avx512f") attribute. Note that
since commit 5359c3bc91 ("x86-64: Remove compiler -mavx512f check") it
is safe to assume that the compiler has AVX512F support, thus the
__AVX512F__ checks can be dropped.
Tested on non-AVX, AVX2 and AVX512F machines.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Linux 6.12 adds AT_RENAME_* aliases for RENAME_* flags for renameat2,
and also AT_HANDLE_MNT_ID_UNIQUE. Add the first set of aliases to
stdio.h alongside the RENAME_* names, and AT_HANDLE_MNT_ID_UNIQUE to
bits/fcntl-linux.h.
Tested for x86_64.
8ef1791950 ("hurd: Fix EINVAL error on linking to a slash-trailing path
[BZ #32569]) made symlink return ENOTDIR, but the gnulib testsuite does
not recognize it for such a situation, and EEXIST is indeed more
comprehensible to users.
This avoids SIGFPE handlers (or code longjmp-ed to) getting disturbed by the
exception that generated it.
Note: gcc's unwinding depends on the rpc_wait_trampoline/trampoline exact
code, so we here avoid breaking it.
If the process has never used fp before getting a signal, xstate is set
(and thus the x87 state is not initialized) but xstate->initialized is still
0, and we should not restore anything.
* hurd/Makefile: add new tests
* hurd/test-sig-rpc-interrupted.c: check xstate save and restore in
the case where a signal is delivered to a thread which is waiting
for an rpc. This test implements the rpc interruption protocol used
by the hurd servers. It was so far passing on Debian thanks to the
local-intr-msg-clobber.diff patch, which is now obsolete.
* hurd/test-sig-xstate.c: check xstate save and restore in the case
where a signal is delivered to a running thread, making sure that
the xstate is modified in the signal handler.
* hurd/test-xstate.h: add helpers to test xstate
* sysdeps/mach/hurd/i386/bits/sigcontext.h: add xstate to the
sigcontext structure.
+ sysdeps/mach/hurd/i386/sigreturn.c: restore xstate from the saved
context
* sysdeps/mach/hurd/x86/trampoline.c: save xstate if
supported. Otherwise we fall back to the previous behaviour of
ignoring xstate.
* sysdeps/mach/hurd/x86_64/bits/sigcontext.h: add xstate to the
sigcontext structure.
* sysdeps/mach/hurd/x86_64/sigreturn.c: restore xstate from the saved
context
Signed-off-by: Luca Dariz <luca@orpolo.org>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Message-ID: <20250319171118.142163-1-luca@orpolo.org>
This patch moves any calls of tcache_init away after tcache hot paths.
Since there is no reason to initialize tcaches in the hot path and since
we need to be able to check tcache != NULL in any case, because of
tcache_thread_shutdown function, moving tcache_init away from hot path
can only be beneficial.
The patch also removes the initialization of tcaches within the
__libc_free call. It only makes sense to initialize tcaches for the
thread after it calls one of the allocation functions. Also the patch
removes the save/restore of errno from tcache_init code, as it is no
longer needed.
I misunderstood the recommendation from the hardware team about non-temporal
load/stores. It is still recommended to use them in memset for large sizes. It
was not recommended for their use with device memory and memset is already
not valid to be used with device memory.
This reverts commit e6590f0c86.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
I misunderstood the recommendation from the hardware team about non-temporal
load/stores. It is still recommended to use them in memcpy for large sizes. It
was not recommended for their use with device memory and memcpy is already
not valid to be use with device memory.
This reverts commit eb5eeb4740.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Use tailcalls to avoid the overhead of a frame on the free fastpath.
Move tcache initialization to _int_free_chunk(). Add malloc_printerr_tail()
which can be tailcalled without forcing a frame like no-return functions.
Change tcache_double_free_verify() to retry via __libc_free() after clearing
the key.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Inline tcache_free since it's only used by __libc_free. Add __glibc_likely
for the tcache checks.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The checks on size can be merged and use __builtin_add_overflow. Since
tcache only handles small sizes (and rejects sizes < MINSIZE), delay this
check until after tcache.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Currently __libc_free checks for a freed mmap chunk in the fast path.
Also errno is always saved and restored to preserve it. Since mmap chunks
are larger than the largest tcache chunk, it is safe to delay this and
handle tcache, smallbin and medium bin blocks first. Move saving of errno
to cases that actually need it. Remove a safety check that fails on mmap
chunks and a check that mmap chunks cannot be added to tcache.
Performance of bench-malloc-thread improves by 9.2% for 1 thread and
6.9% for 32 threads on Neoverse V2.
Reviewed-by: DJ Delorie <dj@redhat.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
There is a spelling mistake in a test filename. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
There are spelling mistakes in assert messages. Fix them.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
There is a spelling mistake in a puts statement. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* manual/stdio.texi (Line Input): Document that getline and getdelim
where GNU extensions until standardized in POSIX.1-2008. Add restrict
to function prototypes.
Signed-off-by: Collin Funk <collin.funk1@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Since one path uses _IO_SYNC and the other _IO_OVERFLOW, the newly added
test cases verifies that `fflush (FILE)` and `fflush (NULL)` are
semantically equivalent from the FILE perspective.
Reviewed-by: Joseph Myers <josmyers@redhat.com>
This is required so that fclose, when trying to seek to the right
position after filling the input buffer, does not fail with EINVAL.
This fclose code path only ignores ESPIPE errors.
Reported by Petr Pisar on
<https://bugzilla.redhat.com/show_bug.cgi?id=2358265>.
Fixes commit be6818be31 ("Make fclose
seek input file to right offset (bug 12724)").
Reviewed-by: Frédéric Bérat <fberat@redhat.com>
Detect Intel Diamond Rapids and tune it similar to Intel Granite Rapids.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
Enable default tuning for unknown Intel processor.
Tested on x86, no regression.
Co-Authored-By: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>