Go to file
Noah Goldstein 12fec8aae5 x86/string: Fixup alignment of main loop in str{n}cmp-evex [BZ #32212]
The loop should be aligned to 32-bytes so that it can ideally run out
the DSB. This is particularly important on Skylake-Server where
deficiencies in it's DSB implementation make it prone to not being
able to run loops out of the DSB.

For example running strcmp-evex on 200Mb string:

32-byte aligned loop:
    - 43,399,578,766      idq.dsb_uops
not 32-byte aligned loop:
    - 6,060,139,704       idq.dsb_uops

This results in a 25% performance degradation for the non-aligned
version.

The fix is to just ensure the code layout is such that the loop is
aligned. (Which was previously the case but was accidentally dropped
in 84e7c46df).

NB: The fix was actually 64-byte alignment. This is because 64-byte
alignment generally produces more stable performance than 32-byte
aligned code (cache line crosses can affect perf), so if we are going
past 16-byte alignmnent, might as well go to 64. 64-byte alignment
also matches most other functions we over-align, so it creates a
common point of optimization.

Times are reported as ratio of Time_With_Patch /
Time_Without_Patch. Lower is better.

The values being reported is the geometric mean of the ratio across
all tests in bench-strcmp and bench-strncmp.

Note this patch is only attempting to improve the Skylake-Server
strcmp for long strings. The rest of the numbers are only to test for
regressions.

Tigerlake Results Strings <= 512:
    strcmp : 1.026
    strncmp: 0.949

Tigerlake Results Strings > 512:
    strcmp : 0.994
    strncmp: 0.998

Skylake-Server Results Strings <= 512:
    strcmp : 0.945
    strncmp: 0.943

Skylake-Server Results Strings > 512:
    strcmp : 0.778
    strncmp: 1.000

The 2.6% regression on TGL-strcmp is due to slowdowns caused by
changes in alignment of code handling small sizes (most on the
page-cross logic). These should be safe to ignore because 1) We
previously only 16-byte aligned the function so this behavior is not
new and was essentially up to chance before this patch and 2) this
type of alignment related regression on small sizes really only comes
up in tight micro-benchmark loops and is unlikely to have any affect
on realworld performance.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 483443d321)
2025-01-09 17:23:28 -08:00
ChangeLog.old Create ChangeLog.old/ChangeLog.27 2023-07-30 21:45:27 +02:00
argp
assert
benchtests
bits login: structs utmp, utmpx, lastlog _TIME_BITS independence (bug 30701) 2024-04-19 18:38:24 +02:00
catgets
conform
crypt
csu Add crt1-2.0.o for glibc 2.0 compatibility tests 2024-10-01 10:33:51 +08:00
ctype
debug misc/bits/select2.h: Clearly separate declaration from definitions 2023-07-05 16:59:48 +02:00
dirent
dlfcn
elf elf: Fix slow tls access after dlopen [BZ #19924] 2025-01-09 07:31:25 -08:00
gmon
gnulib
grp
gshadow
hesiod
htl
hurd
iconv iconv: restore verbosity with unrecognized encoding names (bug 30694) 2023-09-15 23:55:01 +02:00
iconvdata iconv: ISO-2022-CN-EXT: fix out-of-bound writes when writing escape sequence (CVE-2024-2961) 2024-04-17 14:05:00 -03:00
include malloc: Use __get_nprocs on arena_get2 (BZ 30945) 2024-02-12 09:53:27 -03:00
inet nscd: Do not rebuild getaddrinfo (bug 30709) 2023-08-11 10:55:10 +02:00
intl
io io: Fix record locking contants for powerpc64 with __USE_FILE_OFFSET64 2023-09-07 22:45:43 +02:00
libio Add crt1-2.0.o for glibc 2.0 compatibility tests 2024-10-01 10:33:51 +08:00
locale
localedata support: Add FAIL test failure helper 2024-08-28 16:43:33 -04:00
login login: structs utmp, utmpx, lastlog _TIME_BITS independence (bug 30701) 2024-04-19 18:38:24 +02:00
mach
malloc malloc: Use __get_nprocs on arena_get2 (BZ 30945) 2024-02-12 09:53:27 -03:00
manual ungetc: Fix uninitialized read when putting into unused streams [BZ #27821] 2024-08-28 16:44:54 -04:00
math Add crt1-2.0.o for glibc 2.0 compatibility tests 2024-10-01 10:33:51 +08:00
mathvec
misc Add mremap tests 2024-08-01 14:42:07 +02:00
nis
nptl nptl: initialize rseq area prior to registration 2024-12-06 16:01:45 +00:00
nptl_db
nscd nscd: Use time_t for return type of addgetnetgrentX 2024-05-02 19:02:08 +02:00
nss Fix leak in getaddrinfo introduced by the fix for CVE-2023-4806 [BZ #30843] 2023-09-26 10:14:37 -04:00
po translations: update cs, nl, vi 2023-07-27 00:21:13 +02:00
posix posix: Use <support/check.h> facilities in tst-truncate and tst-truncate64 2024-08-30 15:28:38 -04:00
pwd
resolv resolv: Fix tst-resolv-short-response for older GCC (bug 32042) 2024-08-01 21:09:49 +02:00
resource
rt Exclude routines from fortification 2023-07-05 16:59:48 +02:00
scripts scripts: Fix fortify checks if compiler does not support _FORTIFY_SOURCE=3 2023-07-20 17:58:26 -03:00
setjmp Exclude routines from fortification 2023-07-05 16:59:48 +02:00
shadow
signal
socket Fix name space violation in fortify wrappers (bug 32052) 2024-08-06 08:23:15 +02:00
soft-fp
stdio-common ungetc: Fix backup buffer leak on program exit [BZ #27821] 2024-08-28 16:45:25 -04:00
stdlib Fix name space violation in fortify wrappers (bug 32052) 2024-08-06 08:23:15 +02:00
string x86: Fix bug in strchrnul-evex512 [BZ #32078] 2024-08-15 14:41:17 -07:00
sunrpc sunrpc: Fix netname build with older gcc 2023-07-26 09:45:22 -03:00
support support: Add FAIL test failure helper 2024-08-28 16:43:33 -04:00
sysdeps x86/string: Fixup alignment of main loop in str{n}cmp-evex [BZ #32212] 2025-01-09 17:23:28 -08:00
sysvipc
termios
time Add checks for wday, yday and new date formats 2023-06-30 11:25:39 +02:00
timezone
wcsmbs Fix name space violation in fortify wrappers (bug 32052) 2024-08-06 08:23:15 +02:00
wctype
.clang-format
.gitattributes
.gitignore
CONTRIBUTED-BY
COPYING
COPYING.LIB
INSTALL INSTALL: regenerate 2023-07-30 21:16:02 +02:00
LICENSES
MAINTAINERS
Makeconfig Add crt1-2.0.o for glibc 2.0 compatibility tests 2024-10-01 10:33:51 +08:00
Makefile scripts: Fix fortify checks if compiler does not support _FORTIFY_SOURCE=3 2023-07-20 17:58:26 -03:00
Makefile.help
Makefile.in
Makerules Fix tests-clean Makefile target (bug 30545) 2023-06-26 10:37:25 -03:00
NEWS x86: Avoid integer truncation with large cache sizes (bug 32470) 2024-12-17 18:53:16 +01:00
README
Rules Add crt1-2.0.o for glibc 2.0 compatibility tests 2024-10-01 10:33:51 +08:00
SECURITY.md
SHARED-FILES
abi-tags
aclocal.m4 configure: Use autoconf 2.71 2023-07-17 10:08:10 -04:00
config.h.in LoongArch: config: Added HAVE_LOONGARCH_VEC_ASM. 2023-07-11 10:56:01 +08:00
config.make.in Allow glibc to be built with _FORTIFY_SOURCE 2023-07-05 16:59:34 +02:00
configure scripts: Fix fortify checks if compiler does not support _FORTIFY_SOURCE=3 2023-07-20 17:58:26 -03:00
configure.ac scripts: Fix fortify checks if compiler does not support _FORTIFY_SOURCE=3 2023-07-20 17:58:26 -03:00
extra-lib.mk
gen-locales.mk
libc-abis
libof-iterator.mk
o-iterator.mk
shlib-versions
test-skeleton.c
version.h Increase version numbers 2023-07-30 21:35:28 +02:00

README

This directory contains the sources of the GNU C Library.
See the file "version.h" for what release version you have.

The GNU C Library is the standard system C library for all GNU systems,
and is an important part of what makes up a GNU system.  It provides the
system API for all programs written in C and C-compatible languages such
as C++ and Objective C; the runtime facilities of other programming
languages use the C library to access the underlying operating system.

In GNU/Linux systems, the C library works with the Linux kernel to
implement the operating system behavior seen by user applications.
In GNU/Hurd systems, it works with a microkernel and Hurd servers.

The GNU C Library implements much of the POSIX.1 functionality in the
GNU/Hurd system, using configurations i[4567]86-*-gnu and x86_64-gnu.

When working with Linux kernels, this version of the GNU C Library
requires Linux kernel version 3.2 or later.

Also note that the shared version of the libgcc_s library must be
installed for the pthread library to work correctly.

The GNU C Library supports these configurations for using Linux kernels:

	aarch64*-*-linux-gnu
	alpha*-*-linux-gnu
	arc*-*-linux-gnu
	arm-*-linux-gnueabi
	csky-*-linux-gnuabiv2
	hppa-*-linux-gnu
	i[4567]86-*-linux-gnu
	x86_64-*-linux-gnu	Can build either x86_64 or x32
	ia64-*-linux-gnu
	loongarch64-*-linux-gnu Hardware floating point, LE only.
	m68k-*-linux-gnu
	microblaze*-*-linux-gnu
	mips-*-linux-gnu
	mips64-*-linux-gnu
	or1k-*-linux-gnu
	powerpc-*-linux-gnu	Hardware or software floating point, BE only.
	powerpc64*-*-linux-gnu	Big-endian and little-endian.
	s390-*-linux-gnu
	s390x-*-linux-gnu
	riscv32-*-linux-gnu
	riscv64-*-linux-gnu
	sh[34]-*-linux-gnu
	sparc*-*-linux-gnu
	sparc64*-*-linux-gnu

If you are interested in doing a port, please contact the glibc
maintainers; see https://www.gnu.org/software/libc/ for more
information.

See the file INSTALL to find out how to configure, build, and install
the GNU C Library.  You might also consider reading the WWW pages for
the C library at https://www.gnu.org/software/libc/.

The GNU C Library is (almost) completely documented by the Texinfo manual
found in the `manual/' subdirectory.  The manual is still being updated
and contains some known errors and omissions; we regret that we do not
have the resources to work on the manual as much as we would like.  For
corrections to the manual, please file a bug in the `manual' component,
following the bug-reporting instructions below.  Please be sure to check
the manual in the current development sources to see if your problem has
already been corrected.

Please see https://www.gnu.org/software/libc/bugs.html for bug reporting
information.  We are now using the Bugzilla system to track all bug reports.
This web page gives detailed information on how to report bugs properly.

The GNU C Library is free software.  See the file COPYING.LIB for copying
conditions, and LICENSES for notices about a few contributions that require
these additional notices to be distributed.  License copyright years may be
listed using range notation, e.g., 1996-2015, indicating that every year in
the range, inclusive, is a copyrightable year that would otherwise be listed
individually.