JIRA: https://issues.redhat.com/browse/RHEL-85517
commit 7802fce7dc18394d041a1310fe4ad76120e08145
Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date: Mon Jan 27 14:07:12 2025 +0100
cpufreq: intel_pstate: Make it possible to avoid enabling CAS
Capacity-aware scheduling (CAS) is enabled by default by intel_pstate on
hybrid systems without SMT, but in some usage scenarios it may be more
attractive to place tasks for maximum CPU performance regardless of the
extra cost in terms of energy, which is the case on such systems when
CAS is not enabled, so introduce a command line option to forbid
intel_pstate to enable CAS.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by:Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://patch.msgid.link/2781262.mvXUDI8C0e@rjwysocki.net
Signed-off-by: David Arcari <darcari@redhat.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/6398
JIRA: https://issues.redhat.com/browse/RHEL-78821
Proactive fixes and minor updates for scheduler related
code. This includes needed commits up to v6.14-rc1. There
are not as many since there are a few features upstream
which we are not taking into rhel9 at this point.
Signed-off-by: Phil Auld <pauld@redhat.com>
Approved-by: Waiman Long <longman@redhat.com>
Approved-by: Herton R. Krzesinski <herton@redhat.com>
Approved-by: Tony Camuso <tcamuso@redhat.com>
Approved-by: Juri Lelli <juri.lelli@redhat.com>
Approved-by: Rafael Aquini <raquini@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>
Merged-by: Augusto Caringi <acaringi@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-78821
commit 1174b9344bc7e7989439cad207fcd94eaab028db
Author: Waiman Long <longman@redhat.com>
Date: Wed Oct 30 13:52:51 2024 -0400
sched/isolation: Make "isolcpus=nohz" equivalent to "nohz_full"
The "isolcpus=nohz" boot parameter and flag were used to disable tick
when running a single task. Nowsdays, this "nohz" flag is seldomly used
as it is included as part of the "nohz_full" parameter. Extend this
flag to cover other kernel noises disabled by the "nohz_full" parameter
to make them equivalent. This also eliminates the need to use both the
"isolcpus" and the "nohz_full" parameters to fully isolated a given
set of CPUs.
Suggested-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20241030175253.125248-3-longman@redhat.com
Signed-off-by: Phil Auld <pauld@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-76143
Conflicts: A context diff in the include/linux/clocksource.h hunk due
to the presence of later upstream commit 6b2e29977518
("timekeeping: Provide infrastructure for converting to/from
a base clock").
commit 2ed08e4bc53298db3f87b528cd804cb0cce066a9
Author: Feng Tang <feng.tang@intel.com>
Date: Wed, 21 Feb 2024 14:08:59 +0800
clocksource: Scale the watchdog read retries automatically
On a 8-socket server the TSC is wrongly marked as 'unstable' and disabled
during boot time on about one out of 120 boot attempts:
clocksource: timekeeping watchdog on CPU227: wd-tsc-wd excessive read-back delay of 153560ns vs. limit of 125000ns,
wd-wd read-back delay only 11440ns, attempt 3, marking tsc unstable
tsc: Marking TSC unstable due to clocksource watchdog
TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
sched_clock: Marking unstable (119294969739, 159204297)<-(125446229205, -5992055152)
clocksource: Checking clocksource tsc synchronization from CPU 319 to CPUs 0,99,136,180,210,542,601,896.
clocksource: Switched to clocksource hpet
The reason is that for platform with a large number of CPUs, there are
sporadic big or huge read latencies while reading the watchog/clocksource
during boot or when system is under stress work load, and the frequency and
maximum value of the latency goes up with the number of online CPUs.
The cCurrent code already has logic to detect and filter such high latency
case by reading the watchdog twice and checking the two deltas. Due to the
randomness of the latency, there is a low probabilty that the first delta
(latency) is big, but the second delta is small and looks valid. The
watchdog code retries the readouts by default twice, which is not
necessarily sufficient for systems with a large number of CPUs.
There is a command line parameter 'max_cswd_read_retries' which allows to
increase the number of retries, but that's not user friendly as it needs to
be tweaked per system. As the number of required retries is proportional to
the number of online CPUs, this parameter can be calculated at runtime.
Scale and enlarge the number of retries according to the number of online
CPUs and remove the command line parameter completely.
[ tglx: Massaged change log and comments ]
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jin Wang <jin1.wang@intel.com>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Waiman Long <longman@redhat.com>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/20240221060859.1027450-1-feng.tang@intel.com
Signed-off-by: Waiman Long <longman@redhat.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/6045
# Merge Request Required Information
## Summary of Changes
Backport more patches, mostly from 6.12, that are needed to enable TDX support in KVM. These prerequisites are less self contained, but are enough to have a mostly conflict-free TDX backport.
## Approved Development Ticket(s)
All submissions to CentOS Stream must reference a ticket in [Red Hat Jira](https://issues.redhat.com/).
```
JIRA: https://issues.redhat.com/browse/RHEL-71541
Depends: https://issues.redhat.com/browse/RHEL-64444
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Omitted-fix: 3f749befb0998472470d850b11b430477c0718cc (irrelevant series of changes for odd Kconfigs)
Omitted-fix: ea4290d77bda2bd1f173a86f07aa79b568e0a6f8 (irrelevant series of changes for odd Kconfigs)
Omitted-fix: 2a5fe5a01668e831af1de3951718fbf88b9a9b9c (irrelevant series of changes for odd Kconfigs)
Omitted-fix: 338b655a1178900ac05aca7ac66dc28b05100430 (irrelevant series of changes for odd Kconfigs)
Omitted-fix: 341e4023032fba6c02326bfc6babd63ef4039712 (irrelevant series of changes for odd Kconfigs)
Omitted-fix: 1331343af6f502aecd274d522dd34bf7c965f484 (irrelevant series of changes for odd Kconfigs)
Omitted-fix: 9ee62c33c0fe017ee02501a877f6f562363122fa (irrelevant series of changes for odd Kconfigs)
Omitted-fix: 2a5fe5a01668e831af1de3951718fbf88b9a9b9c (irrelevant series of changes for odd Kconfigs)
Omitted-fix: d822ca29a4fc5278fb511790dace44836e8cc40d (can be backported via perf)
Omitted-fix: 979956bc681105f34642971448c4cda048954a07 (irrelevant with RHEL gcc)
Omitted-fix: e120829dbf927c8b93cd5e06acfec0332cc82e02 (can be backported via perf)
```
Approved-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Approved-by: Steve Best <sbest@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>
Merged-by: Patrick Talbert <ptalbert@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-71541
Add an on-by-default module param, enable_virt_at_load, to let userspace
force virtualization to be enabled in hardware when KVM is initialized,
i.e. just before /dev/kvm is exposed to userspace. Enabling virtualization
during KVM initialization allows userspace to avoid the additional latency
when creating/destroying the first/last VM (or more specifically, on the
0=>1 and 1=>0 edges of creation/destruction).
Now that KVM uses the cpuhp framework to do per-CPU enabling, the latency
could be non-trivial as the cpuhup bringup/teardown is serialized across
CPUs, e.g. the latency could be problematic for use case that need to spin
up VMs quickly.
Prior to commit 10474ae894 ("KVM: Activate Virtualization On Demand"),
KVM _unconditionally_ enabled virtualization during load, i.e. there's no
fundamental reason KVM needs to dynamically toggle virtualization. These
days, the only known argument for not enabling virtualization is to allow
KVM to be autoloaded without blocking other out-of-tree hypervisors, and
such use cases can simply change the module param, e.g. via command line.
Note, the aforementioned commit also mentioned that enabling SVM (AMD's
virtualization extensions) can result in "using invalid TLB entries".
It's not clear whether the changelog was referring to a KVM bug, a CPU
bug, or something else entirely. Regardless, leaving virtualization off
by default is not a robust "fix", as any protection provided is lost the
instant userspace creates the first VM.
Reviewed-by: Chao Gao <chao.gao@intel.com>
Acked-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Tested-by: Farrah Chen <farrah.chen@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240830043600.127750-8-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit b4886fab6fb620b96ad7eeefb9801c42dfa91741)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5812
JIRA: https://issues.redhat.com/browse/RHEL-27745
JIRA: https://issues.redhat.com/browse/RHEL-15601
JIRA: https://issues.redhat.com/browse/RHEL-28873
JIRA: https://issues.redhat.com/browse/RHEL-54929
JIRA: https://issues.redhat.com/browse/RHEL-61137
JIRA: https://issues.redhat.com/browse/RHEL-62336
JIRA: https://issues.redhat.com/browse/RHEL-66627
JIRA: https://issues.redhat.com/browse/RHEL-66794
JIRA: https://issues.redhat.com/browse/RHEL-66818
JIRA: https://issues.redhat.com/browse/RHEL-66950
JIRA: https://issues.redhat.com/browse/RHEL-66977
JIRA: https://issues.redhat.com/browse/RHEL-68011
JIRA: https://issues.redhat.com/browse/RHEL-68909
JIRA: https://issues.redhat.com/browse/RHEL-69683
JIRA: https://issues.redhat.com/browse/RHEL-70053
CVE: CVE-2023-52490
CVE: CVE-2024-42316
CVE: CVE-2024-50182
CVE: CVE-2024-50199
CVE: CVE-2024-50200
CVE: CVE-2024-50219
CVE: CVE-2024-50228
CVE: CVE-2024-50272
CVE: CVE-2024-53097
CVE: CVE-2024-53105
CVE: CVE-2024-53136
This set proactively brings into RHEL9 core MM code a set of follow-up
fixes as they were pushed into upstream's stable v6.6 LTS branch, but
Mainline commits are backported instead in order to keep it easy to
track the RHEL backports against upstream. Dependencies were also
selectively backported where it made sense to do so, and all the
selected commits are sorted in upstream's topological order.
Omitted-fix: c567f2948f57 ("Revert "x86/mm/ident_map: Use gbpages only where full GB page should be mapped."")
Omitted-fix: 4b944f8ef996 ("Revert "mm/filemap: avoid buffered read/write race to read inconsistent data"")
Omitted-fix: 9d08ec41a064 ("mm: allow set/clear page_type again")
Omitted-fix: cc9bc36ebef7 ("mm: zswap: remove nr_zswap_stored atomic")
Omitted-fix: 0e4008447242 ("zswap: track swapins from disk more accurately")
Omitted-fix: 6359c39c9de6 ("mm: remove unused hugepage for vma_alloc_folio()")
Omitted-fix: 9b5c87d47949 ("mm: mmap_lock: check trace_mmap_lock_$type_enabled() instead of regcount")
Omitted-fix: 1390a3334a48 ("mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio")
Omitted-fix: f708f6970cc9 ("mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio")
Omitted-fix: 4de22b2a6a74 ("mm: open-code PageTail in folio_flags() and const_folio_flags()")
Omitted-fix: 6a7de1bf218d ("mm: open-code page_folio() in dump_page()")
Omitted-fix: 40a024b81d1c ("ALSA: core: Drop superfluous no_free_ptr() for memdup_user() errors")
Omitted-fix: 9d197b627e5f ("docs/zh_CN: update the translation of mm/page_table_check.rst")
Omitted-fix: ce8f9fb651fa ("comedi: Flush partial mappings in error case")
Signed-off-by: Rafael Aquini <raquini@redhat.com>
Approved-by: Phil Auld <pauld@redhat.com>
Approved-by: Herton R. Krzesinski <herton@redhat.com>
Approved-by: Jerry Snitselaar <jsnitsel@redhat.com>
Approved-by: Tony Camuso <tcamuso@redhat.com>
Approved-by: Steve Best <sbest@redhat.com>
Approved-by: John W. Linville <linville@redhat.com>
Approved-by: Mark Langsdorf <mlangsdo@redhat.com>
Approved-by: Jocelyn Falempe <jfalempe@redhat.com>
Approved-by: Lucas Zampieri <lzampier@redhat.com>
Approved-by: Ivan Vecera <ivecera@redhat.com>
Approved-by: Gavin Shan <gshan@redhat.com>
Approved-by: Andrea Claudi <aclaudi@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>
Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27745
Conflicts:
* arch/*/Kconfig: all hunks dropped as there were only text blurbs and comments
being changed with no functional changes whatsoever, and RHEL9 is missing
several (unrelated) commits to these arches that tranform the text blurbs in
the way these non-functional hunks were expecting;
* drivers/accel/qaic/qaic_data.c: hunk dropped due to RHEL-only commit
083c0cdce2 ("Merge DRM changes from upstream v6.8..v6.9");
* drivers/gpu/drm/i915/gem/selftests/huge_pages.c: hunk dropped due to RHEL-only
commit ca8b16c11b ("Merge DRM changes from upstream v6.7..v6.8");
* drivers/gpu/drm/ttm/tests/ttm_pool_test.c: all hunks dropped due to RHEL-only
commit ca8b16c11b ("Merge DRM changes from upstream v6.7..v6.8");
* drivers/video/fbdev/vermilion/vermilion.c: hunk dropped as RHEL9 misses
commit dbe7e429fe ("vmlfb: framebuffer driver for Intel Vermilion Range");
* include/linux/pageblock-flags.h: differences due to out-of-order backport
of upstream commits 72801513b2bf ("mm: set pageblock_order to HPAGE_PMD_ORDER
in case with !CONFIG_HUGETLB_PAGE but THP enabled"), and 3a7e02c040b1
("minmax: avoid overly complicated constant expressions in VM code");
* mm/mm_init.c: differences on the 3rd, and 4th hunks are due to RHEL
backport commit 1845b92dcf ("mm: move most of core MM initialization to
mm/mm_init.c") ignoring the out-of-order backport of commit 3f6dac0fd1b8
("mm/page_alloc: make deferred page init free pages in MAX_ORDER blocks")
thus partially reverting the changes introduced by the latter;
This patch is a backport of the following upstream commit:
commit 5e0a760b44417f7cadd79de2204d6247109558a0
Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Date: Thu Dec 28 17:47:04 2023 +0300
mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER
commit 23baf831a32c ("mm, treewide: redefine MAX_ORDER sanely") has
changed the definition of MAX_ORDER to be inclusive. This has caused
issues with code that was not yet upstream and depended on the previous
definition.
To draw attention to the altered meaning of the define, rename MAX_ORDER
to MAX_PAGE_ORDER.
Link: https://lkml.kernel.org/r/20231228144704.14033-2-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rafael Aquini <raquini@redhat.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5592
JIRA: https://issues.redhat.com/browse/RHEL-59051
CVE: CVE-2024-44960
CVE JIRA: https://issues.redhat.com/browse/RHEL-57138
CVE: CVE-2024-46675
CVE JIRA: https://issues.redhat.com/browse/RHEL-64322
This MR rebases supported USB/TBT drivers to upstream kernel v6.11. By
design, changes on this rebase are limited to supported USB/Thunderbolt
drivers and infrastructure. Changes which happen to touch the drivers but
are tree-wide are selectively or partially pulled in, whenever relevant.
Notes:
I) Omits:
Omitted-fix: aefa036be8c2 ("phy: freescale: imx8qm-hsio: Include bitfield.h for FIELD_PREP")
Omitted-fix: 2d6213bd592b ("crypto: spacc - Add ifndef around MIN")
Omitted-fix: b8fc70ab7b5f ("Revert "crypto: spacc - Add SPAcc Skcipher support")
Omitted-fix: bf791751162a ("thunderbolt: Add only on-board retimers when !CONFIG_USB4_DEBUGFS_MARGINING")
II) This MR drops `rtsx_pci_ms` driver because it became dead code with
commit <c0e5f4e73a71> ("misc: rtsx: Add support for RTS5261"), which as
consequence was latter dropped on commit <d0f459259c13> ("memstick:
rtsx_pci_ms: Remove Realtek PCI memstick driver"). The latter is being
merged here.
III) This MR also includes minmax updates to fix these build and test errors:
1 - Signedness error:
```
drivers/usb/typec/ucsi/ucsi.c: In function 'ucsi_get_pd_message':
./include/linux/build_bug.h:78:41: error: static assertion failed: "min(bytes, (((con->ucsi)->version < 0x0200) ? 0x10 : 0xff)) signedness error, fix types or consider umin() before min_t()"
78 | #define __static_assert(expr, msg, ...) _Static_assert(expr, msg)
```
2 - ISO C90 error:
```
drivers/scsi/Makefile:196: FORCE prerequisite is missing
lib/vsprintf.c: In function 'resource_string':
lib/vsprintf.c:1068:9: error: ISO C90 forbids variable length array 'sym' [-Werror=vla]
1068 | char sym[max(2*RSRC_BUF_SIZE + DECODED_BUF_SIZE,
| ^~~~
```
3 - Oops on drm_gem_shmem CKI testing:
```
Unable to handle kernel paging request at virtual address ffffffff80000000
...
Internal error: Oops: 0000000096000146 [#1] SMP
...
drm_gem_shmem_test_obj_create_private+0x1cc/0x41c [drm_gem_shmem_test]
...
# drm_gem_shmem_test_obj_create_private: try faulted: last line seen drivers/gpu/drm/tests/drm_gem_shmem_test.c:120
# drm_gem_shmem_test_obj_create_private: internal error occurred preventing test case from running: -4
```
Signed-off-by: Desnes Nunes <desnesn@redhat.com>
Approved-by: José Ignacio Tornos Martínez <jtornosm@redhat.com>
Approved-by: Bastien Nocera <bnocera@redhat.com>
Approved-by: Tony Camuso <tcamuso@redhat.com>
Approved-by: Rafael Aquini <raquini@redhat.com>
Approved-by: Chris von Recklinghausen <crecklin@redhat.com>
Approved-by: Ivan Vecera <ivecera@redhat.com>
Approved-by: David Arcari <darcari@redhat.com>
Approved-by: Eric Chanudet <echanude@redhat.com>
Approved-by: Adam Jackson <ajax@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>
Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5391
JIRA: https://issues.redhat.com/browse/RHEL-55461
JIRA: https://issues.redhat.com/browse/RHEL-55465
JIRA: https://issues.redhat.com/browse/RHEL-55462
Depends: !5252
Updated the respective arch mm directories to v6.6. Most of the patches
have already been updated or included by the respective arch teams and by
Rafael's mm update to v6.6.
Dropped the following to avoid issues with the ppc64le build:
41b7a347bf14 powerpc: Book3S 64-bit outline-only KASAN support
c7b9ed7c34a9 powerpc/64e: KASAN Full support for BOOK3E/64
Omitted-fix: 7bd6680b47fa Revert "Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()""
Omitted-fix: 7b59e8ae92fe arm64: dts: qcom: sc7280: Mark SCM as dma-coherent for chrome devices
Omitted-fix: a54b7fa6b9ab arm64: dts: qcom: sc7180: Mark SCM as dma-coherent for trogdor
Omitted-fix: 9a5f0b11e49e arm64: dts: qcom: sc7180: Mark SCM as dma-coherent for IDP
Omitted-fix: cd87d9f58439 x86/mm: further clarify switch_mm_irqs_off() documentation
Signed-off-by: Audra Mitchell <audra@redhat.com>
Approved-by: Rafael Aquini <raquini@redhat.com>
Approved-by: Vladis Dronov <vdronov@redhat.com>
Approved-by: Herton R. Krzesinski <herton@redhat.com>
Approved-by: Chris von Recklinghausen <crecklin@redhat.com>
Approved-by: Nico Pache <npache@redhat.com>
Approved-by: Lenny Szubowicz <lszubowi@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>
Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-59051
commit 804da867ad016d53bf33373cfeaae041775455f1
Author: Norihiko Hama <Norihiko.Hama@alpsalpine.com>
Date: Wed, 15 May 2024 09:43:39 +0900
Current storage scan delay is reduced by the following old commit.
a4a47bc03f ("Lower USB storage settling delay to something more reasonable")
It means that delay is at least 'one second', or zero with delay_use=0.
'one second' is still long delay especially for embedded system but
when delay_use is set to 0 (no delay), still error observed on some USB drives.
So delay_use should not be set to 0 but 'one second' is quite long.
Especially for embedded system, it's important for end user
how quickly access to USB drive when it's connected.
That's why we have a chance to minimize such a constant long delay.
This patch optimizes scan delay more precisely
to minimize delay time but not to have any problems on USB drives
by extending module parameter 'delay_use' in milliseconds internally.
The parameter 'delay_use' optionally supports in milliseconds
if it ends with 'ms'.
It makes the range of value to 1 / 1000 in internal 32-bit value
but it's still enough to set the delay time.
By default, delay time is 'one second' for backward compatibility.
For example, it seems to be good by changing delay_use=100ms,
that is 100 millisecond delay without issues for most USB pen drives.
Signed-off-by: Norihiko Hama <Norihiko.Hama@alpsalpine.com>
Link: https://lore.kernel.org/r/20240515004339.29892-1-Norihiko.Hama@alpsalpine.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Desnes Nunes <desnesn@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-61942
Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
commit f0295913c4b4f377c454e06f50c1a04f2f80d9df
Author: Joerg Roedel <jroedel@suse.de>
Date: Thu Sep 5 09:22:40 2024 +0200
iommu/amd: Add kernel parameters to limit V1 page-sizes
Add two new kernel command line parameters to limit the page-sizes
used for v1 page-tables:
nohugepages - Limits page-sizes to 4KiB
v2_pgsizes_only - Limits page-sizes to 4Kib/2Mib/1GiB; The
same as the sizes used with v2 page-tables
This is needed for multiple scenarios. When assigning devices to
SEV-SNP guests the IOMMU page-sizes need to match the sizes in the RMP
table, otherwise the device will not be able to access all shared
memory.
Also, some ATS devices do not work properly with arbitrary IO
page-sizes as supported by AMD-Vi, so limiting the sizes used by the
driver is a suitable workaround.
All-in-all, these parameters are only workarounds until the IOMMU core
and related APIs gather the ability to negotiate the page-sizes in a
better way.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://lore.kernel.org/r/20240905072240.253313-1-joro@8bytes.org
(cherry picked from commit f0295913c4b4f377c454e06f50c1a04f2f80d9df)
Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-55462
This patch is a backport of the following upstream commit:
commit 6b34a099faa123488b13caf704562f4dbe483fc4
Author: Nicholas Piggin <npiggin@gmail.com>
Date: Mon Oct 24 13:01:50 2022 +1000
powerpc/64s/hash: add stress_hpt kernel boot option to increase hash faults
This option increases the number of hash misses by limiting the number
of kernel HPT entries, by keeping a per-CPU record of the last kernel
HPTEs installed, and removing that from the hash table on the next hash
insertion. A timer round-robins CPUs removing remaining kernel HPTEs and
clearing the TLB (in the case of bare metal) to increase and slightly
randomise kernel fault activity.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Add comment about NR_CPUS usage, fixup whitespace]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221024030150.852517-1-npiggin@gmail.com
Signed-off-by: Audra Mitchell <audra@redhat.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5430
JIRA: https://issues.redhat.com/browse/RHEL-57113
Upstream Status: up to v6.11 and fixes up to v6.12-rc5 \
Tested: kvm-unit-tests, kselftest, migration test.
This is the first round rebase kvm-arm up to v6.11, which contains the below series:
1. KVM: arm64: pKVM host proxy FF-A fixes (part of them)
2. KVM: arm64: nv: Shadow stage-2 page table handling
3. KVM: arm64: Allow userspace to modify CTR_EL0
4. KVM: arm64: nv: FPSIMD/SVE, plus some other CPTR goodies
5. KVM: arm64: fix warnings in W=1 build
6. Misc commits
Besides that, it also takes the fixes commit `4155539bc5ba ("KVM: arm64: nv: Enforce S2 alignment when contiguous bit is set")` which up to v6.12-rc1.
* 42fb33dde42b KVM: arm64: Use FF-A 1.1 with pKVM \
This commit belongs to the series 1, don't pick it because downstream doesn't support FF-A 1.1 (The related upstream commit is `1609626c32c4 ("firmware: arm_ffa: Update the FF-A command list with v1.1 additions")`).
This `KVM: arm64: Fix handling of TCR2_EL1` series can be taken by kvm-arm rebase but since it depends on the arm64 rebase, so will pick them in the second round when the arm64 rebase being merged.
* 838d992b8448 KVM: arm64: Convert kvm_mpidr_index() to bitmap_gather() \
Don't pick this commit since downstream doesn't support bitmap_gather().
Changelog: \
v2 -> v3: \
Add commits:
* eb9d53d4a949 KVM: arm64: nv: Fix RESx behaviour of disabled FGTs with negative polarity
* cb52b5c8b81b Revert "KVM: arm64: nv: Fix RESx behaviour of disabled FGTs with negative polarity"
* 810ecbefdd54 KVM: Documentation: Correct the VGIC V2 CPU interface addr space size
* 03bd36a387b8 KVM: Documentation: Enumerate allowed value macros of irq_type
* ae8f8b376102 KVM: arm64: Unregister redistributor for failed vCPU creation
* c6c167afa090 KVM: arm64: Fix shift-out-of-bounds bug
* 78a005555500 KVM: arm64: Ensure vgic_ready() is ordered against MMIO registration
v1 -> v2: \
Add those two commits to avoid conflicts when backport `894376385a2d KVM: arm64: Add support for FFA_PARTITION_INFO_GET`.
* 3fad96e9b21b ("firmware: arm_ffa: Declare ffa_bus_type structure in the header")
* 989e8661dc45 ("firmware: arm_ffa: Make ffa_bus_type const")
Add commits:
* b26e484b8bb3 ("arm64: Add CFI error handling")
* 7a928b32f1de arm64: Introduce esr_brk_comment, esr_is_cfi_brk
* 8f3873a39529 KVM: arm64: Introduce print_nvhe_hyp_panic helper
* eca4ba5b6dff KVM: arm64: nVHE: Support CONFIG_CFI_CLANG at EL2
Add commits:
* f26a525b77e0 KVM: arm64: Add memory length checks and remove inline in do_ffa_mem_xfer
* a1d402abf8e3 KVM: arm64: Fix kvm_has_feat'*'() handling of negative features
* 78fee4198bb4 KVM: arm64: Fix __pkvm_init_vcpu cptr_el2 error path
* a9f41588a902 KVM: arm64: Constrain the host to the maximum shared SVE VL with pKVM
* dc0dddb1d66d KVM: arm64: Invalidate EL1&0 TLB entries for all VMIDs in nvhe hyp init
* ed49fe5a6fb9 KVM: arm64: Ensure TLBI uses correct VMID after changing context
* e0b7de4fd18c KVM: arm64: Disallow copying MTE to guest memory while KVM is dirty logging
* ae41d7dbaeb4 KVM: arm64: Release pfn, i.e. put page, if copying MTE tags hits ZONE_DEVICE
* 38753cbc4dca KVM: arm64: Move data barrier to end of split walk
Signed-off-by: Shaoqin Huang <shahuang@redhat.com>
Approved-by: Gavin Shan <gshan@redhat.com>
Approved-by: Sebastian Ott <sebott@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>
Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-40604
commit 1279e8d0dcead53cf1f51e926a1cf6d2a79332d6
Author: Andrea della Porta <andrea.porta@suse.com>
Date: Mon, 29 Apr 2024 12:28:33 +0200
Introducing the field 'el0' to the idreg-override for register
ID_AA64PFR0_EL1. This field is also aliased to the new kernel
command line option 'arm64.no32bit_el0' as a more recognizable
and mnemonic name to disable the execution of 32 bit userspace
applications (i.e. avoid Aarch32 execution state in EL0) from
kernel command line.
Link: https://lore.kernel.org/all/20240207105847.7739-1-andrea.porta@suse.com/
Signed-off-by: Andrea della Porta <andrea.porta@suse.com>
Link: https://lore.kernel.org/r/20240429102833.6426-1-andrea.porta@suse.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Mark Salter <msalter@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-57113
Conflicts:
- Documentation/admin-guide/kernel-parameters.txt
Contextual conflicts due to missing commit
600716592a3a ("doc: Add EARLY flag to early-parsed kernel boot parameters").
commit 0b5afe05377d7993f19292bf49dd13e959000790
Author: Colton Lewis <coltonlewis@google.com>
Date: Thu May 23 17:40:55 2024 +0000
KVM: arm64: Add early_param to control WFx trapping
Add an early_params to control WFI and WFE trapping. This is to
control the degree guests can wait for interrupts on their own without
being trapped by KVM. Options for each param are trap and notrap. trap
enables the trap. notrap disables the trap. Note that when enabled,
traps are allowed but not guaranteed by the CPU architecture. Absent
an explicitly set policy, default to current behavior: disabling the
trap if only a single task is running and enabling otherwise.
Signed-off-by: Colton Lewis <coltonlewis@google.com>
Reviewed-by: Jing Zhang <jingzhangos@google.com>
Link: https://lore.kernel.org/r/20240523174056.1565133-1-coltonlewis@google.com
[ oliver: rework kvm_vcpu_should_clear_tw*() for readability ]
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Shaoqin Huang <shahuang@redhat.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5250
JIRA: https://issues.redhat.com/browse/RHEL-56494
JIRA: https://issues.redhat.com/browse/RHEL-57142
CVE: CVE-2024-44958
Tested: Ran scheduler tests and general stress testing. Have asked
perf QE for sanity tests.
Omitted-fix: c049acee3c71 ("selftests/ftrace: Fix test to handle both old and new kernels"): Somewhat out of scope for this MR and should not need to run test against old kernels in RHEL.
Series of scheduler related fixes and updates, up to v6.11. A large
number of these are refactoring (making naming consistent, breaking out
code into new files etc) with no functional changes. Otherwise, primarily
bug fixes and cleanups, no real feature additions.
Signed-off-by: Phil Auld <pauld@redhat.com>
Approved-by: Tony Camuso <tcamuso@redhat.com>
Approved-by: Mark Langsdorf <mlangsdo@redhat.com>
Approved-by: Juri Lelli <juri.lelli@redhat.com>
Approved-by: Eric Chanudet <echanude@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>
Approved-by: Chris von Recklinghausen <crecklin@redhat.com>
Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-20288
commit 68d124b0999919015e6d23008eafea106ec6bb40
Author: Paul E. McKenney <paulmck@kernel.org>
Date: 2024-05-08 20:11:58 -0700
rcu: Add rcutree.nohz_full_patience_delay to reduce nohz_full OS jitter
If a CPU is running either a userspace application or a guest OS in
nohz_full mode, it is possible for a system call to occur just as an
RCU grace period is starting. If that CPU also has the scheduling-clock
tick enabled for any reason (such as a second runnable task), and if the
system was booted with rcutree.use_softirq=0, then RCU can add insult to
injury by awakening that CPU's rcuc kthread, resulting in yet another
task and yet more OS jitter due to switching to that task, running it,
and switching back.
In addition, in the common case where that system call is not of
excessively long duration, awakening the rcuc task is pointless.
This pointlessness is due to the fact that the CPU will enter an extended
quiescent state upon returning to the userspace application or guest OS.
In this case, the rcuc kthread cannot do anything that the main RCU
grace-period kthread cannot do on its behalf, at least if it is given
a few additional milliseconds (for example, given the time duration
specified by rcutree.jiffies_till_first_fqs, give or take scheduling
delays).
This commit therefore adds a rcutree.nohz_full_patience_delay kernel
boot parameter that specifies the grace period age (in milliseconds,
rounded to jiffies) before which RCU will refrain from awakening the
rcuc kthread. Preliminary experimentation suggests a value of 1000,
that is, one second. Increasing rcutree.nohz_full_patience_delay will
increase grace-period latency and in turn increase memory footprint,
so systems with constrained memory might choose a smaller value.
Systems with less-aggressive OS-jitter requirements might choose the
default value of zero, which keeps the traditional immediate-wakeup
behavior, thus avoiding increases in grace-period latency.
[ paulmck: Apply Leonardo Bras feedback. ]
Link: https://lore.kernel.org/all/20240328171949.743211-1-leobras@redhat.com/
Reported-by: Leonardo Bras <leobras@redhat.com>
Suggested-by: Leonardo Bras <leobras@redhat.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Leonardo Bras <leobras@redhat.com>
Signed-off-by: Leonardo Bras <leobras@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-56494
Conflicts: Minor context differences.
commit c793a62823d1ce8f70d9cfc7803e3ea436277cda
Author: Sean Christopherson <seanjc@google.com>
Date: Mon May 27 17:34:48 2024 -0700
sched/core: Drop spinlocks on contention iff kernel is preemptible
Use preempt_model_preemptible() to detect a preemptible kernel when
deciding whether or not to reschedule in order to drop a contended
spinlock or rwlock. Because PREEMPT_DYNAMIC selects PREEMPTION, kernels
built with PREEMPT_DYNAMIC=y will yield contended locks even if the live
preemption model is "none" or "voluntary". In short, make kernels with
dynamically selected models behave the same as kernels with statically
selected models.
Somewhat counter-intuitively, NOT yielding a lock can provide better
latency for the relevant tasks/processes. E.g. KVM x86's mmu_lock, a
rwlock, is often contended between an invalidation event (takes mmu_lock
for write) and a vCPU servicing a guest page fault (takes mmu_lock for
read). For _some_ setups, letting the invalidation task complete even
if there is mmu_lock contention provides lower latency for *all* tasks,
i.e. the invalidation completes sooner *and* the vCPU services the guest
page fault sooner.
But even KVM's mmu_lock behavior isn't uniform, e.g. the "best" behavior
can vary depending on the host VMM, the guest workload, the number of
vCPUs, the number of pCPUs in the host, why there is lock contention, etc.
In other words, simply deleting the CONFIG_PREEMPTION guard (or doing the
opposite and removing contention yielding entirely) needs to come with a
big pile of data proving that changing the status quo is a net positive.
Opportunistically document this side effect of preempt=full, as yielding
contended spinlocks can have significant, user-visible impact.
Fixes: c597bfddc9e9 ("sched: Provide Kconfig support for default dynamic preempt mode")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ankur Arora <ankur.a.arora@oracle.com>
Reviewed-by: Chen Yu <yu.c.chen@intel.com>
Link: https://lore.kernel.org/kvm/ef81ff36-64bb-4cfe-ae9b-e3acf47bff24@proxmox.com
Signed-off-by: Phil Auld <pauld@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-56494
commit 97450eb909658573dcacc1063b06d3d08642c0c1
Author: Vincent Guittot <vincent.guittot@linaro.org>
Date: Tue Mar 26 10:16:16 2024 +0100
sched/pelt: Remove shift of thermal clock
The optional shift of the clock used by thermal/hw load avg has been
introduced to handle case where the signal was not always a high frequency
hw signal. Now that cpufreq provides a signal for firmware and
SW pressure, we can remove this exception and always keep this PELT signal
aligned with other signals.
Mark sysctl_sched_migration_cost boot parameter as deprecated
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Qais Yousef <qyousef@layalina.io>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://lore.kernel.org/r/20240326091616.3696851-6-vincent.guittot@linaro.org
Signed-off-by: Phil Auld <pauld@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-48601
Upstream Status: 47c8846a49baa8c0b7a6a3e7e7eacd6e8d119d25
commit 47c8846a49baa8c0b7a6a3e7e7eacd6e8d119d25
Author: Vidya Sagar <vidyas@nvidia.com>
Date: Tue Jun 25 21:01:50 2024 +0530
PCI: Extend ACS configurability
PCIe ACS settings control the level of isolation and the possible P2P paths
between devices. With greater isolation the kernel will create smaller
iommu_groups and with less isolation there is more HW that can achieve P2P
transfers. From a virtualization perspective all devices in the same
iommu_group must be assigned to the same VM as they lack security
isolation.
There is no way for the kernel to automatically know the correct ACS
settings for any given system and workload. Existing command line options
(e.g., disable_acs_redir) allow only for large scale change, disabling all
isolation, but this is not sufficient for more complex cases.
Add a kernel command-line option 'config_acs' to directly control all the
ACS bits for specific devices, which allows the operator to setup the right
level of isolation to achieve the desired P2P configuration. The
definition is future proof; when new ACS bits are added to the spec the
open syntax can be extended.
ACS needs to be setup early in the kernel boot as the ACS settings affect
how iommu_groups are formed. iommu_group formation is a one time event
during initial device discovery, so changing ACS bits after kernel boot can
result in an inaccurate view of the iommu_groups compared to the current
isolation configuration.
ACS applies to PCIe Downstream Ports and multi-function devices. The
default ACS settings are strict and deny any direct traffic between two
functions. This results in the smallest iommu_group the HW can support.
Frequently these values result in slow or non-working P2PDMA.
ACS offers a range of security choices controlling how traffic is
allowed to go directly between two devices. Some popular choices:
- Full prevention
- Translated requests can be direct, with various options
- Asymmetric direct traffic, A can reach B but not the reverse
- All traffic can be direct
Along with some other less common ones for special topologies.
The intention is that this option would be used with expert knowledge of
the HW capability and workload to achieve the desired configuration.
Link: https://lore.kernel.org/r/20240625153150.159310-1-vidyas@nvidia.com
Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
[bhelgaas: add example, tidy printk formats]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Myron Stowe <mstowe@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-54248
commit 37002bc6b6039e1491140869c6801e0a2deee43e
Author: Costa Shulyupin <costa.shul@redhat.com>
Date: Tue Jul 18 07:55:02 2023 +0300
docs: move s390 under arch
and fix all in-tree references.
Architecture-specific documentation is being moved into Documentation/arch/
as a way of cleaning up the top-level documentation directory and making
the docs hierarchy more closely match the source hierarchy.
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Acked-by: Jonathan Corbet <corbet@lwn.net>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20230718045550.495428-1-costa.shul@redhat.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Conflicts:
Documentation/admin-guide/kernel-parameters.txt
Documentation/arch/index.rst
MAINTAINERS
(contextual conflicts due to missing other patches in downstream)
Signed-off-by: Thomas Huth <thuth@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-54248
commit 1f3307cf3aac88763077fac90404f2c57bc5181a
Author: Thomas Richter <tmricht@linux.ibm.com>
Date: Tue Sep 20 14:26:16 2022 +0200
s390/con3215: Drop console data printout when buffer full
Using z/VM the 3270 terminal emulator also emulates an IBM 3215 console
which outputs line by line. When the screen is full, the console enters
the MORE... state and waits for the operator to confirm the data
on the screen by pressing a clear key. If this does not happen in the
default time frame (currently 50 seconds) the console enters the HOLDING
state.
It then waits another time frame (currently 10 seconds) before the output
continues on the next screen. When the operator presses the clear key
during these wait times, the output continues immediately.
This may lead to a very long boot time when the console
has to print many messages, also the system may hang because of the
console's limited buffer space and the system waits for the console
output to drain and finally to finish. This problem can only occur
when a terminal emulator is actually connected to the 3215 console
driver. If not z/VM simply drops console output.
Remedy this rare situation and add a kernel boot command line parameter
con3215_drop. It can be set to 0 (do not drop) or 1 (do drop) which is
the default. This instructs the kernel drop console data when the
console buffer is full. This speeds up the boot time considerable and
also does not hang the system anymore.
Add a sysfs attribute file for console IBM 3215 named con_drop.
This allows for changing the behavior after the boot, for example when
during interactive debugging a panic/crash is expected.
Here is a test of the new behavior using the following test program:
#/bin/bash
declare -i cnt=4
mode=$(cat /sys/bus/ccw/drivers/3215/con_drop)
[ $mode = yes ] && cnt=25
echo "cons_drop $(cat /sys/bus/ccw/drivers/3215/con_drop)"
echo "vmcp term more 5 2"
vmcp term more 5 2
echo "Run $cnt iterations of "'echo t > /proc/sysrq-trigger'
for i in $(seq $cnt)
do
echo "$i. command 'echo t > /proc/sysrq-trigger' at $(date +%F,%T)"
echo t > /proc/sysrq-trigger
sleep 1
done
echo "droptest done" > /dev/kmsg
#
Output with sysfs attribute con_drop set to 1:
# ./droptest.sh
cons_drop yes
vmcp term more 5 2
Run 25 iterations of echo t > /proc/sysrq-trigger
1. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:09
2. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:10
3. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:11
4. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:12
5. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:13
6. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:14
7. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:15
8. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:16
9. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:17
10. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:18
11. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:19
12. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:20
13. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:21
14. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:22
15. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:23
16. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:24
17. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:25
18. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:26
19. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:27
20. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:28
21. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:29
22. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:30
23. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:31
24. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:32
25. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:33
#
There are no hangs anymore.
Output with sysfs attribute con_drop set to 0 and identical
setting for z/VM console 'term more 5 2'. Sometimes hitting the
clear key at the x3270 console to progress output.
# ./droptest.sh
cons_drop no
vmcp term more 5 2
Run 4 iterations of echo t > /proc/sysrq-trigger
1. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:20:58
2. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:24:32
3. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:28:04
4. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:31:37
#
Details:
Enable function raw3215_write() to handle tab expansion and newlines
and feed it with input not larger than the console buffer of 65536
bytes. Function raw3125_putchar() just forwards its character for
output to raw3215_write().
This moves tab to blank conversion to one function raw3215_write()
which also does call raw3215_make_room() to wait for enough free
buffer space.
Function handle_write() loops over all its input and segments input
into chunks of console buffer size (should the input be larger).
Rework tab expansion handling logic to avoid code duplication.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-43147
commit 5c5682b9f87a3b7bd4833884f300ec673685f6a6
Author: Thomas Gleixner <tglx@linutronix.de>
Date: Tue Feb 13 22:05:54 2024 +0100
x86/cpu: Detect real BSP on crash kernels
When a kdump kernel is started from a crashing CPU then there is no
guarantee that this CPU is the real boot CPU (BSP). If the kdump kernel
tries to online the BSP then the INIT sequence will reset the machine.
There is a command line option to prevent this, but in case of nested kdump
kernels this is wrong.
But that command line option is not required at all because the real
BSP is enumerated as the first CPU by firmware. Support for the only
known system which was different (Voyager) got removed long ago.
Detect whether the boot CPU APIC ID is the first APIC ID enumerated by
the firmware. If the first APIC ID enumerated is not matching the boot
CPU APIC ID then skip registering it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240213210252.348542071@linutronix.de
Signed-off-by: David Arcari <darcari@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-55557
commit 988f569ae041ccc93a79d98d1b0043dff4d7e9b7
Author: Uladzislau Rezki (Sony) <urezki@gmail.com>
Date: Fri, 8 Mar 2024 18:34:05 +0100
rcu: Reduce synchronize_rcu() latency
A call to a synchronize_rcu() can be optimized from a latency
point of view. Workloads which depend on this can benefit of it.
The delay of wakeme_after_rcu() callback, which unblocks a waiter,
depends on several factors:
- how fast a process of offloading is started. Combination of:
- !CONFIG_RCU_NOCB_CPU/CONFIG_RCU_NOCB_CPU;
- !CONFIG_RCU_LAZY/CONFIG_RCU_LAZY;
- other.
- when started, invoking path is interrupted due to:
- time limit;
- need_resched();
- if limit is reached.
- where in a nocb list it is located;
- how fast previous callbacks completed;
Example:
1. On our embedded devices i can easily trigger the scenario when
it is a last in the list out of ~3600 callbacks:
<snip>
<...>-29 [001] d..1. 21950.145313: rcu_batch_start: rcu_preempt CBs=3613 bl=28
...
<...>-29 [001] ..... 21950.152578: rcu_invoke_callback: rcu_preempt rhp=00000000b2d6dee8 func=__free_vm_area_struct.cfi_jt
<...>-29 [001] ..... 21950.152579: rcu_invoke_callback: rcu_preempt rhp=00000000a446f607 func=__free_vm_area_struct.cfi_jt
<...>-29 [001] ..... 21950.152580: rcu_invoke_callback: rcu_preempt rhp=00000000a5cab03b func=__free_vm_area_struct.cfi_jt
<...>-29 [001] ..... 21950.152581: rcu_invoke_callback: rcu_preempt rhp=0000000013b7e5ee func=__free_vm_area_struct.cfi_jt
<...>-29 [001] ..... 21950.152582: rcu_invoke_callback: rcu_preempt rhp=000000000a8ca6f9 func=__free_vm_area_struct.cfi_jt
<...>-29 [001] ..... 21950.152583: rcu_invoke_callback: rcu_preempt rhp=000000008f162ca8 func=wakeme_after_rcu.cfi_jt
<...>-29 [001] d..1. 21950.152625: rcu_batch_end: rcu_preempt CBs-invoked=3612 idle=....
<snip>
2. We use cpuset/cgroup to classify tasks and assign them into
different cgroups. For example "backgrond" group which binds tasks
only to little CPUs or "foreground" which makes use of all CPUs.
Tasks can be migrated between groups by a request if an acceleration
is needed.
See below an example how "surfaceflinger" task gets migrated.
Initially it is located in the "system-background" cgroup which
allows to run only on little cores. In order to speed it up it
can be temporary moved into "foreground" cgroup which allows
to use big/all CPUs:
cgroup_attach_task():
-> cgroup_migrate_execute()
-> cpuset_can_attach()
-> percpu_down_write()
-> rcu_sync_enter()
-> synchronize_rcu()
-> now move tasks to the new cgroup.
-> cgroup_migrate_finish()
<snip>
rcuop/1-29 [000] ..... 7030.528570: rcu_invoke_callback: rcu_preempt rhp=00000000461605e0 func=wakeme_after_rcu.cfi_jt
PERFD-SERVER-1855 [000] d..1. 7030.530293: cgroup_attach_task: dst_root=3 dst_id=22 dst_level=1 dst_path=/foreground pid=1900 comm=surfaceflinger
TimerDispatch-2768 [002] d..5. 7030.537542: sched_migrate_task: comm=surfaceflinger pid=1900 prio=98 orig_cpu=0 dest_cpu=4
<snip>
"Boosting a task" depends on synchronize_rcu() latency:
- first trace shows a completion of synchronize_rcu();
- second shows attaching a task to a new group;
- last shows a final step when migration occurs.
3. To address this drawback, maintain a separate track that consists
of synchronize_rcu() callers only. After completion of a grace period
users are deferred to a dedicated worker to process requests.
4. This patch reduces the latency of synchronize_rcu() approximately
by ~30-40% on synthetic tests. The real test case, camera launch time,
shows(time is in milliseconds):
1-run 542 vs 489 improvement 9%
2-run 540 vs 466 improvement 13%
3-run 518 vs 468 improvement 9%
4-run 531 vs 457 improvement 13%
5-run 548 vs 475 improvement 13%
6-run 509 vs 484 improvement 4%
Synthetic test(no "noise" from other callbacks):
Hardware: x86_64 64 CPUs, 64GB of memory
Linux-6.6
- 10K tasks(simultaneous);
- each task does(1000 loops)
synchronize_rcu();
kfree(p);
default: CONFIG_RCU_NOCB_CPU: takes 54 seconds to complete all users;
patch: CONFIG_RCU_NOCB_CPU: takes 35 seconds to complete all users.
Running 60K gives approximately same results on my setup. Please note
it is without any interaction with another type of callbacks, otherwise
it will impact a lot a default case.
5. By default it is disabled. To enable this perform one of the
below sequence:
echo 1 > /sys/module/rcutree/parameters/rcu_normal_wake_from_gp
or pass a boot parameter "rcutree.rcu_normal_wake_from_gp=1"
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Co-developed-by: Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Waiman Long <longman@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-55557
commit 7f66f099de4dc4b1a66a3f94e6db16409924a6f8
Author: Qais Yousef <qyousef@layalina.io>
Date: Sun, 3 Dec 2023 01:12:52 +0000
rcu: Provide a boot time parameter to control lazy RCU
To allow more flexible arrangements while still provide a single kernel
for distros, provide a boot time parameter to enable/disable lazy RCU.
Specify:
rcutree.enable_rcu_lazy=[y|1|n|0]
Which also requires
rcu_nocbs=all
at boot time to enable/disable lazy RCU.
To disable it by default at build time when CONFIG_RCU_LAZY=y, the new
CONFIG_RCU_LAZY_DEFAULT_OFF can be used.
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Tested-by: Andrea Righi <andrea.righi@canonical.com>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Waiman Long <longman@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-55557
commit 51823ca651364f68bd3ad33d848c1542fffdd627
Author: Paul E. McKenney <paulmck@kernel.org>
Date: Tue, 21 Mar 2023 17:28:40 -0700
doc: Get rcutree module parameters back into alpha order
This commit puts the rcutree module parameters back into proper
alphabetical order.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-55557
commit 89f7f29140da767f4675efbbe7892f38786451ec
Author: Paul E. McKenney <paulmck@kernel.org>
Date: Wed, 27 Apr 2022 09:24:31 -0700
doc: Document rcutree.nocb_nobypass_lim_per_jiffy kernel parameter
This commit provides documentation for the kernel parameter controlling
RCU's handling of callback floods on offloaded (rcu_nocbs) CPUs.
This parameter might be obscure, but it is always there when you need it.
Reported-by: Frederic Weisbecker <frederic@kernel.org>
Reported-by: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Waiman Long <longman@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-55557
commit 71de1e34f1dfc31ab3cb052cdd7038950aae06e7
Author: Paul E. McKenney <paulmck@kernel.org>
Date: Wed, 20 Apr 2022 08:59:46 -0700
doc: Document the rcutree.rcu_divisor kernel boot parameter
This commit adds kernel-parameters.txt documentation for the
rcutree.rcu_divisor kernel boot parameter, which controls the softirq
callback-invocation batch limit.
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Waiman Long <longman@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-31230
Conflicts:
1) The net/netfilter/Makefile hunk is dropped due to missing
nft_ct_fast.c file first intruduced by commit d9e789147605
("netfilter: nf_tables: avoid retpoline overhead for some ct
expression calls").
2) A merge conflict in the tools/objtool/check.c hunk due to missing
upstream commit 9bb2ec608a20 ("objtool: Update Retpoline validation").
3) First hunk of net/netfilter/nf_tables_core.c is dropped and a merge
conflict in the second hunk due to missing upstream commit
d8d760627855 ("netfilter: nf_tables: add static key to skip retpoline
workarounds").
4) The net/netfilter/nft_ct.c hunks are dropped due to missing upstream
commit d9e789147605 ("netfilter: nf_tables: avoid retpoline overhead
for some ct expression calls").
commit aefb2f2e619b6c334bcb31de830aa00ba0b11129
Author: Breno Leitao <leitao@debian.org>
Date: Tue, 21 Nov 2023 08:07:32 -0800
x86/bugs: Rename CONFIG_RETPOLINE => CONFIG_MITIGATION_RETPOLINE
Step 5/10 of the namespace unification of CPU mitigations related Kconfig options.
[ mingo: Converted a few more uses in comments/messages as well. ]
Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ariel Miculas <amiculas@cisco.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231121160740.1249350-6-leitao@debian.org
Signed-off-by: Waiman Long <longman@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-34875
commit 5b9d31ae1c925bb5f15975e31b31ff5ae3c81f8f
Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date: Sat Sep 9 12:23:01 2023 -0400
NFSv4: Add a parameter to limit the number of retries after NFS4ERR_DELAY
When using a 'softerr' mount, the NFSv4 client can get stuck waiting
forever while the server just returns NFS4ERR_DELAY. Among other things,
this causes the knfsd server threads to busy wait.
Add a parameter that tells the NFSv4 client how many times to retry
before giving up.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4061
# Merge Request Required Information
## Summary of Changes
RHIVOS is running into early mm init performance issues, and a long-term set of solutions is to improve the kernel linear map when kernel security is set to a max-level, a RHIVOS FuSa requirement, where all of memory is -not- read/writeable via the linear map (all of memory mapping from PAGE_OFFSET), but has strict execute-only, rodata, rw-data and no-execute pages.
Although RHEL9 and upstream can support the latter functionally, it is a significant performance issue as page-level mapping of the kernel linear map has to be employed from the default huge-page mappings that the various arch's support. The boot kernel itself is relatively easy to know how to map for optimal page-mappings and protection, because it is the first to load and ELF sections can be scanned for needed info; the same can't be said for all the loadable kernel modules, which is the impetus for page-splitting of the linear map (on x86) and the per-page-mapping on ARM64, where page-splitting of the linear map is not supported, but is the long-term optimal solution.
In order to make a step in this long-term effort, this patch series attempts to take the existing RHEL9 kernel module load support, which is barely 12 patches past the initial v5.14 base, and bring it up to a current, v6.8 version.
Of course, such an update brings a lot of other needed backports to apply cleanly, if the goal is to get close to upstream, maintain RHEL kmod support, and not regress. Thus, this series results with major updates to dynamic-debug (since it involves modifying kernel module sections), kbuild, modpost, genksyms, and sprinkle an odd livepatch, fpatch, and BPF patch, although the latter were trimmed or dropped wherever possible.
The split is approximately 150 kernel-module, 30 dyndbg, 80 modpost, 15 kbuild, 3 livepatch, 3 ftrace, 2 bpf (one being a fix for earlier kernel commit).
Note: modpost and related kbuild updates moved it to approximately v6.4. A full update to 6.8 wasn't deemed necessary, and was an additional 30+ commits, and more kbuild modifications. This effort was deemed sufficiently large and complete for the intended goal of making RHEL9 amenable to future updates to the kernel-module subsystem for posted patches on review now in linux-mm by Mike Rapaport. Those patches and expected follow-ons, will be backported to RHEL-9 when upstream settles on final updates in this area; these updates will make the kernel-load subsystem more common, and less arch-specific.
One patch from v6.9-rc1 was taken, modules: wait do_free_init correctly, to repair a race seen in the module-load path on a RHIVOS platform, which needed to sit on top of this series for ease of backporting.
v6: Evidently the rebase to -457 kept a merge conflict, which was a duplicate patch already taken in. Latest series is now 299 patches vs 300. No functional changes!
v5: rebased to latest kernel (-457) since gitlab punted due to claimed merge conflict; only conflict was relative source, due to other MRs pulled into cs9/9.5 ahead of this MR; no code changes, and (tkdiff+)diff-ing v4 patches to v5, showed no diffs to the author's naked eye.
v4: Just updated 3rd patch's revert to put Upstream status *after* Subject, so it shows correctly in a git-format output. No code changes from v3. (although CKI running a-muck after push'd update w/only a commit-log change).
v3: Rebase to -455 kernel since v2 was 8300 commits behind and had merge conflicts with JoeL's objtool update MR.
v2: Pulling out of Draft.
: (Hopefully) fixed numerous nits (Jira: -> JIRA:; proper link so no more 404's, etc.)
: add new/latest Fixes, some id'd by reviewers, some new to v6.9
: Cleaned up/out bad merges that had introduced RHEL-only hunks
: Significantly re-ordered the series to make it more bisectable; still breaks where the upstream maintainer tore code out of modpost.c and into a sed script, and then put the functionality back into modpost.c, and removed the sed script, which this series didn't backport since it was already large enough.
: identified a failure with systemtap, that Will Cohen is repairing; thus, this MR has to wait for a systemtap update before it will pass its check in the (brew? cki?) builds.
v1: Draft!
This series has gone through some simple, preliminary testing, but it needs deep review by ftrace, BPF, livepatch, and rh-kabi support to ensure no regressions in these few, but corner kernel-modifying code paths. rh-kabi tooling is a bit unknown, as it isn't in the kernel, but there are RHEL-only patches in the kernel for it.
A patchreview run against the series was exed'd, and needed Fixes were added/included.
The list of self-documented omissions is listed below. If new ones have popped in v6.9-rc<n>,
please forward them for addition.
Bisectability: The series is has known bisectability (patch-ordering) issues at the moment, but plan to re-shuffle the patches in v2 to improve if not make it completely bisectable.
Expected feedback will be incorporated in v2, and planned upgrade to full-MR/drop-Draft status.
Shout-out to Joe Lawrence who aided in debugging and providing fixes for well-hidden noarch build failures around Documentation generation, as well as warning cleanups for EXPORT'd init-tagged functions, which the update checks for now. Joe was instrumental in finding key chunks of the modpost update that appears to have closed gaps in my original backport efforts.
Intentionally Omitted Fix: 0aa24a79ee3b603f kbuild: do not try to parse *.cmd files for objects provided by compiler
-- for parisc & sky arch's, not needed in RHEL9
Intentionally Omitted Fix: f5983dab0ead modpost: define more R_ARM_* for old distributions
For old releases not having R_ARM_* in arch/arm/include/asm/elf.h, which RHEL9 has
Intentionally Omitted Fix: 08700ec705043e linux/export: fix reference to exported functions for parisc64
-- no parisc64 support in RHEL9
Intentionally Omitted Fix: 86495af1171e1feec79f media: dvb: symbol fixup for dvb_attach()
-- not included in this backport due to partner request not to include until RHEL-10
Intentionally Omitted Fix: d81f0d7b8 Subject: kunit: add KUNIT_INIT_TABLE to init link
-- will let KUNIT update bring in and enable as needed
## Approved Development Ticket
JIRA: https://issues.redhat.com/browse/RHEL-28063
Signed-off-by: Donald Dutile <ddutile@redhat.com>
Approved-by: Chris von Recklinghausen <crecklin@redhat.com>
Approved-by: David Arcari <darcari@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>
Approved-by: Eric Chanudet <echanude@redhat.com>
Merged-by: Lucas Zampieri <lzampier@redhat.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3614
# Merge Request Required Information
## Summary of Changes
Introduce csd tracepoints that help tracking IPIs that can be messing with latency.
Also, make the trace available for all smp_function_call*(), not only the ones that result in an IPI.
## Approved Development Ticket
JIRA: https://issues.redhat.com/browse/RHEL-13876
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Approved-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Approved-by: Wander Lairson Costa <wander@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>
Merged-by: Lucas Zampieri <lzampier@redhat.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4014
JIRA: https://issues.redhat.com/browse/RHEL-28203
JIRA: https://issues.redhat.com/browse/RHEL-28209
CVE: CVE-2024-2201
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4014
Depends: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3961
Branch History Injection (BHI) attacks may allow a malicious application to
influence indirect branch prediction in kernel by poisoning the branch
history. eIBRS isolates indirect branch targets in ring0. The BHB can
still influence the choice of indirect branch predictor entry, and although
branch predictor entries are isolated between modes when eIBRS is enabled,
the BHB itself is not isolated between modes.
Alder Lake and new processors supports a hardware control BHI_DIS_S to
mitigate BHI. For older processors Intel has released a software sequence
to clear the branch history on parts that don't support BHI_DIS_S. Add
support to execute the software sequence at syscall entry and VMexit to
overwrite the branch history.
This MR extends the existing spectre_v2 mitigation to enable either
software or hardware BHI mitigation for vulnerable Intel processors,
if enabled. The spectre_v2 vulnerability sysfs file will now show the
status of the BHI mitigation like
...; SW sequence; BHI: SW loop, KVM: SW loop
As Linus has changed the default upstream to CONFIG_SPECTRE_BHI_ON,
the syscall hardening commit 1e3ad78334a6 ("x86/syscall: Don't force
use of indirect calls for system calls") is skipped for now. It may be
backported in the future, if necessary.
Signed-off-by: Waiman Long <longman@redhat.com>
Approved-by: Paolo Bonzini <bonzini@gnu.org>
Approved-by: David Arcari <darcari@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>
Merged-by: Lucas Zampieri <lzampier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-28063
commit 8660484ed1cf3261e89e0bad94c6395597e87599
Author: Luis Chamberlain <mcgrof@kernel.org>
Date: Thu Apr 13 22:28:39 2023 -0700
module: add debugging auto-load duplicate module support
The finit_module() system call can in the worst case use up to more than
twice of a module's size in virtual memory. Duplicate finit_module()
system calls are non fatal, however they unnecessarily strain virtual
memory during bootup and in the worst case can cause a system to fail
to boot. This is only known to currently be an issue on systems with
larger number of CPUs.
To help debug this situation we need to consider the different sources for
finit_module(). Requests from the kernel that rely on module auto-loading,
ie, the kernel's *request_module() API, are one source of calls. Although
modprobe checks to see if a module is already loaded prior to calling
finit_module() there is a small race possible allowing userspace to
trigger multiple modprobe calls racing against modprobe and this not
seeing the module yet loaded.
This adds debugging support to the kernel module auto-loader (*request_module()
calls) to easily detect duplicate module requests. To aid with possible bootup
failure issues incurred by this, it will converge duplicates requests to a
single request. This avoids any possible strain on virtual memory during
bootup which could be incurred by duplicate module autoloading requests.
Folks debugging virtual memory abuse on bootup can and should enable
this to see what pr_warn()s come on, to see if module auto-loading is to
blame for their wores. If they see duplicates they can further debug this
by enabling the module.enable_dups_trace kernel parameter or by enabling
CONFIG_MODULE_DEBUG_AUTOLOAD_DUPS_TRACE.
Current evidence seems to point to only a few duplicates for module
auto-loading. And so the source for other duplicates creating heavy
virtual memory pressure due to larger number of CPUs should becoming
from another place (likely udev).
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Donald Dutile <ddutile@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-28063
commit ae39e9ed964f8e450d0de410b5a757e19581dfc5
Author: Saravana Kannan <saravanak@google.com>
Date: Fri Jun 3 18:01:00 2022 -0700
module: Add support for default value for module async_probe
Add a module.async_probe kernel command line option that allows enabling
async probing for all modules. When this command line option is used,
there might still be some modules for which we want to explicitly force
synchronous probing, so extend <modulename>.async_probe to take an
optional bool input so that async probing can be disabled for a specific
module.
Signed-off-by: Saravana Kannan <saravanak@google.com>
Reviewed-by: Aaron Tomlin <atomlin@redhat.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Donald Dutile <ddutile@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-28063
commit 9c40e1aa84123750773a57c9cf39112459a952dd
Author: Andrew Halaney <ahalaney@redhat.com>
Date: Wed Oct 13 11:40:21 2021 -0400
dyndbg: Remove support for ddebug_query param
This param has been deprecated for a very long time now, let's rip it
out.
Signed-off-by: Andrew Halaney <ahalaney@redhat.com>
Signed-off-by: Jason Baron <jbaron@akamai.com>
Link: https://lore.kernel.org/r/1634139622-20667-3-git-send-email-jbaron@akamai.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Donald Dutile <ddutile@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-13876
Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
conflicts: Fixes (some) conflicts introduced by downstream commit
aa5786b04d ("sched, smp: Trace smp callback causing an IPI")
by applying the original dependency commit, and making it easier to
cherry-pick the next upstream commits due to not having conflicts.
commit 1771257cb447a7b27a15ed9aaf332726c47fcbcf
Author: Paul E. McKenney <paulmck@kernel.org>
Date: 2023-03-20 17:55:14 -0700
locking/csd_lock: Remove added data from CSD lock debugging
The diagnostics added by this commit were extremely useful in one instance:
a5aabace5f ("locking/csd_lock: Add more data to CSD lock debugging")
However, they have not seen much action since, and there have been some
concerns expressed that the complexity is not worth the benefit.
Therefore, manually revert this commit, but leave a comment telling
people where to find these diagnostics.
[ paulmck: Apply Juergen Gross feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230321005516.50558-2-paulmck@kernel.org
Signed-off-by: Leonardo Bras <leobras@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-13876
Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
commit c52198601695851622f361d3f16456e9fc857629
Author: Paul E. McKenney <paulmck@kernel.org>
Date: 2023-03-20 17:55:13 -0700
locking/csd_lock: Add Kconfig option for csd_debug default
The csd_debug kernel parameter works well, but is inconvenient in cases
where it is more closely associated with boot loaders or automation than
with a particular kernel version or release. Thererfore, provide a new
CSD_LOCK_WAIT_DEBUG_DEFAULT Kconfig option that defaults csd_debug to
1 when selected and 0 otherwise, with this latter being the default.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230321005516.50558-1-paulmck@kernel.org
Signed-off-by: Leonardo Bras <leobras@redhat.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3910
JIRA: https://issues.redhat.com/browse/RHEL-25103
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3910
Depends: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3847
The primary purpose of this MR is to backport those upstream workqueue
commits which enables ordered workqueues and rescuers to follow
changes in workqueue unbound cpumask which is necessary to make sure
that isolated CPUs won't be disturbed due to unbound work items being
handled by those CPUs.
These upstream commits were merged into the v6.9 kernel which also
contains some major changes in workqueue code. This makes the required
commits dependent on some of the v6.9 workqueue commits. It is less risky
to sync the workqueue code up to v6.9 instead of selective backports
of some dependent commits. This MR also includes some miscellaneous
commits in other subsystems due to changes in the underlying workqueue
implementations.
A follow-up proactive workqueue fixes MR will be created later on,
if necessary.
Signed-off-by: Waiman Long <longman@redhat.com>
Approved-by: Tony Camuso <tcamuso@redhat.com>
Approved-by: Steve Best <sbest@redhat.com>
Approved-by: Vladis Dronov <vdronov@redhat.com>
Approved-by: Prarit Bhargava <prarit@redhat.com>
Approved-by: Wander Lairson Costa <wander@redhat.com>
Approved-by: Phil Auld <pauld@redhat.com>
Approved-by: Radu Rendec <rrendec@redhat.com>
Approved-by: Chris von Recklinghausen <crecklin@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>
Merged-by: Lucas Zampieri <lzampier@redhat.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4151
# Merge Request Required Information
JIRA: https://issues.redhat.com/browse/RHEL-28780
JIRA: https://issues.redhat.com/browse/RHEL-12083
JIRA: https://issues.redhat.com/browse/RHEL-12322
JIRA: https://issues.redhat.com/browse/RHEL-29105
JIRA: https://issues.redhat.com/browse/RHEL-29357
JIRA: https://issues.redhat.com/browse/RHEL-29359
Omitted-fix: ed8b94f6e0ac ("powerpc/pseries/iommu: Fix iommu initialisation during DLPAR add")
- Reverted by 1fba2bf8e9d5 ("Revert "powerpc/pseries/iommu: Fix iommu initialisation during DLPAR add"")
Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git
branch: next
Tested: In progress
- general cki coverage
- Nvidia testing arm-smmu-v3 and iommufd related changes they have requested.
- Multiple rounds testing of amd_iommu, intel_iommu, and arm-smmu-v3 with
various iommu configurations with disk i/o using fio,
covering lazy iotlb invalidation, strict iotlb invalidation,
and passthrough. Also tested with forcedac set. Intel
Scalable Mode capable systems tested with the iotlb invalidation
policies, and passthrough with scalable mode enabled, and disabled.
AMD systems tested tested with v1 pages tables and v2.
- Tested booting with various iommu configurations, and verifying system
in correct state on AMD, Intel, and ARM.
- Limited test on ppc64le. The system I had access to was
setting up a 64-bit bypass window, and using dma_direct
calls. It ran, but since I don't normally touch ppc64le
iommu code, I need to investigate more or get IBM assistance
to more thoroughly test it.
- Working on getting testing assistance from IBM for the s390x changes.
## Summary of Changes
This brings iommu, iommufd, and dma mapping api up to 6.9 with some additions from Joerg's
next branch minus some commits changes in a 6.9 SEV-SNP pull for AMD. Some hightlights:
- The removal of the amd_iommu_v2 code, and the addition of it's replacement based on the
iommu core SVA api, along with a re-org of the amd_iommu code.
- The migration of s390 to the iommu core dma-iommu dma ops implementation, joining Intel,
AMD, and ARM as users of the same code base.
- The beginnings of a re-work of the arm-smmu-v3 driver by Jason, and others.
- A number of changes to iommufd as it continues to get fleshed out.
- IOPT memory usage observability (code that was basis for talk at LPC last year)
Example output in vmstat files:
```
# grep iommu /sys/devices/system/node/node*/vmstat
/sys/devices/system/node/node0/vmstat:nr_iommu_pages 342
/sys/devices/system/node/node1/vmstat:nr_iommu_pages 0
```
- Continued work on shared virtual addressing and io page faulting (PRI).
- Dynamic swiotlb memory pools. This is not enabled yet, as they still seem to be
shaking out issues upstream, but the code is in place now.
- Re-working of iommu core domain allocation.
Note: iommufd selftest is being enabled in separate work that has been delegated to
another engineer starting to help with iommu. So that will be enabled in the
next few weeks to add more coverage for iommufd.
Conflicts wise, they should be noted in the individual commits, but
not too bad overall. 13/30 were dropping unsupported bits, and another
8 were context diffs. A couple caused by out of order backports due
to fixes, and couple upstream conflicts from colliding patchsets that
had to be resolved in the merge commits.
Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com>
Approved-by: Jan Stancek <jstancek@redhat.com>
Approved-by: Donald Dutile <ddutile@redhat.com>
Approved-by: Phil Auld <pauld@redhat.com>
Approved-by: David Airlie <airlied@redhat.com>
Approved-by: Lenny Szubowicz <lszubowi@redhat.com>
Approved-by: Steve Best <sbest@redhat.com>
Approved-by: John W. Linville <linville@redhat.com>
Approved-by: Mark Langsdorf <mlangsdo@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>
Merged-by: Lucas Zampieri <lzampier@redhat.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4268
JIRA: https://issues.redhat.com/browse/RHEL-34114
This rebases supported USB and Thunderbolt drivers to upstream kernel v6.8
By design, changes on this rebase are limited to supported usb/thunderbolt
drivers. Changes which happen to touch the drivers but are tree-wide are
selectively or partially pulled in, when relevant.
Omitted-fix: 9dc292413c56 ("usb: gadget: ncm: Fix endianness of wMaxSegmentSize variable in ecm_desc")
Omitted-fix: f90ce1e04cbc ("usb: gadget: ncm: Fix handling of zero block length packets")
Omitted-fix: 5b9e00a6004c ("powerpc/4xx: Fix warp_gpio_leds build failure")
Omitted-fix: 6f98e44984d5 ("spi: ppc4xx: Fix fallout from include cleanup")
Omitted-fix: 70e6163d17dd ("arm64: dts: qcom: qrb5165-rb5: use u16 for DP altmode svid")
Signed-off-by: Desnes Nunes <desnesn@redhat.com>
Approved-by: José Ignacio Tornos Martínez <jtornosm@redhat.com>
Approved-by: Eric Chanudet <echanude@redhat.com>
Approved-by: David Arcari <darcari@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>
Merged-by: Lucas Zampieri <lzampier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-34076
commit 4e58aaeebb3c27993c734c99eae6881b196b1ddb
Author: Paul E. McKenney <paulmck@kernel.org>
Date: Wed, 1 Nov 2023 18:28:38 -0700
rcu: Restrict access to RCU CPU stall notifiers
Although the RCU CPU stall notifiers can be useful for dumping state when
tracking down delicate forward-progress bugs where NUMA effects cause
cache lines to be delivered to a given CPU regularly, but always in a
state that prevents that CPU from making forward progress. These bugs can
be detected by the RCU CPU stall-warning mechanism, but in some cases,
the stall-warnings printk()s disrupt the forward-progress bug before
any useful state can be obtained.
Unfortunately, the notifier mechanism added by commit 5b404fdabacf ("rcu:
Add RCU CPU stall notifier") can make matters worse if used at all
carelessly. For example, if the stall warning was caused by a lock not
being released, then any attempt to acquire that lock in the notifier
will hang. This will prevent not only the notifier from producing any
useful output, but it will also prevent the stall-warning message from
ever appearing.
This commit therefore hides this new RCU CPU stall notifier
mechanism under a new RCU_CPU_STALL_NOTIFIER Kconfig option that
depends on both DEBUG_KERNEL and RCU_EXPERT. In addition, the
rcupdate.rcu_cpu_stall_notifiers=1 kernel boot parameter must also
be specified. The RCU_CPU_STALL_NOTIFIER Kconfig option's help text
contains a warning and explains the dangers of careless use, recommending
lockless notifier code. In addition, a WARN() is triggered each time
that an attempt is made to register a stall-warning notifier in kernels
built with CONFIG_RCU_CPU_STALL_NOTIFIER=y.
This combination of measures will keep use of this mechanism confined to
debug kernels and away from routine deployments.
[ paulmck: Apply Dan Carpenter feedback. ]
Fixes: 5b404fdabacf ("rcu: Add RCU CPU stall notifier")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>
Signed-off-by: Waiman Long <longman@redhat.com>